Not enough memory using AxRuntime and M.2 Card on host PC

I am using the Metis M.2 Card on my host PC. It is detected using `lspci` and `axdevice`. I am able to run inference using `./inference.py yolov5m-v7-coco-tracker usb:0`. However when I try to use the AxRuntime API (Python) I get the following:

from axelera.runtime import Context,

context = Context()
model_path = "/home/user/axelera/testing/data/yolov5/compile/compiled_model/model.json"
model = context.load_model(model_path)
batch_size = 1
connection = context.device_connect(None, batch_size)
instance = connection.load_model_instance(model, num_sub_devices=batch_size, aipu_cores=batch_size)

[ERROR][axeDeviceMemoryAllocate]: Not enough memory: free memory 1531904, request memory 16520704.
[ERROR][axeDeviceMemAlloc]: Device memory allocate failed: size 16520704.
[ERROR][axeMemAllocDevice]: Device memory allocate failed: 0x70010001.
Error at zeMemAllocDevice(context, &desc, size, alignment, device, &addr): mem_alloc_device: 249: Exit with error code: 0x70010001 : ZE_RESULT_ERROR_NOT_AVAILABLE
Failed to allocate memory poolpool_l2_const
Failed to initialise memory pool pool_l2_const
Failed to initialize V1Executor

---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
Cell In[7], line 1
----> 1 instance = connection.load_model_instance(model, num_sub_devices=batch_size, aipu_cores=batch_size)

File ~/.cache/axelera/venvs/3252ae77/lib/python3.10/site-packages/axelera/runtime/objects.py:496, in Connection.load_model_instance(self, model, **kwargs)
    494     instance = axr.load_model_instance(self._obj, model._obj, props)
    495 if not instance:
--> 496     _raise_error(self.context)
    497 return ModelInstance(instance, self.context)

File ~/.cache/axelera/venvs/3252ae77/lib/python3.10/site-packages/axelera/runtime/objects.py:83, in _raise_error(ctx, err_no)
     81 msg = axr.last_error_string(pctx).decode("utf-8")
     82 exc = _exceptions.get(err_no, _exceptions[axr.Result.UNKNOWN_ERROR])
---> 83 raise exc(f"{err}: {msg}")

InternalError: AXR_ERROR_INTERNAL_ERROR: Error at zeMemAllocDevice(context, &desc, size, alignment, device, &addr): mem_alloc_device: 249: Exit with error code: 0x70010001 : ZE_RESULT_ERROR_NOT_AVAILABLE

What is happening here and how do I solve it?

(Rebooting system or using `axdevice --refresh` does not seem to help)

Page 1 / 1

Hi there @Siegfried!

I did see something similar a short while ago regarding a host PC’s PCIe memory allocation settings; specifically, enabling the “Above 4G Decoding” option (sometimes called “Large Memory Support”) in your BIOS.

Do you know if that’s enabled? Might be worth a short as a first step, and then let us know if the problem continues and we can look a little deeper!

Hi there @Siegfried!

Do you know if that’s enabled? Might be worth a short as a first step, and then let us know if the problem continues and we can look a little deeper!

“Above 4G Decoding” was not enabled. However, after enabling it the error still appears.
Any thoughts?

Hi again @Siegfried !

This is similar to something we’ve seen on platforms like the Orange Pi 5 Plus (in the Modify Device Tree of Orange Pi 5 Plus section).

Not sure if that’s going to be the same solution (if probably won’t be) but maybe it’ll spark some ideas? Do you have some details on your host platform, and its OS and such?

It does not seem to be the same error, but `sudo dmesg | grep metis` and `lspci -vv` give me these results:

02:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
	Subsystem: Axelera AI Metis AIPU (rev 02)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 146
	Region 0: Memory at a2010000 (64-bit, non-prefetchable) [size=4K]
	Region 2: Memory at a0000000 (32-bit, non-prefetchable) [size=32M]
	Expansion ROM at a2000000 [disabled] [size=64K]
	Capabilities: <access denied>
	Kernel driver in use: axl
	Kernel modules: metis

Not sure if this helps, since normal inference with modelzoo models does work, I think that there isn’t something wrong with the setup?

After further investigating, I find that the requested memory of 16520704 corresponds with the size of the pool_l2_const memory pool.

There seems to be something wrong with my model.json. This was created using the CLI compile command using a yolov5s.onnx model. When deploying a yolov5s-v7-coco the model.json is different and there is no error with memory allocation. Should I not use the compile command or is it something else?

I leave some of my system information anyway:

Operating System: Ubuntu 22.04.5 LTS
Kernel: Linux 6.8.0-57-generic
Architecture: x86-64
Hardware Vendor: Micro-Star International Co., Ltd.
Hardware Model: MS-7C75

Thanks in advance!

Hi @Siegfried, thanks for the comparison table, that’s a great idea!

I think you’re right, that the issue comes down to the size of the pool_l2_const block in your custom-compiled yolov5s model. As I recall (and I’ll check this internally) each core has 2MB of SRAM, so looking at the comparison table the size of your custom model exceeds that, being around 16MB. Hence the allocation failure in AxRuntime, most likely.

By comparison, the zoo’s yolov5s-v7-coco version has a pool_l2_const size closer to 200KB.

What about trying compiling and deploying your model using the deploy.py tool instead of compile?

Sign up

Log in, or create an Axelera AI account

Login to the community

Log in, or create an Axelera AI account

Scanning file for viruses.

This file cannot be downloaded