Skip to main content

I am using the Metis M.2 Card on my host PC. It is detected using `lspci` and `axdevice`. I am able to run inference using `./inference.py yolov5m-v7-coco-tracker usb:0`. However when I try to use the AxRuntime API (Python) I get the following:
 

from axelera.runtime import Context,

context = Context()
model_path = "/home/user/axelera/testing/data/yolov5/compile/compiled_model/model.json"
model = context.load_model(model_path)
batch_size = 1
connection = context.device_connect(None, batch_size)
instance = connection.load_model_instance(model, num_sub_devices=batch_size, aipu_cores=batch_size)
>ERROR]]axeDeviceMemoryAllocate]: Not enough memory: free memory 1531904, request memory 16520704.
>ERROR]]axeDeviceMemAlloc]: Device memory allocate failed: size 16520704.
>ERROR]]axeMemAllocDevice]: Device memory allocate failed: 0x70010001.
Error at zeMemAllocDevice(context, &desc, size, alignment, device, &addr): mem_alloc_device: 249: Exit with error code: 0x70010001 : ZE_RESULT_ERROR_NOT_AVAILABLE
Failed to allocate memory poolpool_l2_const
Failed to initialise memory pool pool_l2_const
Failed to initialize V1Executor

---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
Cell Inn7], line 1
----> 1 instance = connection.load_model_instance(model, num_sub_devices=batch_size, aipu_cores=batch_size)

File ~/.cache/axelera/venvs/3252ae77/lib/python3.10/site-packages/axelera/runtime/objects.py:496, in Connection.load_model_instance(self, model, **kwargs)
494 instance = axr.load_model_instance(self._obj, model._obj, props)
495 if not instance:
--> 496 _raise_error(self.context)
497 return ModelInstance(instance, self.context)

File ~/.cache/axelera/venvs/3252ae77/lib/python3.10/site-packages/axelera/runtime/objects.py:83, in _raise_error(ctx, err_no)
81 msg = axr.last_error_string(pctx).decode("utf-8")
82 exc = _exceptions.get(err_no, _exceptionssaxr.Result.UNKNOWN_ERROR])
---> 83 raise exc(f"{err}: {msg}")

InternalError: AXR_ERROR_INTERNAL_ERROR: Error at zeMemAllocDevice(context, &desc, size, alignment, device, &addr): mem_alloc_device: 249: Exit with error code: 0x70010001 : ZE_RESULT_ERROR_NOT_AVAILABLE

What is happening here and how do I solve it?

(Rebooting system or using `axdevice --refresh` does not seem to help)

Hi there ​@Siegfried

I did see something similar a short while ago regarding a host PC’s PCIe memory allocation settings; specifically, enabling the  “Above 4G Decoding” option (sometimes called “Large Memory Support”) in your BIOS.

Do you know if that’s enabled? Might be worth a short as a first step, and then let us know if the problem continues and we can look a little deeper!

 


Hi there ​@Siegfried

I did see something similar a short while ago regarding a host PC’s PCIe memory allocation settings; specifically, enabling the  “Above 4G Decoding” option (sometimes called “Large Memory Support”) in your BIOS.

Do you know if that’s enabled? Might be worth a short as a first step, and then let us know if the problem continues and we can look a little deeper!

 

“Above 4G Decoding” was not enabled. However, after enabling it the error still appears. 
Any thoughts?


Hi again ​@Siegfried !

This is similar to something we’ve seen on platforms like the Orange Pi 5 Plus (in the Modify Device Tree of Orange Pi 5 Plus section).

Not sure if that’s going to be the same solution (if probably won’t be) but maybe it’ll spark some ideas? Do you have some details on your host platform, and its OS and such?


It does not seem to be the same error, but `sudo dmesg | grep metis` and `lspci -vv` give me these results:

02:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
Subsystem: Axelera AI Metis AIPU (rev 02)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 146
Region 0: Memory at a2010000 (64-bit, non-prefetchable) [size=4K]
Region 2: Memory at a0000000 (32-bit, non-prefetchable) [size=32M]
Expansion ROM at a2000000 [disabled] [size=64K]
Capabilities: <access denied>
Kernel driver in use: axl
Kernel modules: metis

Not sure if this helps, since normal inference with modelzoo models does work, I think that there isn’t something wrong with the setup?

After further investigating, I find that the requested memory of 16520704 corresponds with the size of the pool_l2_const memory pool. 

There seems to be something wrong with my model.json. This was created using the CLI compile command using a yolov5s.onnx model. When deploying a yolov5s-v7-coco the model.json is different and there is no error with memory allocation. Should I not use the compile command or is it something else?

 

I leave some of my system information anyway:

  • Operating System: Ubuntu 22.04.5 LTS
  • Kernel: Linux 6.8.0-57-generic
  • Architecture: x86-64
  • Hardware Vendor: Micro-Star International Co., Ltd.
  • Hardware Model: MS-7C75

Thanks in advance!

 

 

 


Hi ​@Siegfried, thanks for the comparison table, that’s a great idea!

I think you’re right, that the issue comes down to the size of the pool_l2_const block in your custom-compiled yolov5s model. As I recall (and I’ll check this internally) each core has 2MB of SRAM, so looking at the comparison table the size of your custom model exceeds that, being around 16MB. Hence the allocation failure in AxRuntime, most likely.

By comparison, the zoo’s yolov5s-v7-coco version has a pool_l2_const size closer to 200KB.

What about trying compiling and deploying your model using the deploy.py tool instead of compile?


Reply