Skip to main content
Question

Aetina board with M2 Metis: error timeout for querying an inference

  • August 29, 2025
  • 10 replies
  • 147 views

lucagessi
Cadet

Hello everyone!
I am Luca Gessi, from Italy. 
I am a (lucky) owner of Aetina board with M2 Axelera Metis. 
Unfortunatly the board arrived without the SDK installed. I installed it using the github guide. The procedure faced several issues and broken packages. I had to run it several time but at the end the installation completed.

From this I tried to run simple command for testing the board (as suggested on the getting started guide):

./inference.py --no-display yolov5s-v7-coco dataset

 

However the inference failed:

 


(venv) aetina@aetina:~/voyager-sdk$ ./inference.py --no-display yolov5s-v7-coco dataset
ERROR   : timeout for querying an inferencearm_release_ver: g13p0-01eac0, rk_so_ver: 9
INFO    : Using dataset val
INFO    : Dataset 'COCO2017' split 'val' downloaded successfully to /home/aetina/.cache/axelera/data/coco
INFO    : Dataset 'COCO2017' split 'labels' downloaded successfully to /home/aetina/.cache/axelera/data/coco
INFO    : Dataset 'COCO2017' split 'annotations' downloaded successfully to /home/aetina/.cache/axelera/data/coco
Creating new label cache: /home/aetina/.cache/axelera/data/coco/labels/val2017/val_coco_objdet.cache
Labels found: 4952, corrupt images: 0
Background images: 48, missing label files: 48, empty label files: 0
Detecting...                               :   0%|▎                                                                                                                       | 13/5000 [00:15<1:41:32,  1.22s/frames][libtriton_linux.c:505] DMABUF_METIS_XFER failed: Connection timed out
[AxeleraDmaBuf.cpp:240] DMA transfer failed: Connection timed out
[ERROR][axeShareMemoryExecute]: Dmabuf transfer failed.
[ERROR][axeCommandQueueExecuteCommandListsAsync]: Level-zero memory operation failed: 0x70010001.
terminate called after throwing an instance of 'std::runtime_error'
  what():  axr_run_model failed with Error at zeCommandQueueExecuteCommandLists(cmdqueue, n_cmdlists, cmdlists, nullptr): cmdqueue_run_cmdlists: 319: Exit with error code: 0x70010001 : ZE_RESULT_ERROR_NOT_AVAILABLE
 

 

The board seems to be recognized by the system, axdevice outputs :
Device 0: metis-0:1:0 1GiB m2 flver=1.2.0-rc2 bcver=1.0 clock=800MHz(0-3:800MHz) mvm=0-3:100%

Metis also has the heat dissipator.
Thank you for your assistance.
 

 

10 replies

lucagessi
Cadet
  • Author
  • Cadet
  • September 8, 2025

Hi.

I tryed to run a pretrained model from zoo but it also fails.

aetina@aetina:~/voyager-sdk$ . venv/bin/activate
(venv) aetina@aetina:~/voyager-sdk$ axdevice
Device 0: metis-0:1:0 1GiB m2 flver=1.2.0-rc2 bcver=1.0 clock=800MHz(0-3:800MHz) mvm=0-3:100%
(venv) aetina@aetina:~/voyager-sdk$ ./inference.py yolov8n-coco-onnx dataset --pipe=torch-aipu --no-display
arm_release_ver: g13p0-01eac0, rk_so_ver: 9
INFO : Using dataset val
INFO : Restricting yolov8n-coco-onnx to 1 core(s) because running pipe torch-aipu.
INFO : Using CPU based torch
Detecting... : 0%|▏ | 9/5000 [01:54<16:51:46, 12.16s/frames] [libtriton_linux.c:505] DMABUF_METIS_XFER failed: Connection timed out
[AxeleraDmaBuf.cpp:240] DMA transfer failed: Connection timed out
[ERROR][axeShareMemoryExecute]: Dmabuf transfer failed.
[ERROR][axeCommandQueueExecuteCommandListsSync]: Level-zero memory operation failed: 0x70010001.
ERROR : Error at zeCommandQueueExecuteCommandLists(cmdqueue, n_cmdlists, cmdlists, nullptr): cmdqueue_run_cmdlists: 319: Exit with error code: 0x70010001 : ZE_RESULT_ERROR_NOT_AVAILABLE
ERROR : TorchPipe terminated due to ExecutionError at /home/aetina/.cache/axelera/venvs/0ce26488/lib/python3.10/site-packages/axelera/runtime/objects.py:83: AXR_ERROR_RUNTIME_ERROR: Error at ze CommandQueueExecuteCommandLists(cmdqueue, n_cmdlists, cmdlists, nullptr): cmdqueue_run_cmdlists: 319: Exit with error code: 0x70010001 : ZE_RESULT_ERROR_NOT_AVAILABLE
ERROR : Full traceback:
ERROR : File "/home/aetina/voyager-sdk/axelera/app/pipe/torch.py", line 53, in _loop
ERROR : image, result, meta = model_pipe.inference.exec_torch(
ERROR : File "/home/aetina/voyager-sdk/axelera/app/operators/inference.py", line 1084, in exec_torch
ERROR : self._axr_modeli.run(inputs, outputs)
ERROR : File "/home/aetina/.cache/axelera/venvs/0ce26488/lib/python3.10/site-packages/axelera/runtime/objects.py", line 529, in run
ERROR : _raise_error(self.context, res.value)
ERROR : File "/home/aetina/.cache/axelera/venvs/0ce26488/lib/python3.10/site-packages/axelera/runtime/objects.py", line 83, in _raise_error
ERROR : raise exc(f"{err}: {msg}")
INFO : Model: yolov8n-coco-onnx
INFO : Dataset: CocoDataset-COCO2017
INFO : Date: 2025-09-08 08:52:12.930523
INFO : Inference Time: 89270.30ms
INFO : Evaluation Time: 127.65ms
INFO : Evaluation Metrics:
INFO : ==========================
INFO : | mAP_box | 52.65% |
INFO : | mAP50_box | 74.90% |
INFO : | precision_box | 59.25% |
INFO : | recall_box | 63.97% |
INFO : ==========================
INFO : Key Metric (mAP_box): 52.65%
Core Temp : 40.0°C
CPU % : 1.5%
End-to-end : 0.1fps
Latency : 0.0ms (min:inf max:-inf σ:0.0 x̄:0.0)ms
(venv) aetina@aetina:~/voyager-sdk$

Do you have any idea?


lucagessi
Cadet
  • Author
  • Cadet
  • September 8, 2025

I tried to enable firmware update and the procedure detected an RC firmware version. The tool forced the firmware update and now it seems to work properly.

(venv) aetina@aetina:~/voyager-sdk$ ./inference.py yolov8n-coco-onnx dataset --pipe=torch-aipu --no-display
arm_release_ver: g13p0-01eac0, rk_so_ver: 9
INFO : Using dataset val
INFO : Restricting yolov8n-coco-onnx to 1 core(s) because running pipe torch-aipu.
INFO : Using CPU based torch
INFO : Model: yolov8n-coco-onnx
INFO : Dataset: CocoDataset-COCO2017
INFO : Date: 2025-09-08 09:17:07.443879
INFO : Inference Time: 475084.50ms
INFO : Evaluation Time: 11687.16ms
INFO : Evaluation Metrics:
INFO : ==========================
INFO : | mAP_box | 36.07% |
INFO : | mAP50_box | 51.04% |
INFO : | precision_box | 55.75% |
INFO : | recall_box | 50.57% |
INFO : ==========================
INFO : Key Metric (mAP_box): 36.07%
Core Temp : 41.0°C
CPU % : 59.8%
End-to-end : 10.4fps
Latency : 89.6ms (min:46.9 max:155.6 σ:13.7 x̄:91.7)ms
(venv) aetina@aetina:~/voyager-sdk$ ./inference.py yolov8n-coco-onnx dataset --no-display
arm_release_ver: g13p0-01eac0, rk_so_ver: 9
INFO : Using dataset val
INFO : Model: yolov8n-coco-onnx
INFO : Dataset: CocoDataset-COCO2017
INFO : Date: 2025-09-08 09:19:39.082672
INFO : Inference Time: 115371.71ms
INFO : Evaluation Time: 86951.65ms
INFO : Evaluation Metrics:
INFO : ==========================
INFO : | mAP_box | 35.79% |
INFO : | mAP50_box | 50.94% |
INFO : | precision_box | 54.74% |
INFO : | recall_box | 51.05% |
INFO : ==========================
INFO : Key Metric (mAP_box): 35.79%
Core Temp : 42.0°C
CPU % : 28.3%
End-to-end : 43.0fps
Latency : 495.4ms (min:380.8 max:629.5 σ:32.8 x̄:496.8)ms

Thank you


  • Ensign
  • September 8, 2025

Hi Luca, you may try with additional argument ‘--timeout 0’ this helped me. Hope it would help you too.


lucagessi
Cadet
  • Author
  • Cadet
  • September 8, 2025

Hi Luca, you may try with additional argument ‘--timeout 0’ this helped me. Hope it would help you too.

Hi. I think I have fixed my problem. I enabled firmware update (here the tutorial). From there, the procedure detected an RC fimrware version and forced the update.

After that and a reboot it seems to work.

(venv) aetina@aetina:~/voyager-sdk$ ./inference.py yolov8n-coco-onnx dataset --no-display
arm_release_ver: g13p0-01eac0, rk_so_ver: 9
INFO    : Using dataset val
INFO    : Model:      yolov8n-coco-onnx
INFO    : Dataset:    CocoDataset-COCO2017
INFO    : Date:       2025-09-08 09:19:39.082672
INFO    : Inference Time: 115371.71ms
INFO    : Evaluation Time: 86951.65ms
INFO    : Evaluation Metrics:
INFO    : ==========================
INFO    : | mAP_box       | 35.79% |
INFO    : | mAP50_box     | 50.94% |
INFO    : | precision_box | 54.74% |
INFO    : | recall_box    | 51.05% |
INFO    : ==========================
INFO    : Key Metric (mAP_box): 35.79%
Core Temp  : 42.0°C
CPU %      : 28.3%
End-to-end : 43.0fps
Latency    : 495.4ms (min:380.8 max:629.5 σ:32.8 x̄:496.8)ms


  • Ensign
  • September 8, 2025

Glad you fix it.
You’re facing same problem I have. Low fps performance and high latency.


lucagessi
Cadet
  • Author
  • Cadet
  • September 8, 2025

Glad you fix it.
You’re facing same problem I have. Low fps performance and high latency.

Do you think that the performance I achieve are bad? 
What should we aspect with yolov8n ?


  • Ensign
  • September 8, 2025

They are in the docs/reference/model_zoo.md


lucagessi
Cadet
  • Author
  • Cadet
  • September 8, 2025

They are in the docs/reference/model_zoo.md

Ok, it should achieve at least around 500 fps. Did you found any reason for this mismatch?


  • Ensign
  • September 8, 2025

Nop

I have opened a ticket: 

 

 


Spanner
Axelera Team
Forum|alt.badge.img+2
  • Axelera Team
  • September 8, 2025

Hi ​@lucagessi!  Thanks for sharing the detailed updates. Great to see you got the board running after updating the firmware. Solid troubleshooting. 👍

And thanks ​@npi  for stepping in on this and helping!

The timeout errors look resolved now, is that right? And what you’re both seeing is a performance mismatch compared to the reference numbers in the Voyager model zoo docs?

In the meantime, really appreciate you both taking the time to test and report back like this 👍