Question

Aetina board with M2 Metis: error timeout for querying an inference

Forum|Forum|6 months ago
August 29, 2025
10 replies
193 views

lucagessi
Cadet

Hello everyone!
I am Luca Gessi, from Italy.
I am a (lucky) owner of Aetina board with M2 Axelera Metis.
Unfortunatly the board arrived without the SDK installed. I installed it using the github guide. The procedure faced several issues and broken packages. I had to run it several time but at the end the installation completed.

From this I tried to run simple command for testing the board (as suggested on the getting started guide):

./inference.py --no-display yolov5s-v7-coco dataset

However the inference failed:

(venv) aetina@aetina:~/voyager-sdk$ ./inference.py --no-display yolov5s-v7-coco dataset
ERROR : timeout for querying an inferencearm_release_ver: g13p0-01eac0, rk_so_ver: 9
INFO : Using dataset val
INFO : Dataset 'COCO2017' split 'val' downloaded successfully to /home/aetina/.cache/axelera/data/coco
INFO : Dataset 'COCO2017' split 'labels' downloaded successfully to /home/aetina/.cache/axelera/data/coco
INFO : Dataset 'COCO2017' split 'annotations' downloaded successfully to /home/aetina/.cache/axelera/data/coco
Creating new label cache: /home/aetina/.cache/axelera/data/coco/labels/val2017/val_coco_objdet.cache
Labels found: 4952, corrupt images: 0
Background images: 48, missing label files: 48, empty label files: 0
Detecting... : 0%|▎ | 13/5000 [00:15<1:41:32, 1.22s/frames][libtriton_linux.c:505] DMABUF_METIS_XFER failed: Connection timed out
[AxeleraDmaBuf.cpp:240] DMA transfer failed: Connection timed out
[ERROR][axeShareMemoryExecute]: Dmabuf transfer failed.
[ERROR][axeCommandQueueExecuteCommandListsAsync]: Level-zero memory operation failed: 0x70010001.
terminate called after throwing an instance of 'std::runtime_error'
what(): axr_run_model failed with Error at zeCommandQueueExecuteCommandLists(cmdqueue, n_cmdlists, cmdlists, nullptr): cmdqueue_run_cmdlists: 319: Exit with error code: 0x70010001 : ZE_RESULT_ERROR_NOT_AVAILABLE

The board seems to be recognized by the system, axdevice outputs :
Device 0: metis-0:1:0 1GiB m2 flver=1.2.0-rc2 bcver=1.0 clock=800MHz(0-3:800MHz) mvm=0-3:100%

Metis also has the heat dissipator.
Thank you for your assistance.

lucagessi
Author
Cadet
Forum|Forum|5 months ago
September 8, 2025

Hi.

I tryed to run a pretrained model from zoo but it also fails.

aetina@aetina:~/voyager-sdk$ . venv/bin/activate
(venv) aetina@aetina:~/voyager-sdk$ axdevice
Device 0: metis-0:1:0 1GiB m2 flver=1.2.0-rc2 bcver=1.0 clock=800MHz(0-3:800MHz) mvm=0-3:100%
(venv) aetina@aetina:~/voyager-sdk$ ./inference.py yolov8n-coco-onnx dataset --pipe=torch-aipu --no-display
arm_release_ver: g13p0-01eac0, rk_so_ver: 9
INFO    : Using dataset val
INFO    : Restricting yolov8n-coco-onnx to 1 core(s) because running pipe torch-aipu.
INFO    : Using CPU based torch
Detecting...                               :   0%|▏                                                                                                        | 9/5000 [01:54<16:51:46, 12.16s/frames]          [libtriton_linux.c:505] DMABUF_METIS_XFER failed: Connection timed out
[AxeleraDmaBuf.cpp:240] DMA transfer failed: Connection timed out
[ERROR][axeShareMemoryExecute]: Dmabuf transfer failed.
[ERROR][axeCommandQueueExecuteCommandListsSync]: Level-zero memory operation failed: 0x70010001.
ERROR   : Error at zeCommandQueueExecuteCommandLists(cmdqueue, n_cmdlists, cmdlists, nullptr): cmdqueue_run_cmdlists: 319: Exit with error code: 0x70010001 : ZE_RESULT_ERROR_NOT_AVAILABLE
ERROR   : TorchPipe terminated due to ExecutionError at /home/aetina/.cache/axelera/venvs/0ce26488/lib/python3.10/site-packages/axelera/runtime/objects.py:83: AXR_ERROR_RUNTIME_ERROR: Error at ze          CommandQueueExecuteCommandLists(cmdqueue, n_cmdlists, cmdlists, nullptr): cmdqueue_run_cmdlists: 319: Exit with error code: 0x70010001 : ZE_RESULT_ERROR_NOT_AVAILABLE
ERROR   : Full traceback:
ERROR   :   File "/home/aetina/voyager-sdk/axelera/app/pipe/torch.py", line 53, in _loop
ERROR   :     image, result, meta = model_pipe.inference.exec_torch(
ERROR   :   File "/home/aetina/voyager-sdk/axelera/app/operators/inference.py", line 1084, in exec_torch
ERROR   :     self._axr_modeli.run(inputs, outputs)
ERROR   :   File "/home/aetina/.cache/axelera/venvs/0ce26488/lib/python3.10/site-packages/axelera/runtime/objects.py", line 529, in run
ERROR   :     _raise_error(self.context, res.value)
ERROR   :   File "/home/aetina/.cache/axelera/venvs/0ce26488/lib/python3.10/site-packages/axelera/runtime/objects.py", line 83, in _raise_error
ERROR   :     raise exc(f"{err}: {msg}")
INFO    : Model:      yolov8n-coco-onnx
INFO    : Dataset:    CocoDataset-COCO2017
INFO    : Date:       2025-09-08 08:52:12.930523
INFO    : Inference Time: 89270.30ms
INFO    : Evaluation Time: 127.65ms
INFO    : Evaluation Metrics:
INFO    : ==========================
INFO    : | mAP_box       | 52.65% |
INFO    : | mAP50_box     | 74.90% |
INFO    : | precision_box | 59.25% |
INFO    : | recall_box    | 63.97% |
INFO    : ==========================
INFO    : Key Metric (mAP_box): 52.65%
Core Temp  : 40.0°C
CPU %      : 1.5%
End-to-end : 0.1fps
Latency    : 0.0ms (min:inf max:-inf σ:0.0 x̄:0.0)ms
(venv) aetina@aetina:~/voyager-sdk$

Do you have any idea?

lucagessi
Author
Cadet
Forum|Forum|5 months ago
September 8, 2025

I tried to enable firmware update and the procedure detected an RC firmware version. The tool forced the firmware update and now it seems to work properly.

(venv) aetina@aetina:~/voyager-sdk$ ./inference.py yolov8n-coco-onnx dataset --pipe=torch-aipu --no-display
arm_release_ver: g13p0-01eac0, rk_so_ver: 9
INFO    : Using dataset val
INFO    : Restricting yolov8n-coco-onnx to 1 core(s) because running pipe torch-aipu.
INFO    : Using CPU based torch
INFO    : Model:      yolov8n-coco-onnx
INFO    : Dataset:    CocoDataset-COCO2017
INFO    : Date:       2025-09-08 09:17:07.443879
INFO    : Inference Time: 475084.50ms
INFO    : Evaluation Time: 11687.16ms
INFO    : Evaluation Metrics:
INFO    : ==========================
INFO    : | mAP_box       | 36.07% |
INFO    : | mAP50_box     | 51.04% |
INFO    : | precision_box | 55.75% |
INFO    : | recall_box    | 50.57% |
INFO    : ==========================
INFO    : Key Metric (mAP_box): 36.07%
Core Temp  : 41.0°C
CPU %      : 59.8%
End-to-end : 10.4fps
Latency    : 89.6ms (min:46.9 max:155.6 σ:13.7 x̄:91.7)ms
(venv) aetina@aetina:~/voyager-sdk$ ./inference.py yolov8n-coco-onnx dataset --no-display
arm_release_ver: g13p0-01eac0, rk_so_ver: 9
INFO    : Using dataset val
INFO    : Model:      yolov8n-coco-onnx
INFO    : Dataset:    CocoDataset-COCO2017
INFO    : Date:       2025-09-08 09:19:39.082672
INFO    : Inference Time: 115371.71ms
INFO    : Evaluation Time: 86951.65ms
INFO    : Evaluation Metrics:
INFO    : ==========================
INFO    : | mAP_box       | 35.79% |
INFO    : | mAP50_box     | 50.94% |
INFO    : | precision_box | 54.74% |
INFO    : | recall_box    | 51.05% |
INFO    : ==========================
INFO    : Key Metric (mAP_box): 35.79%
Core Temp  : 42.0°C
CPU %      : 28.3%
End-to-end : 43.0fps
Latency    : 495.4ms (min:380.8 max:629.5 σ:32.8 x̄:496.8)ms

Thank you

npi
Ensign
Forum|Forum|5 months ago
September 8, 2025

Hi Luca, you may try with additional argument ‘--timeout 0’ this helped me. Hope it would help you too.

lucagessi
Author
Cadet
Forum|Forum|5 months ago
September 8, 2025

Hi Luca, you may try with additional argument ‘--timeout 0’ this helped me. Hope it would help you too.

Hi. I think I have fixed my problem. I enabled firmware update (here the tutorial). From there, the procedure detected an RC fimrware version and forced the update.

After that and a reboot it seems to work.

(venv) aetina@aetina:~/voyager-sdk$ ./inference.py yolov8n-coco-onnx dataset --no-display
arm_release_ver: g13p0-01eac0, rk_so_ver: 9
INFO : Using dataset val
INFO : Model: yolov8n-coco-onnx
INFO : Dataset: CocoDataset-COCO2017
INFO : Date: 2025-09-08 09:19:39.082672
INFO : Inference Time: 115371.71ms
INFO : Evaluation Time: 86951.65ms
INFO : Evaluation Metrics:
INFO : ==========================
INFO : | mAP_box | 35.79% |
INFO : | mAP50_box | 50.94% |
INFO : | precision_box | 54.74% |
INFO : | recall_box | 51.05% |
INFO : ==========================
INFO : Key Metric (mAP_box): 35.79%
Core Temp : 42.0°C
CPU % : 28.3%
End-to-end : 43.0fps
Latency : 495.4ms (min:380.8 max:629.5 σ:32.8 x̄:496.8)ms

npi
Ensign
Forum|Forum|5 months ago
September 8, 2025

Glad you fix it.
You’re facing same problem I have. Low fps performance and high latency.

lucagessi
Author
Cadet
Forum|Forum|5 months ago
September 8, 2025

Glad you fix it.
You’re facing same problem I have. Low fps performance and high latency.

Do you think that the performance I achieve are bad?
What should we aspect with yolov8n ?

npi
Ensign
Forum|Forum|5 months ago
September 8, 2025

They are in the docs/reference/model_zoo.md

lucagessi
Author
Cadet
Forum|Forum|5 months ago
September 8, 2025

They are in the docs/reference/model_zoo.md

Ok, it should achieve at least around 500 fps. Did you found any reason for this mismatch?

npi
Ensign
Forum|Forum|5 months ago
September 8, 2025

Nop

I have opened a ticket:

Spanner
Axelera Team
Forum|Forum|5 months ago
September 8, 2025

Hi @lucagessi! Thanks for sharing the detailed updates. Great to see you got the board running after updating the firmware. Solid troubleshooting. 👍

And thanks @npi for stepping in on this and helping!

The timeout errors look resolved now, is that right? And what you’re both seeing is a performance mismatch compared to the reference numbers in the Voyager model zoo docs?

In the meantime, really appreciate you both taking the time to test and report back like this 👍

Sign up

Log in, or create an Axelera AI account

Login to the community

Log in, or create an Axelera AI account

Scanning file for viruses.

This file cannot be downloaded