Skip to main content
Question

Multiple issues on Aetina RK3588 (gstreamer display issue, boot instability, YOLOv7 onnx accuracy mismatch)

  • March 25, 2026
  • 3 replies
  • 8 views

Hey, we ordered the Aetina RK3588 setup in order to spend minimum time to get it running. When working the numbers are quite impressive, but sadly I encountered three separate issues while testing the same RK3588 setup.
They might be unrelated, but I grouped them here for context.

Current setup:

Aetina RK3588, with the most recent SDK directly installed on it (no docker, in order to reduce failure points and save storage).
Axdevice: Device 0: metis-0:1:0 1GiB m2 flver=1.4.0 bcver=7.0 clock=800MHz(0-3:800MHz) mvm=0-3:100%

Issue 1: Gstreamer + display not working

Working: pipe=gst --no-display, pipe=torch-aipu (with display), pipe=torch (with display)

Not working: pipe=gst (with display) e.g.

(venv) aetina@aetina:~/voyager-sdk$ ./inference.py yolov5s-v7-coco media/traffic1_1080p.mp4

2026-03-25 13:35:42.378368936 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"

arm_release_ver: g13p0-01eac0, rk_so_ver: 9

WARNING : pyglet could not access the display, OpenGL is not available: No standard config is available.

INFO : Deploying model yolov5s-v7-coco for 4 cores. This may take a while...

Detecting... : 0%| | 1/6682 [00:01<1:57:43, 1.06s/frames]

Segmentation fault

Sometimes I get to multiple frames before crashing (but I don’t see any video, the display is still opening before the crash)

(venv) aetina@aetina:~/voyager-sdk$ ./inference.py yolov5s-v7-coco media/traffic1_1080p.mp4

2026-03-25 19:45:53.476303296 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"

arm_release_ver: g13p0-01eac0, rk_so_ver: 9

WARNING : pyglet could not access the display, OpenGL is not available: No standard config is available.

Detecting...                               :   1%|▋                                                                                                     | 46/6682 [00:01<02:42, 40.73frames/s]Segmentation fault

This is the same with the last line of -vv enabled

DEBUG   :axelera.app.pipe.gst: Received first frame from gstreamer

DEBUG   :axelera.app.pipe.gst: Finished building gst pipeline - build time = 1.562

TRACE   :axelera.app.inf_tracers: $ triton_trace --device metis-0:1:0 --slog-level err                                  

TRACE   :axelera.app.inf_tracers: > triton_trace retcode=0 stdout= stderr=  

TRACE   :axelera.app.inf_tracers: $ triton_trace --device metis-0:1:0 --slog-level inf:collector   

TRACE   :axelera.app.inf_tracers: > triton_trace retcode=0 stdout= stderr=  

TRACE   :axelera.app.inf_tracers: Running stdbuf -oL triton_trace --device metis-0:1:0 --clear-buffer --slog as subprocess to collect log              

TRACE   :axelera.app.inf_tracers: CPU Usage is 101.0 on 8 cores == 12.6%    

DEBUG   :axelera.app.display: System memory: 1611.54 MB axelera: 861.92 MB, vms = 4230.73 MB    display queue size: 1                                   

Detecting...                               :   0%|                                                                                                    | 1/6682 [00:01<1:58:10,  1.06s/frames]Segmentation fault

I am at the current setup because directly after unboxing I couldn’t start with the quick start guide, because of following errors: (venv) root@aetina:/home/ubuntu/voyager-sdk# ./inference.py yolov5s-v7-coco media/traffic1_1080p.mp4 --pipe=gst --no-display –aipu-cores=4

WARNING : Failed to get OpenCL platforms : clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR

WARNING : Please check the documentation for installation instructions

[libtriton_linux.c:1037] Failed to allocate AI cores: Invalid argument

[ERROR][axeDeviceAllocateContext]: Fail to alloc ctx associate to 4 device.

[ERROR][axeCreateContextObject]: axeDeviceAllocateContext failed with 0x70010001.

[ERROR][axeContextCreateEx]: Create context object failed.

terminate called after throwing an instance of 'std::runtime_error'

  what():  axr_device_connect failed : Error at zeContextCreateEx(driver, &ctx_desc, num_sub_devices, sub_device_handles.data() + first_sub, &context): new_connection: 1116: Exit with error code: 0x78000007 : ZE_RESULT_ERROR_INVALID_NULL_POINTER

Aborted (core dumped)

./inference.py yolov5s-v7-coco media/traffic1_1080p.mp4   --pipe=gst   --no-display   --aipu-cores=4   --disable-opencl -vv         

pipeline_dot_file.write_text(Gst.debug_bin_to_dot_data(pipeline, Gst.DebugGraphDetails.ALL))

TRACE   :axelera.app.pipe.gst_helper: Pipeline state change: pipeline0, written graph to /home/ubuntu/voyager-sdk/build/yolov5s-v7-coco/logs/pipeline_graph_NULL_to_READY.svg TRACE   :axelera.app.pipe.gst_helper: Received new pad decodebin0.src_0 TRACE   :axelera.app.pipe.gst_helper: Linked decodebin0.src_0 with decodebin-link0.sink

Illegal instruction (core dumped)

These went away with updating firmware, reflashing the board and installing newest SDK, but I didn’t manage to solve above error. That’s why I changed the delivered setup and landed at above setup. 

Issue 2: Bootloading error

Twice I had already a blank screen and also no connection (via ethernet) after booting. Sounds exactly like https://community.axelera.ai/metis-m-2-3/blank-screen-issue-with-metis-m-2-eval-system-with-aetina-rk3588-industrial-motherboard-163 and after reflashing the board both times it was fine again. Both times it was working fine directly before and suddenly not anymore. Didn’t have too much work on the Axelera board yet, but this is not a durable solution if it happens often. 

3. Accuracy missmatch yolov7-tiny-coco-onnx from the zoo 

I tested different models and all went fine except the yolov7-tiny-coco-onnx from the zoo but also with own onnx file exported from the official repo and it behaved exactly the same. 

Good accuracy: torch pipe, torch-apiu pipe mAP_box=41.75%
Accuracy drop: gst pipe (mAP_box=19.09%)

./inference.py yolov7-tiny-coco-onnx dataset --no-display --show-stats --aipu-cores=1 --pipe=torch-aipu                                              

INFO    : Evaluation Metrics:                                  

INFO    : ==========================                           

INFO    : | mAP_box       | 41.75% |                           

INFO    : | mAP50_box     | 57.47% |                           

INFO    : | precision_box | 64.36% |                           

INFO    : | recall_box    | 44.86% |                           

INFO    : ==========================                           

INFO    : Key Metric (mAP_box): 41.75% 

./inference.py yolov7-tiny-coco-onnx dataset --no-display --show-stats --aipu-cores=1                  

INFO    : Model:      yolov7-tiny-coco-onnx                                

INFO    : Dataset:    CocoDataset-COCO2017

INFO    : Date:       2026-03-12 14:46:21.136822

INFO    : Evaluation Metrics:

INFO    : ==========================

INFO    : | mAP_box       | 19.09% |

INFO    : | mAP50_box     | 54.16% |

INFO    : | precision_box | 59.57% |

INFO    : | recall_box    | 43.32% |

INFO    : ==========================

INFO    : Key Metric (mAP_box): 19.09%

As you can see actually only the mAP_box metric drops a lot, the others are quite similar. This suggested maybe an anchor missmatch (but theoretically shouldn’t be). I then changed the anchors in build/yolov7-tiny-coco-onnx/model_info.json by hand and the accuracy for the gst pipe dropped further down to ~2%, torch-aipu pipeline didn’t change at all even with different anchors. Why does gst pipeline behave differently for this model?
Where are anchors loaded from in torch pipelines (obviously not from the same as the gst-pipe)?
As yolov7 has good licences, it would be beneficial to get it to working correctly. 

In general, the system seems quite unstable. After every time the power connection turned off, I need to follow the steps from 

https://support.axelera.ai/hc/en-us/articles/29308064843794-How-to-solve-Metis-driver-failure-to-persist-after-host-reboot Is that expected?

Thank you very much for your help. Looking forward to your answers. Sorry for the long post. If you need any specific outputs, just tell me. 

3 replies

Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • March 27, 2026

Hi ​@erib001 ! Thanks for the detailed info! Let’s have a look at working through them one by one…

Before I dig into the individual issues, could you confirm which SDK version you're running? You mention most recent but I’m wondering about the precise version; you can check with pip show axelera-framework or look in the SDK directory for a version file. Also, which BSP version is on the Aetina (you can check with cat /etc/axelera-version or similar)?

 

Issue 1 : gst + display segfault

The pyglet/OpenGL warning is cosmetic, as far as I know. It appears on any system without working OpenGL and doesn't cause problems by itself (we probably need to work on the warning language here!). The actual segfault looks like it's happening in the display rendering thread, as your debug log shows display queue size: 1 just before the crash.

There was a known bug where the gst pipeline elements could segfault on systems where OpenCL headers are present but no platform is available. Which is exactly the RK3588 situation here, from what I can see. This was fixed in a recent SDK release, which is why I thought it’d be good to confirm your version as it’ll help determine if you've got the fix or not.

In the meantime, could you try running with both OpenCL and OpenGL explicitly disabled?

./inference.py yolov5s-v7-coco media/traffic1_1080p.mp4 --disable-opencl --disable-opengl

That should rule out the GPU-related stuff. 👍


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • March 27, 2026

Issue 2: blank screen after boot

This is a known issue on RK3588-based platforms as I recall. Sometimes after a power cycle the PCIe devices (including Metis) aren't properly powered on by the host. The recommended workaround is to do a full power off/on cycle rather than a soft reboot. If the entire board goes to a blank screen (not just Metis missing), then it's more likely an Aetina BSP issue, and reimaging is often tried for recovery (which you've already found, is that right?).

If this keeps happening frequently, let me know and we can dig deeper into whether there's a power supply or hardware factor at play. 👍


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • March 27, 2026

Issue 3: YOLOv7-tiny accuracy mismatch
This is interesting. The pattern you're seeing (mAP_box dropping significantly while mAP50, precision, and recall stay similar) could suggest the detections themselves are landing in roughly the right places, but the bounding box coordinate precision is off at higher IoU thresholds.

Your observation that editing anchors in model_info.json affects the gst pipeline but not torch-aipu is a really good find, nice work! The two pipelines source their decoder configuration differently, and it sounds like there may be a mismatch in how YOLOv7-specific anchors are handled in the gst path?

I'm going to flag this one to the engineering team as it looks like a genuine pipeline-level issue worth investigating. I'll update you here once I hear back.

Driver persistence after power cycle
Needing to re-run the driver setup steps after every power cycle isn't expected behaviour, yeah. Once we know your SDK and BSP versions, we can figure out if there's a kernel/driver mismatch causing it.

So… let me try and remember where we’re at! 😆 Step one, let me know the SDK and BSP versions, and try the --disable-opencl --disable-opengl test for Issue 1. We'll go from there!