Skip to main content
Question

Metis Compute Board – Persistent ERROR: No devices found even after reboot

  • December 19, 2025
  • 15 replies
  • 145 views

Forum|alt.badge.img

Hello,

I am facing an issue with a Metis Compute Board where any inference command (or any command using the accelerator) immediately fails with the error:

ERROR: No devices found

This happens regardless of the model or the command I run.

Yesterday, I managed to temporarily recover by removing and reinstalling the Docker container with the Voyager SDK, after which inference started working again. I thought this was because I did not use the metis in a month so maybe it needed to be reinstalled.

However, today the same error appears again, even though I did not change anything in my setup.

I have already tried rebooting the board, but the problem persists. At the moment, no inference works at all, and reinstalling the container no longer fixes the issue.

I would like to know if this is a known issue and if there is a recommended recovery procedure for this situation.

Thanks in advance for your help.

Best regards,

Thibaut

15 replies

Spanner
Axelera Team
Forum|alt.badge.img+2
  • Axelera Team
  • December 19, 2025

Hi ​@tchretien ! Sorry to hear you’re having difficulties here.

On the Metis Compute Board, the “No devices found” error typically means the SDK inside your Docker container can’t access the AIPU properly.

As a starting point, maybe if you run this from within your activated Voyager SDK environment we can see what happens:

axdevice --refresh

Even on the Metis Compute Board, this command forces the SDK to:

  • Re-enumerate available devices

  • Reload the AIPU firmware

  • Re-establish communication with the integrated Metis accelerator

This can resolve cases where the Docker environment loses its connection to the hardware after a restart or long period of inactivity. Let me know how that goes — if the issue persists, we can dig deeper! 👍


Forum|alt.badge.img
  • Author
  • Ensign
  • December 22, 2025

After running 

axdevice --refresh

I get this error message : 0000:01:00.0 : Device
[libtriton_linux.c:1093] Device communication timed out: device did not respond within 1 seconds. (4294967295)
ERROR: Failed to load runtime stage0: /opt/axelera/device-1.4.2-1/omega/bin/start_axelera_runtime_stage0.bin

And now when I try my model with inference.py, I get this message error : 

2025-12-22 08:01:06.629668922 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
[libtriton_linux.c:1093] Device communication timed out: device did not respond within 1 seconds. (4294967295)
Failed to load firmware: /opt/axelera/device-1.4.2-1/omega/bin/start_axelera_runtime.elf
arm_release_ver: g13p0-01eac0, rk_so_ver: 11
[libtriton_linux.c:799] Failed to open '/dev/metis-0:1:0': No such device
[AxeleraDevice.cpp:71] Device not found: metis-0:1:0
WARNING : Failed to open device 0(metis-0:1:0)
ERROR   : Failed to detect metis-0:1:0
 


  • Axelera Team
  • December 22, 2025

Hi ​@tchretien,


Here’s a PCIe troubleshooting guide that might help resolve your issue. 

 

Also do you happen to know which SDK version was working with your device? A new voyager-sdk release rolled out a couple weeks ago (1.5) which comes with some new Metis firmware. It looks like you’re still using the old FW from SDK 1.4. 


Forum|alt.badge.img
  • Author
  • Ensign
  • December 23, 2025

OK thanks I’ll try that ! 

Yes I heard I’m on the 1.4.2 version. I was waiting for your response to re install the new version. 

Also do you know if there is a guide to upgrade the version of the sdk ? Do I juste delete everything and start from scratch with the new version or maybe is there a command line to upgrade ?

Thanks a lot

Thibaut


Forum|alt.badge.img
  • Author
  • Ensign
  • December 23, 2025

OK I see the problem… 

there is no metis file in /sys/class/

So I can’t run most of the commands lines in the guide and I can’t upgrade version… : 

$AXELERA_DEVICE_DIR/firmware/interactive_flash_update.sh
>>> Metis Flash Update Script
>>> =========================
>>> ERROR: Could not extract firmware version from output
>>> Output: [libtriton_linux.c:995] Could not open directory '/sys/class/metis/': No such file or directory
Fail to get device name
 

I don’t know what is going on but it’s really blocking everything…

The only solution is maybe delete EVERYTHING and then install the last version of the SDK ? 

is there any guide for a proprer desintallation ?


Forum|alt.badge.img
  • Author
  • Ensign
  • December 23, 2025

I performed a full OS reflash and installed Voyager SDK 1.5.0 on my metis compute board.
After a clean install, the Metis device is still not enumerated at the PCIe level:

lspci -tv -[0000:00]-

This occurs before any driver or runtime initialization, and inference fails with “No devices found”.
This strongly suggests a hardware-level issue (PCIe/AIPU not powered or not responding).
Could you confirm whether this board should be replaced (RMA) or if there is a known hardware recovery procedure?

Does anyone had something like this before ?


Forum|alt.badge.img
  • Author
  • Ensign
  • December 23, 2025

ok I found the problem ( I think). 

The error has now return to : 

2025-12-23 13:44:33.762946035 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"
[libdmabuf.c:320] Found kernel driver version 1.2.3, but at least version 1.4.1 is required. Please update the kernel driver
[AxeleraDevice.cpp:53] Device not found: metis-0:1:0
Failed to open device 0(metis-0:1:0)
Unsupported tracer: core_temp: valid tracers are: cpu_usage, end_to_end_fps, end_to_end_infs, latency
arm_release_ver: g13p0-01eac0, rk_so_ver: 11
ERROR   : Failed to detect metis-0:1:0
and when I run this command triton_multi_ctx --fwver (as the PCIe Troubleshooting page advise it)

I get this message : [libdmabuf.c:320] Found kernel driver version 1.2.3, but at least version 1.4.1 is required. Please update the kernel driver
Could not init device >metis-0:1:0<. 

Isn’t it the source of the problem ?

I am currently looking for a way to update my kernel. 

Do you have any advice or solution ?

Thks

 

--

 

Edit : I found https://support.axelera.ai/hc/en-us/articles/29335553753874-Build-and-load-Metis-driver-on-host-manually this page and I think that the missing kernel drivers are simply not supported with the metis. 

I did : dpkg -l | grep linux-headers
ii  linux-headers-5.19.0-41-generic                   5.19.0-41.42~22.04.1                    arm64        Linux kernel headers for version 5.19.0 on ARMv8 SMP
then uname -r
6.1.148-rockchip-standard

and /usr/src/linux-headers-5.19.0-41-generic/drivers$ sudo apt install -y linux-headers-$(uname -r)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package linux-headers-6.1.148-rockchip-standard
E: Couldn't find any package by glob 'linux-headers-6.1.148-rockchip-standard'

Maybe I need to have an oldest version of the SDK ? 

Thanks for your help🙏

--

re edit 😂 

Sorry there is A LOT of messages now…

My metis just came back to the “no devices found” error… 

But now, the folder metis does not exist in /sys/class/

I really don’t know if it is a problem with my metis or a conflictual version between the sdk and the metis… 

If you need precisions I can rewrite an full message to resume my problem.

Thks


Spanner
Axelera Team
Forum|alt.badge.img+2
  • Axelera Team
  • December 24, 2025

Wow, excellent work on this, ​@tchretien !

Lots to unpack, but one quick thing I wanted to suggest is the BSP version. I think it needs to be newer than v1.3 when using SDK v1.5.x. That could be causing some issues. The getting started guide covers the BSP, I believe.

Also, I don’t think you can compile the Metis driver from source (no kernel headers available as it’s a Yocto build). The driver needs to be installed via the pre-built .deb package.

Hope this gives us something to look at in the meantime as we dig deeper into the info you’ve shared 👍

 


Forum|alt.badge.img
  • Author
  • Ensign
  • December 24, 2025

Ok I’ll try this.

On this guide the 1.3.1 BSP Version is the latest… Do you have a link to a more recent version ? 


Spanner
Axelera Team
Forum|alt.badge.img+2
  • Axelera Team
  • January 5, 2026

Do you have a link to a more recent version ? 

The one in the guide is currently the latest release. The guide will be updated whenever a new release is available, and I’ll also drop a message here in the community when anything goes public. 👍


Forum|alt.badge.img
  • Author
  • Ensign
  • January 6, 2026

I keep having this issue… When I reflash the BSP images and the SDK with a 1.4.2 version (to match with the latest BSP image wich is the 1.3.1) it works for a day and then I need to reflash everything or I get everytime this error “NO DEVICES FOUND”...


Spanner
Axelera Team
Forum|alt.badge.img+2
  • Axelera Team
  • January 6, 2026

Looking into this with the team, ​@tchretien , and will get back to you ASAP!


  • Axelera Team
  • January 7, 2026

Hello ​@tchretien ,

Could you please share the logs from your attempt to flash the latest BSP v1.3.1?

After a successful flash of BSP 1.3.1, you should observe:

  • Metis driver version
    • sh-5.1$ cat /sys/class/metis/version
    • 1.4.4
  • Successful Metis enumeration
    • sh-5.1$ lspci -tv
    • -[0000:00]---00.0-[01-ff]----00.0 Axelera AI Metis AIPU (rev 02)

If either of these checks fails, it may indicate an issue with the BSP flashing process or a potentially defective sample. In that case, we can initiate the RMA process and arrange a replacement.


Forum|alt.badge.img
  • Author
  • Ensign
  • January 9, 2026

hey ! 

It’s now fixed ! I think my Metis sometimes does not activate properly and some files are missing (like /sys/class/metis) but when I reboot + change to ADB (or ssh if I was in ADB) it’s working !

Weird problem but I really tink this is a material problem.

Thanks


Spanner
Axelera Team
Forum|alt.badge.img+2
  • Axelera Team
  • January 9, 2026

Ah great work ​@tchretien ! Keep me posted on how it goes.