Skip to main content
Question

DMABUF error

  • June 23, 2026
  • 3 replies
  • 26 views

  • Hi,

    when running inference on Metis M.2 on raspberry CM5 we noticed these errors occuring after couple minutes.

    error log:

    [libtriton_linux.c:531] DMABUF_METIS_XFER failed: Input/output error
    [AxeleraDmaBuf.cpp:222] DMA transfer failed: Input/output error
    [ERROR][axeShareMemoryExecute]: Dmabuf transfer failed.
    [ERROR][axeCommandQueueExecuteCommandListsAsync]: Level-zero memory operation failed: 0x70010001.
    terminate called after throwing an instance of 'std::runtime_error'
    what():  axr_run_model failed with Error at zeCommandQueueExecuteCommandLists(cmdqueue, n_cmdlists, cmdlists, nullptr): cmdqueue_run_cmdlists: 319:
    Exit with error code: 0x70010001 : ZE_RESULT_ERROR_NOT_AVAILABLE

    kernel errror log:

    axl 0001:01:00.0: DMA error WR CH0 (ctrl 0x1)

    setup:

     Runtime: runtime-1.5.3-1
    - Voyager SDK: v1.5.2
    - metis-dkms: 1.4.4
    - Firmware: flash version 1.5.0
    - bcver 7.1

3 replies

Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • June 23, 2026

Hi ​@mato!

Hmm, the kernel line:

axl 0001:01:00.0: DMA error WR CH0

might suggest a low-level DMA/PCIe/device-state issue rather than the model itself, I wonder...

As a first thing to try, a full power-off shutdown of the Raspberry CM5 + Metis M.2 is always worthwhile. Remove power entirely, give it a minute, then boot up again. If you’re running inside Docker, restarting the container before running inference again is often helpful.

This is different from using axdevice --reboot or a normal warm reboot, and it’s often kicked things back into action for people.

Let me know if that does anything, and we can go from there!


  • Author
  • Cadet
  • June 23, 2026

Hi ​@Spanner,

yes, power restart does get the device back in a good state, but since we use the Metis device in a real time control system having to restart the device to recover is really bad for us. We would prefer to somehow prevent this error from happening.

Just curious if you have any insight or previous experience with this error?

The error occurs even if we configure pcie gen2 or gen3 in the config.txt in linux.
 


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • June 23, 2026

Yeah, it’s useful for diagnostic purposes, but you can’t rely on being able to cold restart is as part of its daily operation 😅

Certainly I think setting it to PCIe gen3 is the way to go, but it sounds like you’ve tried that. It’s not a simple power issue, is it? I know the RPi can get temperemental in some weird ways if it’s not getting just enough juice. I suspect you’ve checked that, but just in case.

 Let’s see if ​@Victor Labian has any insights, too - he’s done a lot of awesome work with a Metis and RPi5 combo. 👍