Hello Axelera support team,I am running the Axelera Metis M.2 accelerator on a Radxa Rock 5B (RK3588 SoC) with Armbian and kernel 6.18.10-current-rockchip64. After extensive debugging over several days, I have identified a compatibility issue between the metis driver and the ARM SMMU-v3 on this platform. I would appreciate your guidance on resolving this.HARDWARE & SOFTWAREBoard: Radxa Rock 5B (RK3588, 8GB RAM, PCIE3.0-4X) OS: Armbian, kernel 6.18.10-current-rockchip64 (mainline) Accelerator: Axelera Metis M.2 Driver version: metis 1.4.16 SDK/Runtime version: voyager-sdk / runtime-1.6.0-1 PSU: 65WWHAT WORKSThe Metis is detected correctly via PCIe: lspci shows "Axelera AI Metis AIPU (rev 02)" at 0000:01:00.0 After patching the Device Tree to expand the PCIe memory window (the default 14MB window was too small; expanded the 64-bit prefetchable range at 0x900000000 to 128MB), the driver loads successfully: /dev/metis0 -> metis-0:1:0 is created "Axelera AIPU PCIe Driver, version 1.4.16, init OK" appears in dmesg PCIe link runs at 8GT/s x4 correctly Model compilation and deployment completes correctly (yolov8s-coco-onnx for 4 cores) when CPU frequency is limited to 1.8GMHz or belowTHE PROBLEM After the model deploy completes, the system fails with a Bus error when the SDK attempts to load the runtime firmware into the Metis via DMA: [libaxldev_linux.c:1515] Device communication timed out: device did not respond within 1 seconds.Failed to load firmware: /opt/axelera/device-1.6.0-1/omega/bin/start_axelera_runtime.elfBus errorWith the default MSI configuration (32 MSI), the system produces a silent hard reset between 25-50 seconds into the deploy process instead. The relevant dmesg pattern before the reset is: axl 0000:01:00.0: vmsi configuredaxl 0000:01:00.0: IRQ MSI timeout (12 1)axl 0000:01:00.0: vmsi configuredImportant observations about the reset behaviour:kernel.panic=10 has no effect — it is a hardware-level reset, not a kernel crash No AER errors are recorded in /sys/bus/pci/devices/0000:01:00.0/aer_dev_fatal or aer_rootport_total_err_fatal CPU load at time of reset is ~20%, temperature is ~40°C — ruling out thermal or CPU overload causes With additional external cooling keeping temperatures below 40°C, the reset takes longer (50 seconds vs 25 seconds without extra cooling). This correlates with CPU thermal throttling: without extra cooling the CPU runs at higher frequencies causing more aggressive DMA, which triggers the SMMU fault faster Limiting CPU frequency to 1008MHz allows the deploy to complete (though it takes ~28 minutes), but the Bus error still occurs when loading the runtime firmware afterwards The problem persists regardless of the number of cores configured: both AXELERA_CONFIGURE_BOARD=,30 (4 cores) and AXELERA_CONFIGURE_BOARD=,10 (1 core) produce the same Bus error The reset is completely silent — no kernel messages whatsoever before the system goes downThe root cause identified is the ARM SMMU-v3 (fc900000) intercepting MSI interrupts and DMA from the Metis: arm-smmu-v3 fc900000.iommu: event: F_TRANSLATION client: 0000:01:00.0 sid: 0x100 ssid: 0x0 iova: 0x30 ipa: 0x0arm-smmu-v3 fc900000.iommu: unpriv data write s1 "Input address caused fault"The SMMU-v3 is initialized by TF-A/BL31 firmware before the kernel boots. Even with status="disabled" in the Device Tree for the iommu@fc900000 node, the kernel finds it already active and continues using it. Kernel parameters iommu=off and iommu.passthrough=1 have no effect.THINGS ALREADY TRIEDiommu.passthrough=1, iommu=off, pci=noaer kernel parameters — no effect status="disabled" on iommu@fc900000 in Device Tree — SMMU still active (TF-A initializes it before kernel) Removing iommu-map from pcie@fe150000 DT node — causes C_BAD_STREAMID errors, same reset modprobe metis single_msi=1 — changes error from F_TRANSLATION to C_BAD_STREAMID + IRQ MSI timeout + Bus error modprobe metis single_msi=1 dma_poll=1 — no improvement Changing msi-map to use ITS0 (0x89) instead of ITS1 (0x132) — same result pcie_acs_override=downstream,multifunction — not compiled in this kernel Disabling SMMU via sysfs bypass — rejected (EINVAL, group shares Root Port) Limiting CPU to 1.8gHz — deploy completes but Bus error persists when loading runtime firmware Using AXELERA_CONFIGURE_BOARD=,10 (single core) — same Bus error, problem is not related to number of cores Monitoring AER counters in real time — no errors recorded before resetKEY OBSERVATION This platform uses kernel 6.18 (mainline) with CONFIG_IOMMU_DEFAULT_TRANSLATED. The ARM SMMU-v3 cannot be disabled at kernel level because TF-A initializes it before the kernel. The metis driver 1.4.16 does not appear to support operation under an active ARM SMMU-v3 with translated DMA in mainline kernels.The Axelera documentation references the Orange Pi 5 Plus (also RK3588) as a supported platform. That board typically runs a BSP kernel (5.10 or 6.1) where the SMMU is not active for PCIe. Could you confirm whether the metis driver supports ARM SMMU-v3 with mainline kernels, and if so, what configuration is required?QUESTIONSDoes metis driver 1.4.16 support operation with ARM SMMU-v3 active (mainline kernel, translated DMA mode)? Is there a known workaround for RK3588 platforms with mainline kernel 6.x? Is a driver update planned that adds proper SMMU-v3 support? Can you share the exact kernel configuration used for the Orange Pi 5 Plus reference setup? Is there a way to configure the DMA operations in the SDK to work within the SMMU constraints?Thank you for your time. I am happy to provide additional logs, dmesg output, or test any patches you may have.Best regards, Miguel

Question

Metis M.2 Bus error loading runtime firmware - Kernel 6.18 / SMMU-v3 compatibility issue

Forum|Forum|1 month ago
April 25, 2026
8 replies
238 views

malbero
Cadet

Hello Axelera support team,

I am running the Axelera Metis M.2 accelerator on a Radxa Rock 5B (RK3588 SoC) with Armbian and kernel 6.18.10-current-rockchip64. After extensive debugging over several days, I have identified a compatibility issue between the metis driver and the ARM SMMU-v3 on this platform. I would appreciate your guidance on resolving this.

HARDWARE & SOFTWARE

Board: Radxa Rock 5B (RK3588, 8GB RAM, PCIE3.0-4X)
OS: Armbian, kernel 6.18.10-current-rockchip64 (mainline)
Accelerator: Axelera Metis M.2
Driver version: metis 1.4.16
SDK/Runtime version: voyager-sdk / runtime-1.6.0-1
PSU: 65W

WHAT WORKS

The Metis is detected correctly via PCIe: lspci shows "Axelera AI Metis AIPU (rev 02)" at 0000:01:00.0
After patching the Device Tree to expand the PCIe memory window (the default 14MB window was too small; expanded the 64-bit prefetchable range at 0x900000000 to 128MB), the driver loads successfully:
- /dev/metis0 -> metis-0:1:0 is created
- "Axelera AIPU PCIe Driver, version 1.4.16, init OK" appears in dmesg
- PCIe link runs at 8GT/s x4 correctly
Model compilation and deployment completes correctly (yolov8s-coco-onnx for 4 cores) when CPU frequency is limited to 1.8GMHz or below

THE PROBLEM After the model deploy completes, the system fails with a Bus error when the SDK attempts to load the runtime firmware into the Metis via DMA:

[libaxldev_linux.c:1515] Device communication timed out: device did not respond within 1 seconds.
Failed to load firmware: /opt/axelera/device-1.6.0-1/omega/bin/start_axelera_runtime.elf
Bus error

With the default MSI configuration (32 MSI), the system produces a silent hard reset between 25-50 seconds into the deploy process instead. The relevant dmesg pattern before the reset is:

axl 0000:01:00.0: vmsi configured
axl 0000:01:00.0: IRQ MSI timeout (12 1)
axl 0000:01:00.0: vmsi configured

Important observations about the reset behaviour:

kernel.panic=10 has no effect — it is a hardware-level reset, not a kernel crash
No AER errors are recorded in /sys/bus/pci/devices/0000:01:00.0/aer_dev_fatal or aer_rootport_total_err_fatal
CPU load at time of reset is ~20%, temperature is ~40°C — ruling out thermal or CPU overload causes
With additional external cooling keeping temperatures below 40°C, the reset takes longer (50 seconds vs 25 seconds without extra cooling). This correlates with CPU thermal throttling: without extra cooling the CPU runs at higher frequencies causing more aggressive DMA, which triggers the SMMU fault faster
Limiting CPU frequency to 1008MHz allows the deploy to complete (though it takes ~28 minutes), but the Bus error still occurs when loading the runtime firmware afterwards
The problem persists regardless of the number of cores configured: both AXELERA_CONFIGURE_BOARD=,30 (4 cores) and AXELERA_CONFIGURE_BOARD=,10 (1 core) produce the same Bus error
The reset is completely silent — no kernel messages whatsoever before the system goes down

The root cause identified is the ARM SMMU-v3 (fc900000) intercepting MSI interrupts and DMA from the Metis:

arm-smmu-v3 fc900000.iommu: event: F_TRANSLATION client: 0000:01:00.0 sid: 0x100 ssid: 0x0 iova: 0x30 ipa: 0x0
arm-smmu-v3 fc900000.iommu: unpriv data write s1 "Input address caused fault"

The SMMU-v3 is initialized by TF-A/BL31 firmware before the kernel boots. Even with status="disabled" in the Device Tree for the iommu@fc900000 node, the kernel finds it already active and continues using it. Kernel parameters iommu=off and iommu.passthrough=1 have no effect.

THINGS ALREADY TRIED

iommu.passthrough=1, iommu=off, pci=noaer kernel parameters — no effect
status="disabled" on iommu@fc900000 in Device Tree — SMMU still active (TF-A initializes it before kernel)
Removing iommu-map from pcie@fe150000 DT node — causes C_BAD_STREAMID errors, same reset
modprobe metis single_msi=1 — changes error from F_TRANSLATION to C_BAD_STREAMID + IRQ MSI timeout + Bus error
modprobe metis single_msi=1 dma_poll=1 — no improvement
Changing msi-map to use ITS0 (0x89) instead of ITS1 (0x132) — same result
pcie_acs_override=downstream,multifunction — not compiled in this kernel
Disabling SMMU via sysfs bypass — rejected (EINVAL, group shares Root Port)
Limiting CPU to 1.8gHz — deploy completes but Bus error persists when loading runtime firmware
Using AXELERA_CONFIGURE_BOARD=,10 (single core) — same Bus error, problem is not related to number of cores
Monitoring AER counters in real time — no errors recorded before reset

KEY OBSERVATION This platform uses kernel 6.18 (mainline) with CONFIG_IOMMU_DEFAULT_TRANSLATED. The ARM SMMU-v3 cannot be disabled at kernel level because TF-A initializes it before the kernel. The metis driver 1.4.16 does not appear to support operation under an active ARM SMMU-v3 with translated DMA in mainline kernels.

The Axelera documentation references the Orange Pi 5 Plus (also RK3588) as a supported platform. That board typically runs a BSP kernel (5.10 or 6.1) where the SMMU is not active for PCIe. Could you confirm whether the metis driver supports ARM SMMU-v3 with mainline kernels, and if so, what configuration is required?

QUESTIONS

Does metis driver 1.4.16 support operation with ARM SMMU-v3 active (mainline kernel, translated DMA mode)?
Is there a known workaround for RK3588 platforms with mainline kernel 6.x?
Is a driver update planned that adds proper SMMU-v3 support?
Can you share the exact kernel configuration used for the Orange Pi 5 Plus reference setup?
Is there a way to configure the DMA operations in the SDK to work within the SMMU constraints?

Thank you for your time. I am happy to provide additional logs, dmesg output, or test any patches you may have.

Best regards, Miguel

Spanner
Axelera Team
Forum|Forum|1 month ago
April 29, 2026

Hi @malbero!

Cracking diagnostic write-up, thanks for the detail. 😃

I wonder if trying a Rockchip BSP kernel on your Rock 5B would shos us whether the firmware-load step gets through? legacy-rk35xx (5.10) or vendor-rk35xx (6.1) image on the Armbian Rock 5B page, maybe. Worth trying legacy-rk35xx first since it looks like that lines up most closely with the kernel base used on the validated RK3588 reference platforms.

While you're at it, a clean install of Voyager SDK v1.6 wouldn’t hurt. If the BSP kernel gets you past the bus error, we've localised the problem to the mainline-kernel + active SMMU path, which is useful context to now. If you still hit the same fault on the BSP kernel, post the new dmesg and we'll look at it again from there.

What do you think?

malbero
Author
Cadet
Forum|Forum|27 days ago
May 14, 2026

Update - May 2026: Significant progress with custom kernel

Hi @Spanner,

Following your suggestion to investigate the mainline kernel + SMMU issue, I have made significant progress by compiling a custom kernel. Here is a full update:

Why we did not use vendor/legacy BSP kernels:

You suggested trying legacy-rk35xx (5.10) or vendor-rk35xx (6.1) Armbian images. We attempted this but hit a known incompatibility: the Armbian vendor/legacy images for Rock 5B do not boot on hardware revision v1.43 (2023-06-14). The vendor bootloader stack (SPL/ddr.bin/BL31) in those images is not compatible with this board revision. The system powers on but hangs before U-Boot with no HDMI output and no serial output. Only the current (mainline) image boots correctly on this board.

Installing the vendor kernel on top of the current image (sudo apt install linux-image-vendor-rk35xx) is theoretically possible but the resulting system would still use the mainline U-Boot and bootloader chain, which may not correctly initialize the PCIe/SMMU for the vendor kernel. We did not pursue this path further as the custom mainline kernel approach was already showing progress.

What has been solved:

I compiled kernel 6.18.25 with CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y using the Armbian build system. With this kernel:

IOMMU is now in Passthrough mode: iommu: Default domain type: Passthrough (set via kernel command line)
ARM SMMU-v3 no longer blocks PCIe DMA: arm-smmu-v3 fc900000.iommu: msi_domain absent - falling back to wired irqs
No more F_TRANSLATION errors in dmesg
No more IRQ MSI timeout errors
No more silent hard resets
/dev/metis0 is created correctly
axdevice --refresh -v works perfectly and loads firmware v1.6.0 successfully

Additionally I had to blacklist hantro_vpu and rockchip_vdec modules which were causing a kernel deadlock (queued_spin_lock_slowpath in vsi_iommu_resume) unrelated to Metis but blocking the system.

Remaining problem - Bus error in configure_device:

After all the above fixes, there is still a Bus error when the SDK attempts to configure the device via DMA. The exact stacktrace is:

Fatal Python error: Bus error
Current thread (most recent call first):
  File "axelera/runtime/objects.py", line 352 in configure_device
  File "axelera/app/device_manager.py", line 148 in _configure_boards
  File "axelera/app/device_manager.py", line 156 in configure_boards_and_tracers
  File "axelera/app/pipe/manager.py", line 442 in __init__

The bus error occurs inside axr.configure_device() → libaxruntime.so → dmabuf memory mapping. The runtime uses dmabuf (mem_alloc_shared_dmabuf, import_dmabuf) to access Metis memory, and this mmap fails silently.

dmesg at the time of Bus error is completely clean:

axl 0000:01:00.0: vmsi configured
axl 0000:01:00.0: vmsi configured
[silence - bus error occurs with no kernel messages]

Current system state:

Kernel: 6.18.25-current-rockchip64 with CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y
IOMMU: Passthrough
PCIe link: 8GT/s x4 (Gen3, correct)
Region 0: Memory at 0x902010000 (4K)
Region 2: Memory at 0x900000000 (32MB, marked [virtual] by kernel)
axdevice --refresh: firmware v1.6.0 loads correctly, device reports 1GiB m2, 4 cores at 800MHz
Driver: metis 1.4.16 (with PCI_IRQ_INTX patch to fix modprobe hang on kernel 6.18)

Question:

The Region 2 (32MB, 0x900000000) is marked as [virtual] by lspci. Is this expected? Could this be causing the dmabuf mmap to fail when libaxruntime.so tries to map Metis device memory into userspace?

Is there a known configuration for the dmabuf/DMA memory access in libaxruntime.so that would work with kernel 6.18 mainline on RK3588?

Thank you, Miguel

Spanner
Axelera Team
Forum|Forum|26 days ago
May 15, 2026

Wow, awesome work @malbero ! You’ve done some serious leg work here and made great progress!

I think this is along the same lines as the issue you’re seeing. Or at least, it’s worth checking out:

Maybe your custom build is missing the dmabuf options that libaxruntime needs? Looks like the following are a requirement:

CONFIG_DMABUF_HEAPS=y
CONFIG_DMABUF_HEAPS_SYSTEM=y
CONFIG_DMABUF_HEAPS_CMA=y
CONFIG_SYNC_FILE=y
CONFIG_SW_SYNC=y

malbero
Author
Cadet
Forum|Forum|23 days ago
May 18, 2026

Hi @Spanner,

Following your suggestion about dmabuf options, I have recompiled the kernel with all required options enabled. Here is the full update:

Kernel changes applied:

CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y — SMMU no longer blocks PCIe DMA
CONFIG_SW_SYNC=y — added as per your suggestion
CONFIG_DMABUF_HEAPS=y
CONFIG_DMABUF_HEAPS_SYSTEM=y
CONFIG_DMABUF_HEAPS_CMA=y
CONFIG_SYNC_FILE=y

Verified after boot:

$ ls /dev/dma_heap/
default_cma_region  reserved  system

Issue 1 — BUS_ADRALN alignment fault:

After adding all the dmabuf options, the bus error still occurred but strace revealed the exact cause:

mmap(NULL, 33554432, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xffff59a50000
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x55, 0xb, 0x8), 0xffffdffb10f0) = 0
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0xffff59a500cc} ---

Key observations:

The mmap of the Metis device memory (32MB, Region 2 at 0x900000000) succeeds
The SIGBUS is BUS_ADRALN (alignment error), not BUS_ADRERR (invalid address)
The fault occurs at offset 0xcc (204 bytes) into the mapped region
All preceding ioctls return 0 (success)

Root cause: offset 0xcc maps exactly to DMA_READ_DONE_IMWR_LOW_OFF defined in axl-aipu-pcie-edma.h:

DMA_READ_DONE_IMWR_LOW_OFF = 0xcc,
    u64 reg; /* 0x00cc..0x00d0 */
    u32 lsb; /* 0x00cc */

0xcc = 204, 204 % 8 = 4 → not 8-byte aligned. libaxruntime.so attempts a u64 access at this offset. On BSP kernels (5.10/6.1) unaligned MMIO accesses are handled silently. On mainline kernel 6.18 strict alignment is enforced for MMIO regions on ARM64, causing SIGBUS BUS_ADRALN.

Workaround we applied for Issue 1:

Since we cannot modify libaxruntime.so, we patched the metis driver to map Region 2 with pgprot_writecombine instead of pgprot_noncached in axl-aipu-core.c:

// line 352 - changed from:
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
// to:
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);

This resolved the BUS_ADRALN fault. The proper fix should be in libaxruntime.so — either replace the single u64 access at offset 0xcc with two separate u32 accesses, or add alignment padding so that DMA_READ_DONE_IMWR_LOW_OFF falls on an 8-byte boundary.

Issue 2 — Clock profile name incompatibility:

After fixing Issue 1, inference works correctly with --pipe torch-aipu and 1 core at 13.5fps. However the default GStreamer pipeline with 4 cores fails:

[libaxldev_linux.c:1539] Device command CMD_SET_CLOCK_AICORE_FREQ returned an error code.
ERROR: Failed to set clock frequency for core 0 to 800

axtrace --slog shows:

[err] app: Invalid clock name aicore0ock_profi

Investigation reveals a naming incompatibility:

libaxruntime.so v1.6.0 sends clock name: clock_profile_core_0
Firmware flver=1.3.2 expects clock name: aicore0
The truncated string aicore0ock_profi in the log confirms the firmware receives clock_profile_core_0 but cannot process it

This is confirmed by inspecting the binaries:

start_axelera_runtime.elf contains: aicore0, aicore1, aicore2, aicore3
libaxruntime.so contains: clock_profile_core_0, clock_profile_core_1, clock_profile_core_2, clock_profile_core_3

Current system state:

Kernel: 6.18.25-current-rockchip64 with CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y
IOMMU: Passthrough
PCIe: Gen3 x4, Region 2 at 0x900000000 (32MB)
/dev/dma_heap/: system, reserved, default_cma_region present
axdevice --refresh: firmware v1.6.0 loads correctly, flver=1.3.2, bcver=1.4, 4 cores at 800MHz
Inference working with --pipe torch-aipu --aipu-cores 1 at 13.5fps

Questions:

Is flver=1.3.2 compatible with runtime v1.6.0 for the GStreamer pipeline?
Is there a firmware update available that supports clock_profile_core_N naming?
Is there a workaround to force the old aicore0 clock naming with the current runtime?
Will the pgprot_writecombine workaround be officially supported or is a fix planned for libaxruntime.so?

Thank you, Miguel

Spanner
Axelera Team
Forum|Forum|22 days ago
May 19, 2026

Wow, nice work on the diagnostics, @malbero ! That’s impressively thorough!

I was just talking with @mipallaro, who suggested that the following bits of info would help get to the root of this:

lspci -tv
sudo lspci -s [RC ID] -vv (the Root Complex, you'll spot the ID from the tree in step 1)
sudo lspci -s [Metis ID] -vv (your 01:00.0 from earlier)
With your Voyager venv activated: axcmd --fwver
The Device Tree patch you applied to expand the PCIe memory window

If you’ve got those, we'll have a much clearer picture to work from. 👍

malbero
Author
Cadet
Forum|Forum|21 days ago
May 20, 2026

Hi again @Spanner,

here you can find info about @mipallaro suggestions

root@rock-5b:/home/miguelon# lspci -tv
-[0000:00]---00.0-[01-ff]----00.0  Axelera AI Metis AIPU (rev 02)
-[0002:20]---00.0-[21]--
-[0004:40]---00.0-[41]----00.0  Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller

root@rock-5b:/home/miguelon# sudo lspci -s 0000:00:00.0 -vv
0000:00:00.0 PCI bridge: Rockchip Electronics Co., Ltd RK3588 (rev 01) (prog-if 00 [Normal decode])
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 125
    IOMMU group: 14
    Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
    I/O behind bridge: [disabled] [16-bit]
    Memory behind bridge: 00000000-020fffff [size=33M] [32-bit]
    Prefetchable memory behind bridge: [disabled] [64-bit]
    Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
    Expansion ROM at f0200000 [virtual] [disabled] [size=64K]
    BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
        PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI: Enable+ Count=16/32 Maskable+ 64bit+
        Address: 00000000fe670040  Data: 0000
        Masking: fffffcff  Pending: 00000000
    Capabilities: [70] Express (v2) Root Port (Slot-), MSI 08
        DevCap:    MaxPayload 256 bytes, PhantFunc 0
            ExtTag+ RBE+
        DevCtl:    CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <16us
            ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp+
        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 8GT/s, Width x4
            TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
        RootCap: CRSVisible-
        RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
        RootSta: PME ReqID 0000, PMEStatus- PMEPending-
        DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP+ LTR+
             10BitTagComp- 10BitTagReq- OBFF Via message/WAKE#, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd+
             AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled, ARIFwd-
             AtomicOpsCtl: ReqEn- EgressBlck-
        LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
             EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [b0] MSI-X: Enable- Count=128 Masked-
        Vector table: BAR=4 offset=00020000
        PBA: BAR=4 offset=00028000
    Capabilities: [100 v2] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
        RootCmd: CERptEn+ NFERptEn+ FERptEn+
        RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
             FirstFatal- NonFatalMsg- FatalMsg- IntMsg 9
        ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
    Capabilities: [148 v1] Secondary PCI Express
        LnkCtl3: LnkEquIntrruptEn- PerformEqu-
        LaneErrStat: 0
    Capabilities: [190 v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- L1_PM_Substates-
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
        L1SubCtl2:
    Capabilities: [1d0 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
    Capabilities: [2d0 v1] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
    Kernel driver in use: pcieport

root@rock-5b:/home/miguelon# sudo lspci -s 0000:01:00.0 -vv
0000:01:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
    Subsystem: Axelera AI Metis AIPU (rev 02)
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 139
    IOMMU group: 14
    Region 0: Memory at 902010000 (64-bit, non-prefetchable) [size=4K]
    Region 2: Memory at 900000000 (32-bit, non-prefetchable) [virtual] [size=32M]
    Expansion ROM at 902000000 [virtual] [disabled] [size=64K]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI: Enable+ Count=32/32 Maskable+ 64bit+
        Address: 00000000fe670040  Data: 0000
        Masking: 00000000  Pending: 00000000
    Capabilities: [70] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0W
        DevCtl:    CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
            RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <16us
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 8GT/s, Width x4
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
             10BitTagComp- 10BitTagReq- OBFF Via message, ExtFmt- EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS- TPHComp- ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
             AtomicOpsCtl: ReqEn-
        LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
             EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [100 v2] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq+ ACSViol-
        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:    RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
        AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Capabilities: [148 v1] Secondary PCI Express
        LnkCtl3: LnkEquIntrruptEn- PerformEqu-
        LaneErrStat: 0
    Capabilities: [168 v1] Latency Tolerance Reporting
        Max snoop latency: 0ns
        Max no snoop latency: 0ns
    Capabilities: [170 v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
              PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
               T_CommonMode=0us LTR1.2_Threshold=0ns
        L1SubCtl2: T_PwrOn=10us
    Capabilities: [180 v1] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
    Kernel driver in use: axl
    Kernel modules: metis

(venv) root@rock-5b:/home/miguelon/metis-driver/voyager-sdk# axcmd --fwver
Firmware version: v1.3.2+bl1-stage0

    pcie@fe150000 {
        compatible = "rockchip,rk3588-pcie\0rockchip,rk3568-pcie";
        #address-cells = <0x03>;
        #size-cells = <0x02>;
        bus-range = <0x00 0x0f>;
        clocks = <0x21 0x140 0x21 0x145 0x21 0x13b 0x21 0x14a 0x21 0x14f 0x21 0x174>;
        clock-names = "aclk_mst\0aclk_slv\0aclk_dbi\0pclk\0aux\0pipe";
        device_type = "pci";
        interrupts = <0x00 0x107 0x04 0x00 0x00 0x106 0x04 0x00 0x00 0x105 0x04 0x00 0x00 0x104 0x04 0x00 0x00 0x103 0x04 0x00>;
        interrupt-names = "sys\0pmc\0msg\0legacy\0err";
        #interrupt-cells = <0x01>;
        interrupt-map-mask = <0x00 0x00 0x00 0x07>;
        interrupt-map = <0x00 0x00 0x00 0x01 0x131 0x00 0x00 0x00 0x00 0x02 0x131 0x01 0x00 0x00 0x00 0x03 0x131 0x02 0x00 0x00 0x00 0x04 0x131 0x03>;
        linux,pci-domain = <0x00>;
        max-link-speed = <0x03>;
        msi-map = <0x00 0x132 0x00 0x1000>;
        edp1_out = "/edp@fded0000/ports/port@1";
        hdmi_receiver = "/hdmi_receiver@fdee0000";
        pcie3x4 = "/pcie@fe150000";
        pcie3x4_intc = "/pcie@fe150000/legacy-interrupt-controller";
        pcie3x4_ep = "/pcie-ep@fe150000";
        pcie3x2 = "/pcie@fe160000";
        pcie3x2_intc = "/pcie@fe160000/legacy-interrupt-controller";
        pcie2x1l0 = "/pcie@fe170000";
        pcie2x1l0_intc = "/pcie@fe170000/legacy-interrupt-controller";
        gmac0 = "/ethernet@fe1b0000";
        mdio0 = "/ethernet@fe1b0000/mdio";
        gmac0_stmmac_axi_setup = "/ethernet@fe1b0000/stmmac-axi-config";
        gmac0_mtl_rx_setup = "/ethernet@fe1b0000/rx-queues-config";
        gmac0_mtl_tx_setup = "/ethernet@fe1b0000/tx-queues-config";
        sata1 = "/sata@fe220000";
        hdptxphy1 = "/phy@fed70000";
        usbdp_phy1 = "/phy@fed90000";
        combphy1_ps = "/phy@fee10000";
        pcie30phy = "/phy@fee80000";

Regards, Miguel.

malbero
Author
Cadet
Forum|Forum|16 days ago
May 25, 2026

Hello again @Spanner,

Further update with new findings after our last message.

BUS_ADRALN fix — confirmed working:

We identified and fixed the BUS_ADRALN issue ourselves by patching the metis driver. The fix was changing pgprot_noncached to pgprot_writecombine in sysctl_mmap() in axl-aipu-core.c:

// Before (causes BUS_ADRALN on ARM64 mainline kernel 6.18)
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

// After (works correctly)
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);

With this fix, inference works correctly with --pipe torch-aipu and 1 core at 13.5fps. We consider this a workaround until libaxruntime.so is fixed to use aligned u32 accesses instead of unaligned u64 at offset 0xcc.

New issue — Clock profile name incompatibility:

With the BUS_ADRALN fix in place, the GStreamer pipeline with 4 cores fails with:

[libaxldev_linux.c:1539] Device command CMD_SET_CLOCK_AICORE_FREQ returned an error code.
ERROR: Failed to set clock frequency for core 0 to 800

axtrace --slog shows:

[err] app: Invalid clock name aicore0ock_profi

Full firmware update performed:

We performed a full firmware update using axdevice interactive_flash_update:

flver: 1.3.2 → 1.6.0 ✓
bcver: 1.4 → 7.4 ✓

The direct jump from bcver=1.4 to 7.4 worked correctly on our Metis M.2 board_type=ortles. The script flagged a warning but also stated the update was possible for PCIe/M.2 cards, which proved correct.

After the update axdevice --refresh confirms:

Device 0: metis-0:1:0 1GiB m2 flver=1.6.0 bcver=7.4 clock=800MHz(0-3:800MHz) mvm=0-3:100%

However the clock error persists even with flver=1.6.0 and bcver=7.4.

Critical finding — Internal SDK v1.6.0 incompatibility:

After investigating the binaries, we found a mismatch inside SDK v1.6.0 itself:

libaxruntime.so v1.6.0 contains:

handle_clock_profile_core() → sends "clock_profile_core_0"
clock_profile_core_0
clock_profile_core_1
clock_profile_core_2
clock_profile_core_3

start_axelera_runtime.elf v1.6.0 contains:

aicore0, aicore1, aicore2, aicore3  ← old naming
CMD_SET_CLOCK_PROFILE               ← profile-based, not per-core
NO handler for "clock_profile_core_N"

The runtime library and the firmware binary are out of sync within the same SDK release. libaxruntime.so sends clock_profile_core_0 but start_axelera_runtime.elf only understands aicore0, causing the truncated string aicore0ock_profi visible in axtrace.

Repository investigation — confirmed release bug:

We checked the Axelera apt repository and confirmed:

SDK v1.6.0 is the latest available release
Our installed packages (axelera-device-1.6.0 and axelera-runtime-1.6.0) are exactly the same versions as in the repository
No newer version is available

This means the incompatibility between libaxruntime.so sending clock_profile_core_0 and start_axelera_runtime.elf only understanding aicore0 is a bug in the final v1.6.0 release, not a pre-release issue.

Current system state:

Kernel: 6.18.25-current-rockchip64 with CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y
IOMMU: Passthrough
PCIe: Gen3 x4
flver=1.6.0, bcver=7.4
/dev/dma_heap/: system, reserved, default_cma_region present
Inference works with --pipe torch-aipu and 1 core at 13.5fps
Inference fails with GStreamer pipeline (4 cores) due to clock naming mismatch

Questions:

Is there a hotfix or patch available for the clock_profile_core_N vs aicore0 naming mismatch in SDK v1.6.0? Is a v1.6.1 planned?
Can you confirm that pgprot_writecombine is a safe workaround for the BUS_ADRALN issue on ARM64 mainline kernels, or is a fix planned for libaxruntime.so?
Is there any environment variable or configuration flag to force the old aicore0 clock naming in libaxruntime.so as a temporary workaround?

Thank you, Miguel

Spanner
Axelera Team
Forum|Forum|14 days ago
May 27, 2026

Hi Miguel, thanks for the detailed update!

One point to clarify; the release is validated on our supported ARM64 RK3588 AISBC platform, but your Rock 5B + Armbian mainline 6.18 + custom DT/kernel + Metis M.2 setup is a slightly different integration path, so we should treat that part as experimental. 👍

So, for the 4-core GStreamer clock issue, maybe we try skipping Voyager’s automatic board reconfiguration, since your device already reports the expected clock/MVM state?

axdevice --refresh -v
axdevice -v
AXELERA_CONFIGURE_BOARD=0 python3 ./inference.py <same-model> <same-input> --pipe gst --aipu-cores 4

If needed, preconfigure the device first with:

axdevice --set-core-clock 800
axdevice --set-mvm-limitation 100
axdevice --refresh -v

Then rerun the same command:

AXELERA_CONFIGURE_BOARD=0

This skips Voyager’s automatic per-core board configuration path, which could be where the clock_profile_core_N configuration is failing in your setup.

Let’s see if that gets us a step forward! Let me know dude!

Sign up

Log in, or create an Axelera AI account

Login to the community

Log in, or create an Axelera AI account

Scanning file for viruses.

This file cannot be downloaded