Skip to main content

Environment

  • SoC/Board: Rockchip RK3588, Radxa ROCK 5B Plus    . It has 2 M.2 slots!

  • OS/Kernel: Debian bookworm, 6.1.43-15-rk2312 (aarch64)

  • RAM: 16GB (also tested with mem=4G kernel parameter; total reported ~3.8Gi)

  • Device: Axelera Metis PCIe accelerator, PCI ID 1f9d:1100 (shows as “Axelera AI Metis AIPU”)

  • PCIe topology: Endpoint at 0001:11:00.0 behind upstream bridge 0001:10:00.0 (Gen3 x2)

  • Drivers: axl/metis-dkms 1.2.3; “Kernel driver in use: axl” shown; /dev/metis-1:11:0 device node present when driver bound

  • Container: Voyager SDK tested in Ubuntu 22.04 container (privileged); device access tests moved back to host after hangs observed inside container

Device-tree overlay (added to expand ranges window to 48MB)

Reason: Default 32-bit non-prefetchable MEM window under fe160000.pcie was ~14MB; Metis BAR2 asked for 32MB (plus headroom), so a 48MB window was used in different place that seems free. Overlay below is applied via extlinux overlays:

text

/dts-v1/;

/plugin/;

/ { compatible = "rockchip,rk3588";

fragment@0 {

target-path = "/pcie@fe160000"; __overlay__ { #address-cells = <3>; #size-cells = <2>; ranges = <0x81000000 0x0 0xf1100000 0x0 0xf1100000 0x0 0x00100000>, <0x82000000 0x0 0xf5000000 0x0 0xf5000000 0x0 0x03000000>, <0xc3000000 0x9 0x40000000 0x9 0x40000000 0x0 0x40000000>;

};

};

};

Boot log confirms the active window on fe160000.pcie:

  • “MEM 0x00f5000000..0x00f7ffffff -> 0x00f5000000”

BARs and bridge window as assigned by kernel

  • lspci -vv -s 0001:11:00.0 shows:

    • Region 0: Memory at f5010000 (64-bit, non-prefetchable) [size=4K]

    • Region 2: Memory at f6000000 (32-bit, non-prefetchable) [size=32M]

    • Expansion ROM at f5000000 [disabled] [size=64K]

  • dmesg shows for 0001:10:00.0:

    • “Bridge window: [mem 0xf5000000-0xf7ffffff]”

    • “Decoded memory behind bridge: f5000000-f7ffffff”

    • “Memory behind bridge is sufficient. Skipping reset.”

Link status and capability excerpts

  • lspci -vv -s 0001:11:00.0 (stable across reboots):

    • LnkSta: Speed 8GT/s, Width x2 (downgraded), DLActive-

    • LnkSta2: EqualizationComplete+, Phase1+, Phase2+, Phase3+

    • DevCtl: MaxPayload 128, MaxReadReq typically 512 (also tested set to 256)

    • MSI: Enable+ Count=32/32 Maskable+ 64bit+

    • AER: present; “All AER errors masked” printed by axl during probe

  • dmesg (earlier runs):

    • “pcieport 0001:10:00.0: Data Link Layer Link Active not set in 1000 msec”

Minimal reproducible read failure (BAR2)

  • Command:

    • set -o pipefail; sudo timeout 5 dd if=/sys/bus/pci/devices/0001:11:00.0/resource2 bs=4 count=1 status=none of=/dev/null; echo "rc=$?"

  • Result (consistent):

    • dd: error reading '/sys/bus/pci/devices/0001:11:00.0/resource2': Input/output error

    • rc=1

  • Pipeline check (to avoid masked exit codes):

    • sudo timeout 5 dd if=.../resource2 bs=4 count=1 status=none | hexdump -C; echo "dd_rc=${PIPESTATUS[0]} hexdump_rc=${PIPESTATUS[1]}"

    • Output: dd_rc=1 hexdump_rc=0

Attempts performed and outcomes

  • Disable ASPM:

    • Kernel params: pcie_aspm=off pcie_port_pm=off pcie_hp=nomsi in extlinux; verified ASPM disabled in LnkCtl.

    • Runtime writes: setpci cleared ASPM bits on both upstream and endpoint.

    • Outcome: lnk speed remains 8GT/s x2, DLActive remains negative, BAR2 reads still EIO.

  • Force D0 and tune payload/read request:

    • setpci -s 0001:11:00.0 CAP_PM+4.w=0000

    • setpci -s 0001:11:00.0 CAP_EXP+0x08.w=0x0028 (MPS=128, MRRS=256)

    • Outcome: No change in BAR2 behavior.

  • Hot reset and rescan:

    • Driver unbind: echo 0001:11:00.0 > /sys/bus/pci/devices/.../driver/unbind

    • SBR via upstream port Bridge Control bit 6: assert, wait, deassert; rescan.

    • At one point, lspci showed:

      • “!!! Unknown header type 7f” for 0001:11:00.0 after SBR.

    • Recovery via bus rescan or reboot restored enumeration; BAR2 still EIO, DLActive negative.

  • Force Gen1 and retrain:

    • Endpoint Link Control 2: setpci -s 0001:11:00.0 0xA0.w=0001 (Target Link Speed 2.5GT/s)

    • Retrain: setpci -s 0001:11:00.0 0x80.w=0020

    • Outcome: LnkCtl2 shows Target Link Speed: 2.5GT/s, but LnkSta remains at Speed 8GT/s x2; BAR2 still EIO.

  • Memory limit to <4GB (test):

    • Kernel param added: mem=4G; post-boot free -h reports ~3.8Gi total.

    • Outcome: BAR2 still EIO; DLActive remains negative.

Additional logs around probe

  • dmesg during probe:

    • “pci 0001:11:00.0: Found target device: TRITON_OMEGA_DEVICE_ID”

    • “axl 0001:11:00.0: enabling device (0000 -> 0002)”

    • “axl 0001:11:00.0: MSI registered 32 (32)”

    • “axl 0001:11:00.0: Data Link Layer Link Active Reporting capability”

    • “axl 0001:11:00.0: Register directory 0001:11:00.0”

Current status summary

  • Device consistently enumerates with correct VID:DID and BAR assignments under the 48MB bridge window.

  • Link consistently trains to Gen3 x2 and equalization completes, but DLActive remains negative in LnkSta.

  • All reads from BAR2 via sysfs return EIO, including single dword reads; dd exit code is 1.

  • Disabling ASPM, forcing D0, tuning MPS/MRRS, forcing Gen1 and retraining, performing SBR/rescan, and limiting system memory to 4GB did not change the behavior.

Request

  • Please help me to get it working in my setup.

Thank you.

Update: Solution Found - Required OS/Installation Method Change

TL;DR

Original PCIe/driver issues were unresolvable on Debian Bookworm (6.1.43-15-rk2312) with containerized SDK. There is most probably something fucked up in metis-dkms installation for such case. Maybe also something wrong for containerised SDK for such kernel or in general

Working solution:

  1. Flash Ubuntu 22.04 image from https://joshua-riek.github.io/ubuntu-rockchip-download/boards/rock-5b-plus.html
  2. Perform native (non-container) SDK installation directly on host
  3. Fix GStreamer plugin build issue (see separate post below)

What Didn't Work

Despite extensive troubleshooting (ASPM disable, link retraining, SBR, memory limits, etc.), the combination of:

  • Debian Bookworm kernel (6.1.43-15-rk2312)
  • Containerized Voyager SDK (Ubuntu 22.04 container on Debian host)

resulted in persistent BAR2 read failures (EIO) and DLActive negative status that couldn't be resolved through PCIe configuration changes.

What Worked

Switched to joshua-riek's Ubuntu 22.04 image (purpose-built for Rock 5B Plus) and installed Voyager SDK natively on the host (no container).

Device now enumerates correctly, driver binds, and BAR access works.

Additional Issue Encountered

After getting the driver working, hit a different problem: GStreamer plugins (axinplace, etc.) failed to build on ARM64 due to undocumented dependency.

Solution documented in post called: Missing GStreamer Plugins on ARM64/RK3588 - Root Cause & Solution

Short version: Install librga-dev before building:

 

 

bash

sudo apt-get install librga-dev
cd ~/voyager-sdk/operators
rm -rf CMakeCache.txt CMakeFiles/
cmake . && make -j$(nproc)