Environment
-
SoC/Board: Rockchip RK3588, Radxa ROCK 5B Plus . It has 2 M.2 slots!
-
OS/Kernel: Debian bookworm, 6.1.43-15-rk2312 (aarch64)
-
RAM: 16GB (also tested with mem=4G kernel parameter; total reported ~3.8Gi)
-
Device: Axelera Metis PCIe accelerator, PCI ID 1f9d:1100 (shows as “Axelera AI Metis AIPU”)
-
PCIe topology: Endpoint at 0001:11:00.0 behind upstream bridge 0001:10:00.0 (Gen3 x2)
-
Drivers: axl/metis-dkms 1.2.3; “Kernel driver in use: axl” shown; /dev/metis-1:11:0 device node present when driver bound
-
Container: Voyager SDK tested in Ubuntu 22.04 container (privileged); device access tests moved back to host after hangs observed inside container
Device-tree overlay (added to expand ranges window to 48MB)
Reason: Default 32-bit non-prefetchable MEM window under fe160000.pcie was ~14MB; Metis BAR2 asked for 32MB (plus headroom), so a 48MB window was used in different place that seems free. Overlay below is applied via extlinux overlays:
text
/dts-v1/;
/plugin/;
/ { compatible = "rockchip,rk3588";
fragment@0 {
target-path = "/pcie@fe160000"; __overlay__ { #address-cells = <3>; #size-cells = <2>; ranges = <0x81000000 0x0 0xf1100000 0x0 0xf1100000 0x0 0x00100000>, <0x82000000 0x0 0xf5000000 0x0 0xf5000000 0x0 0x03000000>, <0xc3000000 0x9 0x40000000 0x9 0x40000000 0x0 0x40000000>;
};
};
};
Boot log confirms the active window on fe160000.pcie:
-
“MEM 0x00f5000000..0x00f7ffffff -> 0x00f5000000”
BARs and bridge window as assigned by kernel
-
lspci -vv -s 0001:11:00.0 shows:
-
Region 0: Memory at f5010000 (64-bit, non-prefetchable) [size=4K]
-
Region 2: Memory at f6000000 (32-bit, non-prefetchable) [size=32M]
-
Expansion ROM at f5000000 [disabled] [size=64K]
-
-
dmesg shows for 0001:10:00.0:
-
“Bridge window: [mem 0xf5000000-0xf7ffffff]”
-
“Decoded memory behind bridge: f5000000-f7ffffff”
-
“Memory behind bridge is sufficient. Skipping reset.”
-
Link status and capability excerpts
-
lspci -vv -s 0001:11:00.0 (stable across reboots):
-
LnkSta: Speed 8GT/s, Width x2 (downgraded), DLActive-
-
LnkSta2: EqualizationComplete+, Phase1+, Phase2+, Phase3+
-
DevCtl: MaxPayload 128, MaxReadReq typically 512 (also tested set to 256)
-
MSI: Enable+ Count=32/32 Maskable+ 64bit+
-
AER: present; “All AER errors masked” printed by axl during probe
-
-
dmesg (earlier runs):
-
“pcieport 0001:10:00.0: Data Link Layer Link Active not set in 1000 msec”
-
Minimal reproducible read failure (BAR2)
-
Command:
-
set -o pipefail; sudo timeout 5 dd if=/sys/bus/pci/devices/0001:11:00.0/resource2 bs=4 count=1 status=none of=/dev/null; echo "rc=$?"
-
-
Result (consistent):
-
dd: error reading '/sys/bus/pci/devices/0001:11:00.0/resource2': Input/output error
-
rc=1
-
-
Pipeline check (to avoid masked exit codes):
-
sudo timeout 5 dd if=.../resource2 bs=4 count=1 status=none | hexdump -C; echo "dd_rc=${PIPESTATUS[0]} hexdump_rc=${PIPESTATUS[1]}"
-
Output:
dd_rc=1 hexdump_rc=0
-
Attempts performed and outcomes
-
Disable ASPM:
-
Kernel params:
pcie_aspm=off pcie_port_pm=off pcie_hp=nomsi
in extlinux; verified ASPM disabled in LnkCtl. -
Runtime writes: setpci cleared ASPM bits on both upstream and endpoint.
-
Outcome: lnk speed remains 8GT/s x2, DLActive remains negative, BAR2 reads still EIO.
-
-
Force D0 and tune payload/read request:
-
setpci -s 0001:11:00.0 CAP_PM+4.w=0000
-
setpci -s 0001:11:00.0 CAP_EXP+0x08.w=0x0028
(MPS=128, MRRS=256) -
Outcome: No change in BAR2 behavior.
-
-
Hot reset and rescan:
-
Driver unbind:
echo 0001:11:00.0 > /sys/bus/pci/devices/.../driver/unbind
-
SBR via upstream port Bridge Control bit 6: assert, wait, deassert; rescan.
-
At one point, lspci showed:
-
“!!! Unknown header type 7f” for 0001:11:00.0 after SBR.
-
-
Recovery via bus rescan or reboot restored enumeration; BAR2 still EIO, DLActive negative.
-
-
Force Gen1 and retrain:
-
Endpoint Link Control 2:
setpci -s 0001:11:00.0 0xA0.w=0001
(Target Link Speed 2.5GT/s) -
Retrain:
setpci -s 0001:11:00.0 0x80.w=0020
-
Outcome: LnkCtl2 shows Target Link Speed: 2.5GT/s, but LnkSta remains at Speed 8GT/s x2; BAR2 still EIO.
-
-
Memory limit to <4GB (test):
-
Kernel param added:
mem=4G
; post-bootfree -h
reports ~3.8Gi total. -
Outcome: BAR2 still EIO; DLActive remains negative.
-
Additional logs around probe
-
dmesg during probe:
-
“pci 0001:11:00.0: Found target device: TRITON_OMEGA_DEVICE_ID”
-
“axl 0001:11:00.0: enabling device (0000 -> 0002)”
-
“axl 0001:11:00.0: MSI registered 32 (32)”
-
“axl 0001:11:00.0: Data Link Layer Link Active Reporting capability”
-
“axl 0001:11:00.0: Register directory 0001:11:00.0”
-
Current status summary
-
Device consistently enumerates with correct VID:DID and BAR assignments under the 48MB bridge window.
-
Link consistently trains to Gen3 x2 and equalization completes, but DLActive remains negative in LnkSta.
-
All reads from BAR2 via sysfs return EIO, including single dword reads; dd exit code is 1.
-
Disabling ASPM, forcing D0, tuning MPS/MRRS, forcing Gen1 and retraining, performing SBR/rescan, and limiting system memory to 4GB did not change the behavior.
Request
-
Please help me to get it working in my setup.
Thank you.