Hi,
Bit of a weird issue with the M.2 Evaluation System (SBC model AIB-MR1B-A1), and I’m worried it could be a hardware fault. I somehow got to a state where I would see
AXR_ERROR_CONNECTION_ERROR: No target device found in lspci output".
lspci didn’t show the accelerator card, but instead listed “Non-VGA unclassified device”:
00:00.0 PCI bridge: Rockchip Electronics Co., Ltd RK3588 (rev 01)
01:00.0 Non-VGA unclassified device: Synopsys, Inc. DWC_usb3 / PCIe bridge
dmesg showed:
Tue Sep 2 19:51:14 2025] rk-pcie fe170000.pcie: PCIe Linking... LTSSM is 0x3
xTue Sep 2 19:51:16 2025] rk-pcie fe170000.pcie: PCIe Link Fail
iTue Sep 2 19:51:16 2025] rk-pcie fe170000.pcie: failed to initialize host
After trying a few things, I resorted to re-imaging the SBC. Then, on the first log-in from adb shell, I still saw Non-VGA unclassified device, but after a reboot, SSH’ing into the Eval System, I saw it as:
00:00.0 PCI bridge: Rockchip Electronics Co., Ltd Device 3588 (rev 01)
01:00.0 Processing accelerators: Device 1f9d:1100
And then after installing the Voyager SDK:
00:00.0 PCI bridge: Rockchip Electronics Co., Ltd RK3588 (rev 01)
01:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
Incidentally I always shut down using the sudo poweroff command, i.e. never an uncontrolled shutdown. All fine, but again today when I came to use it, I saw Non-VGA unclassified device. I rebooted, and lspci showed nothing at all.
root@aetina:~# lspci
root@aetina:~# sudo sh -c 'echo 1 > /sys/bus/pci/rescan'
root@aetina:~# lspci -nn
root@aetina:~# lspci
root@aetina:~# dmesg | grep -iE 'rk-pcie|pcie link|nvme'
n 1.810292] rk-pcie fe150000.pcie: invalid prsnt-gpios property in node
1.810321] rk-pcie fe170000.pcie: invalid prsnt-gpios property in node
1.815781] rk-pcie fe170000.pcie: missing legacy IRQ resource
o 1.815800] rk-pcie fe170000.pcie: IRQ msi not found
f 1.815811] rk-pcie fe170000.pcie: use outband MSI support
p 1.815819] rk-pcie fe170000.pcie: Missing *config* reg space
s 1.815832] rk-pcie fe170000.pcie: host bridge /pcie@fe170000 ranges:
n 1.815854] rk-pcie fe170000.pcie: err 0x00f2000000..0x00f20fffff -> 0x00f2000000
0 1.815870] rk-pcie fe170000.pcie: IO 0x00f2100000..0x00f21fffff -> 0x00f2100000
0 1.815886] rk-pcie fe170000.pcie: MEM 0x00f2200000..0x00f2ffffff -> 0x00f2200000
0 1.815898] rk-pcie fe170000.pcie: MEM 0x0980000000..0x09bfffffff -> 0x0980000000
0 1.815931] rk-pcie fe170000.pcie: Missing *config* reg space
s 1.815959] rk-pcie fe170000.pcie: invalid resource
o 1.826810] rk-pcie fe150000.pcie: missing legacy IRQ resource
o 1.826836] rk-pcie fe150000.pcie: IRQ msi not found
f 1.826845] rk-pcie fe150000.pcie: use outband MSI support
p 1.826865] rk-pcie fe150000.pcie: host bridge /pcie@fe150000 ranges:
n 1.826905] rk-pcie fe150000.pcie: IO 0x00f0100000..0x00f01fffff -> 0x00f0100000
0 1.826939] rk-pcie fe150000.pcie: MEM 0x0900000000..0x091fffffff -> 0x0040000000
0 1.826957] rk-pcie fe150000.pcie: MEM 0x0920000000..0x093fffffff -> 0x0060000000
0 1.827026] rk-pcie fe150000.pcie: invalid resource
o 2.021962] rk-pcie fe170000.pcie: PCIe Linking... LTSSM is 0x3
s 2.031961] rk-pcie fe150000.pcie: PCIe Linking... LTSSM is 0x0
s 2.047510] rk-pcie fe170000.pcie: PCIe Linking... LTSSM is 0x3
s 2.057516] rk-pcie fe150000.pcie: PCIe Linking... LTSSM is 0x0
s <truncated similar lines>
s 27.187539] rk-pcie fe150000.pcie: PCIe Linking... LTSSM is 0x1
s 27.207577] rk-pcie fe170000.pcie: PCIe Linking... LTSSM is 0x3
s 27.214288] rk-pcie fe150000.pcie: PCIe Linking... LTSSM is 0x0
s 29.164223] rk-pcie fe170000.pcie: PCIe Link Fail
29.164294] rk-pcie fe170000.pcie: failed to initialize host
29.170893] rk-pcie fe150000.pcie: PCIe Link Fail
29.170958] rk-pcie fe150000.pcie: failed to initialize host
After another reboot, I’m back to the Non-VGA unclassified device.
I can try re-imaging the system again, but I’m concerned the same thing will happen again, unless I try to figure out what could have gone wrong.
I also tried retrieving the live device-tree using sudo dtc -I dtb -O dts -o live.dts /sys/firmware/fdt and it is attached.
Has anyone seen a similar issue? Any debugging steps I should take? If it is indeed a SBC hardware issue or likely to be, then I’ll try to purchase another SBC, or Eval System, but would like to be fairly sure that it is indeed hardware-related before I try that option.
Incidentally I have tried to re-seat the accelerator card, but it made no difference. I was fairly sure that couldn’t have been an issue anyway, since the board is protected in a cover with just fan holes (no dust or knocks possible to unseat or affect the connections), but figured it was worth a try.
Many thanks!