Dear all,
I am trying to get my hands on the Metis PCIe card on my current working PC, being
- mainboard: Asus Prime X570-P
- CPU: AMD Ryzen 9 3900X
- OS: Ubuntu 24.04.2 LTS, 6.8.0-58-generic x86_64
The first issue I face is that the warm-reset does not seem to work for me, i.e. if I issue a reboot without re-powering the system, the Metis device does not show up in lspci, while after a power-down (with physical PSU switching off) it is there. This is reproducible and based on my experience points to either my mainboard BIOS issuing a wrong reset sequence to the PCIe slot, or the device itself not issuing a full reset sequence on warm-boot. Since I know how to get the card detected, this is only an annoyance - but I saw that also another user of AMD64 system in the forum has similar issues and the problem might go deeper.
Which leads to the second issue I am observing and which I filed already a support request but realized here would be the better place to discuss it.
According to David M. to check whether the card is properly installed on HW and driver level would be to issue a
axdevice --pcie-rescan && axdevice --reload-firmwarecommand within the docker environment, which at my side fails with
(venv) root@zefir-PC:~/voyager-sdk# axdevice --pcie-rescan
0000:05:00.0 : Axelera AI Metis AIPU (rev 02)
WARNING:axelera.runtime:4PCI device count mismatch: lspci=1, triton=0
While this failure happens, kernel says
[ 131.149382] pci_bus 0000:05: busn_res: [bus 05] is released
[ 133.174548] pci 0000:03:02.0: [1022:57a3] type 01 class 0x060400 PCIe Switch Downstream Port
[ 133.174598] pci 0000:03:02.0: PCI bridge to [bus 05]
[ 133.174981] pci 0000:03:02.0: PME# supported from D0 D3hot D3cold
[ 133.175833] pci 0000:05:00.0: [1f9d:1100] type 00 class 0x120000 PCIe Endpoint [ 133.175898] pci 0000:05:00.0: BAR 0 [mem 0xfa010000-0xfa010fff 64bit]
[ 133.175902] pci 0000:05:00.0: BAR 2 [mem 0xf8000000-0xf9ffffff]
[ 133.175911] pci 0000:05:00.0: ROM [mem 0xfa000000-0xfa00ffff pref]
[ 133.176025] pci 0000:05:00.0: supports D1
[ 133.176027] pci 0000:05:00.0: PME# supported from D0 D1 D3hot
[ 133.176281] pci 0000:03:02.0: PCI bridge to [bus 05]
[ 133.176408] pci 0000:03:02.0: bridge window [mem size 0x03000000]: can't assign; no space [ 133.176411] pci 0000:03:02.0: bridge window [mem size 0x03000000]: failed to assign
[ 133.176415] pci 0000:05:00.0: BAR 2 [mem size 0x02000000]: can't assign; no space
[ 133.176417] pci 0000:05:00.0: BAR 2 [mem size 0x02000000]: failed to assign
[ 133.176419] pci 0000:05:00.0: ROM [mem size 0x00010000 pref]: can't assign; no space
[ 133.176421] pci 0000:05:00.0: ROM [mem size 0x00010000 pref]: failed to assign
[ 133.176423] pci 0000:05:00.0: BAR 0 [mem size 0x00001000 64bit]: can't assign; no space
[ 133.176425] pci 0000:05:00.0: BAR 0 [mem size 0x00001000 64bit]: failed to assign
[ 133.176427] pci 0000:03:02.0: PCI bridge to [bus 05]
[ 133.176897] axl 0000:05:00.0: Failed to request resources
Note that I tried all the suggestions proposed in the other thread (amd_iommu=off intel_iommu=off pcie_aspm=off) with no difference. Also the above failure message (4PCI device count mismatch: lspci=1, triton=0) is common to all tools in the SDK trying to make use of the device.
Any suggestions?


