Skip to main content

Hey everyone,

 

I’m having trouble getting my Axelera Metis PCIe AI Accelerator to be recognized by `lspci`(and my system in general). I tested the card on two different systems.

 

On my AMD setup, I am using an AMD 5950X with an ASUS B550-F. I tried installing the card in the slot I normally use for my RTX 3080 as well as in another slot that meets the specifications. In both cases, `lspci` does not list the card even though the fan spins and I know it is getting power. Also Voyager SDK dosn’t recognize the card.

 

I also tried it on an Intel system with an Intel i5-8500T on a Supermicro X11SCA-F motherboard. The same issue occurs. The card is powered (fan spin) but not recognized.

 

I noticed that the boot time increases significantly when the accelerator is installed. This makes me think that UEFI might be attempting to detect something, even though no error messages are shown. As I already tried a lot and I couldn’t get it running, I wonder if you have any ideas what I can test.

Could there be a problem with the BIOS on the card, similar to issues sometimes seen with GPUs (Video Bios)? If so, is there a way to flash or update it? I think I would also have the necessary tools to attach to the debug port if that helps.

 

Any insights or suggestions would be greatly appreciated. Thanks in advance!

Hello Victor,

I followed your instructions and retested the setup. Here are the details for completeness:

  • Kernel Parameters:
    Initially, running:

    cat /proc/cmdline

    produced:

    BOOT_IMAGE=/boot/vmlinuz-6.8.0-57-generic root=UUID=60bd6ac3-6af8-4ba0-9722-935f61fb73a3 ro quiet splash amd_iommu=off vt.handoff=7

    I then noticed that the pcie_aspm parameter was missing, so I updated my configuration. The updated output is now:

    BOOT_IMAGE=/boot/vmlinuz-6.8.0-57-generic root=UUID=60bd6ac3-6af8-4ba0-9722-935f61fb73a3 ro quiet splash amd_iommu=off pcie_aspm=off vt.handoff=7
  • Refresh Command Output:
    With the corrected kernel parameter in place, I executed the refresh command (using axdevice --refresh -v) several times. Here’s a representative output:

    INFO:axelera.runtime.axdevice:Removing 0000:03:00.0
    INFO:axelera.runtime.axdevice:PCIE rescan
    0000:04:00.0 : Axelera AI Metis AIPU (rev 02)
    INFO:axelera.runtime:Found PCI device: 04:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
    INFO:axelera.runtime:Found AIPU driver: metis 90112 0
    WARNING:axelera.runtime:4PCI device count mismatch: lspci=1, triton=0
    Traceback (most recent call last):
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/bin/axdevice", line 8, in <module>
    sys.exit(entrypoint_main())
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 625, in entrypoint_main
    main(args)
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 608, in main
    devices = _find_devices(found_devices, device_id)
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 187, in _find_devices
    raise RuntimeError("No devices found, use -v for more information")
    RuntimeError: No devices found, use -v for more information
    (venv) tripton@tripton-ubuntu:~/repos/voyager-sdk$ axdevice --refresh -v
    INFO:axelera.runtime.axdevice:Removing 0000:03:00.0
    INFO:axelera.runtime.axdevice:PCIE rescan
    0000:04:00.0 : Axelera AI Metis AIPU (rev 02)
    INFO:axelera.runtime:Found PCI device: 04:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
    INFO:axelera.runtime:Found AIPU driver: metis 90112 0
    WARNING:axelera.runtime:4PCI device count mismatch: lspci=1, triton=0
    Traceback (most recent call last):
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/bin/axdevice", line 8, in <module>
    sys.exit(entrypoint_main())
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 625, in entrypoint_main
    main(args)
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 608, in main
    devices = _find_devices(found_devices, device_id)
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 187, in _find_devices
    raise RuntimeError("No devices found, use -v for more information")
    RuntimeError: No devices found, use -v for more information
    (venv) tripton@tripton-ubuntu:~/repos/voyager-sdk$ axdevice --refresh -v
    INFO:axelera.runtime.axdevice:Removing 0000:03:00.0
    INFO:axelera.runtime.axdevice:PCIE rescan
    0000:04:00.0 : Axelera AI Metis AIPU (rev 02)
    INFO:axelera.runtime:Found PCI device: 04:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
    INFO:axelera.runtime:Found AIPU driver: metis 90112 0
    WARNING:axelera.runtime:4PCI device count mismatch: lspci=1, triton=0
    Traceback (most recent call last):
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/bin/axdevice", line 8, in <module>
    sys.exit(entrypoint_main())
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 625, in entrypoint_main
    main(args)
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 608, in main
    devices = _find_devices(found_devices, device_id)
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 187, in _find_devices
    raise RuntimeError("No devices found, use -v for more information")
    RuntimeError: No devices found, use -v for more information
    (venv) tripton@tripton-ubuntu:~/repos/voyager-sdk$ axdevice --refresh -v
    INFO:axelera.runtime.axdevice:Removing 0000:03:00.0
    INFO:axelera.runtime.axdevice:PCIE rescan
    0000:04:00.0 : Axelera AI Metis AIPU (rev 02)
    INFO:axelera.runtime:Found PCI device: 04:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
    INFO:axelera.runtime:Found AIPU driver: metis 90112 0
    WARNING:axelera.runtime:4PCI device count mismatch: lspci=1, triton=0
    Traceback (most recent call last):
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/bin/axdevice", line 8, in <module>
    sys.exit(entrypoint_main())
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 625, in entrypoint_main
    main(args)
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 608, in main
    devices = _find_devices(found_devices, device_id)
    File "/home/tripton/.cache/axelera/venvs/93f45ae3/lib/python3.10/site-packages/axelera/runtime/axdevice.py", line 187, in _find_devices
    raise RuntimeError("No devices found, use -v for more information")
    RuntimeError: No devices found, use -v for more information

The output shows that although the device is being detected by the system (as reflected in the logs), it still isn’t found correctly by the driver, and the device count mismatch remains.

Please let me know if you require any additional details or further tests.

Thanks,

Tripton


Hello ​@tripton ,

Just to double check, did you do sudo update-grub  and then added pcie_aspm=off and then did sudo reboot?

 

I have some additional requests:

  • Can you share with us the information of which host systems you are testing?
  • I see you are now testing your AMD host. Can you try also all the suggestions from my previous messages for the Intel host? Note that for intel it is intel_iommu=off.
  • In your Metis card there should be a serial number. Can you share that with us?

Thank you beforehand,

Victor


Hi ​@tripton ,

Do you have any updates on:

  • Can you share with us the information of which host systems you are testing?
  • I see you are now testing your AMD host. Can you try also all the suggestions from my previous messages for your Intel host? Note that for intel it is intel_iommu=off.
  • In your Metis card there should be a serial number. Can you share that with us?

@Spanner Please keep track of this post while I am off if possible ( I am back from Wednesday next week). Thanks!


Hello Victor and team,

Sorry for the late response; I’ve been busy with other tasks.

  • Host System Information:
    I’m currently testing on an AMD host with an ASUS B550-F motherboard, an AMD 5950X CPU, and an NVIDIA RTX 3080. The system is still running Ubuntu 22.04. Let me know if you need any additional details about this host.

  • Kernel Parameter Confirmation:
    I did run sudo update-grub after adding pcie_aspm=off and rebooted. I can confirm the parameter is active, as seen in my dmesg output:

    o    0.059275] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.0-57-generic root=UUID=60bd6ac3-6af8-4ba0-9722-935f61fb73a3 ro quiet splash amd_iommu=off pcie_aspm=off vt.handoff=7
  • Intel Host Testing:
    I will test the Intel host next week and share the results with you.

  • Serial Number on the Metis Card:
    I’m not completely sure where the serial number is located. If it’s on the sticker with the barcode, it appears as either “AD-PEG-AM1A” or “A96BD002489.”

Please let me know if any more information or tests are needed. I’ll keep you posted about the intel system.

Thanks,
Tripton


Reply