Skip to main content

Hi all,

I am trying to enable the the usage of the Metis PCIE rev02 on RISC-V Hosts, in particular the Sifive P550. Currently I could successfully (at least I think so), install the drivers and the device gets correctly identified and mapped in the system. I attach the loggings just to be sure: 

dmesg (after rescan):

>  858.500785] pci 0000:01:00.0: :1f9d:1100] type 00 class 0x120000
/  858.500843] pci 0000:01:00.0: reg 0x10: 0mem 0x04380000-0x04380fff 64bit]
 858.500865] pci 0000:01:00.0: reg 0x18: 1mem 0x08000000-0x09ffffff]
r  858.500922] pci 0000:01:00.0: reg 0x30: xmem 0x00000000-0x0000ffff pref]
b  858.500943] pci 0000:01:00.0: Max Payload Size set to 512 (was 128, max 512)
<  858.501115] pci 0000:01:00.0: supports D1
1  858.501121] pci 0000:01:00.0: PME# supported from D0 D1 D3hot
o  858.516166] pci_bus 0000:01: busn_res: ubus 01] end is updated to 01
 858.516197] pcieport 0000:00:00.0: BAR 14: assigned mem 0x42000000-0x44ffffff]
f  858.516209] pci 0000:01:00.0: BAR 2: assigned :mem 0x42000000-0x43ffffff]
f  858.516222] pci 0000:01:00.0: BAR 6: assigned 6mem 0x44000000-0x4400ffff pref]
 858.516229] pci 0000:01:00.0: BAR 0: assigned mem 0x44010000-0x44010fff 64bit]
 858.518612] metis: loading out-of-tree module taints kernel.
 858.518630] metis: module verification failed: signature and/or required key missing - tainting kernel
n  858.521107] pci 0000:01:00.0: Found target device: TRITON_OMEGA_DEVICE_ID
_  858.521116] pci 0000:01:00.0: Found target device: 0000:01:00.0
0  858.521123] pcieport 0000:00:00.0: Found bridge device: 0000:00:00.0
0  858.521130] Invalid memory base and limit values: base=0xfff00000, limit=0x0
0  858.528281] axl: Bridge not reset becuse of a previously reported error: 4294967274
o  858.528286] axl: This is not fatal and is normal for passtrough devices
s  858.528290] axl: The module will continue to load without attempting bridge reset
i  858.528371] triton: root directory for triton
e  858.532286] axl 0000:01:00.0: Adding to iommu group 5
g  858.532379] axl 0000:01:00.0: enabling device (0000 -> 0002)
e  858.533898] axl 0000:01:00.0: MSI registered 32 (1)
S  858.533909] axl 0000:01:00.0: irq vec number 92
.  858.534011] axl 0000:01:00.0: Init irq handler for single msi
h  858.541250] axl 0000:01:00.0: Data Link Layer Link Active Reporting capability
v  858.541445] axl 0000:01:00.0: Register directory 0000:01:000
e  858.541598] Triton Linux Driver, version 0.07.16, init OK

 

lspci

01:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
        Subsystem: Axelera AI Metis AIPU (rev 02)
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 92
        IOMMU group: 5
        Region 0: Memory at 44010000 (64-bit, non-prefetchable) size=4K]
        Region 2: Memory at 42000000 (32-bit, non-prefetchable) size=32M]
        Expansion ROM at 44000000 pvirtual] ldisabled] 2size=64K]
        Capabilities: <access denied>
        Kernel driver in use: axl
        Kernel modules: metis

 

After this I could compile the libuio and use the “triton_multi_ctx” to read the different characteristics of the device.

 ./triton_multi_ctx --fwver
elibdmabuf.c:1860] Device 0: metis-0:1:0
olibdmabuf.c:1281] edma init
rlibdmabuf.c:1330] msg_tv_sec = 1, krn_tv_sec = 120, dma_tv_sec = 5
dlibdmabuf.c:1725] uio_dev_msg msg_tv_sec 1
klibdmabuf.c:1664] Sending device command: opcode=111 size=0 msg=m00 ]
slibdmabuf.c:617] Wait MSG 12 (timeout 1sec)
nlibdmabuf.c:542] Wait MSI 12 (timeout 1sec)
6libdmabuf.c:550] Wait MSI 12 DONE
blibdmabuf.c:1664] Received device response: status=1 size=21 msg=a76 31 2e 32 2e 30 2d 72 63 32 2b 62 6c 31 2d 73 74 61 67 65 30 00 ]
Firmware version: v1.2.0-rc2+bl1-stage0

But right now I am trying to debug the communication for data between the metis and the host using the unit tests provided. All the initial test pass, but for the data transfer is giving problems, The test memory read/write to/from device using the ‘ Device’ fails both writing and reading to the device, and the ‘DmaBuf’ only fails on the read. I don’t really know the cause, if this is because the device is mapped in a different address range than it expects (I can see 0x80000000 hard coded in many .sh) but the BAR regions are in:

41000000-4fffffff : pcie@0x54000000
  41000000-410fffff : 0000:00:00.0
  42000000-44ffffff : PCI Bus 0000:01
    42000000-43ffffff : 0000:01:00.0
      42000000-43ffffff : triton-0:1:0
    44000000-4400ffff : 0000:01:00.0
    44010000-44010fff : 0000:01:00.0
      44010000-44010fff : triton-0:1:00.0

… (but also I can see in the /proc/iomem

8000000000-81ffffffff : pcie@0x54000000

In the “Device” code method while useing dma transfer the main error is this one: 5AxeleraDevice.cpp:341] UIO_IOCTL_USR_DMA_XFER failed: Connection timed out . And for the “DmaBuf” is oAxeleraDmaBuf.cpp:262] DMABUF_METIS_WAIT failed: Connection timed out.

The normal method and the mmap method seems to succeed in the transfer tho. I will attach the complete logging in case it can help. Some extra notes: this host is know to not have IO coherency, and the only way of geting DMA to work (on other progets on my experience) is when memory is request it by the dma_alloc_coherent() as they patched the pagesconfig at kernel level with an uncached bit for this purpose. Using normal memory allocation will result on reading the wrong value from cache. I hope I didn’t add too many info but it seemed needed to explain the current state ahhahaha. Any help is more than welcome as I am not really an expert in this. 

Thanks for the detailed post ​@jjpr! This is really helpful stuff.

From the logs and the tests you’ve shared, the setup is looks advanced and it’s great that you’ve got the driver loaded, device identified, and firmware version retrieved. As a starting point, one useful test might be to check the PCIe communication and firmware status using Axelera’s axdevice tool with the --refresh flag. 

source venv/bin/activate
axdevice --refresh

This command:
    •    Rescans and reinitialises any Metis devices.
    •    Reloads the firmware.
    •    Helps clear up inconsistencies between what the OS thinks is available and what’s actually ready for use.

Let’s start with this and see what it reports back. If you could share the result of that test, we can dig deeper into where we go next 👍


Reply