Hello Axelera team and community,
We are running a Metis PCIe card (16GB, rev 02) in a Proxmox VE 9.1.9 cluster with VFIO PCIe passthrough into a KVM guest. We understand this is listed as Beta in SDK 1.6, and we want to share detailed diagnostics to help improve it.
Environment:
- Host: Proxmox VE 9.1.9, kernel 7.0.0-3-pve (Linux 6.14), Intel VT-d enabled (
intel_iommu=on iommu=pt) - Metis firmware:
flver=1.6.0, bcver=7.4 - DKMS driver:
metis-dkms=1.4.16 - SDK:
axelera-rt==1.6.0, axelera-runtime==1.6.0 - Guest: Ubuntu 24.04.3, kernel 6.8.0-111-generic
- IOMMU group 22: only the Metis card (isolated, ideal for passthrough)
What works (bare metal on Proxmox host):
axelera.runtime Python API works perfectly on the host:
- ResNet18: 1.29ms per inference
- RetinaFace: 5.32ms per inference, face detected with confidence=1.000
load_model_instancecompletes instantly
What fails (inside KVM guest via VFIO):
conn.load_model_instance(model, num_sub_devices=1, aipu_cores=4) fails:
[libaxldev_linux.c:675] DMA_GET_XFER_SYNC_STATUS failed: Connection timed out
[AxeleraDmaBuf.cpp:256] DMABUF_METIS_WAIT failed: Connection timed out
[libaxldev_linux.c:721] USR_DMA_XFER failed: Connection timed out
[AxeleraDevice.cpp:672] axl_dma_xfer failed: Connection timed out
[ERROR][load]: Failed to write module binary to device memory.Both DMA paths fail: UIO (USR_DMA_XFER) and DMABUF (DMABUF_METIS_WAIT).
Root cause from host dmesg:
We captured the host kernel log during the failure. Two distinct IOMMU faults appear for device [86:00.0]:
1. DMA Write blocked:
DMAR: [DMA Write NO_PASID] Request device [86:00.0]
fault addr 0x149800000 [fault reason 0x05] PTE Write access is not setThe Metis device (inside guest) attempts DMA writes to guest RAM. The host IOMMU blocks them — the PTE exists but has no write permission.
2. MSI interrupt remapping blocked:
DMAR: [INTR-REMAP] Request device [86:00.0] fault index 0x28
[fault reason 0x22] Present field in the IRTE entry is clearDMA completion interrupts (MSI) are also blocked — the IRTE Present bit is 0.
Both issues occur together on every load_model_instance call.
What we tried (all unsuccessful):
allow_unsafe_interrupts=1forvfio_iommu_type1- QEMU
-mem-prealloc - 2MB hugepages with NUMA enabled
- QEMU
-machine q35,kernel-irqchip=split - Unloading host
metismodule before VM start - Guest kernel
mem=3Gto limit DMA to low memory
Historical note:
With firmware 1.3.2, create_inference_stream (GStreamer path) worked inside the VM (slow — 32s/frame due to pipeline re-init per request, but functional). After upgrading to firmware 1.6.0, both create_inference_stream and axelera.runtime fail with DMA timeout inside the VM.
Questions:
- Is there a specific QEMU/VFIO configuration required for KVM passthrough with firmware 1.6.0?
- The IOMMU creates PTEs without write permission — is this a known issue?
- Is there a recommended workaround for Proxmox/VFIO environments?
We are happy to provide additional diagnostics. Thank you.

