Skip to main content
Question

KVM/VFIO PCIe Passthrough: DMA Write blocked by IOMMU (Proxmox, firmware 1.6.0)

  • May 7, 2026
  • 1 reply
  • 16 views

Hello Axelera team and community,

We are running a Metis PCIe card (16GB, rev 02) in a Proxmox VE 9.1.9 cluster with VFIO PCIe passthrough into a KVM guest. We understand this is listed as Beta in SDK 1.6, and we want to share detailed diagnostics to help improve it.

Environment:

  • Host: Proxmox VE 9.1.9, kernel 7.0.0-3-pve (Linux 6.14), Intel VT-d enabled (intel_iommu=on iommu=pt)
  • Metis firmware: flver=1.6.0, bcver=7.4
  • DKMS driver: metis-dkms=1.4.16
  • SDK: axelera-rt==1.6.0, axelera-runtime==1.6.0
  • Guest: Ubuntu 24.04.3, kernel 6.8.0-111-generic
  • IOMMU group 22: only the Metis card (isolated, ideal for passthrough)

What works (bare metal on Proxmox host):

axelera.runtime Python API works perfectly on the host:

  • ResNet18: 1.29ms per inference
  • RetinaFace: 5.32ms per inference, face detected with confidence=1.000
  • load_model_instance completes instantly

What fails (inside KVM guest via VFIO):

conn.load_model_instance(model, num_sub_devices=1, aipu_cores=4) fails:

 

 

[libaxldev_linux.c:675] DMA_GET_XFER_SYNC_STATUS failed: Connection timed out
[AxeleraDmaBuf.cpp:256] DMABUF_METIS_WAIT failed: Connection timed out
[libaxldev_linux.c:721] USR_DMA_XFER failed: Connection timed out
[AxeleraDevice.cpp:672] axl_dma_xfer failed: Connection timed out
[ERROR][load]: Failed to write module binary to device memory.

Both DMA paths fail: UIO (USR_DMA_XFER) and DMABUF (DMABUF_METIS_WAIT).

Root cause from host dmesg:

We captured the host kernel log during the failure. Two distinct IOMMU faults appear for device [86:00.0]:

1. DMA Write blocked:

 

 

DMAR: [DMA Write NO_PASID] Request device [86:00.0] 
fault addr 0x149800000 [fault reason 0x05] PTE Write access is not set

The Metis device (inside guest) attempts DMA writes to guest RAM. The host IOMMU blocks them — the PTE exists but has no write permission.

2. MSI interrupt remapping blocked:

 

 

DMAR: [INTR-REMAP] Request device [86:00.0] fault index 0x28 
[fault reason 0x22] Present field in the IRTE entry is clear

DMA completion interrupts (MSI) are also blocked — the IRTE Present bit is 0.

Both issues occur together on every load_model_instance call.

What we tried (all unsuccessful):

  • allow_unsafe_interrupts=1 for vfio_iommu_type1
  • QEMU -mem-prealloc
  • 2MB hugepages with NUMA enabled
  • QEMU -machine q35,kernel-irqchip=split
  • Unloading host metis module before VM start
  • Guest kernel mem=3G to limit DMA to low memory

Historical note:

With firmware 1.3.2, create_inference_stream (GStreamer path) worked inside the VM (slow — 32s/frame due to pipeline re-init per request, but functional). After upgrading to firmware 1.6.0, both create_inference_stream and axelera.runtime fail with DMA timeout inside the VM.

Questions:

  1. Is there a specific QEMU/VFIO configuration required for KVM passthrough with firmware 1.6.0?
  2. The IOMMU creates PTEs without write permission — is this a known issue?
  3. Is there a recommended workaround for Proxmox/VFIO environments?

We are happy to provide additional diagnostics. Thank you.

1 reply

Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • May 7, 2026

Wow, that’s some excellent feedback and really useful disgnostics, ​@tenor! Let me pass this along to the team internally - I think they’l be really intested. Cheers!