Skip to main content
Question

Multi-Card Model Sharding / Unified Workloads — Roadmap or Workaround?

  • June 29, 2026
  • 0 replies
  • 1 view

Hey everyone,

I am evaluating the Metis lineup for an edge AI project and have a specific question about multi-card architecture that I haven't found a definitive answer to in the docs or SDK references.

What I understand is supported today:

  • The Voyager SDK detects multiple Metis cards in a single host.

  • Each card runs its own independent inference pipeline.

  • A real-world surveillance example shows three 4-chip PCIe cards running five primary models + one secondary model in parallel across 48 AIPU cores.

My question: Is there any current or planned SDK support for splitting a single neural network across multiple Metis cards (e.g., layer sharding or tensor parallelism across two M.2 cards)? Or is the architecture strictly "one model must fit entirely within a single card's memory and AIPU"?

Context for my use case: I am looking at a host with dual M.2 slots (e.g., Radxa Orion O6N) and weighing whether two standard Metis M.2 cards could act as a unified 2-GB / 428-TOPS accelerator for a single large model, or if I should instead plan for a single M.2 Max (16 GB) and treat the dual-M.2 path as strictly for agent-swarming / multi-model parallelism.

If unified multi-card execution is not on the roadmap, a clear confirmation would help me (and likely others) size the right SKU upfront rather than over-provisioning hardware.

Thanks for any insight you can share.