Question

Multi-Card Model Sharding / Unified Workloads — Roadmap or Workaround?

Forum|Forum|1 day ago
June 29, 2026
0 replies
1 view

Meraclus
Cadet

Hey everyone,

I am evaluating the Metis lineup for an edge AI project and have a specific question about multi-card architecture that I haven't found a definitive answer to in the docs or SDK references.

What I understand is supported today:

The Voyager SDK detects multiple Metis cards in a single host.
Each card runs its own independent inference pipeline.
A real-world surveillance example shows three 4-chip PCIe cards running five primary models + one secondary model in parallel across 48 AIPU cores.

My question: Is there any current or planned SDK support for splitting a single neural network across multiple Metis cards (e.g., layer sharding or tensor parallelism across two M.2 cards)? Or is the architecture strictly "one model must fit entirely within a single card's memory and AIPU"?

Context for my use case: I am looking at a host with dual M.2 slots (e.g., Radxa Orion O6N) and weighing whether two standard Metis M.2 cards could act as a unified 2-GB / 428-TOPS accelerator for a single large model, or if I should instead plan for a single M.2 Max (16 GB) and treat the dual-M.2 path as strictly for agent-swarming / multi-model parallelism.

If unified multi-card execution is not on the roadmap, a clear confirmation would help me (and likely others) size the right SKU upfront rather than over-provisioning hardware.

Thanks for any insight you can share.

Sign up

Log in, or create an Axelera AI account

Login to the community

Log in, or create an Axelera AI account

Scanning file for viruses.

This file cannot be downloaded