Hello, I would like to run Voxtral-Mini for Speech-To-Text (STT) and qwen3 tts 0.6B for Text-To-Speech. (TTS)Qwen3 tts would probably fit into a M.2 Metis AI board, but voxtral is a 4B model, therefore it would probably need a M.2 Metia AI Max.I have a few questions to start:I find a plenty of examples running Metis with images, but audio is almost not existent. Do you support it? If yes, can you point me to the specific documentation, please? If I use the 16GB of RAM of M.2 Metis AI Max, can I have two models loaded and running in parallel on the board? Voice applications are very sensitive to latency, being able to load two models and keeping them ready to work would keep latency low.Thank you, Ottavio

Question

Support for audio models (STT and TTS)

Forum|Forum|4 months ago
February 21, 2026
3 replies
86 views

Ottavio
Cadet

Hello,

I would like to run Voxtral-Mini for Speech-To-Text (STT) and qwen3 tts 0.6B for Text-To-Speech. (TTS)

Qwen3 tts would probably fit into a M.2 Metis AI board, but voxtral is a 4B model, therefore it would probably need a M.2 Metia AI Max.

I have a few questions to start:

I find a plenty of examples running Metis with images, but audio is almost not existent. Do you support it? If yes, can you point me to the specific documentation, please?
If I use the 16GB of RAM of M.2 Metis AI Max, can I have two models loaded and running in parallel on the board? Voice applications are very sensitive to latency, being able to load two models and keeping them ready to work would keep latency low.

Thank you,

Ottavio

+3

Spanner
Axelera Team
Forum|Forum|4 months ago
February 23, 2026

Hi @Ottavio ! Hmm, in all honesty Metis is designed and built for vision inference, so STT/TTS isn't really a use case at the moment. That said, there’s ongoing work experimenting with running something like Whisper on the host CPU alongside Metis doing the vision side, rather than trying to get it onto the AIPU itself. So nothing’s entirely off the table!

On running two models in parallel, yep, absolutely — this is one of Voyager's strengths for vision models. You can run models in parallel or cascade them sequentially, all defined in a single YAML. There's a good blog post covering exactly this: Simplifying Model and Pipeline Deployment with the Voyager SDK.

The caveat her is the same though, that it applies to vision models. Audio models loaded and ready to serve in parallel the way you're describing isn't something the platform supports natively. Not yet, anyway - could be an interesting project though! It didn’t support LLMs, until it did...

Like

O

Ottavio
Author
Cadet
Forum|Forum|4 months ago
February 23, 2026

Yes, I agree the audio could be an interesting project.

About running two models in parallel, let me rephrase my question, because I think that we are not aligned on the desired outcome. Is it currently possible to run two distinct processes, each with its own pipeline, sharing a unique board?

Like

+3

Spanner
Axelera Team
Forum|Forum|4 months ago
March 3, 2026

HI @Ottavio !

You can run two separate processes, each with their own pipeline, sharing a single Metis board, yep! As long as the combined resource usage doesn't exceed what the device has available, anyway. Since Metis has 4 AI cores, the allocation is proportional: a 1-core process gets ~1/4 of device memory, a 2-core process gets ~1/2, and so on.

The main constraint is that resources are allocated at startup and held for the lifetime of the process. So if the second process tries to start and there aren't enough free cores or memory, it could fail.

Like

Sign up

Log in, or create an Axelera AI account

Login to the community

Log in, or create an Axelera AI account