Hello,
I would like to run Voxtral-Mini for Speech-To-Text (STT) and qwen3 tts 0.6B for Text-To-Speech. (TTS)
Qwen3 tts would probably fit into a M.2 Metis AI board, but voxtral is a 4B model, therefore it would probably need a M.2 Metia AI Max.
I have a few questions to start:
- I find a plenty of examples running Metis with images, but audio is almost not existent. Do you support it? If yes, can you point me to the specific documentation, please?
- If I use the 16GB of RAM of M.2 Metis AI Max, can I have two models loaded and running in parallel on the board? Voice applications are very sensitive to latency, being able to load two models and keeping them ready to work would keep latency low.
Thank you,
Ottavio
