Skip to main content
Question

Qwen 32b instruct A3b on the edge

  • January 26, 2026
  • 9 replies
  • 100 views

I am looking for a way to infer Qwen VL 32b instruct A3b on the edge using Axelera m 2 form factor hardware. But I would need a lot of Ram to make this happen I am talking at least 48Gb. Could anyone share some light on what are my options. I need to do video inferencing and scene understanding on request from the user. My idea was to grab a few frames whenever the request was initiated and then choose the best frame to do inference. 

9 replies

  • Cadet
  • January 27, 2026

I don’t think this is possible with the current hardware. The model would need to run directly on the RAM of the card and the largest Metis card has 16GB. In any case with Metis the performance would also be terrible. You would have to wait until Europa is released, which should happen in this Q1 of 2026. Also not sure about the performance here: “best in class” is a lot of marketing blabla and we don’t have any raw numbers on the product page.


  • Author
  • Cadet
  • January 27, 2026

Thanks for the reply. Just got to know about Europa :-). Hopefully maybe that should help, till then I'm going to slog it out on the Jetson Orin 64gb I guess


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • January 27, 2026

Sounds like a great project, but as you say, The Metis M.2 doesn’t have the memory for that. There’s a 4 quad-core Metis with 64GB, but it’s a fair price difference from the M.2 😄

As ​@jclsn points out, the Europa could bring you closer. It’s designed for VLMs up to 8B parameters on single chip. As soon as there’s a launch date, I’ll make sure to come back here and drop it in 👍

What’s the project you’re working on, ​@Rhinostar ?


  • Author
  • Cadet
  • January 27, 2026

Hi Spanner, price isnt a factor for me as is portability and power draw. The project really is scene understanding with specific job roles given to the model along with fine tuned data (role based). the reason for the Qwen 30B is that it seems to be better at understanding the scene for now - until of course a better distilled model comes out with lesser parameters that fits in a smaller memory footprint.
You say Europa is designed for upto 8b, why is that so? Is this a limitation of the compute or the memory or the bandwidth or any /all of them?


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • January 28, 2026

Dur, sorry, my mistake there! I’m writing my replies while reading the original post at the same time 🤣 8B is actually for Metis.

Europa's intended to solve that memory bottleneck you mention for models up to 32B parameters per chip. So Qwen VL 32B should be just within Europa's single-chip capability. The multi-chip configs (70B+ across systems) would be for even larger models.

Apologies for the confusion there 👍


  • Author
  • Cadet
  • January 28, 2026

@Spanner Would the Europa chip be available as standalone with schematics for typical use case so that I can design my own circuit around it? 


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • January 28, 2026

@Spanner Would the Europa chip be available as standalone with schematics for typical use case so that I can design my own circuit around it? 

Not to my knowledge - I think currently, at least, the plan is for M.2 and PCIe form factors, much like Metis. Titania potentially, although that’s still on the horizon. But do stay tuned - if that changes with Europa, I’ll make sure to bring that news to the community right away 👍


  • Author
  • Cadet
  • January 30, 2026

@Spanner are there any schematics available for the Metis M.2 board? Thanks


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • February 2, 2026

There are some datasheets and info on integration, but I don't think any full schematics have been made available, although I will ask the team 👍

https://support.axelera.ai/hc/en-us/articles/25577654318354-Metis-M-2-Integration-Requirements