Skip to main content
Question

Deploy available LLMs with custom weights

  • December 4, 2025
  • 5 replies
  • 157 views

Hi!

First of all, thank you very much for your work — the Axelera tools are really impressive and perform exceptionally well!

I’m currently working on deploying LLMs on the METIS PCIe accelerator using the Voyager SDK v1.5.1.  
From my understanding, the LLMs currently available are already precompiled, and we simply download the compiled model files.

Would it be possible to access the deployment or compilation process so that we can deploy our own fine-tuned LLMs — even if they share the same architecture as the models already available in the model zoo?

 

5 replies

Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • December 8, 2025

Hi there ​@NVigne , glad you’re enjoying all things Axelera so far!

It’s actually still quite early days for LLMs, although they are getting a lot of attention (both internally and externally). For supported architectures (like Llama family), loading custom weights from HuggingFace should theoretically work, but the exact process has’t been documented yet.

I will make sure that your request is circulated though, so hopefully we’ll be able to help out soon. What models are you looking at using?


  • Cadet
  • December 11, 2025

I’m also interested in running LLM using METIS board. I would love to see you add Qwen 2.5 VL to your build list. However, for SMLs, they are not very useful without custom fine-tuning like LORA. I hope you could publish the process soon, it’s holding back potential users.


  • Author
  • Cadet
  • December 11, 2025

Hi @Spanner,

Thank you for your response. To give a more complete picture of what I’m trying to achieve: I want to run multi-modal inferences on Axelera (using image + text as input and text as output).

Since version v1.5.1 aligned the vision and language components of the SDK in terms of library versioning, I’ve been able to deploy a custom visual encoder and a multi-modal projector together with a LLaMA model. However, because custom weights cannot currently be used for LLM deployment, the visual tokens are not understood by the available precompiled LLMs.

Regarding loading custom weights from Hugging Face, from what I can tell when reading the .yaml files of the supported LLMs, all downloaded components are already compiled and have gone through an optimization and deployment process — which doesn’t seem to have been publicly released yet (though it may be available internally).

In any case, I’ll be happy to try this out once it becomes available! 🙂


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • December 12, 2025

Good to know about any new models people are interested in, so thanks for mentioning Qwen too, ​@cortexist!

And you’re on the nail there ​@NVigne - as you say, the existing models have been optimised quite extensively, which is definitely something that’ll get documented (and, actually, the process itself should become a lot more streamlined) and shared. It’s not quite there yet as LLMs aren’t the primary focus for Axelera, so I’m not sure exactly when.

But the Metis M.2 Max and Europa do make LLMs (and VLMs) far more accessible and practical, so it’s great to hear about people’s enthusiasm - the whole team takes note of this, and it does affect the Axelera roadmap hearing from you guys!

Do feel free to add new model requests to the Launchpad if you like. The more upvotes they get, the stronger the case is for prioritising them internally 👍 And feel free to add as many as you like - giving each model its own idea post would actually be really helpful!


  • Cadet
  • December 12, 2025

@Spanner, that’s awesome. At this point, many are waiting for Hailo10-H and Metis M.2 Max, the first reaches the developers likely gain some foothold and see projects utilizing them on GitHub spread.