Skip to main content
Question

Custom Model on Metis SBC

  • May 28, 2026
  • 3 replies
  • 11 views

Hello,

I am trying to compile an IBM Granite embedding model for the Metis AIPU.

Current test model:

```text
ibm-granite/granite-embedding-311m-multilingual-r2
https://huggingface.co/ibm-granite/granite-embedding-311m-multilingual-r2
```

My final target model is:

```text
ibm-granite/granite-switch-4.1-3b-preview
https://huggingface.co/ibm-granite/granite-switch-4.1-3b-preview
```

Target hardware and software:

```text
Metis SBC
16 GB board RAM
4 GB AIPU
Python 3.12
Voyager SDK 1.6.0
```

I tried to compile the embedding model with `axcompile`, but it fails during quantization.

Command:

```bash
axcompile \
  --input granite_embedding_metis_work/granite_embedding_311m_static.onnx \
  --config granite_embedding_metis_work/granite_embedding_metis_4gb.json \
  --output granite_embedding_metis_work/axcompile_out \
  --overwrite \
  --dataset-len 2 \
  --log-level DEBUG \
  --quantize-only
```

The ONNX export succeeds.

Current ONNX details:

```text
ONNX opset: 17
Input: inputs_embeds [1, 128, 768]
Output: embeddings [1, 768]
```

The model reaches calibration, then fails after calibration completes.

Error:

```text
Calibrating... | 100% | 11.05s/it | 2it |

RuntimeError: External op model_layers_dot_0_attn_squeeze_const_input1 found in the model (<class 'qtoolsv2.intermediate_representation.operators.constant.Constant'> op). QTools may have issues quantizing this model.
```

ONNX operator summary:

```text
Constant: 789
Mul: 226
Add: 154
MatMul: 134
Slice: 132
Transpose: 68
Squeeze: 66
Cast: 52
Concat: 46
Reshape: 44
Neg: 44
LayerNormalization: 44
Div: 44
Gather: 23
Split: 22
Softmax: 22
Shape: 22
Erf: 22
Unsqueeze: 4
ConstantOfShape: 2
Equal: 2
Where: 2
Expand: 2
Cos: 2
Sin: 2
```

I tried several changes already:

```text
Moved tokenizer and token embedding lookup to CPU.
Changed the ONNX input to inputs_embeds [1, 128, 768].
Used a fixed attention mask.
Moved final L2 normalization to CPU.
Tried FP32 export instead of FP16.
Used dataset-len 2 because dataset-len 1 fails.
```

My question is:

Can Voyager SDK 1.6.0 compile Transformer embedding models like this through the generic `axcompile` ONNX path?

Or do IBM Granite models need a precompiled Metis package, similar to the LLM flow with `precompiled_url`?

I can provide these files if useful:

```text
metadata.json
cli_args.json
conf.json
compilation_log.txt
compilation_report.json
ONNX operator summary
the export script
```

I would really appreciate guidance on this.

Thank you,
Peter

 

3 replies

Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • May 28, 2026

Hi there ​@spectral369! Welcome to the show, and thanks for the detailed write-up. You've clearly put a lot of work into the export already, nice one. 👍

So,for LLM models on Metis the supported path at present (I beleive) is the precompiled AxLLM flow rather than generic axcompile. The LLM tutorial has a bit deeper info on this. All the SLMs on Metis are precompiled, and arbitrary models can't be loaded unless they've been compiled for the platform (catch 22, at the moment anyway -- this is getting a lot of work right now though!). I bet the QTools error you're hitting is the compiler saying it doesn't have rules for some of the Constant ops. Which in turn fits with Transformer-class embedding models like Granite not being on the generic ONNX compile path at the moment. 

Before going further with the export, maybe it’d be illuminating to try running one of the precompiled SLMs as a sanity check? Something like axllm llama-3-2-1b-1024-4core-static --prompt "Hello" should run on the MCB. The walkthrough is in that same LLM tutorial. If that runs cleanly, we know your Computer Board and SDK install are good and you've seen the supported LLM path working. If it errors, we can dig in deeper from there.

Out of curiosity, what's the wider use case?


  • Author
  • Cadet
  • May 28, 2026

Hello,

Thank you for your swift response!

I tried the SBC tutorial, and it worked as described. At first, I was a bit confused by the stripped-down OS, just Docker with no additional packages but I figured it out after manually installing a few from the Debian repositories 😅. The Llama model worked, and that was the extent of it, since I had to build my demo app on my PC and so far haven’t had a reason to push it to the SBC.

The demo use case is both …..naive and complex. At its core, it’s a custom RAG system using Docling and Qdrant for sensitive data. Since Granite doesn’t support my native language, I’m also using [Helsinki-NLP](https://huggingface.co/Helsinki-NLP). For now, the data ingestion is at a ~decent level. I could elaborate, but it would take a few paragraphs.

Is there an experimental workflow for my case, so I can test it myself ?
Is there anything I can do besides waiting for llm precompilation ?

Thank you,
Peter



 


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • May 28, 2026

That’s not naive at all, it’s a great use case! Really interested to see how that progresses.

So, the custom model deployment tutorial we currently have is very much focused on vision models, since that’s primarily Axelera’s bread and butter. It doesn’t currently have an LLM angle to it, so unfortunately wouldn’t get you closer to Granite.

At this stage, probably the best thing you can do is add it as a feature request in the Launchpad section. The team does keep a close eye on that when working on priorities and the roadmap, so if it’s in there, it’ll get seen. And I’ll also pass your request along internally (we have a group chat specifically about new models people are looking for) to give it a bump there, too.

So, is it Finnish you’re looking to use? (Guessing from the model name!) . I’m just wondering if it's one of the ones Llama 3.2 supports there might be a path that drops the Helsinki-NLP translation step out of your stack entirely? Not a like-for-like Granite replacement, but maybe worth considering depending on your accuracy requirements.