Hello,
I am trying to compile an IBM Granite embedding model for the Metis AIPU.
Current test model:
```text
ibm-granite/granite-embedding-311m-multilingual-r2
https://huggingface.co/ibm-granite/granite-embedding-311m-multilingual-r2
```
My final target model is:
```text
ibm-granite/granite-switch-4.1-3b-preview
https://huggingface.co/ibm-granite/granite-switch-4.1-3b-preview
```
Target hardware and software:
```text
Metis SBC
16 GB board RAM
4 GB AIPU
Python 3.12
Voyager SDK 1.6.0
```
I tried to compile the embedding model with `axcompile`, but it fails during quantization.
Command:
```bash
axcompile \
--input granite_embedding_metis_work/granite_embedding_311m_static.onnx \
--config granite_embedding_metis_work/granite_embedding_metis_4gb.json \
--output granite_embedding_metis_work/axcompile_out \
--overwrite \
--dataset-len 2 \
--log-level DEBUG \
--quantize-only
```
The ONNX export succeeds.
Current ONNX details:
```text
ONNX opset: 17
Input: inputs_embeds [1, 128, 768]
Output: embeddings [1, 768]
```
The model reaches calibration, then fails after calibration completes.
Error:
```text
Calibrating... | 100% | 11.05s/it | 2it |
RuntimeError: External op model_layers_dot_0_attn_squeeze_const_input1 found in the model (<class 'qtoolsv2.intermediate_representation.operators.constant.Constant'> op). QTools may have issues quantizing this model.
```
ONNX operator summary:
```text
Constant: 789
Mul: 226
Add: 154
MatMul: 134
Slice: 132
Transpose: 68
Squeeze: 66
Cast: 52
Concat: 46
Reshape: 44
Neg: 44
LayerNormalization: 44
Div: 44
Gather: 23
Split: 22
Softmax: 22
Shape: 22
Erf: 22
Unsqueeze: 4
ConstantOfShape: 2
Equal: 2
Where: 2
Expand: 2
Cos: 2
Sin: 2
```
I tried several changes already:
```text
Moved tokenizer and token embedding lookup to CPU.
Changed the ONNX input to inputs_embeds [1, 128, 768].
Used a fixed attention mask.
Moved final L2 normalization to CPU.
Tried FP32 export instead of FP16.
Used dataset-len 2 because dataset-len 1 fails.
```
My question is:
Can Voyager SDK 1.6.0 compile Transformer embedding models like this through the generic `axcompile` ONNX path?
Or do IBM Granite models need a precompiled Metis package, similar to the LLM flow with `precompiled_url`?
I can provide these files if useful:
```text
metadata.json
cli_args.json
conf.json
compilation_log.txt
compilation_report.json
ONNX operator summary
the export script
```
I would really appreciate guidance on this.
Thank you,
Peter
