I’m doing some testing with the PCI-Express based AI card.
After some initial hiccups due to motherboard compatibility which required me to swap place for GPU and AI-card and then exposing the card into a ubuntu-22.04 docker container it’s working fine.
I’m impressed by the performance of the card compared to using --pipe torch and having a CUDA based backend for the benchmark problems identifying traffic.
I want to do inference with not only custom weights, but also custom nets and see how it performs. My wish is to have a sample that uses numpy f32 arrays as input and output and does inference on the card given a model from a ONNX file.
To keep it simple, I made a PyTorch net [3, 64][ Relu ][64, 64][ Relu ][64, 1] and trained it
on a mathematical function f(x1, x2, x3) = y.
This toy net was exported to ONNX.
From there I wanted to test inference on the AI card before moving to more complex things.
I could not see that my wanted usecase was compatible with `deploy.py` and the required items in the model/pipeline yaml file.
So instead I tried to use the compiler directly which gave an error:
…
07:58:42 [INFO] Running LowerFrontend...
07:58:42 [ERROR] Failed passes: ['axelera.PadChannelsToPword', 'LowerFrontend']
07:58:42 [INFO] TVM pass trace information stored in: /drive/build/compiled_model
07:58:42 [ERROR] Lowering failed. Failed pass: axelera.PadChannelsToPword <- LowerFrontend
…
With a small change to the default compilation config, I got a successful compilation thinking that I now have something that I should be able to load onto the card.
compile --generate-config --output config
sed -i 's/quantize_and_lower/quantize_only/' config/default_conf.json
compile --input toy-model/toy3d_model.onnx --input-shape 1,3 --overwrite --output build --config config/default_conf.json
...
07:59:58 [INFO] Checking ONNX model compatibility with the constraints of opset 17.
Calibrating... ✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨ | 100% | 469.31it/s | 100it |
07:59:59 [INFO] Exporting '' using GraphExporterV2.
07:59:59 [INFO] Quantization finished.
07:59:59 [INFO] Quantization took: 1.42 seconds.
07:59:59 [INFO] Export quantized model manifest to JSON file: /drive/build/quantized_model_manifest.json
07:59:59 [INFO] Quantization only was requested. Skipping lowering.
07:59:59 [INFO] Done.
Next I wanted to load the compiled model using python and axelera.runtime.
Which json file should I use for loading? I tried different ones, but loading the model
does not work and I get the error:
ValueError: AXR_ERROR_VALUE_ERROR: Failed to load model from /drive/build/quantized_model/quantized_model.json
Error: Version not found in model
Is there some example in the SDK which shows similar things that I’m trying to do but I’ve missed it? Any hints or guidance would be appreciated.

