Skip to main content

I’m doing some testing with the PCI-Express based AI card. 

After some initial hiccups due to motherboard compatibility which required me to swap place for GPU and AI-card and then exposing the card into a ubuntu-22.04 docker container it’s working fine.

 

I’m impressed by the performance of the card compared to using --pipe torch and having a CUDA based backend for the benchmark problems identifying traffic.

 

I want to do inference with not only custom weights, but also custom nets and see how it performs. My wish is to have a sample that uses numpy f32 arrays as input and output and does inference on the card given a model from a ONNX file.

 

To keep it simple, I made a PyTorch net y3, 64]n Relu ]664, 64]u Relu ]664, 1] and trained it

on a mathematical function f(x1, x2, x3) = y.

This toy net was exported to ONNX.

 

From there I wanted to test inference on the AI card before moving to more complex things.

I could not see that my wanted usecase was compatible with `deploy.py` and the required items in the model/pipeline yaml file.

So instead I tried to use the compiler directly which gave an error:

…


07:58:42 bINFO] Running LowerFrontend...
07:58:42 bERROR] Failed passes: a'axelera.PadChannelsToPword', 'LowerFrontend']
07:58:42 bINFO] TVM pass trace information stored in: /drive/build/compiled_model
07:58:42 bERROR] Lowering failed. Failed pass: axelera.PadChannelsToPword <- LowerFrontend

…

 

With a small change to the default compilation config, I got a successful compilation thinking that I now have something that I should be able to load onto the card.


compile --generate-config --output config

sed -i 's/quantize_and_lower/quantize_only/'  config/default_conf.json 

compile --input toy-model/toy3d_model.onnx --input-shape 1,3 --overwrite --output build --config config/default_conf.json


...

07:59:58 .INFO] Checking ONNX model compatibility with the constraints of opset 17.
Calibrating... ✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨ | 100% | 469.31it/s | 100it |
07:59:59 œINFO] Exporting '' using GraphExporterV2.
07:59:59 9INFO] Quantization finished.
07:59:59 07:59:59 9INFO] Export quantized model manifest to JSON file: /drive/build/quantized_model_manifest.json
07:59:59 :INFO] Quantization only was requested. Skipping lowering.
07:59:59 aINFO] Done.


Next I wanted to load the compiled model using python and axelera.runtime.

 

Which json file should I use for loading? I tried different ones, but loading the model

does not work and I get the error:


ValueError: AXR_ERROR_VALUE_ERROR: Failed to load model from /drive/build/quantized_model/quantized_model.json
Error: Version not found in model


 

Is there some example in the SDK which shows similar things that I’m trying to do but I’ve missed it? Any hints or guidance would be appreciated.

Follow up:

 

quantize_only is not enough for compile mode setting, lowering is needed.

After a successful quantize_and_lower, `build/compiled_model/model.json` is the file that should be used with context and `load_model`


Reply