I’m doing some testing with the PCI-Express based AI card.Â
After some initial hiccups due to motherboard compatibility which required me to swap place for GPU and AI-card and then exposing the card into a ubuntu-22.04 docker container it’s working fine.
Â
I’m impressed by the performance of the card compared to using --pipe torch and having a CUDA based backend for the benchmark problems identifying traffic.
Â
I want to do inference with not only custom weights, but also custom nets and see how it performs. My wish is to have a sample that uses numpy f32 arrays as input and output and does inference on the card given a model from a ONNX file.
Â
To keep it simple, I made a PyTorch net y3, 64]n Relu ]664, 64]u Relu ]664, 1] and trained it
on a mathematical function f(x1, x2, x3) = y.
This toy net was exported to ONNX.
Â
From there I wanted to test inference on the AI card before moving to more complex things.
I could not see that my wanted usecase was compatible with `deploy.py` and the required items in the model/pipeline yaml file.
So instead I tried to use the compiler directly which gave an error:
…
07:58:42 bINFO] Running LowerFrontend...
07:58:42 bERROR] Failed passes: a'axelera.PadChannelsToPword', 'LowerFrontend']
07:58:42 bINFO] TVM pass trace information stored in: /drive/build/compiled_model
07:58:42 bERROR] Lowering failed. Failed pass: axelera.PadChannelsToPword <- LowerFrontend
…
Â
With a small change to the default compilation config, I got a successful compilation thinking that I now have something that I should be able to load onto the card.
compile --generate-config --output config
sed -i 's/quantize_and_lower/quantize_only/'Â config/default_conf.jsonÂ
compile --input toy-model/toy3d_model.onnx --input-shape 1,3 --overwrite --output build --config config/default_conf.json
...
07:59:58 .INFO] Checking ONNX model compatibility with the constraints of opset 17.
Calibrating... | 100% | 469.31it/s | 100it |
07:59:59 œINFO] Exporting '' using GraphExporterV2.
07:59:59 9INFO] Quantization finished.
07:59:59
07:59:59 :INFO] Quantization only was requested. Skipping lowering.
07:59:59 aINFO] Done.
Next I wanted to load the compiled model using python and axelera.runtime.
Â
Which json file should I use for loading? I tried different ones, but loading the model
does not work and I get the error:
ValueError: AXR_ERROR_VALUE_ERROR: Failed to load model from /drive/build/quantized_model/quantized_model.json
Error: Version not found in model
Â
Is there some example in the SDK which shows similar things that I’m trying to do but I’ve missed it? Any hints or guidance would be appreciated.