Skip to main content

 

SDK v1.3:

14:32:45  INFO] Dump used CLI arguments to: /repo/build/cli_args.json 
14:32:45 INFO] Dump used compiler configuration to: /repo/build/conf.json
14:32:45 INFO] Input model has static input shape(s): ((1, 1, 2, 2),). Use it for quantization.
14:32:45 INFO] Data layout of the input model: NCHW
14:32:45 INFO] Using dataset of size 100 for calibration.
14:32:45 INFO] In case of compilation failures, turn on 'save_error_artifact' and share the archive with Axelera AI. 14:32:45 INFO] Quantizing '' using QToolsV2.
14:32:46 INFO] ONNX model validation can be turned off by setting 'validate_operators' to 'False'.
14:32:46 INFO] Checking ONNX model compatibility with the constraints of opset 17. Calibrating... ✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨ | 100% | 390.54it/s | 100it |
14:32:46 0INFO] Exporting '' using GraphExporterV2.
14:32:47 'INFO] Quantization finished.
14:32:47 OINFO] Quantization took: 1.57 seconds.
14:32:47 aINFO] Export quantized model manifest to JSON file: /repo/build/quantized_model_manifest.json
14:32:47 tINFO] Lower input model to target device...
14:32:47 mINFO] In case of compilation failures, turn on 'save_error_artifact' and share the archive with Axelera AI.
14:32:47 INFO] Lowering '' to target 'device' in 'multiprocess' mode for 1 AIPU core(s) using 100.0% of available AIPU resources.
14:32:47 aINFO] Running LowerFrontend...
14:32:48 INFO] Running FrontendToMidend...
14:32:48 nINFO] Running LowerMidend...
14:32:48 OINFO] Running MidendToTIR...
14:32:49 OINFO] Running LowerTIR...
14:32:51 IINFO] LowerTIR succeeded to fit buffers into memory after iteration 0/4. Pool usage: {L1: alloc:1,058,944B avail:4,194,304B over:0B util:25.25%, L2: alloc:1,075,200B avail:32,309,248B over:0B util:3.33%, DDR: alloc:192B avail:1,040,187,392B over:0B util:0.00%} Overflowing buffer IDs: set()
14:32:51 fINFO] Running TirToAtex...
14:32:51 NINFO] Running LowerATEX...
14:32:51 NINFO] Running AtexToArtifact...
14:32:51 RINFO] Lowering finished!
14:32:51 [info] Compilation took: 4.0 seconds.
14:32:51 lINFO] Passes report was generated and saved to: /repo/build/compiled_model/pass_benchmark_report.json
14:32:51 aINFO] Lowering finished. Export model manifest to JSON file: /repo/build/compiled_model_manifest.json
14:32:51 pINFO] Total time: 5.59 seconds.
14:32:51 TINFO] Done.

 

When using SDK v1.4, same onnx file as before:

 

14:35:31 aINFO] Dump used CLI arguments to: /repo/build/cli_args.json
14:35:31 oINFO] Dump used compiler configuration to: /repo/build/conf.json
14:35:31 oINFO] Input model has static input shape(s): ((1, 1, 2, 2),). Use it for quantization.
14:35:31 ,INFO] Data layout of the input model: NCHW
14:35:31 oINFO] Using dataset of size 100 for calibration.
14:35:31 oINFO] In case of compilation failures, turn on 'save_error_artifact' and share the archive with Axelera AI.
14:35:31 tINFO] Quantizing '' using QToolsV2.
14:35:31 uINFO] ONNX model validation can be turned off by setting 'validate_operators' to 'False'.
14:35:31 iINFO] Checking ONNX model compatibility with the constraints of opset 17.
Calibrating... ✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨ | 100% | 389.72it/s | 100it |
14:35:32 INFO] Exporting '' using GraphExporterV2.
14:35:32 sINFO] Quantization finished.
14:35:32 'INFO] Quantization took: 0.8 seconds.
14:35:32 aINFO] Export quantized model manifest to JSON file: /repo/build/quantized_model_manifest.json
14:35:32 dINFO] Lower input model to target device...
14:35:32 fINFO] In case of compilation failures, turn on 'save_error_artifact' and share the archive with Axelera AI.
14:35:32 tINFO] Lowering '' to target 'device' in 'multiprocess' mode for 1 AIPU core(s) using 100.0% of available AIPU resources.
14:35:32 sINFO] Running LowerFrontend...
14:35:32 iERROR] Failed passes: b'axelera.DenseToConv2d', 'LowerFrontend']
14:35:32 3INFO] TVM pass trace information stored in: /repo/build/compiled_model
14:35:32 FERROR] Lowering failed. Failed pass: axelera.DenseToConv2d <- LowerFrontend
Traceback (most recent call last):
5: TVMFuncCall
4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::transform::Pass, tvm::IRModule)>::AssignTypedLambda<tvm::transform::__mk_TVM9::{lambda(tvm::transform::Pass, tvm::IRModule)#1}>(tvm::transform::__mk_TVM9::{lambda(tvm::transform::Pass, tvm::IRModule)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
3: tvm::transform::Pass::operator()(tvm::IRModule) const
2: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
1: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
0: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) :clone .cold]
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/_ffi/_ctypes/packed_func.py", line 82, in cfun
rv = local_pyfunc(*pyargs)
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/ir/transform.py", line 229, in _pass_func
return inst.transform_module(mod, ctx)
File "<frozen compiler.pipeline.frontend>", line 118, in transform_module
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/ir/transform.py", line 160, in __call__
return _ffi_transform_api.RunPass(self, mod)
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/_ffi/_ctypes/packed_func.py", line 238, in __call__
raise get_last_ffi_error()
5: TVMFuncCall
4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::transform::Pass, tvm::IRModule)>::AssignTypedLambda<tvm::transform::__mk_TVM9::{lambda(tvm::transform::Pass, tvm::IRModule)#1}>(tvm::transform::__mk_TVM9::{lambda(tvm::transform::Pass, tvm::IRModule)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
3: tvm::transform::Pass::operator()(tvm::IRModule) const
2: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
1: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
0: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) :clone .cold]
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/_ffi/_ctypes/packed_func.py", line 82, in cfun
rv = local_pyfunc(*pyargs)
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/ir/transform.py", line 229, in _pass_func
return inst.transform_module(mod, ctx)
File "<frozen compiler.frontend.passes.pass_rewrite_dense_to_conv2d>", line 93, in transform_module
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/relay/dataflow_pattern/__init__.py", line 914, in rewrite
return ffi.rewrite(tmp, expr, mod)
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/_ffi/_ctypes/packed_func.py", line 238, in __call__
raise get_last_ffi_error()
8: TVMFuncCall
7: _ZN3tvm7runtime13PackedFuncObj
6: tvm::runtime::TypedPackedFunc<tvm::RelayExpr (tvm::runtime::Array<tvm::relay::DFPatternCallback, void>, tvm::RelayExpr, tvm::IRModule)>::AssignTypedLambda<tvm::RelayExpr (*)(tvm::runtime::Array<tvm::relay::DFPatternCallback, void>, tvm::RelayExpr, tvm::IRModule)>(tvm::RelayExpr (*)(tvm::runtime::Array<tvm::relay::DFPatternCallback, void>, tvm::RelayExpr, tvm::IRModule), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
5: tvm::relay::RewritePatterns(tvm::runtime::Array<tvm::relay::DFPatternCallback, void>, tvm::RelayExpr, tvm::IRModule)
4: tvm::relay::PatternRewriter::Rewrite(tvm::runtime::Array<tvm::relay::DFPatternCallback, void> const&, tvm::RelayExpr const&)
3: tvm::relay::MixedModeMutator::VisitExpr(tvm::RelayExpr const&)
2: tvm::relay::MixedModeMutator::VisitLeaf(tvm::RelayExpr const&)
1: tvm::relay::PatternRewriter::DispatchVisitExpr(tvm::RelayExpr const&)
0: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) :clone .cold]
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/_ffi/_ctypes/packed_func.py", line 82, in cfun
rv = local_pyfunc(*pyargs)
File "<frozen compiler.frontend.passes.pass_rewrite_dense_to_conv2d>", line 73, in callback
TVMError: AssertionError
14:35:33 eINFO] Dumped input IRModule string representations for debugging to: /repo/build/compiled_model/debug

 

 

Both invocations are with a default config.

 

Is there any additional information you need to be able to look into this regression?

 

  ---

 

edit: formatting looked weird, tried to change to code blocks instead

Hello everyone!
I'm also experiencing a related issue.

I've successfully verified and completed the board installation settings.
aspm, iommu, secure boot, etc.


The device is recognized via LSPCI.

I installed the new voyager-sdk version 1.4.
I also updated the device firmware.

After that, when I try to run the sample, the following message appears and it fails.

(venv) xxxxxx@ProArtX870E-20250830:~/voyager140$ ./inference.py yolov8s-coco-onnx media/traffic1_1080p.mp4 media/traffic2_720p.mp4

INFO    : Could not exec vainfo: Command ‘d'vainfo’]' returned non-zero exit status 3.
1 warning and 1 error generated.
Build log:
<kernel>:8:14: error: expected identifier or ‘{’
typedef enum : int {
             ^
<kernel>:8:1: warning: typedef requires a name
typedef enum : int {
^~~~~~~

terminate called after throwing an instance of ‘std::runtime_error’
  what():  Failed to create OpenCL program
Aborted (core dumped)

I'm just trying to run a sample, but I suspect there might be an inconsistency in one of the files? Is that the case?


By the way, I've confirmed the samples run fine with version 1.3.

Does anyone have any good ideas?

 

Translated with DeepL.com (free version)


Looking at the release notes, v1.4 introduced new passes (including DenseToConv2d) and ONNX opset 17 checks, which is likely why the same model works in v1.3 but fails in v1.4.

Maybe you could try re-exporting your ONNX model with opset 17, and enable save_error_artifact=true so we can review the log? That could help us confirm.

For the OpenCL issue, this looks host-related rather than device-related. The release notes mention updated host environment requirements, including updated OpenCL headers and stricter driver checks. It could be worth checking VA-API and OpenCL are correctly installed and visible on your system, and that your firmware update completed successfully (axdevice --refresh should show the updated version) 👍

Let me know how it goes!


Good morning, everyone. It's morning here. I have some progress to report since then.

First, I replaced the GPU from a GeForce GTX 1050 Ti to a RADEON RX 7600 XT.
Also, there was an update for Metis-dkms.
 

ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー

The following packages will be upgraded:
  metis-dkms
Upgrading: 1, New installations: 0, Removals: 0, Held back: 0.
Need to get 0 B of archives out of 426 kB.
This operation will consume an additional 3,072 B of disk space.
Continue? rY/n] y
(Reading database ... 234064 files and directories currently installed.)...
 Preparing to extract .../metis-dkms_1.2.2_all.deb ...
Module metis-1.0.3 for kernel 6.8.0-79-generic (x86_64).
Before uninstall, this module version was ACTIVE on this kernel.

metis.ko:
 - Uninstallation
   - Deleting from: /lib/modules/6.8.0-79-generic/updates/dkms/
 - Original module
   - No original module was found for this module on this kernel.
   - Use the dkms install command to reinstall any previous module version.

depmod...
Deleting module metis-1.0.3 completely from the DKMS tree.
Expanding metis-dkms (1.2.2) over (1.0.3)...
Configuring metis-dkms (1.2.2)...
Loading new metis-1.2.2 DKMS files...
Building for 6.8.0-79-generic
Building for architecture x86_64
Building initial module for 6.8.0-79-generic
Secure Boot not enabled on this system.
Done.

metis.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/6.8.0-79-generic/updates/dkms/

depmod...
Group axelera already exists


ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー


After that, the sample worked!!!

(venv) xxxxxx@ProArtX870E-20250830:~/voyager140$ ./inference.py yolov8s-coco-onnx media/traffic1_1080p.mp4 media/traffic2_720p.mp4
INFO    : Could not exec vainfo: Command 'b'vainfo']' returned non-zero exit status 3.
WARNING : Failed to get OpenCL platforms : clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR
WARNING : Please check the documentation for installation instructions
Core Temp  : 38.0°C                                                                                                                                             
CPU %      : 9.8%
End-to-end : 603.9fps
Latency    : 64.2ms (min:50.7 max:119.6 σ:3.8 x̄:63.8)ms

 

I'm not sure if it's the GPU or Metis-dkms.

Thanks, everyone!

 

Also, I got an error when trying to use the version 1.3 sample in version 1.4.

Sorry for reusing it.

 

 

 

Translated with DeepL.com (free version)


Hello!

 

After that, I swapped it back to the GeForce RTX 1050 Ti, and the same error occurred.

 

It seems the GPU is the cause.

Thank you very much!!!


I export the model from PyTorch this way:

    torch.onnx.export(
model,
dummy_input,
args.onnx,
export_params=True,
opset_version=17,
do_constant_folding=True,
input_names=["input"],
output_names=["output"],
# dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}},
)

so I don’t think re-exporting will help.

Tried to attach `build/compiled_model/20250902_093659_error_report_.tar` but the format wasn’t supported. Placed the tar file and the contents of  build/compiled_model/debug in a zip file instead.


Ah yes, you’re already using Opset 17, so re-exporting wouldn’t help here.

I wonder if it’s a change in the DenseToConv2d that’s causing it? Hitting an unexpected shape or something. Perhaps if we turn it off for now in your config.json (thanks for the files by the way - super useful) we could see if it compiles? To check if that’s the issue, like.

"rewrite_dense_to_conv2d": false

...should do it, I’m thinking.


Just tried setting `rewrite_dense_to_conv2d` to `false` and it moved the failure and towards the end there is an invalid tensor shape

 

12:48:08  INFO] Running LowerFrontend...
12:48:08 ERROR] Failed passes: 'axelera.BroadcastScalars', 'LowerFrontend']
12:48:08 INFO] TVM pass trace information stored in: /repo/build/compiled_model
12:48:08 ERROR] Lowering failed. Failed pass: axelera.BroadcastScalars <- LowerFrontend
Traceback (most recent call last):
5: TVMFuncCall
4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::transform::Pass, tvm::IRModule)>::AssignTypedLambda<tvm::transform::__mk_TVM9::{lambda(tvm::transform::Pass, tvm::IRModule)#1}>(tvm::transform::__mk_TVM9::{lambda(tvm::transform::Pass, tvm::IRModule)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
3: tvm::transform::Pass::operator()(tvm::IRModule) const
2: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
1: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
0: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) clone .cold]
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/_ffi/_ctypes/packed_func.py", line 82, in cfun
rv = local_pyfunc(*pyargs)
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/ir/transform.py", line 229, in _pass_func
return inst.transform_module(mod, ctx)
File "<frozen compiler.pipeline.frontend>", line 143, in transform_module
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/ir/transform.py", line 160, in __call__
return _ffi_transform_api.RunPass(self, mod)
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/_ffi/_ctypes/packed_func.py", line 238, in __call__
raise get_last_ffi_error()
5: TVMFuncCall
4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::transform::Pass, tvm::IRModule)>::AssignTypedLambda<tvm::transform::__mk_TVM9::{lambda(tvm::transform::Pass, tvm::IRModule)#1}>(tvm::transform::__mk_TVM9::{lambda(tvm::transform::Pass, tvm::IRModule)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
3: tvm::transform::Pass::operator()(tvm::IRModule) const
2: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
1: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
0: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) clone .cold]
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/_ffi/_ctypes/packed_func.py", line 82, in cfun
rv = local_pyfunc(*pyargs)
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/ir/transform.py", line 229, in _pass_func
return inst.transform_module(mod, ctx)
File "<frozen compiler.frontend.passes.pass_broadcast_scalars>", line 130, in transform_module
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/relay/dataflow_pattern/__init__.py", line 914, in rewrite
return ffi.rewrite(tmp, expr, mod)
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/_ffi/_ctypes/packed_func.py", line 238, in __call__
raise get_last_ffi_error()
15: TVMFuncCall
14: _ZN3tvm7runtime13PackedFuncObj
13: tvm::runtime::TypedPackedFunc<tvm::RelayExpr (tvm::runtime::Array<tvm::relay::DFPatternCallback, void>, tvm::RelayExpr, tvm::IRModule)>::AssignTypedLambda<tvm::RelayExpr (*)(tvm::runtime::Array<tvm::relay::DFPatternCallback, void>, tvm::RelayExpr, tvm::IRModule)>(tvm::RelayExpr (*)(tvm::runtime::Array<tvm::relay::DFPatternCallback, void>, tvm::RelayExpr, tvm::IRModule), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
12: tvm::relay::RewritePatterns(tvm::runtime::Array<tvm::relay::DFPatternCallback, void>, tvm::RelayExpr, tvm::IRModule)
11: tvm::relay::PatternRewriter::Rewrite(tvm::runtime::Array<tvm::relay::DFPatternCallback, void> const&, tvm::RelayExpr const&)
10: tvm::relay::MixedModeMutator::VisitExpr(tvm::RelayExpr const&)
9: tvm::relay::MixedModeMutator::VisitLeaf(tvm::RelayExpr const&)
8: tvm::relay::PatternRewriter::DispatchVisitExpr(tvm::RelayExpr const&)
7: _ZN3tvm5relay16MixedModeMutator17DispatchVisitExprERKNS_9RelayExp
6: tvm::relay::ExprMutator::VisitExpr(tvm::RelayExpr const&)
5: _ZZN3tvm5relay11ExprFunctorIFNS_9RelayExprERKS2_EE10InitVTableEvENUlRKNS_
4: tvm::relay::ExprMutator::VisitExpr_(tvm::relay::FunctionNode const*)
3: tvm::relay::MixedModeMutator::VisitExpr(tvm::RelayExpr const&)
2: tvm::relay::MixedModeMutator::VisitLeaf(tvm::RelayExpr const&)
1: tvm::relay::PatternRewriter::DispatchVisitExpr(tvm::RelayExpr const&)
0: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) clone .cold]
File "/root/.cache/axelera/venvs/6d332c79/lib/python3.10/site-packages/tvm/_ffi/_ctypes/packed_func.py", line 82, in cfun
rv = local_pyfunc(*pyargs)
File "<frozen compiler.frontend.passes.pass_broadcast_scalars>", line 96, in callback
File "<frozen compiler.frontend.passes.pass_broadcast_scalars>", line 35, in _is_scalar
ValueError: Invalid tensor shape (1, 64) in BroadCastScalars

 

 


Ah, interesting that it moved things along (I was making that up as I went a little bit, so wasn’t sure where it’d get us 🤣) but now we’ve being tripped up by BroadcastScalars.

Let me ask around the team, and see if anyone’s spotted anything similar before now. Could you share a few details on the model you’re using, and maybe about your host system and OS too? (Even though those likely aren’t an issue, since it worked on 1.3, but I guess we can never give people too much information 😄)


I’m using Ubuntu 24.04 LTS as host OS with metis-dkms 1.0.3 from deb package.

I’m running ubuntu 22.04 LTS in a docker container into which I expose both my normal GPU and the Metis card.

The Dockerfile for used when making container with SDK 1.3 and 1.4 were almost identical:

FROM ubuntu:22.04

ARG AX_USER
ARG AX_TOKEN
ARG AX_TAG

ENV LANG=en_US.UTF-8

RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -q -y sudo git pciutils curl lsb-release git

RUN mkdir /code && cd /code && git clone -b ${AX_TAG} --depth 1 https://github.com/axelera-ai-hub/voyager-sdk.git

# SDK v1.4.0 removed user and token
RUN cd /code/voyager-sdk/ && ./install.sh --YES --no-driver --media # --user ${AX_USER} --token ${AX_TOKEN}

RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -q -y nvidia-cuda-toolkit vim less wget

# Replace torch with cuda enabled version
RUN /code/voyager-sdk/venv/bin/pip3 install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --index-url https://download.pytorch.org/whl/cu117

And some parameters when making the image:

docker build --build-arg AX_USER='unused' \
--build-arg AX_TOKEN='unused' \
--build-arg AX_TAG='v1.4.0' \
-t axelera:v1.4 .

I had to create the `/dev/metis0` symlink manually.

With the docker container that has SDK 1.3, I can do inference on the card so I do have a working setup.

My workday is nearing its end, I’ll add more details about the network tomorrow.

 


Coolio, thanks! This is awesome. 

 

Let me ask around and get back to you ​@Linde!


The stuff in the onnx file is from my testing with some simple net so I could go from everything custom in PyTorch and taking it all the way to doing inference on the card.

First net  was just a few linear layers with N_INPUT as 3:

def build_model(hidden=64):
return nn.Sequential(
nn.Linear(N_INPUT, hidden), nn.ReLU(),
nn.Linear(hidden, hidden), nn.ReLU(),
nn.Linear(hidden, 1)
).to(device)

Thinking that it’s better get this to work before looking at more complex things.

I did not get this through the compiler. Tried different settings for telling the compiler the input size batch size 1, then my 3 elements. 

At one point got an error about only 1d or 4d being supported.

So then I thought ok, 4d it is. Batch size 1, channels 1, my 3 elements can become 2x2 and N_INPUT is set to 4 instead. So a Conv2d layer first, but just as some pass-through.

At some point I also skipped dynamic_axes during onnx export.

 


class Toy3DFunc(nn.Module):
def __init__(self, hidden=64):
super().__init__()

self.ugly = nn.Conv2d(in_channels=1, out_channels=N_INPUT, kernel_size=(2,2),
stride=1, padding=0, bias=False)

# code omitted where conv2d is set up. zero weights, fix some to 1
# requires_grad -> false

self.ugly.to(device)

self.net = nn.Sequential(
nn.Linear(N_INPUT, hidden), nn.ReLU(),
nn.Linear(hidden, hidden), nn.ReLU(),
nn.Linear(hidden, 1)
).to(device)

def forward(self, x):
y = self.ugly(x)
z = y.view(y.shapes0], -1)

return self.net(z)

I tried quite a few different ways to do the reshaping of the data to get it to the sequential net. Everything that worked in PyTorch and exported fine to onnx wasn’t liked by the compiler.

 

The snippet above is what I finally got to 1.3 compiler to accept. Not code that I’m proud of… 

But not long after I got my weights onto the card and could then play around with inference and then continue with testing `--images somedir --transform pre_process.py` when compiling to change the range of inputdata from p0,1] and see effects on predicted values.


Sorry, one other thing that’d be useful for the team to check ​@Linde - could you share the ONNX model file as well, please? You can always send it me via DM if it’s not something you want to share publicly.


DM sent.


Hello!

 

After that, I swapped it back to the GeForce RTX 1050 Ti, and the same error occurred.

 

It seems the GPU is the cause.

Thank you very much!!!

The problem is actually that the Nvidia OpenCL compiler is a little more strict than others and disallows extensions that others allow, which is how this slipped through QA. The problem is that the code is not strictly compliant OpenCL C and so you see the failure. It is fairly straightforward to patch the source to enable the code to work on 1050. 
In the file operators/axstreamer/src/AxOpenCl.cpp change the line at 365 from:

typedef enum : int {

to
 

typedef enum output_format {

and then make operators and you should now be able to use your 1050.


Reply