Compiling yolov5 doesn't give correct input/output shapes

I’ve used the yolov5s.pt file provided from the official yolov5 repository and converted it to onnx with this command:

python3 export.py --weights yolov5s.pt --imgsz 640 --batch-size 1 --include onnx --opset 17 .

I’ve then tried to compile this with this command:

compile -i /home/axelera/Vision.AxeleraTesting/data/yolov5s.onnx -o /home/axelera/Vision.AxeleraTesting/data/compile --overwrite

I got this output:

09:49:45 >INFO] Dump used CLI arguments to: /home/vintecc/axelera/Vision.AxeleraTesting/data/compile/cli_args.json
09:49:45 >INFO] Dump used compiler configuration to: /home/vintecc/axelera/Vision.AxeleraTesting/data/compile/conf.json
09:49:45 >INFO] Input model has static input shape(s): ((1, 3, 640, 640),). Use it for quantization.
09:49:45 >INFO] Data layout of the input model: NCHW
09:49:45 >INFO] Using dataset of size 100 for calibration.
09:49:45 >INFO] In case of compilation failures, turn on 'save_error_artifact' and share the archive with Axelera AI.
09:49:45 >INFO] Quantizing '' using QToolsV2.
09:49:45 >INFO] ONNX model validation can be turned off by setting 'validate_operators' to 'False'.
09:49:49 >INFO] Checking ONNX model compatibility with the constraints of opset 17.
Calibrating... ✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨ | 100% |  8.22it/s | 100it |
09:50:04 �INFO] Exporting '' using GraphExporterV2.
/home/vintecc/.cache/axelera/venvs/3252ae77/lib/python3.10/site-packages/torch/nn/functional.py:3734: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/home/vintecc/.cache/axelera/venvs/3252ae77/lib/python3.10/site-packages/torch/jit/_trace.py:976: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  module._c._create_method_from_trace(
09:50:12 oINFO] Quantization finished.
09:50:12 5INFO] Quantization took: 27.38 seconds.
09:50:12 ]INFO] Export quantized model manifest to JSON file: /home/vintecc/axelera/Vision.AxeleraTesting/data/compile/quantized_model_manifest.json
09:50:12 mINFO] Lower input model to target device...
09:50:12 wINFO] In case of compilation failures, turn on 'save_error_artifact' and share the archive with Axelera AI.
09:50:12  INFO] Lowering '' to target 'device' in 'multiprocess' mode for 1 AIPU core(s) using 100.0% of available AIPU resources.
09:50:12 1INFO] Running LowerFrontend...
09:50:31 :INFO] Running FrontendToMidend...
09:50:33  INFO] Running LowerMidend...
09:50:37 5INFO] Running MidendToTIR...
09:50:58 5INFO] Running LowerTIR...
09:51:54 0INFO] LowerTIR succeeded to fit buffers into memory after iteration 0/4.
  Pool usage: {L1: alloc:3,978,752B avail:4,194,304B over:0B util:94.86%, L2: alloc:23,074,304B avail:32,309,248B over:0B util:71.42%, DDR: alloc:3,840,256B avail:1,040,187,392B over:0B util:0.37%}
  Overflowing buffer IDs: set()
09:51:55 rINFO] Running TirToAtex...
09:51:57 9INFO] Running LowerATEX...
09:52:01 9INFO] Running AtexToArtifact...
09:52:02 0INFO] Lowering finished!
09:52:02 >INFO] Compilation took: 109.53 seconds.
09:52:02 ]INFO] Passes report was generated and saved to: /home/vintecc/axelera/Vision.AxeleraTesting/data/compile/compiled_model/pass_benchmark_report.json
09:52:02 eINFO] Lowering finished. Export model manifest to JSON file: /home/vintecc/axelera/Vision.AxeleraTesting/data/compile/compiled_model_manifest.json
09:52:02 oINFO] Total time: 137.26 seconds.
09:52:02  INFO] Done.

Looking at the output it finds the correct input shape (1, 3, 640, 640). However when opening the model with the runtime API I get different input/ouput shapes. This piece of code:

input_infos, output_infos = model.inputs(), model.outputs()

Gives me this:

Input: uTensorInfo(shape=(1, 644, 656, 4), dtype=<class 'numpy.int8'>, name='var_input_ifd0', padding=s(0, 0), (0, 0), (0, 0), (0, 0)], scale=1.0, zero_point=0)]

Output:

)TensorInfo(shape=(1, 644, 656, 4), dtype=<class 'numpy.int8'>, name='var_input_ifd0', padding=l(0, 0), (0, 0), (0, 0), (0, 0)], scale=1.0, zero_point=0)]

TensorInfo(shape=(1, 40, 40, 256), dtype=<class 'numpy.int8'>, name='output_mmio_var', padding=a(0, 0), (0, 0), (0, 0), (0, 0)], scale=1.0, zero_point=0), TensorInfo(shape=(1, 20, 20, 256), dtype=<class 'numpy.int8'>, name='output_mmio_var_1', padding=s(0, 0), (0, 0), (0, 0), (0, 0)], scale=1.0, zero_point=0), TensorInfo(shape=(1, 80, 80, 256), dtype=<class 'numpy.int8'>, name='output_mmio_var_2', padding=s(0, 0), (0, 0), (0, 0), (0, 0)], scale=1.0, zero_point=0)]

The input should be 1, 3, 640, 640

And the output should be 1, 25200, 85

Could you assist me in how I should use the compiled model?

Page 1 / 1

Hi @Gilles!

Hmm, I wonder if the compiled_model_manifest.json that the log mentions can tell us which tensors the runtime is expecting and returning, in its "inputs" and "outputs" sections? If the input tensor listed there doesn’t match the expected shape (1, 3, 640, 640) , it’s possible the wrong tensor is being exposed.

Let me know if that helps, and how it goes, and we can work from there!

{
  "quantized_model_file": "/home/vintecc/axelera/Vision.AxeleraTesting/data/compile/compiled_model/quantized_model.json",
  "quantize_params": [
    [
      0.003913902677595615,
      -128
    ]
  ],
  "dequantize_params": [
    [
      0.003781032981351018,
      -128
    ],
    [
      0.0038066483102738857,
      -128
    ],
    [
      0.003769318340346217,
      -128
    ]
  ],
  "input_names": [
    "x"
  ],
  "input_shapes": [
    [
      1,
      644,
      656,
      4
    ]
  ],
  "input_dtypes": [
    "int8"
  ],
  "output_shapes": [
    [
      1,
      40,
      40,
      256
    ],
    [
      1,
      20,
      20,
      256
    ],
    [
      1,
      80,
      80,
      256
    ]
  ],
  "output_dtypes": [
    "int8",
    "int8",
    "int8"
  ],
  "model_lib_file": "/home/vintecc/axelera/Vision.AxeleraTesting/data/compile/compiled_model/model.json",
  "model_params_file": "/home/vintecc/axelera/Vision.AxeleraTesting/data/compile/compiled_model/params.bin",
  "n_padded_ch_inputs": [
    [
      0,
      0,
      2,
      2,
      2,
      14,
      0,
      1
    ]
  ],
  "n_padded_ch_outputs": [
    [
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      1
    ],
    [
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      1
    ],
    [
      0,
      0,
      0,
      0,
      0,
      0,
      0,
      1
    ]
  ],
  "input_tensor_layout": "NHWC",
  "preprocess_graph": null,
  "postprocess_graph": "/home/vintecc/axelera/Vision.AxeleraTesting/data/compile/compiled_model/postprocess_graph.onnx",
  "manifest_version": "1.0"
}

Hey @Spanner, thanks for the reply! This is the compiled_model_manifest.json content. The input name doesn’t seem correct (it should be “images”), but I can’t see where the x comes from in the model. The name doesn’t exist and the shape is also not visible somewhere. The output names I don’t see in the JSON.

However I see that the compiled_model/postprocess_graph.onnx contains the layout to transfer from the weird output shape to the expected shape.

hi @Gilles ,

Our AIPU requires inputs and returns outputs with specific dimensions.
That’s why the model_manifest.json contains modified (padded) input_shapes and output_shapes.
The applied padding can be seen in n_padded_ch_inputs and n_padded_ch_outputs.

Please see our axruntime example for an example how the padding can be applied.
https://github.com/axelera-ai-hub/voyager-sdk/blob/eb3d9d6e2fe83b2aba3fc3014f8a05ae69f167ee/examples/axruntime/axruntime_example.py#L70

Hey @Jonas !

Thanks for the reply! For the input this seems like the solution! However for the output, I get 3 different tensors instead of the one:

(1, 80, 80, 256)
(1, 40, 40, 256)
(1, 20, 20, 256)

Instead of (1, 25200, 85). I see that the postprocess_graph.onnx contains the conversion, but can this be added in the model or do we need to do this ourselves?

hi @Gilles ,

Our compiler cuts of the last part of the model that does not run on our AIPU.
This is indeed the postprocess_graph.onnx which you need to still run yourself.

Reply

Sign up

Log in, or create an Axelera AI account

Login to the community

Log in, or create an Axelera AI account

Scanning file for viruses.

This file cannot be downloaded