Thanks for sharing this @npi - I’ll ask around the team, see what we can see 
Might be worth quickly bumping it up to the latest v1.4 either way, in the meantime?
https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.4/docs/tutorials/firmware_flash_update.md
Hi @Spanner
Thanks for the suggestion however the installation failed at the end with:
<138/140] Install axelera-runtime-1.4.0
138/140] E: Unable to locate package axelera-runtime-1.4.0
138/140] E: Couldn't find any package by glob 'axelera-runtime-1.4.0'
138/140] E: Couldn't find any package by regex 'axelera-runtime-1.4.0'
ERROR: Failed to install axelera-runtime-1.4.0
I had to rollback to v1.3.3
Hello @npi
The torch-aipu flow is not optimised for performance, it uses PyTorch for pre and postprocessing operators, and only a single core of the Axelera AIPU is used for inference. It is also not pipelined at all so latency and throughput is poor. It is intended to be used to compare the accuracy of the model against a pure PyTorch pipeline, so that you can measure the impact of quantisation on accuracy. Since your Aetina does not have a GPU it will be using a CPU base PyTorch back end.
Consequently the 1.5fps is not very surprising.
For the yolov5s-v7 pipeline you are using the GStreamer pipe (--pipe=gst). This is pipelined and utilises all 4 cores, but it is still limited by the performance of the slowest element in the pipeline, in this case it is
inference-task0:libtransform_resize_cl_0 4,090 244.4
This is the element that resizes and letterboxes the 720p input frame to 640x640 for the yolo model. The gap between this 244fps and 214fps will be partly caused by using `--show-stats` which on all hosts as a non-zero impact, and on the Aetina rock chip more so, and additionally there is other overhead in the pipeline management - we are always working on closing this gap.
What performance do you get for unet_fcn_512-cityscapes using the gst pipe?
Regarding the installation issue with 1.4, I have asked someone more familiar with the installer to investigate a likely cause of this. I suggest we do pursue this because aside from various performance improvements, and bug fixes, 1.4 also makes it easier to use images from a directory of images and possible to get images from a python generator, as shown in this example:
https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.4/examples/data_source.py
Hello @SamP ,
Thanks for your quick and detailed answer.
I ran unet with the gst pipe and get 7.8fps, much better but still lower than expected.
./inference.py unet_fcn_512-cityscapes dataset --pipe gst --no-display --show-stats
=========================================================================
Element Time(𝜇s) Effective FPS
=========================================================================
axinplace-addstreamid0 159 6,270.2
axtransform-colorconvert0 160,019 6.2
inference-task0:libtransform_resize_cl_0 2,403 416.1
inference-task0:libtransform_padding_0 1,509 662.7
inference-task0:inference 95,637 10.5
inference-task0:Inference latency 2,455,298 n/a
inference-task0:libtransform_paddingdequantize_0
17,239 58.0
inference-task0:libdecode_semantic_seg_0 14,789 67.6
inference-task0:Postprocessing latency 123,824 n/a
inference-task0:Total latency 3,157,814 n/a
=========================================================================
End-to-end average measurement 7.8
=========================================================================
And also got during inference this warning:
WARNING : New inference data is ready, but the InferencedStream is not being processed fast enough (backlog=10)
INFO : InferencedStream is being processed quickly enough again (backlog=1)
With --enable-hardware-codec I get 9.7 fps
=========================================================================
Element Time(𝜇s) Effective FPS
=========================================================================
axinplace-addstreamid0 150 6,660.3
axtransform-colorconvert0 116,649 8.6
inference-task0:libtransform_resize_cl_0 1,818 549.8
inference-task0:libtransform_padding_0 1,244 803.6
inference-task0:inference 99,256 10.1
inference-task0:Inference latency 1,679,149 n/a
inference-task0:libtransform_paddingdequantize_0
15,439 64.8
inference-task0:libdecode_semantic_seg_0 12,416 80.5
inference-task0:Postprocessing latency 19,759 n/a
inference-task0:Total latency 1,875,842 n/a
=========================================================================
End-to-end average measurement 9.7
=========================================================================
And no warning
Adding --disable-opencl I get 10.5fps
=========================================================================
Element Time(𝜇s) Effective FPS
=========================================================================
axinplace-addstreamid0 169 5,895.2
videoconvert0 9,219 108.5
capsfilter0 238 4,186.0
inference-task0:libtransform_resize_0 6,576 152.1
inference-task0:libtransform_totensor_0 333 2,996.9
inference-task0:libinplace_normalize_0 20,924 47.8
inference-task0:libtransform_padding_0 1,089 917.5
inference-task0:inference 82,273 12.2
inference-task0:Inference latency 1,743,865 n/a
inference-task0:libtransform_paddingdequantize_0
15,887 62.9
inference-task0:libdecode_semantic_seg_0 13,069 76.5
inference-task0:Postprocessing latency 62,064 n/a
inference-task0:Total latency 2,079,652 n/a
=========================================================================
End-to-end average measurement 10.5
=========================================================================
But warning came back
disabling opengl or/and vaapi did not improve/degrade fps
Is there anything else I can tune to reach the 18fps ?
Thanks