Aetina board with M2 Metis: low fps and high latency running unet_fcn_512-cityscapes

Hello,

I’m running inference with model unet_fcn_512-cityscapes with pipe torch-aipu on the aetina eval board. It runs at 1.8 fps system and with 500ms of latency although device fps shows 11.5 fps capability. In the doc it is also mentioned that it should reach 18fps. I originally thought it was an issue with time wasted to load and decode png images from the SD card so I put them in shared memory but result are identical.
I also tested yolov5s-v7-coco that should reach 805 fps but I can only achieve 214fps. Here are the output of:

AXELERA_USE_CL_DOUBLE_BUFFER=0 ./inference.py yolov5s-v7-coco media/traffic3_720p.mp4 --show-stats --no-display

INFO    : Deploying model yolov5s-v7-coco for 4 cores. This may take a while...
|████████████████████████████████████████| 12:41.1 
arm_release_ver: g13p0-01eac0, rk_so_ver: 9
========================================================================                                                                                                                                                                                                                                                                                                     
Element                                         Time(𝜇s)   Effective FPS
========================================================================
qtdemux0                                             319         3,126.4
h264parse0                                         3,094           323.2
capsfilter0                                          259         3,851.4
mppvideodec0                                       9,563           104.6
decodebin-link0                                       91        10,922.0
axtransform-colorconvert0                          3,404           293.8
inference-task0:libtransform_resize_cl_0           4,090           244.4
inference-task0:libtransform_padding_0             1,816           550.5
inference-task0:inference                          4,405           227.0
inference-task0:Inference latency                 94,835             n/a
inference-task0:libdecode_yolov5_0                   991         1,008.3
inference-task0:libinplace_nms_0                     130         7,679.8
inference-task0:Postprocessing latency               952             n/a
inference-task0:Total latency                    110,383             n/a
========================================================================
End-to-end average measurement                                     214.0
========================================================================

Is there anything I can tune to reduce the latency and increase the fps?

Voyager SDK release v1.3.3, Ubuntu 22.04

Thanks in advance

Page 1 / 1

Thanks for sharing this @npi - I’ll ask around the team, see what we can see

Might be worth quickly bumping it up to the latest v1.4 either way, in the meantime?

https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.4/docs/tutorials/firmware_flash_update.md

Hi @Spanner
Thanks for the suggestion however the installation failed at the end with:

<138/140] Install axelera-runtime-1.4.0
138/140] E: Unable to locate package axelera-runtime-1.4.0
138/140] E: Couldn't find any package by glob 'axelera-runtime-1.4.0'
138/140] E: Couldn't find any package by regex 'axelera-runtime-1.4.0'
ERROR: Failed to install axelera-runtime-1.4.0

I had to rollback to v1.3.3

Hello @npi

The torch-aipu flow is not optimised for performance, it uses PyTorch for pre and postprocessing operators, and only a single core of the Axelera AIPU is used for inference. It is also not pipelined at all so latency and throughput is poor. It is intended to be used to compare the accuracy of the model against a pure PyTorch pipeline, so that you can measure the impact of quantisation on accuracy. Since your Aetina does not have a GPU it will be using a CPU base PyTorch back end.

Consequently the 1.5fps is not very surprising.

For the yolov5s-v7 pipeline you are using the GStreamer pipe (--pipe=gst). This is pipelined and utilises all 4 cores, but it is still limited by the performance of the slowest element in the pipeline, in this case it is

inference-task0:libtransform_resize_cl_0 4,090 244.4

This is the element that resizes and letterboxes the 720p input frame to 640x640 for the yolo model. The gap between this 244fps and 214fps will be partly caused by using `--show-stats` which on all hosts as a non-zero impact, and on the Aetina rock chip more so, and additionally there is other overhead in the pipeline management - we are always working on closing this gap.

What performance do you get for unet_fcn_512-cityscapes using the gst pipe?

Regarding the installation issue with 1.4, I have asked someone more familiar with the installer to investigate a likely cause of this. I suggest we do pursue this because aside from various performance improvements, and bug fixes, 1.4 also makes it easier to use images from a directory of images and possible to get images from a python generator, as shown in this example:

https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.4/examples/data_source.py

Hello @SamP ,

Thanks for your quick and detailed answer.

I ran unet with the gst pipe and get 7.8fps, much better but still lower than expected.

./inference.py unet_fcn_512-cityscapes dataset --pipe gst --no-display --show-stats

=========================================================================                                                                                                                       
Element                                          Time(𝜇s)   Effective FPS
=========================================================================
axinplace-addstreamid0                                159         6,270.2
axtransform-colorconvert0                         160,019             6.2
inference-task0:libtransform_resize_cl_0            2,403           416.1
inference-task0:libtransform_padding_0              1,509           662.7
inference-task0:inference                          95,637            10.5
inference-task0:Inference latency               2,455,298             n/a
inference-task0:libtransform_paddingdequantize_0
                                                   17,239            58.0
inference-task0:libdecode_semantic_seg_0           14,789            67.6
inference-task0:Postprocessing latency            123,824             n/a
inference-task0:Total latency                   3,157,814             n/a
=========================================================================
End-to-end average measurement                                        7.8
=========================================================================

And also got during inference this warning:

WARNING : New inference data is ready, but the InferencedStream is not being processed fast enough (backlog=10)                                                                                 
INFO    : InferencedStream is being processed quickly enough again (backlog=1)

With --enable-hardware-codec I get 9.7 fps

=========================================================================                                                                                                                       
Element                                          Time(𝜇s)   Effective FPS
=========================================================================
axinplace-addstreamid0                                150         6,660.3
axtransform-colorconvert0                         116,649             8.6
inference-task0:libtransform_resize_cl_0            1,818           549.8
inference-task0:libtransform_padding_0              1,244           803.6
inference-task0:inference                          99,256            10.1
inference-task0:Inference latency               1,679,149             n/a
inference-task0:libtransform_paddingdequantize_0
                                                   15,439            64.8
inference-task0:libdecode_semantic_seg_0           12,416            80.5
inference-task0:Postprocessing latency             19,759             n/a
inference-task0:Total latency                   1,875,842             n/a
=========================================================================
End-to-end average measurement                                        9.7
=========================================================================

And no warning

Adding --disable-opencl I get 10.5fps

=========================================================================                                                                                                                       
Element                                          Time(𝜇s)   Effective FPS
=========================================================================
axinplace-addstreamid0                                169         5,895.2
videoconvert0                                       9,219           108.5
capsfilter0                                           238         4,186.0
inference-task0:libtransform_resize_0               6,576           152.1
inference-task0:libtransform_totensor_0               333         2,996.9
inference-task0:libinplace_normalize_0             20,924            47.8
inference-task0:libtransform_padding_0              1,089           917.5
inference-task0:inference                          82,273            12.2
inference-task0:Inference latency               1,743,865             n/a
inference-task0:libtransform_paddingdequantize_0
                                                   15,887            62.9
inference-task0:libdecode_semantic_seg_0           13,069            76.5
inference-task0:Postprocessing latency             62,064             n/a
inference-task0:Total latency                   2,079,652             n/a
=========================================================================
End-to-end average measurement                                       10.5
=========================================================================

But warning came back

disabling opengl or/and vaapi did not improve/degrade fps

Is there anything else I can tune to reach the 18fps ?

Thanks

Reply

Sign up

Log in, or create an Axelera AI account

Login to the community

Log in, or create an Axelera AI account

Scanning file for viruses.

This file cannot be downloaded