Reproducing the Model Zoo benchmarks

Question

Hello,

I am having issues reproducing the Axelera Model Zoo benchmarks on a Axelera Metis M.2 module.

My issues seem to be related to pre/post processing, which is why I started with a classification model.

I was hoping to use “fakevideo” as input, to bypass the pre-processing, but that seems to be fixed at 30fps.

Another unresolved issue I have is that opencl is not working on my AMD PC.

For ResNet-50 v1.5, here is what I am getting …

(venv) voyager-sdk$ ./inference.py resnet50-imagenet data/coco --disable-opencl --no-display --show-stats
========================================================================
Element Time(𝜇s) Effective FPS
========================================================================
axinplace-addstreamid0 16 61,066.3
vaapipostproc0 1,927 518.9
videoconvert0 17 56,486.4
axinplace0 7 130,629.7
inference-task0:libtransform_resizeratiocropexcess_0
149 6,677.7
inference-task0:libtransform_totensor_0 7 142,241.8
inference-task0:libinplace_normalize_0 16 59,300.1
inference-task0:libtransform_padding_0 20 47,901.4
inference-task0:inference 2,884 346.7
inference-task0:Inference latency 42,056 n/a
inference-task0:libtransform_paddingdequantize_0
5 184,454.6
inference-task0:libdecode_classification_0 8 116,059.4
inference-task0:Postprocessing latency 726 n/a
inference-task0:Total latency 45,652 n/a
========================================================================
End-to-end average measurement 351.1
========================================================================
Core Temp : 37.0°C
CPU % : 5.5%
End-to-end : 351.1fps
Latency : 42.8ms (min:9.1 max:57.3 σ:3.9 x̄:42.7)ms

Spanner · Accepted Answer

Ah, nice work on ramping it up anyway, ​@AlbertaBeef!In terms of the differences with the benchmarks, a few things that come to mind could be the host system (Ithinkthe benchmarks were run on an Intel i9. Not sure about performance difference, but it is an architectural difference, I guess).And I think the benchmark numbers are measured against multiple input streams running in parallel to spread the load across all four cores. This might have some interesting directions to test in terms of multiple inputs:https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.5/docs/reference/inference.mdLet me know how it goes!

AlbertaBeef · Answer

Thank you for the response ​@Spanner.When using directories of images, I am not able to successfully run a session with multiple input streams. I keep getting errors:(venv) abbeefai@AlbertaBeefAI:/media/abbeefai/TheExpanse/shared_with_docker/voyager-sdk$ ./inference.py resnet50-imagenet data/coco data/coco data/coco data/coco --no-display --show-stats --aipu-cores 4Detecting...               : 4%|▍    | 811/20000 [00:01<00:19, 998.95frames/s](python:1574310): GStreamer-CRITICAL **: 12:11:49.968: gst_caps_is_fixed: assertion 'GST_IS_CAPS (caps)' failed(python:1574310): GStreamer-CRITICAL **: 12:11:49.968: gst_caps_is_equal_fixed: assertion 'gst_caps_is_fixed (caps2)' failedDetecting...               : 12%|▉   | 2411/20000 [00:02<00:10, 1739.13frames/s]Segmentation fault (core dumped)Despite the crash, the previous session shows that we are beyond my previous limit of 714 FPS, indicating 1739 FPS after 2411 frames …When using videos, however, I am able to surpass the public benchmark !(venv) abbeefai@AlbertaBeefAI:/media/abbeefai/TheExpanse/shared_with_docker/voyager-sdk$ ./inference.py resnet50-imagenet media/Fabrizio_talk.mp4 media/Fabrizio_talk.mp4 media/Fabrizio_talk.mp4 media/Fabrizio_talk.mp4 --no-display --show-stats --aipu-cores 4========================================================================     Element                    Time(𝜇s) Effective FPS========================================================================qtdemux1                       13    74,212.0h264parse2                      48    20,464.6capsfilter2                     21    47,384.4qtdemux3                       13    75,689.6h264parse0                      50    19,867.1capsfilter1                     20    48,458.1qtdemux0                       13    75,734.3h264parse1                      49    20,152.8capsfilter0                     19    50,028.4qtdemux2                       13    75,003.3h264parse3                      50    19,898.3capsfilter3                     20    49,813.4decodebin-link0                   18    53,708.6decodebin-link3                   19    50,700.8axtransform-colorconvert-cl0             45    21,833.6axtransform-colorconvert-cl3             44    22,448.8decodebin-link1                   19    51,757.4axtransform-colorconvert-cl1             44    22,468.7decodebin-link2                   20    49,301.3axtransform-colorconvert-cl2             44    22,583.0inference-task0:libtransform_centrecropextra_0                           0  1,017,982.8inference-task0:libtransform_resize_cl_0       10    99,635.5inference-task0:libtransform_padding_0        47    21,263.7inference-task0:inference              471    2,119.8inference-task0:Inference latency         9,183      n/ainference-task0:libtransform_paddingdequantize_0                           4   231,910.3inference-task0:libdecode_classification_0      5   184,210.4inference-task0:Postprocessing latency       169      n/ainference-task0:Total latency          14,194      n/a========================================================================End-to-end average measurement                 2,054.7========================================================================Core Temp : 40.0°CCPU %   : 15.7%End-to-end : 2054.7fpsLatency  : 22.8ms (min:17.0 max:30.0 σ:1.2 x̄:22.6)msThis is running on an AMD Ryzen AI MAX+ 395 PC.Thank you ​@Spanner for suggesting to try multiple input sources :)​

Sign up

Log in, or create an Axelera AI account

Login to the community

Log in, or create an Axelera AI account

Scanning file for viruses.

This file cannot be downloaded