Skip to main content
Solved

Reproducing the Model Zoo benchmarks

  • May 6, 2026
  • 4 replies
  • 198 views

AlbertaBeef
Ensign

Hello, 

I am having issues reproducing the Axelera Model Zoo benchmarks on a Axelera Metis M.2 module.

My issues seem to be related to pre/post processing, which is why I started with a classification model.

I was hoping to use “fakevideo” as input, to bypass the pre-processing, but that seems to be fixed at 30fps.

Another unresolved issue I have is that opencl is not working on my AMD PC.

For ResNet-50 v1.5, here is what I am getting … 

(venv) voyager-sdk$ ./inference.py resnet50-imagenet data/coco --disable-opencl --no-display --show-stats
========================================================================           
Element                                         Time(𝜇s)   Effective FPS
========================================================================
axinplace-addstreamid0                                16        61,066.3
vaapipostproc0                                     1,927           518.9
videoconvert0                                         17        56,486.4
axinplace0                                             7       130,629.7
inference-task0:libtransform_resizeratiocropexcess_0
                                                     149         6,677.7
inference-task0:libtransform_totensor_0                7       142,241.8
inference-task0:libinplace_normalize_0                16        59,300.1
inference-task0:libtransform_padding_0                20        47,901.4
inference-task0:inference                          2,884           346.7
inference-task0:Inference latency                 42,056             n/a
inference-task0:libtransform_paddingdequantize_0
                                                       5       184,454.6
inference-task0:libdecode_classification_0             8       116,059.4
inference-task0:Postprocessing latency               726             n/a
inference-task0:Total latency                     45,652             n/a
========================================================================
End-to-end average measurement                                     351.1
========================================================================
Core Temp  : 37.0°C
CPU %      : 5.5%
End-to-end : 351.1fps
Latency    : 42.8ms (min:9.1 max:57.3 σ:3.9 x̄:42.7)ms

 

 

Best answer by Spanner

Ah, nice work on ramping it up anyway, ​@AlbertaBeef !

In terms of the differences with the benchmarks, a few things that come to mind could be the host system (I think the benchmarks were run on an Intel i9. Not sure about performance difference, but it is an architectural difference, I guess). 

And I think the benchmark numbers are measured against multiple input streams running in parallel to spread the load across all four cores. This might have some interesting directions to test in terms of multiple inputs: https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.5/docs/reference/inference.md

Let me know how it goes!

4 replies

AlbertaBeef
Ensign
  • Author
  • Ensign
  • May 7, 2026

After posting my question, I found the solution provide by ​@Steven Hunsche in another post:


This got me through my opencl issue on my AMD Ryzen AI MAX+ 395 PC.
This effectively doubled the throughput achieved (from 350 FPS to 716 FPS).

Still a ways from the 1756 FPS benchmark for ResNet-50 on M.2 Metis.

Here is where I am currently at:

(venv) abbeefai@AlbertaBeefAI:/media/abbeefai/TheExpanse/shared_with_docker/voyager-sdk$ ./inference.py resnet50-imagenet data/coco --enable-opencl --no-display --show-stats
========================================================================                                    
Element                                         Time(𝜇s)   Effective FPS
========================================================================
axinplace-addstreamid0                                10        94,033.6
axtransform-colorconvert-cl0                         323         3,092.8
inference-task0:libtransform_centrecropextra_0
                                                       0     1,357,590.2
inference-task0:libtransform_resize_cl_0              81        12,201.0
inference-task0:libtransform_padding_0                39        25,571.2
inference-task0:inference                          1,415           706.5
inference-task0:Inference latency                 22,083             n/a
inference-task0:libtransform_paddingdequantize_0
                                                       4       238,789.7
inference-task0:libdecode_classification_0             6       160,551.4
inference-task0:Postprocessing latency               614             n/a
inference-task0:Total latency                     25,152             n/a
========================================================================
End-to-end average measurement                                     716.8
========================================================================
Core Temp  : 37.0°C
CPU %      : 5.0%
End-to-end : 716.8fps
Latency    : 19.6ms (min:7.0 max:27.4 σ:1.9 x̄:19.7)ms

 


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • Answer
  • May 7, 2026

Ah, nice work on ramping it up anyway, ​@AlbertaBeef !

In terms of the differences with the benchmarks, a few things that come to mind could be the host system (I think the benchmarks were run on an Intel i9. Not sure about performance difference, but it is an architectural difference, I guess). 

And I think the benchmark numbers are measured against multiple input streams running in parallel to spread the load across all four cores. This might have some interesting directions to test in terms of multiple inputs: https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.5/docs/reference/inference.md

Let me know how it goes!


AlbertaBeef
Ensign
  • Author
  • Ensign
  • May 7, 2026

Thank you for the response ​@Spanner .

When using directories of images, I am not able to successfully run a session with multiple input streams.  I keep getting errors:

(venv) abbeefai@AlbertaBeefAI:/media/abbeefai/TheExpanse/shared_with_docker/voyager-sdk$ ./inference.py resnet50-imagenet data/coco data/coco data/coco data/coco --no-display --show-stats --aipu-cores 4
Detecting...                               :   4%|▍         | 811/20000 [00:01<00:19, 998.95frames/s]
(python:1574310): GStreamer-CRITICAL **: 12:11:49.968: gst_caps_is_fixed: assertion 'GST_IS_CAPS (caps)' failed

(python:1574310): GStreamer-CRITICAL **: 12:11:49.968: gst_caps_is_equal_fixed: assertion 'gst_caps_is_fixed (caps2)' failed
Detecting...                               :  12%|▉       | 2411/20000 [00:02<00:10, 1739.13frames/s]Segmentation fault (core dumped)

 

Despite the crash, the previous session shows that we are beyond my previous limit of 714 FPS, indicating 1739 FPS after 2411 frames … 
 

When using videos, however, I am able to surpass the public benchmark !

(venv) abbeefai@AlbertaBeefAI:/media/abbeefai/TheExpanse/shared_with_docker/voyager-sdk$ ./inference.py resnet50-imagenet media/Fabrizio_talk.mp4 media/Fabrizio_talk.mp4 media/Fabrizio_talk.mp4 media/Fabrizio_talk.mp4 --no-display --show-stats --aipu-cores 4
========================================================================           
Element                                         Time(𝜇s)   Effective FPS
========================================================================
qtdemux1                                              13        74,212.0
h264parse2                                            48        20,464.6
capsfilter2                                           21        47,384.4
qtdemux3                                              13        75,689.6
h264parse0                                            50        19,867.1
capsfilter1                                           20        48,458.1
qtdemux0                                              13        75,734.3
h264parse1                                            49        20,152.8
capsfilter0                                           19        50,028.4
qtdemux2                                              13        75,003.3
h264parse3                                            50        19,898.3
capsfilter3                                           20        49,813.4
decodebin-link0                                       18        53,708.6
decodebin-link3                                       19        50,700.8
axtransform-colorconvert-cl0                          45        21,833.6
axtransform-colorconvert-cl3                          44        22,448.8
decodebin-link1                                       19        51,757.4
axtransform-colorconvert-cl1                          44        22,468.7
decodebin-link2                                       20        49,301.3
axtransform-colorconvert-cl2                          44        22,583.0
inference-task0:libtransform_centrecropextra_0
                                                       0     1,017,982.8
inference-task0:libtransform_resize_cl_0              10        99,635.5
inference-task0:libtransform_padding_0                47        21,263.7
inference-task0:inference                            471         2,119.8
inference-task0:Inference latency                  9,183             n/a
inference-task0:libtransform_paddingdequantize_0
                                                       4       231,910.3
inference-task0:libdecode_classification_0             5       184,210.4
inference-task0:Postprocessing latency               169             n/a
inference-task0:Total latency                     14,194             n/a
========================================================================
End-to-end average measurement                                   2,054.7
========================================================================
Core Temp  : 40.0°C
CPU %      : 15.7%
End-to-end : 2054.7fps
Latency    : 22.8ms (min:17.0 max:30.0 σ:1.2 x̄:22.6)ms


This is running on an AMD Ryzen AI MAX+ 395 PC.
 

Thank you ​@Spanner  for suggesting to try multiple input sources :)

 


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • May 8, 2026

Awesome work ​@AlbertaBeef ! And it’s so cool to see someone beat our own benchmarks 😆 I hope more people are able to do the same!