Skip to main content
Question

Regarding Performance

  • May 3, 2026
  • 1 reply
  • 23 views

I created a custom model based on yolo26n, 'yolo26n-aya'.
The performance results for source code (bitmap (256x256) images and mp4 video) are shown below.

(a) images

    (venv) ubuntu@antelao-3588:~/voyager-sdk$ AXELERA_USE_CL_DOUBLE_BUFFER=0 ./inference.py yolo26n-aya ./data/aya100/images --show-stats --no-display
    ========================================================================
    Element                                         Time(??s)   Effective FPS
    ========================================================================
    axinplace-addstreamid0                               141         7,076.4
    inference-task0:libtransform_resize_cl_0             371         2,692.7
    inference-task0:libtransform_padding_0               844         1,184.0
    inference-task0:inference                          2,961           337.7
    inference-task0:Inference latency                208,874             n/a
    inference-task0:libdecode_yolov8_0                   653         1,531.2
    inference-task0:Postprocessing latency            25,097             n/a
    inference-task0:Total latency                    308,970             n/a
    ========================================================================
    End-to-end average measurement                                       0.0
    ========================================================================
 

(b)mp4

    (venv) ubuntu@antelao-3588:~/voyager-sdk$ AXELERA_USE_CL_DOUBLE_BUFFER=0 ./inference.py yolo26n-aya ./media/traffic3_720p.mp4 --show-stats --no-display
 

    ========================================================================
    Element                                         Time(??s)   Effective FPS
    ========================================================================
    qtdemux0                                             117         8,546.7
    h264parse0                                           646         1,546.7
    capsfilter0                                          153         6,516.0
    mppvideodec0                                       6,368           157.0
    decodebin-link0                                      115         8,636.4
    inference-task0:libtransform_resize_cl_0             542         1,844.6
    inference-task0:libtransform_padding_0               778         1,285.0
    inference-task0:inference                            550         1,816.0
    inference-task0:Inference latency                 24,772             n/a
    inference-task0:libdecode_yolov8_0                   581         1,719.9
    inference-task0:Postprocessing latency             2,782             n/a
    inference-task0:Total latency                     35,448             n/a
    ========================================================================
    End-to-end average measurement                                     674.6
    ========================================================================

I have a question:
(1) Why are the values ​​of inference-task0:inference different?
(2) My goal is to measure the time (end-to-end) from the input (image) buffer in main memory to the output (result) buffer.

Thank you in advance for your assistance.

1 reply

Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • May 6, 2026

Hi @kmiura,

From the docs, it seems the Time(µs) column in --show-stats is the inter-frame period at that stage (1 / Effective FPS), not the actual time the AIPU spends on one frame. With the mp4, the pipeline streams continuously and inference reports ~550 µs because that's how often a frame leaves that stage. With an image folder, the source is bounded and the pipeline “drains” rather than streams, so the rate is lower (~2,961 µs). The actual per-frame work the AIPU does would be about the same in both runs, I think?

For your end-to-end measurement, the line you want is inference-task0:Total latency. That's the wall-clock per-frame latency from input through to output (35,448 µs for the mp4, 308,970 µs for the images). Note that End-to-end average measurement is reported in the Effective FPS column, not microseconds, so 674.6 there means 674.6 FPS.

This might have a bit more info on the subject (or explain it better than I did, anyway 😆): 

 


  • Author
  • Cadet
  • May 8, 2026

Thank you in advance