Skip to main content

Hi again ​@Spanner  😊

I'm doing other tests and I want to use the inference.py script to detect persons in a stream via rtsp.

I’m using the command “ ./inference.py yolov8s-coco-onnx ‘rtsp….’ --pipe gst --frame-rate 0 --rtsp-latency 0 --show-stream-timing”.

With the command ‘--show-stream-timing’ I get a latency of almost 2000 ms.

I'd like to know if this latency is just in the stream or if it involves the inference time and what I can do to have the stream in real time.

 

Thank you!

Hi ​@sara!

Hmm, yeah I think that number (2 seconds) is end-to-end, so that include everything from network speed to buffering, to decoding on the host and inference/post processing time.

I wonder if we can do some tests to try and figure out exactly where the bottleneck is? Like, maybe (temporarily) replacing the RTSP feed with a local video file? That might give us a clue as to whether the latency is on the feed, or in processing.


(venv) ~/voyager-sdk$ ./inference.py yolov8s-coco-onnx media/traffic1_480p.mp4 --show-stream-timing --show-stats
=========================================================================                                                                                                                                          
Element                                          Time(𝜇s)   Effective FPS
=========================================================================
qtdemux0                                              122         8,133.9
h264parse0                                            236         4,228.0
capsfilter0                                            80        12,427.9
decodebin-link0                                       103         9,639.1
axtransform-colorconvert0                          54,198            18.5
inference-task0:libtransform_resize_cl_0            1,384           722.3
inference-task0:libtransform_padding_0              2,239           446.5
inference-task0:inference                          50,148            19.9
inference-task0:Inference latency               1,207,065             n/a
inference-task0:libdecode_yolov8_0                  3,422           292.2
inference-task0:libinplace_nms_0                       27        35,793.8
inference-task0:Postprocessing latency              6,842             n/a
inference-task0:Total latency                   1,497,217             n/a
=========================================================================
End-to-end average measurement                                       18.0
=========================================================================
(venv) ~/voyager-sdk$ ./inference.py yolov8s-coco-onnx media/traffic1_480p.mp4 --show-stream-timing 
INFO    : Core Temp : 39.0°C                                                                                                                                                                                       
INFO    : CPU % : 14.3%
INFO    : End-to-end : 20.8fps
INFO    : Jitter : 87.0ms
INFO    : Latency : 1867.4ms


Hmm your fps seems incredibly low for yolov8s as well. What kind of host are you using? And which Metis product? Are you getting any warning?

If I run yolov8s on a single m.2 device and an x86 i5, I get around 390 fps.


Hi ​@sara ,

If I remember correctly you were using Raspberry Pi 5, correct?

If so, please see the following info from https://support.axelera.ai/hc/en-us/articles/26362016484114-Bring-up-Voyager-SDK-in-Raspberry-Pi-5 : 

If you see that the performance is lower than expected, this is something that can be solved.

The solution is setting AXELERA_USE_DOUBLE_BUFFER=0 at the beginning of the inference.py command. For example:

AXELERA_USE_DOUBLE_BUFFER=0 ./inference.py yolov8s-coco-onnx ./media/traffic1_480p.mp4 

For reference, by setting AXELERA_USE_DOUBLE_BUFFER=0, yolov8s-coco-onnx improved from 5.2 FPS end-to-end (which is a bug) to 80.3 FPS end-to-end, which is expected.

 

Note that this is an existing bug we are investigating. However, with the workaround I shared, performance is almost not affected.

 

Please, let us know if that solves the issue.

 

I also recommend you to check, just in case, the section “Note about MVM utilisation limit” in https://support.axelera.ai/hc/en-us/articles/26362016484114-Bring-up-Voyager-SDK-in-Raspberry-Pi-5.

 

Best,

Victor


Hi ​@sara, not precisely what you're asking for, but this latency can be reduced significantly if you want to. This latency of 2s is added “artificially” to allow some buffering for a smoother stream. If you are in a controlled environment (private, dedicated network) 40ms should be enough to keep a smooth stream.

 

Check the RTSP Latency section in the Voyager reference:

https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.3/docs/tutorials/application.md#rtsp-latency

 

Internally, this configures the “latency” property in the rtspsrc GStreamer element.

 

My two cents:

- Michael

https://ridgerun.ai


Hi! 😊
Yes, I've already tested it on a Raspberry, but not anymore. I've been doing a lot of tests to see what features there are and another question came up! Sorry 😅

I think using gstreamer to visualize the stream looks good and in real time and I've tried using axelera plugins but I'd like to know how I can draw the result of the inference on the image.

For example, in this example you give:

GST_PLUGIN_PATH=`pwd`/operators/lib GST_DEBUG=4 gst-launch-1.0 \
  filesrc location=media/traffic3_480p.mp4 ! \
  decodebin force-sw-decoders=true caps="video/x-raw(ANY)" expose-all-streams=false ! \
  axinplace lib=libinplace_addstreamid.so mode=meta options="stream_id:0" ! \
  axtransform lib=libtransform_colorconvert.so options=format:rgba ! \
  queue max-size-buffers=4 max-size-time=0 max-size-bytes=0 ! \
  axinferencenet \
    model=build/yolov7-coco/yolov7-coco/1/model.json \
    double_buffer=true \
    dmabuf_inputs=true \
    dmabuf_outputs=true \
    num_children=4 \
    preprocess0_lib=libtransform_resize_cl.so \
    preprocess0_options="width:640;height:640;padding:114;letterbox:1;scale_up:1;to_tensor:1;mean:0.,0.,0.;std:1.,1.,1.;quant_scale:0.003921568859368563;quant_zeropoint:-128.0" \
    preprocess1_lib=libtransform_padding.so \
    preprocess1_options="padding:0,0,1,1,1,15,0,0;fill:0" \
    preprocess1_batch=1 \
    postprocess0_lib=libdecode_yolov5.so \
    postprocess0_options="meta_key:detections;anchors:1.5,2.0,2.375,4.5,5.0,3.5,2.25,4.6875,4.75,3.4375,4.5,9.125,4.4375,3.4375,6.0,7.59375,14.34375,12.53125;classes:80;confidence_threshold:0.25;scales:0.003937006928026676,0.003936995752155781,0.003936977591365576;zero_points:-128,-128,-128;topk:30000;multiclass:0;sigmoid_in_postprocess:0;transpose:1;classlabels_file:ax_datasets/labels/coco.names;model_width:640;model_height:640;scale_up:1;letterbox:1" \
    postprocess0_mode=read \
    postprocess1_lib=libinplace_nms.so \
    postprocess1_options="meta_key:detections;max_boxes:300;nms_threshold:0.45;class_agnostic:0;location:CPU" ! \
  videoconvert ! \
  x264enc ! \
  mp4mux ! \
  filesink location=output_video.mp4
  

 

How do i use the libinplace_draw.so plugin to draw the bounding boxes? is that the point?


Hi ​@sara,
 
In v1.3 we can modify the Gst-pipeline i1], by adding libinplace_draw.so right after axinferencenet plugin by either doing following:

! axinplace lib="${AXELERA_FRAMEWORK}/operators/lib/libinplace_draw.so" mode="write" \
! videoconvert \
! ximagesink

Or we can write to an .mp4 file with following:

! axinplace lib="${AXELERA_FRAMEWORK}/operators/lib/libinplace_draw.so" mode="write"\
! videoconvert \
! x264enc \
! mp4mux \
! filesink location=output_video.mp4

Hope this helps!
Feel free to let us know if you have anymore questions, comments or suggestions!
---
<1] https://community.axelera.ai/voyager-sdk-2/raspberry-pi-5-metis-inference-with-gstreamer-pipeline-221?postid=684#post684

 


It worked! :) 

And to  RTSP stream? is something like this? I’m getting an error.

GST_PLUGIN_PATH=`pwd`/operators/lib GST_DEBUG=4 gst-launch-1.0 \
rtspsrc location='rtsp:/...' latency=200 ! \
decodebin ! \
axinferencenet \
    model=build/yolov8s-coco-onnx/yolov8s-coco-onnx/1/model.json \
    double_buffer=true \
    dmabuf_inputs=true \
    dmabuf_outputs=true \
    num_children=4 \
    preprocess0_lib=libtransform_resize_cl.so \
    preprocess0_options="width:640;height:640;padding:114;letterbox:1;scale_up:1;to_tensor:1;mean:0.,0.,0.;std:1.,1.,1.;quant_scale:0.003921568859368563;quant_zeropoint:-128.0" \
    preprocess1_lib=libtransform_padding.so \
    preprocess1_options="padding:0,0,1,1,1,15,0,0;fill:0" \
    preprocess1_batch=1 \
    postprocess0_lib=libdecode_yolov8.so \
    postprocess0_options="meta_key:detections;anchors:1.5,2.0,2.375,4.5,5.0,3.5,2.25,4.6875,4.75,3.4375,4.5,9.125,4.4375,3.4375,6.0,7.59375,14.34375,12.53125;classes:80;confidence_threshold:0.25;scales:0.003937006928026676,0.003936995752155781,0.003936977591365576;zero_points:-128,-128,-128;topk:30000;multiclass:0;sigmoid_in_postprocess:0;transpose:1;classlabels_file:ax_datasets/labels/coco.names;model_width:640;model_height:640;scale_up:1;letterbox:1" \
    postprocess0_mode=read \
    postprocess1_lib=libinplace_nms.so \
    postprocess1_options="meta_key:detections;max_boxes:300;nms_threshold:0.45;class_agnostic:0;location:CPU" ! \
axinplace lib=libinplace_draw.so mode=write ! \
videoconvert ! \
x264enc ! \
mp4mux ! \
autovideosink


Hi ​@sara,

I think between the decodebin and the axinferencenet we need some other plugins to properly convert to a format that is compatible with the latter … and since inference.py supports reading from rtsp links out of the box, perhaps you will have to regenerate the low-level gst yaml file first, i.e. run inference.py with an rtsp link as a source and then goto the build/<model>/logs/gst_pipeline.yaml, and use that to convert it to the gst-launch-1.0 syntax.

Please do let us know If you need further help regarding converting the gst_pipeline.yaml file.
Many thanks!


Hi ​@Habib 😊

I managed to solve it with this pipeline:

GST_PLUGIN_PATH=`pwd`/operators/lib GST_DEBUG=2 gst-launch-1.0 \
rtspsrc location='rtsp://….' latency=200  ! \
capsfilter name=rtspcapsfilter0  caps="application/x-rtp, media=video" ! \
decodebin force-sw-decoders=false caps="video/x-raw(ANY)" expose-all-streams=false ! \
videoconvert ! \
axinplace lib=libinplace_addstreamid.so mode=meta options="stream_id:0" ! \
axtransform lib=libtransform_colorconvert.so options=format:rgba ! \
queue max-size-buffers=4 max-size-time=0 max-size-bytes=0 ! \
axinferencenet \
    model=build/yolov7-coco/yolov7-coco/1/model.json \
    double_buffer=true \
    dmabuf_inputs=true \
    dmabuf_outputs=true \
    num_children=4 \
    preprocess0_lib=libtransform_resize_cl.so \
    preprocess0_options="width:640;height:640;padding:114;letterbox:1;scale_up:1;to_tensor:1;mean:0.,0.,0.;std:1.,1.,1.;quant_scale:0.003921568859368563;quant_zeropoint:-128.0" \
    preprocess1_lib=libtransform_padding.so \
    preprocess1_options="padding:0,0,1,1,1,15,0,0;fill:0" \
    preprocess1_batch=1 \
    postprocess0_lib=libdecode_yolov5.so \
    postprocess0_options="meta_key:detections;anchors:1.5,2.0,2.375,4.5,5.0,3.5,2.25,4.6875,4.75,3.4375,4.5,9.125,4.4375,3.4375,6.0,7.59375,14.34375,12.53125;classes:80;confidence_threshold:0.25;scales:0.003937006928026676,0.003936995752155781,0.003936977591365576;zero_points:-128,-128,-128;topk:30000;multiclass:0;sigmoid_in_postprocess:0;transpose:1;classlabels_file:ax_datasets/labels/coco.names;model_width:640;model_height:640;scale_up:1;letterbox:1" \
    postprocess0_mode=read \
    postprocess1_lib=libinplace_nms.so \
 postprocess1_options="meta_key:detections;max_boxes:300;nms_threshold:0.45;class_agnostic:0;location:CPU" ! \
axinplace lib=libinplace_draw.so mode=write ! \
videoconvert ! \
ximagesink

But I'm losing a lot of frames and it's slow :( I've tried increasing the latency, increasing the max-size-buffers=4 but it's still the same. Do you have any suggestions or something I could be doing wrong?

I think it's due to the inference because only the stream works perfectly:

GST_PLUGIN_PATH=`pwd`/operators/lib GST_DEBUG=2 gst-launch-1.0 \
rtspsrc location='rtsp://….' latency=200  ! \
capsfilter name=rtspcapsfilter0  caps="application/x-rtp, media=video" ! \
decodebin force-sw-decoders=false caps="video/x-raw(ANY)" expose-all-streams=false ! \
videoconvert ! \
axinplace lib=libinplace_addstreamid.so mode=meta options="stream_id:0" ! \
axtransform lib=libtransform_colorconvert.so options=format:rgba ! \
queue max-size-buffers=4 max-size-time=0 max-size-bytes=0 ! \
axinplace lib=libinplace_draw.so mode=write ! \
videoconvert ! \
ximagesink


​Many thanks @sara! for taking the time to post your solution here, this helps us and the community users who are facing the same issue.

To answer your question:
Have you tried with ximagesink:

! videoconvert           \
! ximagesink sync=false

e.g. for me this h1] pipeline was working.

Feel free to let us know more about your project and don’t hesitate to get back to us for more questions, or comments.

Thanks again!  

---

1]
 

> GST_PLUGIN_PATH=${AXELERA_FRAMEWORK}/operators/lib  \
gst-launch-1.0 \
rtspsrc \
location=rtsp://127.0.0.1:9550/test \
latency=0 \
! application/x-rtp,media=video \
! decodebin \
force-sw-decoders=true \
expose-all-streams=false \
! videoconvert \
! video/x-raw,format=RGBA \
! axinplace \
lib="${AXELERA_FRAMEWORK}/operators/lib/libinplace_addstreamid.so" \
mode="meta" \
options="stream_id:0" \
! axtransform \
lib="${AXELERA_FRAMEWORK}/operators/lib/libtransform_colorconvert.so" \
options="format:rgb" \
! queue \
max-size-buffers=4 \
max-size-time=0 \
max-size-bytes=0 \
! axinferencenet \
name="inference-task0" \
model="/home/ubuntu/spap_v0/v1/software-platform/host/application/framework/build/yolov8s-coco-onnx/yolov8s-coco-onnx/1/model.json" \
devices="metis-0:1:0" \
double_buffer=false \
dmabuf_inputs=true \
dmabuf_outputs=false \
num_children=1 \
preprocess0_lib="${AXELERA_FRAMEWORK}/operators/lib/libtransform_resize_cl.so" \
preprocess0_options="width:640;height:640;padding:114;letterbox:1;scale_up:1;to_tensor:1;mean:0.,0.,0.;std:1.,1.,1.;quant_scale:0.003921568859368563;quant_zeropoint:-128.0" \
preprocess1_lib="${AXELERA_FRAMEWORK}/operators/lib/libtransform_padding.so" \
preprocess1_options="padding:0,0,1,1,1,15,0,0;fill:0" \
preprocess1_batch=1 \
postprocess0_lib="${AXELERA_FRAMEWORK}/operators/lib/libdecode_yolov8.so" \
postprocess0_options="meta_key:detections;classes:80;confidence_threshold:0.25;scales:0.07061304897069931,0.0678478255867958,0.06895139068365097,0.10098495334386826,0.15165244042873383,0.16900309920310974;padding:0,0,0,0,0,0,0,0|0,0,0,0,0,0,0,0|0,0,0,0,0,0,0,0|0,0,0,0,0,0,0,48|0,0,0,0,0,0,0,48|0,0,0,0,0,0,0,48;zero_points:-68,-58,-44,134,106,110;topk:30000;multiclass:0;classlabels_file:/tmp/tmp8vzqe8_f;model_width:640;model_height:640;scale_up:1;letterbox:1" \
postprocess0_mode="read" \
postprocess1_lib="${AXELERA_FRAMEWORK}/operators/lib/libinplace_nms.so" \
postprocess1_options="meta_key:detections;max_boxes:300;nms_threshold:0.45;class_agnostic:1;location:CPU" \
! axinplace \
lib="${AXELERA_FRAMEWORK}/operators/lib/libinplace_draw.so" \
mode="write" \
! videoconvert \
! ximagesink sync=false

 


Reply