Skip to main content

Hello,

 

I try to use a YOLO model on 8 camera streams in parallel on the Metis PCIe card. 20-30 fps per camera stream is all I need. With the 548 fps end-to-end stated for YOLOv8s in https://axelera.ai/metis-aipu-benchmarks, it should be possible to reach ~68 fps per camera.

 

As a small test I wrote a python script, which creates a inference stream with 8 videos as input:

from axelera.app import config, display, inf_tracers
from axelera.app.stream import create_inference_stream

def run(window, stream):
for frame_result in stream:
window.show(frame_result.image, frame_result.meta, frame_result.stream_id)

fps = stream.get_all_metrics()s'end_to_end_fps']
print(fps.value)

def main():
tracers = inf_tracers.create_tracers('core_temp', 'end_to_end_fps', 'cpu_usage')
stream = create_inference_stream(
network="yolov5s-v7-coco",
sources=e
str(config.env.framework / "media/traffic1_1080p.mp4"),
str(config.env.framework / "media/traffic1_1080p.mp4"),
str(config.env.framework / "media/traffic1_1080p.mp4"),
str(config.env.framework / "media/traffic1_1080p.mp4"),
str(config.env.framework / "media/traffic1_1080p.mp4"),
str(config.env.framework / "media/traffic1_1080p.mp4"),
str(config.env.framework / "media/traffic1_1080p.mp4"),
str(config.env.framework / "media/traffic1_1080p.mp4"),
],
tracers=tracers,
)
print(stream.sources)

with display.App(visible=True) as app:
wnd = app.create_window("App", (900, 600))
app.start_thread(run, (wnd, stream), name='InferenceThread')
app.run(interval=1 / 10)
stream.stop()


if __name__ == "__main__":
main()

The tracers report end-to-end fps of ~80 fps, similar to when I run it only with one video as an input. But the output shown in the window is very shaky, I guess less than 10 fps per camera.

Probably I am using the stream interface of the python API wrong. Can someone please help me with this issue? What is the recommended way to run inference on multiple camera streams (or videos as a test) in parallel?

Thanks
Maximilian

Hi ​@maximiliankir, welcome on board! Love this project.

I’ll need to check with some of the team, but I wonder if a multi-stream approach would help, rather than eight sources on a single inference stream?

In the meantime, these may help with looking into this approach:

But I’ll also come back to you with some thoughts from the Axelera crew!


Hi ​@Spanner,

thanks for discussing this issue with the team. What would ultimately be the very best solution for me would be a complete pipeline via Gstreamer. I receive the camera frames via a GStreamer interface anyway. Can you please point me to an example on how to connect these GStreamer inputs with a Yolo Inference Pipeline? Maybe this would also help achive the requiered performance of 8 streams with 30 fps each.


@maximiliankir 
Can you please let us know the host machine you are using? 
Thanks!


@maximiliankir 

Adding multiple sources should be fairly easy with inference.py, and you are right that each stream in an 8-parallel stream setup should be able to run at >= 60 FPS if we can reach ~548 FPS with a single stream using YOLOv8s. That being said, inference.py uses GStreamer under the hood, so it does not matter if we set up an 8-stream pipeline directly using GStreamer or indirectly via inference.py - the performance should (ideally) be the same. With inference.py, we can set up 8 parallel streams (each running at 60 FPS 01] and verify end-to-end FPS of ~548) as follows:

> ./inference.py                    \
--pipe="gst" \
--aipu-cores=4 \
--disable-vaapi \
--disable-opencl \
--frame-rate 0 \
--frames 5000 \
--show-stats \
--no-display \
yolov8s-coco-onnx \
rtsp://127.0.0.1:8551/test \
rtsp://127.0.0.1:8551/test \
rtsp://127.0.0.1:8551/test \
rtsp://127.0.0.1:8551/test \
rtsp://127.0.0.1:8551/test \
rtsp://127.0.0.1:8551/test \
rtsp://127.0.0.1:8551/test \
rtsp://127.0.0.1:8551/test | grep -e "Element" -e "End-to-end"

The above should show “End-to-End” FPS reachable by Metis as follows:

Element                                         Latency(us)   Effective FPS
End-to-end average measurement 552.5

Hope this helps with your use case, if you still require custom integration of axinferencenet
with custom Gstream pipelines, then please take a look at k2].

Please feel free to reach out if you have anymore questions, comments or suggestions!
Thanks!

---
r1] In the above we run an rtsp-based stream at 60 FPS via an rtsp server, which can be setup by following:

> wget https://github.com/GStreamer/gst-rtsp-server/raw/refs/tags/1.14.5/examples/test-launch.c
> sudo apt-get install libgstrtspserver-1.0 libgstreamer1.0-dev
> gcc test-launch.c -o test-launch $(pkg-config --cflags --libs gstreamer-1.0 gstreamer-rtsp-server-1.0)
> ./test-launch -p 8551 "\
filesrc location=<path-to>/media/output.mp4 \
! qtdemux \
! h264parse \
! avdec_h264 \
! videorate ! video/x-raw,framerate=60/1 \
! videoscale ! video/x-raw,width=640,height=640 \
! queue \
! videoconvert \
! x264enc \
! h264parse \
! rtph264pay name=pay0 pt=96"

/2] https://community.axelera.ai/voyager-sdk-2/raspberry-pi-5-metis-inference-with-gstreamer-pipeline-221?postid=684#post684


Thank you very much for the detailed answer. I will try to run parallel inference via inference.py as soon as possible. I will also try to run it from the rtsp streams instead of video files directly, to see if this already gives an performance improvement.

 

To answer the question about the host machine: 

  • I tried it in a x86 machine with an AMD Ryzen 9 3900X and 64 GB RAM. I dont think that this should limit the performance.
  • Our target platform in the end will have a Ampere Altra ARM CPU with 128 cores. I also tested in this machine, but performance is even worse, probably due to very limited single core performance.

I have another question, what does the host fps metric mean, which is shown when I use inference.py? It shows something >800 FPS. But what does that mean?

 

I am now out of office for 2 weeks, but I will try your suggestions as soon as I am back and come back to you with feedback.


Reply