Skip to main content

I tend to learn best by getting my hands dirty first and reading the manual later. This "fail a lot" approach helps me build tangible context and really absorb what I'm reading. It might not work for everyone, but it's how I've planned my development process.

Moving from my PoC to a prototype, I learned from the helpful staff at Axelera that their Voyager SDK would handle much of the work I was doing with OpenCV. This was a great relief; it meant I wouldn't have to worry about how my hardware was being utilized.

To build the prototype, I decoupled my app into a front-end and a back-end, which is a classic web app way of thinking. Since I'm working with limited compute resources on the embedded device, I have to be smart about what the hardware does. The front-end, built with React.js, handles all the visual processing on the client's web browser. The back-end is where the heavy lifting happens, with the AI part of the app running on the server that hosts the Axelera Metis hardware and Voyager SDK.

I connected the front-end and back-end with a JSON API, with endpoints for things like:

  • /api/status for health monitoring.

  • /api/video/stream for the live video feed.

  • /api/video/keypoints for the AI model's output on pose features.

This approach isn't revolutionary, but I was nervous about how it would work with an unknown SDK. The results so far have been promising:

  • The data transfer for pose tracking is great.

  • Integrating the new SDK was different from OpenCV but manageable, because I was using the same YOLO model.

  • The main problem I'm facing is with the video feed. The performance isn't good enough, and while it's not essential for the app to function, a good video feed is really useful for me to debug during development.

So, my next step is to improve the video feed's performance and maybe delve into GStreamer which I have not done before - any tips or advice is very welcome!

Hi ​@llrds 

Can you elaborate on the video feed aspect? Do you want to see the original video stream or a video stream with pose keypoints drawn?


Awesome work, and love how you’ve taken an established approach to this, ​@llrds! From my very limited perspective, that sounds like a rock solid foundation to build your project upon!

(By the way, just between us, if you need someone to hit you with the Gstreamer knowledge, ​@Habib is a GST ninja! 😀)


@shashi.chilappagari ideally it would be the feed with the keypoints drawn on it. When using the method to get a numpy array from Voyager it’s just the plain video feed, but when I use the normal video feed to display in window on the desktop there are keypoints drawn. Is there an option I’m missing?

Also, I switched to a websocket and the performance is good enough I think, latency is typically 1-2ms but the quality is reduced a lot with OpenCV (on the numpy array) to get there.


Hi ​@llrds 

If you are willing to DeGirum PySDK a try, we can guide you through various options: how to get raw video, video with annotations, results array etc. 


Thanks, I saw that Saad had some success on that so it is certainly interesting. When I get a chance to give it a go, I'll ping you 


Reply