I tend to learn best by getting my hands dirty first and reading the manual later. This "fail a lot" approach helps me build tangible context and really absorb what I'm reading. It might not work for everyone, but it's how I've planned my development process.
Moving from my PoC to a prototype, I learned from the helpful staff at Axelera that their Voyager SDK would handle much of the work I was doing with OpenCV. This was a great relief; it meant I wouldn't have to worry about how my hardware was being utilized.
To build the prototype, I decoupled my app into a front-end and a back-end, which is a classic web app way of thinking. Since I'm working with limited compute resources on the embedded device, I have to be smart about what the hardware does. The front-end, built with React.js, handles all the visual processing on the client's web browser. The back-end is where the heavy lifting happens, with the AI part of the app running on the server that hosts the Axelera Metis hardware and Voyager SDK.
I connected the front-end and back-end with a JSON API, with endpoints for things like:
-
/api/status
for health monitoring. -
/api/video/stream
for the live video feed. -
/api/video/keypoints
for the AI model's output on pose features.
This approach isn't revolutionary, but I was nervous about how it would work with an unknown SDK. The results so far have been promising:
-
The data transfer for pose tracking is great.
-
Integrating the new SDK was different from OpenCV but manageable, because I was using the same YOLO model.
-
The main problem I'm facing is with the video feed. The performance isn't good enough, and while it's not essential for the app to function, a good video feed is really useful for me to debug during development.
So, my next step is to improve the video feed's performance and maybe delve into GStreamer which I have not done before - any tips or advice is very welcome!