Hello Axelera Community,
I’ve been working on a project that I’m excited to share: a real-time face recognition home surveillance and attendance system using the Voyager SDK. My goal was to create a reliable system that could identify individuals from an RTSP stream and display their names accurately on-screen.
I’m using a multi-model pipeline, starting with RetinaFace-Resnet50 for detection and then leveraging the FaceNet-LFW model for recognition, implemented through the face-recognition and face-recognition-with-vote components. While I've made progress, I’ve also encountered some challenges, particularly around database management and recognition accuracy. I wanted to share my approach and ask for advice from the community.
My Current Approach
The system architecture involves two main steps:
-
Database Creation: I use the
./inference.py face-recognition faces/person/foldercommand to generate embeddings from images in my dataset. This creates afamous_embeddings.jsonfile, which acts as my face database. My image folder structure follows a clear hierarchy:text
faces/├── person_A/│ ├── person_A1.jpg│ └── person_A2.jpg└── person_B/ ├── person_B1.jpg └── person_B2.jpg
The system loads these 512-dimensional embeddings from the JSON file for recognition.
-
Recognition Pipeline: I run the live recognition using
./inference.py face-recognition-with-vote RTSP_URL. The system is designed to display the recognized name in a window, providing a real-time view of the system's accuracy.
The Challenge: Accuracy and Database Management
I've run into a few issues that I'm hoping to get some community input on:
-
Accuracy: The recognition is not as robust as I had hoped. It sometimes struggles to correctly identify individuals or fails to match them entirely.
-
Image Loading and Database: I'm not entirely sure if my method of creating the
famous_embeddings.jsonfile is optimal. Is it as simple as pointing the script to a folder of images? How does the script handle multiple images per person to create a robust template?
Seeking Community Advice for a Better Attendance System
I have a few specific questions for those with more experience:
-
Database Creation Best Practices: What is the most effective way to generate embeddings for a person? Does the
inference.pyscript automatically average embeddings from multiple images, or are they stored individually? For a production-grade attendance system, it's generally recommended to use multiple images per person to create a more representative template . I'm curious if the Voyager SDK's built-in tools support this averaging or selection of the best enrollment images automatically. -
Optimizing the Recognition Pipeline: I understand that real-world conditions like poor lighting, motion blur, and varied angles can severely impact accuracy . Have you found success with any specific techniques in the Voyager SDK? I’m thinking about things like:
-
Pose Quality Buffers: Filtering out low-quality frames before sending them to the recognition model to improve accuracy and efficiency .
-
Batch Processing: Combining recognition outputs over time (e.g., using a "vote" or Bayesian update) to build a more confident identification .
-
-
Alternative Methods: Are there any other built-in or community-recommended methods for building an attendance system? Should I consider integrating other models from the Voyager Model Zoo, like different backbones (e.g., ArcFace, which is known for its performance in recognition tasks )?
-
Threshold Tuning: Is there a configurable "match threshold" for the
face-recognition-with-votemodel? Adjusting this threshold can be key to reducing false positives . I’d be interested in hearing what thresholds other developers are using for similar projects.
I believe the Voyager SDK has immense potential for these kinds of edge AI applications, and I'd love to get it right. Any insights, code snippets, or pointers to relevant documentation would be highly appreciated.
Thanks for your help!

