Skip to main content
Question

How to Properly Load and Manage Face Embeddings for Recognition on Voyager?

  • June 23, 2026
  • 2 replies
  • 32 views

Hello Axelera Community,

I’ve been working on a project that I’m excited to share: a real-time face recognition home surveillance and attendance system using the Voyager SDK. My goal was to create a reliable system that could identify individuals from an RTSP stream and display their names accurately on-screen.

I’m using a multi-model pipeline, starting with RetinaFace-Resnet50 for detection and then leveraging the FaceNet-LFW model for recognition, implemented through the face-recognition and face-recognition-with-vote components. While I've made progress, I’ve also encountered some challenges, particularly around database management and recognition accuracy. I wanted to share my approach and ask for advice from the community.

My Current Approach

The system architecture involves two main steps:

  1. Database Creation: I use the ./inference.py face-recognition faces/person/folder command to generate embeddings from images in my dataset. This creates a famous_embeddings.json file, which acts as my face database. My image folder structure follows a clear hierarchy:

    text

    faces/├── person_A/│   ├── person_A1.jpg│   └── person_A2.jpg└── person_B/    ├── person_B1.jpg    └── person_B2.jpg

    The system loads these 512-dimensional embeddings from the JSON file for recognition.

  2. Recognition Pipeline: I run the live recognition using ./inference.py face-recognition-with-vote RTSP_URL. The system is designed to display the recognized name in a window, providing a real-time view of the system's accuracy.

The Challenge: Accuracy and Database Management

I've run into a few issues that I'm hoping to get some community input on:

  • Accuracy: The recognition is not as robust as I had hoped. It sometimes struggles to correctly identify individuals or fails to match them entirely.

  • Image Loading and Database: I'm not entirely sure if my method of creating the famous_embeddings.json file is optimal. Is it as simple as pointing the script to a folder of images? How does the script handle multiple images per person to create a robust template?

Seeking Community Advice for a Better Attendance System

I have a few specific questions for those with more experience:

  1. Database Creation Best Practices: What is the most effective way to generate embeddings for a person? Does the inference.py script automatically average embeddings from multiple images, or are they stored individually? For a production-grade attendance system, it's generally recommended to use multiple images per person to create a more representative template . I'm curious if the Voyager SDK's built-in tools support this averaging or selection of the best enrollment images automatically.

  2. Optimizing the Recognition Pipeline: I understand that real-world conditions like poor lighting, motion blur, and varied angles can severely impact accuracy . Have you found success with any specific techniques in the Voyager SDK? I’m thinking about things like:

    • Pose Quality Buffers: Filtering out low-quality frames before sending them to the recognition model to improve accuracy and efficiency .

    • Batch Processing: Combining recognition outputs over time (e.g., using a "vote" or Bayesian update) to build a more confident identification .

  3. Alternative Methods: Are there any other built-in or community-recommended methods for building an attendance system? Should I consider integrating other models from the Voyager Model Zoo, like different backbones (e.g., ArcFace, which is known for its performance in recognition tasks )?

  4. Threshold Tuning: Is there a configurable "match threshold" for the face-recognition-with-vote model? Adjusting this threshold can be key to reducing false positives . I’d be interested in hearing what thresholds other developers are using for similar projects.

I believe the Voyager SDK has immense potential for these kinds of edge AI applications, and I'd love to get it right. Any insights, code snippets, or pointers to relevant documentation would be highly appreciated.

Thanks for your help!

2 replies

  • Author
  • Cadet
  • June 23, 2026

Hii ​@Spanner  


Spanner
Axelera Team
Forum|alt.badge.img+3
  • Axelera Team
  • June 23, 2026

Hi ​@Nanthu! Firstly, apologies, but the community falsly bumped your post into moderation - I’ve put it live now, but if it happens again, feel free to DM me. All posts should go live immediately as we don’t moderate (other than an overzelous automated spam filter, which is what caught us out here!), so if they don’t, just let me know 👍

Wow, there’s so much useful stuff here! This is something I would actually use myself, at home! I have a bunch of RTSP cameras around the house, and being able to recognise people would be awesome - especially if I hooked it up to Home Assistant for automations and notifications! How cool would that be?!

Hmm, excellent questions too. As I understand it, models like FaceNet respond well to good quality enrolment images. Clean, well-lit, roughly front-facing shots per person tends to beat a big pile of mixed-quality ones, I think, because a bad enrolment image can drag the stored template in the wrong direction. And I think you’re right on the matching threshold, although I don’t really know of a reliable way to nail this other than live testing and repeatedly tweaking! 😅 Lot of leg work, but once you get it spot on, this could really move the accuracy needle.

I think ArcFace is pretty good on angled faces, as you say. Although given what you’ve already got up and running, I’d be tempted to persevere as it is rather than swapping models too soon, and test if good enrolment and a finely-tuned threshold get there instead. It’s probably the easiest thing to test, anyway, before any more substantial changes.

Let’s ask ​@Habib about creating the database of images - he’s a total ninja with this stuff!

This is going to be an awesome project, Nanthu - keep us updated every step of the way. Hugely useful and a fantastic way to supercharge an existing CCTV system. Loving it!