Hi everyone,
After working on a serious project for few weeks, i wanted to build something fun with Metis and Voyager.
A tiny tribute to The Big Bang Theory and Sheldon Cooper fans-
Built a real-time Rock Paper Scissors Lizard Spock game that runs entirely on-device using the Axelera Metis AIPU (214 TOPS).
How it works:
- YOLOv10n trained on the HaGRID dataset (1M+ hand gesture images, 34 gesture classes) detects and classifies hand gestures directly from a live camera feed
- Model is quantized and compiled to run on the Metis AIPU — no GPU, no cloud, just the edge accelerator
- The game maps detected gestures to moves: fist=Rock, open palm=Paper, peace sign=Scissors, grip=Lizard, vulcan/4 fingers=Spock
- You play against Sheldon Cooper (complete with Bazinga quotes and reaction images)
The flow:
- Show a gesture to the camera and hold it steady (0.5s)
- 3-2-1 countdown
- Your gesture is locked, Sheldon picks randomly
- Result screen with win logic, flavor text, and Sheldon's reaction
Tech stack:
- Axelera Metis M2 AIPU on RK3588 board
- Voyager SDK (GStreamer pipeline + AIPU inference)
- YOLOv10n (HaGRID gestures) — exported to ONNX, quantized with per_tensor_min_max, compiled for Metis
- OpenCV for display overlay
- POV / RTSP camera input
- Python, fully local, zero latency to cloud
What I learned:
- YOLOv10 has attention ops (MatMul/Softmax) that need
per_tensor_min_maxquantization to compile on Metis - Tried hand keypoint models first (YOLO11n-pose-hands, 21 keypoints) — keypoints worked but gesture classification from POV camera angle was unreliable. Geometry-based pose detection fails when perspective flattens finger distances
- Switching to a gesture classification model (HaGRID) that directly outputs gesture labels was the right call — much more robust than trying to infer gestures from keypoint geometry

