This short video showcases an early experiment using the Voyager SDK and Metis AIPU to run Fast Segment Anything (FastSAM). It is based on YOLOv8-seg generating masks and CLIP to link the text prompt with visual masks in the video stream.
What’s exciting is the flexibility—no need for retraining on specific classes. Just change the text prompt and it finds the object. This is still a test setup, but I thought it might spark a few ideas and show what’s becoming possible on the edge with Metis. Keen to hear what you guys think, or how you’d use this!