See and Hear: Final Submission

Dear all,

I am happy to say that I just wrapped up my See and Hear project. In the last couple of weeks I was mainly involved in finding a solution that works without beamforming, because the ReSpeaker microphone array I planned to use has proven to be unreliable.

In the end I decided to use a simple setup containing 3 Lavalier microphones attached to a USB soundcard, which was then controlled from the script running on Metis. This was then quite straight forward and leads to good results, especially when there are only two speakers. In the future it would be much fun to try to run more advanced audio algorithms on Metis, as it should be quite performant there.

One major topic for audio is of course latency, which is a bit hard to reduce significantly on a Linux system. But of course there are other audio applications that are not as latency critical, such as scene classification or ASR.

Feel free to watch my live demo:

or watch the video that describes the whole system:

All the code and a writeup can be found at the github repository: https://github.com/skroed/see_and_hear/

I would be also happy to receive feedback or help with setting this up elsewhere.

I would also like to thank Axelera for making this happen and being so generous with time and ressources. It was really fun playing with Metis and I congratulate you guys for having designed such a flexible system and the powerful Voyager SDK.

I wish the others a nice last week of building.

Best,

Sebastian

Page 1 / 1

Amazing work @skroedel! Exactly as you first described, and incorporating audio into an AI project is something of a first here - nice work!

Do you have any plans to put it into live action?

Thanks! The system is working great as it is (we tested it). I wanted to do some audio recording to demonstrate it but it was technically a bit hard. I think it needs a bit more time but definitely doable ☺️

Hi @skroedel ,

Congratulations on your final submission! Well done on integrating everything together - the vision pipeline, your application with the business logic, audio routing and controlling objects in the physical world. Good luck with your future projects using Metis!

Best,

Radhika

Sign up

Log in, or create an Axelera AI account

Login to the community

Log in, or create an Axelera AI account

Scanning file for viruses.

This file cannot be downloaded