Dear all,
I am happy to say that I just wrapped up my See and Hear project. In the last couple of weeks I was mainly involved in finding a solution that works without beamforming, because the ReSpeaker microphone array I planned to use has proven to be unreliable.
In the end I decided to use a simple setup containing 3 Lavalier microphones attached to a USB soundcard, which was then controlled from the script running on Metis. This was then quite straight forward and leads to good results, especially when there are only two speakers. In the future it would be much fun to try to run more advanced audio algorithms on Metis, as it should be quite performant there.
One major topic for audio is of course latency, which is a bit hard to reduce significantly on a Linux system. But of course there are other audio applications that are not as latency critical, such as scene classification or ASR.
Feel free to watch my live demo:
or watch the video that describes the whole system:
All the code and a writeup can be found at the github repository: https://github.com/skroed/see_and_hear/
I would be also happy to receive feedback or help with setting this up elsewhere.
I would also like to thank Axelera for making this happen and being so generous with time and ressources. It was really fun playing with Metis and I congratulate you guys for having designed such a flexible system and the powerful Voyager SDK.
I wish the others a nice last week of building.
Best,
Sebastian