speech to text - Stardance

1h 1m 20s logged

Devlog 01

started my speech-to-text project today

got microphone input working with SoundDevice and set it up so it constantly listens in small chunks. RMS volume to figure out whether I’m speaking or not, and then storing only the audio that actually contains speech

i also added silence detection so when I stop talking for about half a second, it assumes I’ve finished a sentence and sends the audio off to Faster-Whisper for transcription. transcription runs in a separate thread so the microphone can keep listening while Whisper does its thing

had to mess around with locks and a few state variables to stop duplicate transcriptions from happening, but it’s working pretty well now. right now it can listen, detect when I’m speaking, wait for me to finish, and then print the transcribed text automatically.

still needs some tweaking though. the silence detection isn’t perfect and I’m creating a lot of threads right now, which probably isn’t the best approach long term

start of the code is in the photo, i’m at like 102 lines