speech to text
- 1 Devlogs
- 1 Total hours
speech to text program for subtitling using faster-whisper and a custom-built pipeline
speech to text program for subtitling using faster-whisper and a custom-built pipeline
Devlog 01
started my speech-to-text project today
got microphone input working with SoundDevice and set it up so it constantly listens in small chunks. RMS volume to figure out whether I’m speaking or not, and then storing only the audio that actually contains speech
i also added silence detection so when I stop talking for about half a second, it assumes I’ve finished a sentence and sends the audio off to Faster-Whisper for transcription. transcription runs in a separate thread so the microphone can keep listening while Whisper does its thing
had to mess around with locks and a few state variables to stop duplicate transcriptions from happening, but it’s working pretty well now. right now it can listen, detect when I’m speaking, wait for me to finish, and then print the transcribed text automatically.
still needs some tweaking though. the silence detection isn’t perfect and I’m creating a lot of threads right now, which probably isn’t the best approach long term
start of the code is in the photo, i’m at like 102 lines