Alright I’ve started researching available tools and started defining the code.
I won’t bore you with all the decisions, but because of modularity I’ve ended up with a wrapper library called Sherpa NX, which seems to do STT and TTS.
For the first prototype I will be using STT whisper tiny and for the TTS I’ll do piper