You are browsing as a guest. Sign up (or log in) to start making projects!

chuku-N0

@chuku-N0

Joined June 1st, 2026

  • 1Devlogs
  • 2Projects
  • 0Ships
  • 0Votes
Open comments for this post

21m 29s logged

Spent some quality time building out a completely local, voice-activated desktop assistant. I was not able to log all of my time which is around 2 weeks of work because stardust didn’t start. The goal was speed, screen awareness, and an interface that doesn’t just sit there but actually feels alive.

What’s working:

Dual-Model Brains: Hooked up Ollama to dynamically switch models depending on the task. It defaults to a fast text model (mistral) for snappy voice conversations but instantly pivots to a vision model (llama3.2-vision) when triggered to inspect the screen.

Eyes on the Desktop: Integrated real-time screenshot capturing using PyAutoGUI. If I ask “Jarvis, look at this,” it grabs the frame, pipes it to the vision model, and tells me what I’m looking at completely offline.

Reactive Audio GUI: Built a CustomTkinter dark-themed interface featuring a central arc engine. It doesn’t just loop an animation; it parses live microphone input streams via sounddevice to pulse and flash dynamically based on my voice volume.

Contextual YouTube & Notes Engine: Taught it to search YouTube, read the top three results via voice synthesis, and wait for contextual follow-ups (“open the second one”). It can also handle local Python file analysis and dynamic note-taking routines through a custom chunk-reading mechanism.

Barge-In Interrupts: Implemented background thread monitoring so I can cut the AI off mid-speech or mid-thought with a quick phrase if it gets too chatty.

N.B: if you have any improvements in mind I could add plz feel free

Spent some quality time building out a completely local, voice-activated desktop assistant. I was not able to log all of my time which is around 2 weeks of work because stardust didn’t start. The goal was speed, screen awareness, and an interface that doesn’t just sit there but actually feels alive.

What’s working:

Dual-Model Brains: Hooked up Ollama to dynamically switch models depending on the task. It defaults to a fast text model (mistral) for snappy voice conversations but instantly pivots to a vision model (llama3.2-vision) when triggered to inspect the screen.

Eyes on the Desktop: Integrated real-time screenshot capturing using PyAutoGUI. If I ask “Jarvis, look at this,” it grabs the frame, pipes it to the vision model, and tells me what I’m looking at completely offline.

Reactive Audio GUI: Built a CustomTkinter dark-themed interface featuring a central arc engine. It doesn’t just loop an animation; it parses live microphone input streams via sounddevice to pulse and flash dynamically based on my voice volume.

Contextual YouTube & Notes Engine: Taught it to search YouTube, read the top three results via voice synthesis, and wait for contextual follow-ups (“open the second one”). It can also handle local Python file analysis and dynamic note-taking routines through a custom chunk-reading mechanism.

Barge-In Interrupts: Implemented background thread monitoring so I can cut the AI off mid-speech or mid-thought with a quick phrase if it gets too chatty.

N.B: if you have any improvements in mind I could add plz feel free

Replying to @chuku-N0

0
2

Followers

Loading…