You are browsing as a guest. Sign up (or log in) to start making projects!

JJ

  • 1 Devlogs
  • 0 Total hours

It's an ai that acts like jarvis it can open webs search for things on google and even write notes. edit and delete them

Open comments for this post

21m 29s logged

Spent some quality time building out a completely local, voice-activated desktop assistant. I was not able to log all of my time which is around 2 weeks of work because stardust didn’t start. The goal was speed, screen awareness, and an interface that doesn’t just sit there but actually feels alive.

What’s working:

Dual-Model Brains: Hooked up Ollama to dynamically switch models depending on the task. It defaults to a fast text model (mistral) for snappy voice conversations but instantly pivots to a vision model (llama3.2-vision) when triggered to inspect the screen.

Eyes on the Desktop: Integrated real-time screenshot capturing using PyAutoGUI. If I ask “Jarvis, look at this,” it grabs the frame, pipes it to the vision model, and tells me what I’m looking at completely offline.

Reactive Audio GUI: Built a CustomTkinter dark-themed interface featuring a central arc engine. It doesn’t just loop an animation; it parses live microphone input streams via sounddevice to pulse and flash dynamically based on my voice volume.

Contextual YouTube & Notes Engine: Taught it to search YouTube, read the top three results via voice synthesis, and wait for contextual follow-ups (“open the second one”). It can also handle local Python file analysis and dynamic note-taking routines through a custom chunk-reading mechanism.

Barge-In Interrupts: Implemented background thread monitoring so I can cut the AI off mid-speech or mid-thought with a quick phrase if it gets too chatty.

N.B: if you have any improvements in mind I could add plz feel free

Spent some quality time building out a completely local, voice-activated desktop assistant. I was not able to log all of my time which is around 2 weeks of work because stardust didn’t start. The goal was speed, screen awareness, and an interface that doesn’t just sit there but actually feels alive.

What’s working:

Dual-Model Brains: Hooked up Ollama to dynamically switch models depending on the task. It defaults to a fast text model (mistral) for snappy voice conversations but instantly pivots to a vision model (llama3.2-vision) when triggered to inspect the screen.

Eyes on the Desktop: Integrated real-time screenshot capturing using PyAutoGUI. If I ask “Jarvis, look at this,” it grabs the frame, pipes it to the vision model, and tells me what I’m looking at completely offline.

Reactive Audio GUI: Built a CustomTkinter dark-themed interface featuring a central arc engine. It doesn’t just loop an animation; it parses live microphone input streams via sounddevice to pulse and flash dynamically based on my voice volume.

Contextual YouTube & Notes Engine: Taught it to search YouTube, read the top three results via voice synthesis, and wait for contextual follow-ups (“open the second one”). It can also handle local Python file analysis and dynamic note-taking routines through a custom chunk-reading mechanism.

Barge-In Interrupts: Implemented background thread monitoring so I can cut the AI off mid-speech or mid-thought with a quick phrase if it gets too chatty.

N.B: if you have any improvements in mind I could add plz feel free

Replying to @chuku-N0

0
2

Followers

Loading…