You are browsing as a guest. Sign up (or log in) to start making projects!

NellowTCS

@NellowTCS

Joined May 31st, 2026

  • 26Devlogs
  • 4Projects
  • 1Ships
  • 8Votes
Open comments for this post

1h 14m 10s logged

making it sound less terrible (two commits, one mission)

these two commits are about one thing: the output was robotic and buzzy and I was tired of it. every change here is about making the synthesizer sound more like a voice and less like a modem.


the renderer got smarter

cross-note co-articulation. renderNote() now returns { chunk, finalFormants }. the stream passes the previous note’s final formants into the next note’s renderer, and the first phoneme interpolates from those formants instead of jumping cold. gaps between notes reset the chain. notes that follow each other seamlessly now blend their formants across the boundary.

per-phoneme envelopes. the old global 5ms attack / 10ms release is gone. replaced with getPhonemeEnvelopeSamples() which gives each phoneme type its own envelope: plosives get 2ms attack / 15ms decay (sharp burst), consonants and vowels get 5ms / 3ms. every phoneme segment fades independently.

diphthong formant sweeping. PhonemeDef got an endFormants field. if a diphthong has both formants and endFormants, the renderer sweeps between them over the phoneme duration. AY now actually glides from /aa/ to /ih/. EY glides from /eh/ to /ih/. OW glides from /oh/ to /uh/. they sound like diphthongs now instead of static vowels.

vibratoOverride. was defined on Note but never read. now it is. per-note vibrato control works.


everything got retuned

glottal source. added shimmer (per-cycle amplitude variation driven by jitter, so the volume wobbles slightly like a real voice). aspiration noise is now high-pass filtered (subtract a lowpass from the raw noise) so it’s airy instead of muddy. aspiration gain bumped from 0.1 to 0.15.

plosive bursts. noise envelope for plosives changed from symmetric fade to a fast 12ms exponential decay. “pa” now sounds like a burst instead of a pop.

formant data for everything. Z, ZH, V, DH, Y, W, HH, JH in English all got formant targets. same for z, h, y, w, j in Japanese. consonants that were previously just noise bursts now resonate through the vocal tract. the difference is huge.

vowel bandwidths tightened. defaults went from 80/100/120 to 70/90/130 Hz. narrower bandwidths = sharper resonant peaks = more vowel-like quality.

voice presets retuned. male voice: lower open quotient (0.4), lower speed quotient (0.65), higher tenseness (0.65), less aspiration (0.05). sounds less breathy, more chest voice. female: formant scale 1.18. gender slider in scaleVoice now affects speed quotient and has a gender-dependent tenseness base.

pitch accent. Japanese got resolveAccents() implementing heiban pattern (low first mora, high rest). the stream groups consecutive notes into phrases, calls resolveAccents, and applies the offsets as constant pitch shifts. it’s basic but it makes Japanese phrases have some melodic contour beyond what the score provides.


four TODO items checked off in one go: co-articulation, phoneme envelopes, diphthong sweeping, vibratoOverride. pitch accent too.

it still doesn’t sound human. but it’s starting to sound like it’s trying. and that’s a big step from where it was.


if you can identify the song in the editor image, good job, you’re cool :D

making it sound less terrible (two commits, one mission)

these two commits are about one thing: the output was robotic and buzzy and I was tired of it. every change here is about making the synthesizer sound more like a voice and less like a modem.


the renderer got smarter

cross-note co-articulation. renderNote() now returns { chunk, finalFormants }. the stream passes the previous note’s final formants into the next note’s renderer, and the first phoneme interpolates from those formants instead of jumping cold. gaps between notes reset the chain. notes that follow each other seamlessly now blend their formants across the boundary.

per-phoneme envelopes. the old global 5ms attack / 10ms release is gone. replaced with getPhonemeEnvelopeSamples() which gives each phoneme type its own envelope: plosives get 2ms attack / 15ms decay (sharp burst), consonants and vowels get 5ms / 3ms. every phoneme segment fades independently.

diphthong formant sweeping. PhonemeDef got an endFormants field. if a diphthong has both formants and endFormants, the renderer sweeps between them over the phoneme duration. AY now actually glides from /aa/ to /ih/. EY glides from /eh/ to /ih/. OW glides from /oh/ to /uh/. they sound like diphthongs now instead of static vowels.

vibratoOverride. was defined on Note but never read. now it is. per-note vibrato control works.


everything got retuned

glottal source. added shimmer (per-cycle amplitude variation driven by jitter, so the volume wobbles slightly like a real voice). aspiration noise is now high-pass filtered (subtract a lowpass from the raw noise) so it’s airy instead of muddy. aspiration gain bumped from 0.1 to 0.15.

plosive bursts. noise envelope for plosives changed from symmetric fade to a fast 12ms exponential decay. “pa” now sounds like a burst instead of a pop.

formant data for everything. Z, ZH, V, DH, Y, W, HH, JH in English all got formant targets. same for z, h, y, w, j in Japanese. consonants that were previously just noise bursts now resonate through the vocal tract. the difference is huge.

vowel bandwidths tightened. defaults went from 80/100/120 to 70/90/130 Hz. narrower bandwidths = sharper resonant peaks = more vowel-like quality.

voice presets retuned. male voice: lower open quotient (0.4), lower speed quotient (0.65), higher tenseness (0.65), less aspiration (0.05). sounds less breathy, more chest voice. female: formant scale 1.18. gender slider in scaleVoice now affects speed quotient and has a gender-dependent tenseness base.

pitch accent. Japanese got resolveAccents() implementing heiban pattern (low first mora, high rest). the stream groups consecutive notes into phrases, calls resolveAccents, and applies the offsets as constant pitch shifts. it’s basic but it makes Japanese phrases have some melodic contour beyond what the score provides.


four TODO items checked off in one go: co-articulation, phoneme envelopes, diphthong sweeping, vibratoOverride. pitch accent too.

it still doesn’t sound human. but it’s starting to sound like it’s trying. and that’s a big step from where it was.


if you can identify the song in the editor image, good job, you’re cool :D

Replying to @NellowTCS

0
0
Open comments for this post

1h 4m 59s logged

CI arc (three commits, one story)

three commits that are really one story: getting CI from “permanently red” to green.


the setup

“wrote” (aka copied and modified) six GitHub Actions workflows in one go:

ci.yml: lint + test + build on push/PR. matrix tests Node 20 and 22. builds the library first, then typechecks the demo. (docs build is commented out because docs don’t exist yet. they will. eventually.)

test.yml: dedicated test runner. same Node 20/22 matrix. runs jest in the Build workspace.

release-npm.yml: publishes to npm on GitHub release. strips private/scripts/devDependencies from package.json, copies README and LICENSE into Build/, publishes with --provenance. has a workflow_dispatch with a dry-run option so I can test without actually publishing.

security-audit.yml: runs npm audit --audit-level=high on all three workspaces (root, Build, Demo). daily cron plus on push when package files change.

static.yml: reworked the GitHub Pages deployment. it was pointed at Build/dist (wrong, that’s the library output). now it builds the library, builds the demo, and deploys Demo/dist as the pages root. docs will go in pages-root/docs/ when they exist.

single-file.yml: was still referencing Web-Template (the old template name). fixed to point at Demo, output file is now UTAUjsEditor.html.

also integrated Updato (my own auto-updater library) into the demo. on load it checks the current build hash against the latest commit on main and shows an update notification if there’s a newer version. the build hash gets injected at build time via __BUILD_HASH__ in the vite config.

cleaned up the TODO: removed all the completed checkboxes (they were cluttering the file), added detail to the remaining items.


the fixes

CI was 0/3 passing. then 1/8 passing. then 2/8. then eventually 8/8. the classic experience.

the single-file and updato workflows needed the library built before the demo (workspace dependency). added npm ci at root level and a “Build Library” step before the demo build. also added vite-plugin-singlefile and cross-env for the build:single script.

second fix commit added ts-node and unrun as dev deps because the ESM config loading was unhappy without them.

three commits to go from red to green. could be worse honestly but whatever

CI arc (three commits, one story)

three commits that are really one story: getting CI from “permanently red” to green.


the setup

“wrote” (aka copied and modified) six GitHub Actions workflows in one go:

ci.yml: lint + test + build on push/PR. matrix tests Node 20 and 22. builds the library first, then typechecks the demo. (docs build is commented out because docs don’t exist yet. they will. eventually.)

test.yml: dedicated test runner. same Node 20/22 matrix. runs jest in the Build workspace.

release-npm.yml: publishes to npm on GitHub release. strips private/scripts/devDependencies from package.json, copies README and LICENSE into Build/, publishes with --provenance. has a workflow_dispatch with a dry-run option so I can test without actually publishing.

security-audit.yml: runs npm audit --audit-level=high on all three workspaces (root, Build, Demo). daily cron plus on push when package files change.

static.yml: reworked the GitHub Pages deployment. it was pointed at Build/dist (wrong, that’s the library output). now it builds the library, builds the demo, and deploys Demo/dist as the pages root. docs will go in pages-root/docs/ when they exist.

single-file.yml: was still referencing Web-Template (the old template name). fixed to point at Demo, output file is now UTAUjsEditor.html.

also integrated Updato (my own auto-updater library) into the demo. on load it checks the current build hash against the latest commit on main and shows an update notification if there’s a newer version. the build hash gets injected at build time via __BUILD_HASH__ in the vite config.

cleaned up the TODO: removed all the completed checkboxes (they were cluttering the file), added detail to the remaining items.


the fixes

CI was 0/3 passing. then 1/8 passing. then 2/8. then eventually 8/8. the classic experience.

the single-file and updato workflows needed the library built before the demo (workspace dependency). added npm ci at root level and a “Build Library” step before the demo build. also added vite-plugin-singlefile and cross-env for the build:single script.

second fix commit added ts-node and unrun as dev deps because the ESM config loading was unhappy without them.

three commits to go from red to green. could be worse honestly but whatever

Replying to @NellowTCS

0
1
Open comments for this post

37m 42s logged

the prettier commit (and a tiny bugfix)

two commits. one has 19 lines of actual code. the other touched every single file in the project.


the bugfix

the piano roll’s resize handle wasn’t working for already-selected notes. you could resize on first click, but if you clicked a note to select it and THEN tried to drag the right edge, it would move the note instead of resizing. added a resize zone check that fires before the drag-to-move handler when a note is already selected. 19 lines.


the formatting pass

ran prettier on the entire codebase. every file. the diff is enormous and the actual logic changes are: zero.

added .prettierrc (semicolons, double quotes, trailing commas, 140 char width, svelte plugin), .prettierignore, and eslint.config.ts. bumped eslint to 10.5 and typescript-eslint to 8.61. added jiti for ESM config loading.

the one real improvement buried in here: replaced the Function type in ufdata.ts with a proper ParseFn type alias. eslint was right to yell at me for using bare Function. everything else is semicolons and line breaks.

the codebase has a consistent style now. that’s the whole commit. sometimes you just gotta.

the prettier commit (and a tiny bugfix)

two commits. one has 19 lines of actual code. the other touched every single file in the project.


the bugfix

the piano roll’s resize handle wasn’t working for already-selected notes. you could resize on first click, but if you clicked a note to select it and THEN tried to drag the right edge, it would move the note instead of resizing. added a resize zone check that fires before the drag-to-move handler when a note is already selected. 19 lines.


the formatting pass

ran prettier on the entire codebase. every file. the diff is enormous and the actual logic changes are: zero.

added .prettierrc (semicolons, double quotes, trailing commas, 140 char width, svelte plugin), .prettierignore, and eslint.config.ts. bumped eslint to 10.5 and typescript-eslint to 8.61. added jiti for ESM config loading.

the one real improvement buried in here: replaced the Function type in ufdata.ts with a proper ParseFn type alias. eslint was right to yell at me for using bare Function. everything else is semicolons and line breaks.

the codebase has a consistent style now. that’s the whole commit. sometimes you just gotta.

Replying to @NellowTCS

0
2
Open comments for this post

29m 54s logged

open any vocal synth file ever made

so you know how the TODO said “MIDI file import” as one little checkbox? I may have slightly exceeded scope on that one.

UTAU.js can now import UST, USTX, VPR, VSQX, VSQ, SVP, MIDI, MusicXML, PPSF, S5P, TSSLN, CCS, DV, and UFData files. that’s UTAU, OpenUTAU, Vocaloid, Synthesizer V, Piapro Studio, CeVIO, DeepVocal, and standard MIDI. basically every vocal synth format that exists.

this is thanks to utaformatix-ts by sevenc-nanashi, which is a universal parser for vocal synth project files. it converts everything into a common UfData format. I wrote a 145-line adapter (ufdata.ts) that converts UfData into UTAU.js Scores. lazy-loaded so the parser only gets pulled in when you actually import a file.


pitch bends

this was the hard part. vocal synth files have pitch curves. track-level arrays of tick-value pairs that describe how the pitch deviates from the written note. the importer splits the track-level curve into per-note pitch bends, handling absolute-to-relative conversion, null value filtering, start/end padding, and extrapolation from preceding points.

the renderer now reads note.pitchBend and interpolates it per-sample alongside vibrato. the math: f0 = baseF0 * 2^((bendSemitones * 100 + vibratoCents) / 1200). pitch bends and vibrato stack correctly in cents space.


piano roll overhaul

the piano roll was hardcoded to 3 octaves (ik bad, but will fix soon) and a fixed width. now it covers the full MIDI range (0-127) with virtual scrolling. wheel scrolls vertically, shift+wheel or trackpad scrolls horizontally. viewport culling so only visible notes and key labels get drawn. auto-scrolls to center on notes when you import a file.

also: pitch curves render as yellow lines overlaid on note blocks. you can see the imported pitch data right there on the piano roll.


other stuff

added await setTimeout(0) in the streaming loop so the UI doesn’t freeze during long scores. the demo has an “Open” button that accepts all 14 supported file extensions. TODO got updated with a lot of checkboxes ticked.

Some amount of lines of import tests covering note mapping, tempo conversion, pitch splitting (absolute, relative, null filtering, extrapolation, out-of-range skipping, pitch:false opt-out), and edge cases.

open any vocal synth file ever made

so you know how the TODO said “MIDI file import” as one little checkbox? I may have slightly exceeded scope on that one.

UTAU.js can now import UST, USTX, VPR, VSQX, VSQ, SVP, MIDI, MusicXML, PPSF, S5P, TSSLN, CCS, DV, and UFData files. that’s UTAU, OpenUTAU, Vocaloid, Synthesizer V, Piapro Studio, CeVIO, DeepVocal, and standard MIDI. basically every vocal synth format that exists.

this is thanks to utaformatix-ts by sevenc-nanashi, which is a universal parser for vocal synth project files. it converts everything into a common UfData format. I wrote a 145-line adapter (ufdata.ts) that converts UfData into UTAU.js Scores. lazy-loaded so the parser only gets pulled in when you actually import a file.


pitch bends

this was the hard part. vocal synth files have pitch curves. track-level arrays of tick-value pairs that describe how the pitch deviates from the written note. the importer splits the track-level curve into per-note pitch bends, handling absolute-to-relative conversion, null value filtering, start/end padding, and extrapolation from preceding points.

the renderer now reads note.pitchBend and interpolates it per-sample alongside vibrato. the math: f0 = baseF0 * 2^((bendSemitones * 100 + vibratoCents) / 1200). pitch bends and vibrato stack correctly in cents space.


piano roll overhaul

the piano roll was hardcoded to 3 octaves (ik bad, but will fix soon) and a fixed width. now it covers the full MIDI range (0-127) with virtual scrolling. wheel scrolls vertically, shift+wheel or trackpad scrolls horizontally. viewport culling so only visible notes and key labels get drawn. auto-scrolls to center on notes when you import a file.

also: pitch curves render as yellow lines overlaid on note blocks. you can see the imported pitch data right there on the piano roll.


other stuff

added await setTimeout(0) in the streaming loop so the UI doesn’t freeze during long scores. the demo has an “Open” button that accepts all 14 supported file extensions. TODO got updated with a lot of checkboxes ticked.

Some amount of lines of import tests covering note mapping, tempo conversion, pitch splitting (absolute, relative, null filtering, extrapolation, out-of-range skipping, pitch:false opt-out), and edge cases.

Replying to @NellowTCS

0
0
Open comments for this post

46m 35s logged

uhhh tests (that was fast huh)

the commit message says it all. 961 lines added. 10 test files. plus a WAV encoder, a bunch of bug fixes, and the entire player got an upgrade. in one sitting.


tests

jest config, ts-jest, ESM mode. ten test files covering everything that exists:

  • envelope.test.ts: attack/release shape, mixBuffers offset and gain, edge cases
  • filter.test.ts: FormantFilter resonator, FormantCascade
  • noise.test.ts: NoiseSource output
  • oscillator.test.ts: LFGlottalSource waveform shape
  • wav.test.ts: encodeWav RIFF header
  • english.test.ts: lexicon hits and fallback
  • japanese.test.ts: hiragana, romaji, special cases
  • renderer.test.ts: renderNote output shape, sample count, non-zero output
  • stream.test.ts: streamScore chunk boundaries, mixChunks
  • voices/index.test.ts: buildVoice, scaleVoice parameter ranges

writing tests found bugs. writing tests always finds bugs.


the bugs tests found

OW was missing. the English phoneme table had every ARPABET vowel except OW. “HELLO” ends with OW. the demo word was literally broken and I didn’t notice because the fallback produced silence instead of crashing. added it. F1=470 F2=1000 F3=2400.

anti-resonator formula was wrong. the pole radius was hardcoded to 0.99 with a random 0.95 frequency offset. now it derives the pole from bw * 1.5 like a real anti-resonator should. nasals sound less terrible.

jitter was per-sample. it was recalculating a random f0 every single sample, which made the pitch wobble chaotically instead of naturally. moved it to recalculate once per glottal cycle. much more realistic.

noise fadeout could go negative. phDur - segPos - 1 can be negative at the boundary. clamped to 0.

gain normalization could divide by zero. clamped overallPeak to 1e-6 and gain to max 100.

buildVoice spread order was wrong. ...overrides was before the sub-objects, so the glottal/formant/vibrato defaults always overwrote user values. flipped the order.


WAV encoder

(i’ve written manual WAV encoders before, I just copy-pasted that, it’s not that bad tbh)

62 lines. encodeWav() takes AudioChunks and writes a proper RIFF/WAVE file. 16-bit PCM, little-endian, handles mono and stereo. float-to-int16 conversion with clamping. exported from the barrel file.


player upgrades

setVolume() API. volume parameter on play(). progress events actually fire now (the type existed but was never emitted). stop does a 50ms gain ramp to zero before closing the AudioContext so it doesn’t click. scheduling uses Math.max(ctx.currentTime + 0.01, ...) to prevent scheduling in the past if rendering falls behind.


stream tempo handling

streamScore() was using a single currentTempo for the whole note. now it has tempoAt() for point lookups and noteSampleDuration() that integrates across tempo changes within a note. a note that spans a tempo change gets the right duration now.


also added a cascade reset between phonemes in the renderer (was carrying filter state across phoneme boundaries causing ringing), added a missing phoneme console.warn so you can actually debug G2P failures, and fixed the Japanese romaji parser to skip spaces instead of treating them as unknown consonants.

the TODO list is getting shorter. slowly but surely.
…and i forgot to update it oops

uhhh tests (that was fast huh)

the commit message says it all. 961 lines added. 10 test files. plus a WAV encoder, a bunch of bug fixes, and the entire player got an upgrade. in one sitting.


tests

jest config, ts-jest, ESM mode. ten test files covering everything that exists:

  • envelope.test.ts: attack/release shape, mixBuffers offset and gain, edge cases
  • filter.test.ts: FormantFilter resonator, FormantCascade
  • noise.test.ts: NoiseSource output
  • oscillator.test.ts: LFGlottalSource waveform shape
  • wav.test.ts: encodeWav RIFF header
  • english.test.ts: lexicon hits and fallback
  • japanese.test.ts: hiragana, romaji, special cases
  • renderer.test.ts: renderNote output shape, sample count, non-zero output
  • stream.test.ts: streamScore chunk boundaries, mixChunks
  • voices/index.test.ts: buildVoice, scaleVoice parameter ranges

writing tests found bugs. writing tests always finds bugs.


the bugs tests found

OW was missing. the English phoneme table had every ARPABET vowel except OW. “HELLO” ends with OW. the demo word was literally broken and I didn’t notice because the fallback produced silence instead of crashing. added it. F1=470 F2=1000 F3=2400.

anti-resonator formula was wrong. the pole radius was hardcoded to 0.99 with a random 0.95 frequency offset. now it derives the pole from bw * 1.5 like a real anti-resonator should. nasals sound less terrible.

jitter was per-sample. it was recalculating a random f0 every single sample, which made the pitch wobble chaotically instead of naturally. moved it to recalculate once per glottal cycle. much more realistic.

noise fadeout could go negative. phDur - segPos - 1 can be negative at the boundary. clamped to 0.

gain normalization could divide by zero. clamped overallPeak to 1e-6 and gain to max 100.

buildVoice spread order was wrong. ...overrides was before the sub-objects, so the glottal/formant/vibrato defaults always overwrote user values. flipped the order.


WAV encoder

(i’ve written manual WAV encoders before, I just copy-pasted that, it’s not that bad tbh)

62 lines. encodeWav() takes AudioChunks and writes a proper RIFF/WAVE file. 16-bit PCM, little-endian, handles mono and stereo. float-to-int16 conversion with clamping. exported from the barrel file.


player upgrades

setVolume() API. volume parameter on play(). progress events actually fire now (the type existed but was never emitted). stop does a 50ms gain ramp to zero before closing the AudioContext so it doesn’t click. scheduling uses Math.max(ctx.currentTime + 0.01, ...) to prevent scheduling in the past if rendering falls behind.


stream tempo handling

streamScore() was using a single currentTempo for the whole note. now it has tempoAt() for point lookups and noteSampleDuration() that integrates across tempo changes within a note. a note that spans a tempo change gets the right duration now.


also added a cascade reset between phonemes in the renderer (was carrying filter state across phoneme boundaries causing ringing), added a missing phoneme console.warn so you can actually debug G2P failures, and fixed the Japanese romaji parser to skip spaces instead of treating them as unknown consonants.

the TODO list is getting shorter. slowly but surely.
…and i forgot to update it oops

Replying to @NellowTCS

0
2
Open comments for this post

1h 48m 19s logged

it makes sound now

okay so. I may have blacked out and written an entire synthesizer in one commit. 210 lines of renderer, 65 lines of streaming, a full Svelte demo app with a piano roll, and a dozen DSP fixes. this is flo-era hyperfocus energy except I can HEAR it this time. (well I could hear flo, it’s a audio format, so like, duh, but yk what i mean loll)


the renderer

renderNote() takes a Note + VoiceConfig + LanguageModule and produces actual audio. per-sample processing: vibrato with attack ramp, 30ms smoothstep formant interpolation between phonemes, glottal pulse through the cascade, shaped noise for consonants with per-segment fade in/out, smoothstep attack/release envelope, peak normalization. consonants get their default duration, vowels split the remaining time. if consonants would eat more than 40% of the note they get compressed.

every filter, oscillator, and cascade got refactored from batch to per-sample. slower but I can morph formants sample-by-sample for smooth transitions.


DSP fixes (there were several)

the LF oscillator’s open phase was inverted. added a DC blocking filter because the pulse was making everything drift. added jitter for natural-sounding pitch variation.

the resonator gain formula was wrong (b0 = 1 - r*r should be b0 = 1 - B - C). FormantCascade now runs anti-resonators before resonators (correct order for nasals). added setPassthrough() for unused filter slots.

female formant scale went from 0.88 to 1.15. I had it backwards. scaling down shrinks the tract and sounds childlike. scaling up is what you want.


streaming + demo

streamScore() is an async generator. walks the score note by note, handles tempo changes, yields chunks. player starts playing before the score finishes rendering.

the demo is a full Svelte 5 app. canvas piano roll (click to create, drag to move/resize, delete to remove), voice panel with easy sliders + expandable advanced params, transport bar, language switcher. Japanese demo says “ka na ta shi i ne”. English says “HELLO WORLD THIS IS A TEST”.

press play and it synthesizes through Web Audio in real time. from math.


everything else

Japanese plosives got formant data (were noise-only). added “l” as an r-alias for loanwords. barrel file exports the full public API. wrote a comprehensive TODO.md because the list of things that aren’t done is very long.

it sounds terrible. robotic and buzzy and the consonants are more like clicks. but it’s SOUND. generated from MATH. in a BROWSER. Peterson and Barney would be proud. (or horrified.)

it makes sound now

okay so. I may have blacked out and written an entire synthesizer in one commit. 210 lines of renderer, 65 lines of streaming, a full Svelte demo app with a piano roll, and a dozen DSP fixes. this is flo-era hyperfocus energy except I can HEAR it this time. (well I could hear flo, it’s a audio format, so like, duh, but yk what i mean loll)


the renderer

renderNote() takes a Note + VoiceConfig + LanguageModule and produces actual audio. per-sample processing: vibrato with attack ramp, 30ms smoothstep formant interpolation between phonemes, glottal pulse through the cascade, shaped noise for consonants with per-segment fade in/out, smoothstep attack/release envelope, peak normalization. consonants get their default duration, vowels split the remaining time. if consonants would eat more than 40% of the note they get compressed.

every filter, oscillator, and cascade got refactored from batch to per-sample. slower but I can morph formants sample-by-sample for smooth transitions.


DSP fixes (there were several)

the LF oscillator’s open phase was inverted. added a DC blocking filter because the pulse was making everything drift. added jitter for natural-sounding pitch variation.

the resonator gain formula was wrong (b0 = 1 - r*r should be b0 = 1 - B - C). FormantCascade now runs anti-resonators before resonators (correct order for nasals). added setPassthrough() for unused filter slots.

female formant scale went from 0.88 to 1.15. I had it backwards. scaling down shrinks the tract and sounds childlike. scaling up is what you want.


streaming + demo

streamScore() is an async generator. walks the score note by note, handles tempo changes, yields chunks. player starts playing before the score finishes rendering.

the demo is a full Svelte 5 app. canvas piano roll (click to create, drag to move/resize, delete to remove), voice panel with easy sliders + expandable advanced params, transport bar, language switcher. Japanese demo says “ka na ta shi i ne”. English says “HELLO WORLD THIS IS A TEST”.

press play and it synthesizes through Web Audio in real time. from math.


everything else

Japanese plosives got formant data (were noise-only). added “l” as an r-alias for loanwords. barrel file exports the full public API. wrote a comprehensive TODO.md because the list of things that aren’t done is very long.

it sounds terrible. robotic and buzzy and the consonants are more like clicks. but it’s SOUND. generated from MATH. in a BROWSER. Peterson and Barney would be proud. (or horrified.)

Replying to @NellowTCS

0
1
Open comments for this post

22m 55s logged

voices, a player, and oh look a demo app

three things in one commit because I couldn’t decide which to work on so I did all of them.


voice presets

two built-in voices: Male and Female.

the male voice has a lower open quotient (0.45, vocal folds close faster), lower aspiration (0.08, less breathy), and formant scale of 1.0. the female voice has higher open quotient (0.55), more aspiration (0.12), and formant scale of 0.88 which shifts all the formant frequencies up to simulate a shorter vocal tract.

vibrato differs too. male is 5.5 Hz rate with 30 cents depth. female is 6.0 Hz with 40 cents. these are rough averages from the singing voice literature. real vibrato varies wildly between singers but you have to start somewhere.

buildVoice() lets you construct a custom voice with partial overrides. scaleVoice() is the fun one. it takes a voice config and five intuitive sliders: gender (-1 to +1), breathiness, tension, brightness, vibratoAmount. the gender parameter interpolates formant scale between 0.88 and 1.0. breathiness maps to open quotient and aspiration. tension maps to glottal tenseness. brightness maps to formant bandwidth (narrower bandwidths = brighter, more resonant sound). vibrato amount scales depth.

the idea is you start with a preset and then tweak it with human-readable parameters instead of raw acoustic values. “make this voice breathier” is easier to think about than “increase open quotient to 0.55 and aspiration to 0.12”.

registry pattern same as languages. getVoice("male"), registerVoice("my-voice", config).


stream player

this is the thing that actually makes sound come out of your speakers.

StreamPlayer takes an AsyncGenerator<AudioChunk> and schedules the audio through the Web Audio API. each chunk gets turned into an AudioBuffer, wired through a GainNode (fixed at 0.8 for now), and scheduled at its exact start time using ctx.currentTime + chunk.startSample / sampleRate. the generator can yield chunks as fast or as slow as it wants. the player just keeps scheduling them.

async generator as the interface is the key design decision. the synthesizer can stream audio chunk by chunk as it renders each phoneme, and the player starts playing before the whole score is done. no waiting for the full render. just start.

pause suspends the AudioContext. resume resumes it. stop aborts the generator via AbortController and closes the context. event system emits stateChange, progress, done, error. on() returns an unsubscribe function. clean lifecycle.


the demo app

svelte 5 + vite. the Demo/ workspace finally has code in it. just scaffolding for now, no UI components yet. but the package.json is wired up: utaujs as a workspace dependency so it pulls from the Build/ output, @sveltejs/vite-plugin-svelte, vite 8.

I picked svelte because it’s the lightest framework that still gives me reactivity and components without a virtual DOM. for a music app where audio timing matters, I don’t want React’s reconciliation cycle anywhere near my render loop. svelte compiles to vanilla JS. no runtime overhead. (also I just like svelte.)


the engine is almost wirable end to end now. language module produces phonemes, voice config provides the acoustic parameters, the DSP layer renders audio chunks, the stream player schedules them through Web Audio. the only missing piece is the actual synthesizer that takes a Score + Voice + Language and yields AudioChunks. that’s next.

getting close to hearing actual sound.

voices, a player, and oh look a demo app

three things in one commit because I couldn’t decide which to work on so I did all of them.


voice presets

two built-in voices: Male and Female.

the male voice has a lower open quotient (0.45, vocal folds close faster), lower aspiration (0.08, less breathy), and formant scale of 1.0. the female voice has higher open quotient (0.55), more aspiration (0.12), and formant scale of 0.88 which shifts all the formant frequencies up to simulate a shorter vocal tract.

vibrato differs too. male is 5.5 Hz rate with 30 cents depth. female is 6.0 Hz with 40 cents. these are rough averages from the singing voice literature. real vibrato varies wildly between singers but you have to start somewhere.

buildVoice() lets you construct a custom voice with partial overrides. scaleVoice() is the fun one. it takes a voice config and five intuitive sliders: gender (-1 to +1), breathiness, tension, brightness, vibratoAmount. the gender parameter interpolates formant scale between 0.88 and 1.0. breathiness maps to open quotient and aspiration. tension maps to glottal tenseness. brightness maps to formant bandwidth (narrower bandwidths = brighter, more resonant sound). vibrato amount scales depth.

the idea is you start with a preset and then tweak it with human-readable parameters instead of raw acoustic values. “make this voice breathier” is easier to think about than “increase open quotient to 0.55 and aspiration to 0.12”.

registry pattern same as languages. getVoice("male"), registerVoice("my-voice", config).


stream player

this is the thing that actually makes sound come out of your speakers.

StreamPlayer takes an AsyncGenerator<AudioChunk> and schedules the audio through the Web Audio API. each chunk gets turned into an AudioBuffer, wired through a GainNode (fixed at 0.8 for now), and scheduled at its exact start time using ctx.currentTime + chunk.startSample / sampleRate. the generator can yield chunks as fast or as slow as it wants. the player just keeps scheduling them.

async generator as the interface is the key design decision. the synthesizer can stream audio chunk by chunk as it renders each phoneme, and the player starts playing before the whole score is done. no waiting for the full render. just start.

pause suspends the AudioContext. resume resumes it. stop aborts the generator via AbortController and closes the context. event system emits stateChange, progress, done, error. on() returns an unsubscribe function. clean lifecycle.


the demo app

svelte 5 + vite. the Demo/ workspace finally has code in it. just scaffolding for now, no UI components yet. but the package.json is wired up: utaujs as a workspace dependency so it pulls from the Build/ output, @sveltejs/vite-plugin-svelte, vite 8.

I picked svelte because it’s the lightest framework that still gives me reactivity and components without a virtual DOM. for a music app where audio timing matters, I don’t want React’s reconciliation cycle anywhere near my render loop. svelte compiles to vanilla JS. no runtime overhead. (also I just like svelte.)


the engine is almost wirable end to end now. language module produces phonemes, voice config provides the acoustic parameters, the DSP layer renders audio chunks, the stream player schedules them through Web Audio. the only missing piece is the actual synthesizer that takes a Score + Voice + Language and yields AudioChunks. that’s next.

getting close to hearing actual sound.

Replying to @NellowTCS

0
0
Open comments for this post

40m 43s logged

two languages walk into a synthesizer…

so I said “next step is a basic ARPABET phoneme dictionary” and then I just… did both English AND Japanese in one sitting. because apparently my brain doesn’t know how to do things incrementally.


english

every phoneme in ARPABET, with real formant data from real papers. I’m going to be responsible and cite my sources (gasp):

vowel formants are Peterson & Barney 1952 male speaker means. the classic dataset. 76 speakers, 10 monophthongal vowels. /IY/ is F1=270 F2=2290 F3=3010. /AA/ is F1=730 F2=1090 F3=2440. these numbers are from a 74-year-old paper and they’re still the standard reference. wild.

consonant noise centres follow Jongman et al. 2000 for fricatives (sibilant spectral peaks), Stevens 1998 for plosive burst loci, and Fujimura 1962 for nasal formants/antiformants. I feel like an actual phonetician typing these citations. I am not an actual phonetician.

the full set: 10 vowels, 4 diphthongs, 6 plosives, 9 fricatives, 3 nasals, 4 approximants, 2 affricates. every consonant has its noise shaping config. every nasal has antiformant data. every vowel has 5 formant targets (F1 through F5).

there’s also a grapheme-to-phoneme dictionary with like 100 common English words. “HELLO” -> [“HH”, “EH”, “L”, “OW”]. “BEAUTIFUL” -> [“B”, “Y”, “UW”, “T”, “IH”, “F”, “UH”, “L”]. it’s a smol subset but it covers the words you’d actually want a singing synthesizer to say. sun, moon, star, dream, love, forever, together. very anime opening core vocabulary. I should expand it eventually but it’s fine for testing.

the fallback for unknown words is fun: first it checks if the input is already ARPABET symbols separated by spaces/underscores. if not, it falls back to a dead simple single-character mapping where each letter gets one phoneme. it’s terrible but it won’t crash.


japanese

honestly? mapping Japanese to phonemes is SO much easier than English. kana are basically a syllabary. each character maps to exactly one consonant-vowel pair (or just a vowel). no ambiguity. no “through” being pronounced nothing like it looks. English is a disaster and Japanese is a joy.

(I still don’t know Japanese. sighhh. but the internet is very helpful.)

vowel formants are from Yazawa & Kondo 2019, specifically the short-vowel midpoint averages for male speakers from their ICPhS paper. that Japanese vowel formant displacement paper I was reading earlier today. /a/ F1=687 F2=1283, /i/ F1=301 F2=2154, etc. F3 values come from Kitamura et al. 2009, the ATR MRI vocal tract study. different paper, different research group, but the F3 data fills a gap that Yazawa didn’t cover in detail.

the lyric parser handles: raw romaji (“ka”, “shi”, “tsu”), hiragana (あ, き, しゃ), and compound kana (きゃ, しゅ, ちょ). hiragana gets converted to romaji via a lookup table, then romaji gets split into consonant-vowel pairs via romajiToPhonemes(). special cases for し -> “shi”, ち -> “chi”, つ -> “tsu”, ふ -> “fu”. word-final ん becomes the moraic nasal N. geminate consonants (double letters) get handled. it’s not perfect but it covers standard Hepburn romanisation.


the registry

a Map of language IDs to modules. getLanguage("en") or getLanguage("jp"). registerLanguage() for future additions. both “jp” and “ja” point to Japanese because people use both and I’m not going to pick a side.


the LanguageModule interface from types.ts is earning its keep already. both languages implement the same lyricToPhonemes() contract. the synthesizer won’t know or care which language is active. plug in English, plug in Japanese, plug in anything. the architecture handles it.

I love this.

two languages walk into a synthesizer…

so I said “next step is a basic ARPABET phoneme dictionary” and then I just… did both English AND Japanese in one sitting. because apparently my brain doesn’t know how to do things incrementally.


english

every phoneme in ARPABET, with real formant data from real papers. I’m going to be responsible and cite my sources (gasp):

vowel formants are Peterson & Barney 1952 male speaker means. the classic dataset. 76 speakers, 10 monophthongal vowels. /IY/ is F1=270 F2=2290 F3=3010. /AA/ is F1=730 F2=1090 F3=2440. these numbers are from a 74-year-old paper and they’re still the standard reference. wild.

consonant noise centres follow Jongman et al. 2000 for fricatives (sibilant spectral peaks), Stevens 1998 for plosive burst loci, and Fujimura 1962 for nasal formants/antiformants. I feel like an actual phonetician typing these citations. I am not an actual phonetician.

the full set: 10 vowels, 4 diphthongs, 6 plosives, 9 fricatives, 3 nasals, 4 approximants, 2 affricates. every consonant has its noise shaping config. every nasal has antiformant data. every vowel has 5 formant targets (F1 through F5).

there’s also a grapheme-to-phoneme dictionary with like 100 common English words. “HELLO” -> [“HH”, “EH”, “L”, “OW”]. “BEAUTIFUL” -> [“B”, “Y”, “UW”, “T”, “IH”, “F”, “UH”, “L”]. it’s a smol subset but it covers the words you’d actually want a singing synthesizer to say. sun, moon, star, dream, love, forever, together. very anime opening core vocabulary. I should expand it eventually but it’s fine for testing.

the fallback for unknown words is fun: first it checks if the input is already ARPABET symbols separated by spaces/underscores. if not, it falls back to a dead simple single-character mapping where each letter gets one phoneme. it’s terrible but it won’t crash.


japanese

honestly? mapping Japanese to phonemes is SO much easier than English. kana are basically a syllabary. each character maps to exactly one consonant-vowel pair (or just a vowel). no ambiguity. no “through” being pronounced nothing like it looks. English is a disaster and Japanese is a joy.

(I still don’t know Japanese. sighhh. but the internet is very helpful.)

vowel formants are from Yazawa & Kondo 2019, specifically the short-vowel midpoint averages for male speakers from their ICPhS paper. that Japanese vowel formant displacement paper I was reading earlier today. /a/ F1=687 F2=1283, /i/ F1=301 F2=2154, etc. F3 values come from Kitamura et al. 2009, the ATR MRI vocal tract study. different paper, different research group, but the F3 data fills a gap that Yazawa didn’t cover in detail.

the lyric parser handles: raw romaji (“ka”, “shi”, “tsu”), hiragana (あ, き, しゃ), and compound kana (きゃ, しゅ, ちょ). hiragana gets converted to romaji via a lookup table, then romaji gets split into consonant-vowel pairs via romajiToPhonemes(). special cases for し -> “shi”, ち -> “chi”, つ -> “tsu”, ふ -> “fu”. word-final ん becomes the moraic nasal N. geminate consonants (double letters) get handled. it’s not perfect but it covers standard Hepburn romanisation.


the registry

a Map of language IDs to modules. getLanguage("en") or getLanguage("jp"). registerLanguage() for future additions. both “jp” and “ja” point to Japanese because people use both and I’m not going to pick a side.


the LanguageModule interface from types.ts is earning its keep already. both languages implement the same lyricToPhonemes() contract. the synthesizer won’t know or care which language is active. plug in English, plug in Japanese, plug in anything. the architecture handles it.

I love this.

Replying to @NellowTCS

0
1
Open comments for this post

57m 19s logged

new project. yes, another one.

(penumbra is on pause btw)
so uh. hi. I’m starting a new thing.

penumbra is on pause. not abandoned, just… paused. the backend is solid, the architecture is clean, and the UI needs to catch up but I need to do the HTML mockup thing first and I’m not in the headspace for that right now. it’ll come back. I promise.

in the meantime: UTAU.js.

the idea: a browser-based singing synthesizer inspired by UTAU. but here’s the thing. it’s not sample-based like the original. it’s parametric. formant synthesis. generate the entire voice from math. no voicebank files, no platform dependencies, no 200MB sample libraries. just DSP and physics and the human vocal tract modeled in TypeScript.

why TypeScript and not Rust this time? because.

I scaffolded the repo from my Web-Template and then immediately ripped out everything that made it a web template.
I replaced it with a proper library setup: tsdown for bundling (ESM + CJS + type declarations), npm workspaces (Build for the engine, Demo for a future demo app), TypeScript strict mode.

renamed to utaujs. added jest, eslint, prettier, typescript-eslint. the foundation is there.

also brought in @nisoku/satori as a dependency because I’ll want observability later and I might as well wire it up now. And like, it’s such a great observatory library like smh my head, why wouldn’t I use it

the research rabbit hole (aka I read way too many papers)

okay so today was one of those days where hackatime probably says like less than a hour but reality says 4-5.

I spent most of half of today reading. papers. reference tables. phonetics Wikipedia articles. I have 20 browser tabs open right now (thank goodness for tab groups) and they’re all about formants and audio and human vocal behaviors.

here’s the reading list:

  • Peterson & Barney 1952 (a classic formant frequency dataset, 76 speakers, 10 American English vowels, F0 through F3)
  • the Stanford CCRMA formant table (Peterson’s data averaged by gender and age group, the numbers everyone cites)
  • ARPABET (the phonetic notation system CMU uses, 39 phonemes for General American English)
  • CMU Pronouncing Dictionary (134,000+ words mapped to ARPABET, sheesh)
  • a paper on Japanese vowel formant displacement (short vs long vowels have different formant targets, which matters a lot)
  • Kitamura et al on vocal tract transfer functions from MRI-derived solid models (they literally 3D printed vocal tracts and measured the acoustic response)

the MRI one is wild. they took volumetric MRI scans of people saying Japanese vowels, built physical 3D models of their vocal tracts via stereolithography, and then measured the frequency response by pumping sound through the models.


but I also wrote code

I remember a LOT of this from working on flo.

types.ts is the big one. 91 lines of the full type stuff

oscillator.ts is the glottal source. an LF (Liljencrants-Fant) model. this is the buzzing sound your vocal folds make before your throat and mouth shape it into speech. lots of math that i don’t want to talk about.

filter.ts is the formant cascade. second-order IIR resonator (biquad) that can operate as either a resonator or anti-resonator.

noise.ts is simple. white noise generator plus a function to shape it through formant resonators. this is how you get consonants like “s” and “sh”. they’re just filtered noise.

envelope.ts is attack/release shaping with smoothstep curves plus a buffer mixing utility.


Fun!

new project. yes, another one.

(penumbra is on pause btw)
so uh. hi. I’m starting a new thing.

penumbra is on pause. not abandoned, just… paused. the backend is solid, the architecture is clean, and the UI needs to catch up but I need to do the HTML mockup thing first and I’m not in the headspace for that right now. it’ll come back. I promise.

in the meantime: UTAU.js.

the idea: a browser-based singing synthesizer inspired by UTAU. but here’s the thing. it’s not sample-based like the original. it’s parametric. formant synthesis. generate the entire voice from math. no voicebank files, no platform dependencies, no 200MB sample libraries. just DSP and physics and the human vocal tract modeled in TypeScript.

why TypeScript and not Rust this time? because.

I scaffolded the repo from my Web-Template and then immediately ripped out everything that made it a web template.
I replaced it with a proper library setup: tsdown for bundling (ESM + CJS + type declarations), npm workspaces (Build for the engine, Demo for a future demo app), TypeScript strict mode.

renamed to utaujs. added jest, eslint, prettier, typescript-eslint. the foundation is there.

also brought in @nisoku/satori as a dependency because I’ll want observability later and I might as well wire it up now. And like, it’s such a great observatory library like smh my head, why wouldn’t I use it

the research rabbit hole (aka I read way too many papers)

okay so today was one of those days where hackatime probably says like less than a hour but reality says 4-5.

I spent most of half of today reading. papers. reference tables. phonetics Wikipedia articles. I have 20 browser tabs open right now (thank goodness for tab groups) and they’re all about formants and audio and human vocal behaviors.

here’s the reading list:

  • Peterson & Barney 1952 (a classic formant frequency dataset, 76 speakers, 10 American English vowels, F0 through F3)
  • the Stanford CCRMA formant table (Peterson’s data averaged by gender and age group, the numbers everyone cites)
  • ARPABET (the phonetic notation system CMU uses, 39 phonemes for General American English)
  • CMU Pronouncing Dictionary (134,000+ words mapped to ARPABET, sheesh)
  • a paper on Japanese vowel formant displacement (short vs long vowels have different formant targets, which matters a lot)
  • Kitamura et al on vocal tract transfer functions from MRI-derived solid models (they literally 3D printed vocal tracts and measured the acoustic response)

the MRI one is wild. they took volumetric MRI scans of people saying Japanese vowels, built physical 3D models of their vocal tracts via stereolithography, and then measured the frequency response by pumping sound through the models.


but I also wrote code

I remember a LOT of this from working on flo.

types.ts is the big one. 91 lines of the full type stuff

oscillator.ts is the glottal source. an LF (Liljencrants-Fant) model. this is the buzzing sound your vocal folds make before your throat and mouth shape it into speech. lots of math that i don’t want to talk about.

filter.ts is the formant cascade. second-order IIR resonator (biquad) that can operate as either a resonator or anti-resonator.

noise.ts is simple. white noise generator plus a function to shape it through formant resonators. this is how you get consonants like “s” and “sh”. they’re just filtered noise.

envelope.ts is attack/release shaping with smoothstep curves plus a buffer mixing utility.


Fun!

Replying to @NellowTCS

0
0
Open comments for this post

2h 40m 58s logged

the polish commit (that didn’t polish enough)

okay so this commit looks massive and impressive on paper. I touched basically everything. and the code changes are real. but I need to be honest: the app still looks and feels terrible.

like, genuinely bad. the UI is broken in ways that make me want to close the window and pretend I never opened it.

but the changes are real so let me document them anyway.


layout tuning

repulsion went from 1000 to 8000, attraction from 0.01 to 0.004, ideal_length from 50 to 220. notes were clumping into an unreadable blob. the physics are better now even if you can’t tell because the rendering is fighting itself.


theme overhaul

flattened the entire theme system into one struct with semantic names. every hardcoded color replaced with CSS vars. dark and light themes. toggle in settings. this part actually works correctly and I’m proud of it even though nobody can see it because the layout is broken.


TipTap editor

replaced the textarea with TipTap via JS eval. StarterKit with bold/italic/strike/code/headings/lists. toolbar with format buttons. sends HTML back via dioxus.send(). this is genuinely nice when it works. which is actually most of the time wow.


manual linking, note list, auto-linking on save, context menu, keyboard shortcuts, zoom controls, empty state

all implemented. all technically functional. all look like they were designed by someone who has never seen a GUI before. (me. I’m that someone. i am not good at RSX, which is funny because it SHOULD just be a different form of HTML. you would think.)


what I learned

dioxus-desktop’s webview is not a browser with devtools. iterating on UI by recompiling a Rust binary every time you move a div 2 pixels is actual torture. I should have designed the UI in plain HTML first, gotten it looking right in a real browser with real devtools, and THEN ported it to rsx.

I did not do that. I have been doing it backwards this entire time. smh my silly head.

next step: HTML mockup first, then port. the backend is solid. the architecture is clean. the UI just needs to catch up.

still broken but less broken than before. progress? maybe?

onwards (to HTML mockups) :}

the polish commit (that didn’t polish enough)

okay so this commit looks massive and impressive on paper. I touched basically everything. and the code changes are real. but I need to be honest: the app still looks and feels terrible.

like, genuinely bad. the UI is broken in ways that make me want to close the window and pretend I never opened it.

but the changes are real so let me document them anyway.


layout tuning

repulsion went from 1000 to 8000, attraction from 0.01 to 0.004, ideal_length from 50 to 220. notes were clumping into an unreadable blob. the physics are better now even if you can’t tell because the rendering is fighting itself.


theme overhaul

flattened the entire theme system into one struct with semantic names. every hardcoded color replaced with CSS vars. dark and light themes. toggle in settings. this part actually works correctly and I’m proud of it even though nobody can see it because the layout is broken.


TipTap editor

replaced the textarea with TipTap via JS eval. StarterKit with bold/italic/strike/code/headings/lists. toolbar with format buttons. sends HTML back via dioxus.send(). this is genuinely nice when it works. which is actually most of the time wow.


manual linking, note list, auto-linking on save, context menu, keyboard shortcuts, zoom controls, empty state

all implemented. all technically functional. all look like they were designed by someone who has never seen a GUI before. (me. I’m that someone. i am not good at RSX, which is funny because it SHOULD just be a different form of HTML. you would think.)


what I learned

dioxus-desktop’s webview is not a browser with devtools. iterating on UI by recompiling a Rust binary every time you move a div 2 pixels is actual torture. I should have designed the UI in plain HTML first, gotten it looking right in a real browser with real devtools, and THEN ported it to rsx.

I did not do that. I have been doing it backwards this entire time. smh my silly head.

next step: HTML mockup first, then port. the backend is solid. the architecture is clean. the UI just needs to catch up.

still broken but less broken than before. progress? maybe?

onwards (to HTML mockups) :}

Replying to @NellowTCS

0
2
Open comments for this post

1h 11m 21s logged

making it actually work (kind of)

so remember how last commit I said “everything is broken”? I spent tonight fixing the worst of it. it’s not done but it’s a lot less broken.


the big fix: drag + layout fighting

the core problem was that when you drag a note, the layout engine keeps running and overwriting your drag position every 16ms. the note jitters back and forth between where you’re dragging it and where the physics wants it.

fix: new SetNodePosition event. when you drag a note, the UI publishes it through the event bus. the layout worker drains these events at the top of each cycle and calls engine.set_position() before stepping. so the layout engine knows where you put the card and computes forces from there instead of fighting you.

also: while any note is being dragged (tracked by dragged_set), the layout event loop skips updating positions entirely. other notes stay still instead of jittering from stale force calculations. once you release, everything syncs up again.


mouse events moved to root div

pan/drag/mouseup handlers were on the canvas element, which meant releasing the mouse over a note card didn’t fire mouseup on the canvas. dragging would “stick.” moved all the move/up/leave handlers to the root div so they fire regardless of what’s under the cursor.


canvas renderer simplified

removed the node drawing code from WebCanvasRenderer entirely. nodes were being drawn twice: once on the canvas in Rust/web-sys, once as DOM cards in Dioxus. the canvas now only draws the dot grid and bezier edges. the DOM handles cards. no more double rendering.

the inline JS RAF loop was also replaced with a standalone canvas-draw.js that gets called explicitly via window.__penumbra_draw() whenever render state changes. no more requestAnimationFrame spinning at 60fps when nothing changed.


spring-animated card positions

cards now spring-animate to their new positions instead of teleporting. each AnimatedCard tracks target x/y and animates via dioxus-motion springs when the target changes. so when the layout engine repositions notes, they glide smoothly. the spring from the ideas issue, basically.


sidebar panels are real now

the floating sidebar buttons actually do things now. four panels:

Search: reactive search input that queries the hybrid search engine as you type. results show title, preview, and similarity score. click a result and the camera drifts to that note.

Tags: shows all tags sorted by count. click a tag to filter the graph view to only notes with that tag. click again to clear the filter. below the tag list, shows the filtered notes.

Pins: lists all pinned notes. click to pan to them.

Settings: just shows note count and link count for now. placeholder.


context menu on right-click

right-click a note card and you get a context menu with “Open in editor,” “Pin to canvas” / “Unpin,” and “Delete note.” the pin toggle calls a new toggle_pin method on AppState that flips the meta flag, persists, and publishes an event.


note cards render markdown

NoteCard preview now runs through markdown_to_html() and uses dangerous_inner_html. (XSS? yes) the editor got a preview toggle button that switches between the textarea and rendered HTML.


real embeddings on init

switched from SimpleEmbedder to CandleEmbedder::load() at startup. downloads the real Snowflake model from HuggingFace on first launch. falls back to SimpleEmbedder if the download fails.


position persistence

the layout event loop now debounce-saves positions to storage every 2 seconds. on next launch, saved positions are fed into the layout engine before the first step so notes start where you left them instead of random positions.


other stuff done as well but i’m eepyyyy

making it actually work (kind of)

so remember how last commit I said “everything is broken”? I spent tonight fixing the worst of it. it’s not done but it’s a lot less broken.


the big fix: drag + layout fighting

the core problem was that when you drag a note, the layout engine keeps running and overwriting your drag position every 16ms. the note jitters back and forth between where you’re dragging it and where the physics wants it.

fix: new SetNodePosition event. when you drag a note, the UI publishes it through the event bus. the layout worker drains these events at the top of each cycle and calls engine.set_position() before stepping. so the layout engine knows where you put the card and computes forces from there instead of fighting you.

also: while any note is being dragged (tracked by dragged_set), the layout event loop skips updating positions entirely. other notes stay still instead of jittering from stale force calculations. once you release, everything syncs up again.


mouse events moved to root div

pan/drag/mouseup handlers were on the canvas element, which meant releasing the mouse over a note card didn’t fire mouseup on the canvas. dragging would “stick.” moved all the move/up/leave handlers to the root div so they fire regardless of what’s under the cursor.


canvas renderer simplified

removed the node drawing code from WebCanvasRenderer entirely. nodes were being drawn twice: once on the canvas in Rust/web-sys, once as DOM cards in Dioxus. the canvas now only draws the dot grid and bezier edges. the DOM handles cards. no more double rendering.

the inline JS RAF loop was also replaced with a standalone canvas-draw.js that gets called explicitly via window.__penumbra_draw() whenever render state changes. no more requestAnimationFrame spinning at 60fps when nothing changed.


spring-animated card positions

cards now spring-animate to their new positions instead of teleporting. each AnimatedCard tracks target x/y and animates via dioxus-motion springs when the target changes. so when the layout engine repositions notes, they glide smoothly. the spring from the ideas issue, basically.


sidebar panels are real now

the floating sidebar buttons actually do things now. four panels:

Search: reactive search input that queries the hybrid search engine as you type. results show title, preview, and similarity score. click a result and the camera drifts to that note.

Tags: shows all tags sorted by count. click a tag to filter the graph view to only notes with that tag. click again to clear the filter. below the tag list, shows the filtered notes.

Pins: lists all pinned notes. click to pan to them.

Settings: just shows note count and link count for now. placeholder.


context menu on right-click

right-click a note card and you get a context menu with “Open in editor,” “Pin to canvas” / “Unpin,” and “Delete note.” the pin toggle calls a new toggle_pin method on AppState that flips the meta flag, persists, and publishes an event.


note cards render markdown

NoteCard preview now runs through markdown_to_html() and uses dangerous_inner_html. (XSS? yes) the editor got a preview toggle button that switches between the textarea and rendered HTML.


real embeddings on init

switched from SimpleEmbedder to CandleEmbedder::load() at startup. downloads the real Snowflake model from HuggingFace on first launch. falls back to SimpleEmbedder if the download fails.


position persistence

the layout event loop now debounce-saves positions to storage every 2 seconds. on next launch, saved positions are fed into the layout engine before the first step so notes start where you left them instead of random positions.


other stuff done as well but i’m eepyyyy

Replying to @NellowTCS

0
0
Open comments for this post

3h 24m 54s logged

the Dioxus commit (it’s broken but it exists)

okay so.

it does not work properly. I want to be upfront about that. it compiles, it launches, things appear on screen, but the interaction is janky, the canvas rendering fights with the DOM cards, and the layout worker doesn’t sync positions correctly yet. it’s a WIP commit and I’m committing it anyway because there’s genuinely a lot of infrastructure here and I don’t want to lose it.


three new crates

penumbra-theme: dark and light themes defined as Rust structs. colors, radii, glass blur config. to_css_vars() generates the full CSS custom properties string so the theme can be injected into the DOM at runtime. purple accent because penumbra means shadow and shadows are purple, obviously.

penumbra-canvas: a GraphCanvasRenderer trait with a WebCanvasRenderer that draws to an HTML canvas via web-sys. dot grid background, bezier curve edges between nodes, rounded-rect cards with titles. plus a NullCanvasRenderer for when there’s no canvas available. the RenderState struct holds the camera, nodes, edges, and selection state.

penumbra-thread: cross-platform threading. std::thread on native, wasm_thread on WASM. one #[cfg] block (the only one in the project, and it’s in a platform abstraction crate where it belongs). Worker struct with atomic cancellation flag. spawn_worker() and spawn_detached() helpers.


the app

Build/ui/penumbra-app/ is a full Dioxus desktop app. the component count looks scary but most of it is from the dioxus-components library (badge, button, card, dialog, dropdown menu, separator, sheet, sidebar, skeleton, tabs, tooltip). I styled them but didn’t write them. the custom ones are:

NoteCard: the little frosted-glass card that represents a note on the canvas. title, preview, positioned absolutely in world coordinates.

GraphCards: renders all notes as AnimatedCards inside a CSS-transformed container. each card gets a spring animation on mount via dioxus-motion so it scales in from zero. cards are draggable.

FloatingSidebar: the vertical icon bar on the left. grid, search, pin, tag, settings. just state toggles for now, no panels wired up yet.

TopBar: centered pill bar with the app name and search area.

Fab: “New note” button in the bottom right.

NoteEditor: full-screen editor view with title, body, and tags fields. auto-saves on back.


the bridge

bridge/mod.rs connects the UI to the backend. load_graph() and load_positions() pull from storage. restore_state() inserts everything into the graph and index. create_layout_engine() builds the GPU layout engine with all current nodes. start_layout_worker() spawns a background thread that steps the layout at 60fps, syncs node additions/removals, and publishes position updates through the event bus. sleeps longer when the layout has converged.


the interaction model

pan: mousedown on canvas starts tracking, mousemove applies delta to camera, mouseup stops.

zoom: wheel events on canvas adjust zoom level around the cursor position.

drag: mousedown on a note card captures the offset, mousemove in drag mode updates the note’s position directly.

note creation: click fab -> create empty note in graph -> wait for layout engine to assign position -> camera drifts to the new note (spring animation) -> switch to editor view. this is the flow from the ideas issue. it doesn’t work smoothly yet but the state machine is there.


what’s broken

everything, kind of. the canvas renderer and the DOM cards are two separate rendering paths that don’t coordinate well. the layout worker publishes positions but the signals don’t always pick them up in time. the camera drift animation triggers but sometimes snaps instead of drifting. the editor saves but doesn’t trigger re-embedding. the sidebar buttons toggle state but nothing happens.

it’s a prototype. sigh.

the Dioxus commit (it’s broken but it exists)

okay so.

it does not work properly. I want to be upfront about that. it compiles, it launches, things appear on screen, but the interaction is janky, the canvas rendering fights with the DOM cards, and the layout worker doesn’t sync positions correctly yet. it’s a WIP commit and I’m committing it anyway because there’s genuinely a lot of infrastructure here and I don’t want to lose it.


three new crates

penumbra-theme: dark and light themes defined as Rust structs. colors, radii, glass blur config. to_css_vars() generates the full CSS custom properties string so the theme can be injected into the DOM at runtime. purple accent because penumbra means shadow and shadows are purple, obviously.

penumbra-canvas: a GraphCanvasRenderer trait with a WebCanvasRenderer that draws to an HTML canvas via web-sys. dot grid background, bezier curve edges between nodes, rounded-rect cards with titles. plus a NullCanvasRenderer for when there’s no canvas available. the RenderState struct holds the camera, nodes, edges, and selection state.

penumbra-thread: cross-platform threading. std::thread on native, wasm_thread on WASM. one #[cfg] block (the only one in the project, and it’s in a platform abstraction crate where it belongs). Worker struct with atomic cancellation flag. spawn_worker() and spawn_detached() helpers.


the app

Build/ui/penumbra-app/ is a full Dioxus desktop app. the component count looks scary but most of it is from the dioxus-components library (badge, button, card, dialog, dropdown menu, separator, sheet, sidebar, skeleton, tabs, tooltip). I styled them but didn’t write them. the custom ones are:

NoteCard: the little frosted-glass card that represents a note on the canvas. title, preview, positioned absolutely in world coordinates.

GraphCards: renders all notes as AnimatedCards inside a CSS-transformed container. each card gets a spring animation on mount via dioxus-motion so it scales in from zero. cards are draggable.

FloatingSidebar: the vertical icon bar on the left. grid, search, pin, tag, settings. just state toggles for now, no panels wired up yet.

TopBar: centered pill bar with the app name and search area.

Fab: “New note” button in the bottom right.

NoteEditor: full-screen editor view with title, body, and tags fields. auto-saves on back.


the bridge

bridge/mod.rs connects the UI to the backend. load_graph() and load_positions() pull from storage. restore_state() inserts everything into the graph and index. create_layout_engine() builds the GPU layout engine with all current nodes. start_layout_worker() spawns a background thread that steps the layout at 60fps, syncs node additions/removals, and publishes position updates through the event bus. sleeps longer when the layout has converged.


the interaction model

pan: mousedown on canvas starts tracking, mousemove applies delta to camera, mouseup stops.

zoom: wheel events on canvas adjust zoom level around the cursor position.

drag: mousedown on a note card captures the offset, mousemove in drag mode updates the note’s position directly.

note creation: click fab -> create empty note in graph -> wait for layout engine to assign position -> camera drifts to the new note (spring animation) -> switch to editor view. this is the flow from the ideas issue. it doesn’t work smoothly yet but the state machine is there.


what’s broken

everything, kind of. the canvas renderer and the DOM cards are two separate rendering paths that don’t coordinate well. the layout worker publishes positions but the signals don’t always pick them up in time. the camera drift animation triggers but sometimes snaps instead of drifting. the editor saves but doesn’t trigger re-embedding. the sidebar buttons toggle state but nothing happens.

it’s a prototype. sigh.

Replying to @NellowTCS

0
0
Open comments for this post

40m 29s logged

sync conflict detection + wiremock tests

two changes in one commit: the sync worker now rejects stale pushes, and the Rust sync client has a real test suite.


conflict detection

if the client sends a snapshotId with its push, the worker checks it against the current server snapshot. if they don’t match, someone else pushed in between, and you get a 409 back with the current server snapshot ID and a message to pull first.

the snapshotId used to be passed through from the client or generated fresh. now it’s always generated server-side. the client’s snapshotId is only used for the conflict check, never as the new snapshot. small change, but it means two clients can’t accidentally create the same snapshot ID.


last_sync is now a Mutex

WorkerSyncProvider.last_sync went from Option<DateTime> to Mutex<Option<DateTime>>. it gets updated after both push and pull succeed. this matters because SyncProvider is behind an Arc in the real app, and last_sync needs interior mutability. also added #[serde(rename_all = "camelCase")] on SyncSnapshot so the JSON fields match what the worker actually sends.


wiremock tests

238 lines of new tests using wiremock to mock the HTTP worker. each test spins up a local mock server, registers response expectations, and exercises the WorkerSyncProvider against it. no real network, no real worker, fully deterministic.

covers: connect (200 and 500), push (returns snapshot, forwards snapshot ID, handles 500, handles 409 conflict), pull (returns notes/embeddings/positions, sends since query param), status (parses all fields), last_sync (none initially, updated after push).

the with_rt() helper builds a single-threaded tokio runtime with IO enabled for each test since wiremock needs async + network.


the TODO

“Cloud sync” is checked off. one item left: Dioxus UI.

aaaaaaaaa

sync conflict detection + wiremock tests

two changes in one commit: the sync worker now rejects stale pushes, and the Rust sync client has a real test suite.


conflict detection

if the client sends a snapshotId with its push, the worker checks it against the current server snapshot. if they don’t match, someone else pushed in between, and you get a 409 back with the current server snapshot ID and a message to pull first.

the snapshotId used to be passed through from the client or generated fresh. now it’s always generated server-side. the client’s snapshotId is only used for the conflict check, never as the new snapshot. small change, but it means two clients can’t accidentally create the same snapshot ID.


last_sync is now a Mutex

WorkerSyncProvider.last_sync went from Option<DateTime> to Mutex<Option<DateTime>>. it gets updated after both push and pull succeed. this matters because SyncProvider is behind an Arc in the real app, and last_sync needs interior mutability. also added #[serde(rename_all = "camelCase")] on SyncSnapshot so the JSON fields match what the worker actually sends.


wiremock tests

238 lines of new tests using wiremock to mock the HTTP worker. each test spins up a local mock server, registers response expectations, and exercises the WorkerSyncProvider against it. no real network, no real worker, fully deterministic.

covers: connect (200 and 500), push (returns snapshot, forwards snapshot ID, handles 500, handles 409 conflict), pull (returns notes/embeddings/positions, sends since query param), status (parses all fields), last_sync (none initially, updated after push).

the with_rt() helper builds a single-threaded tokio runtime with IO enabled for each test since wiremock needs async + network.


the TODO

“Cloud sync” is checked off. one item left: Dioxus UI.

aaaaaaaaa

Replying to @NellowTCS

0
2
Open comments for this post

55m 41s logged

making Candle actually WASM-compatible

so remember the “no #[cfg] in cross-cutting interfaces” rule? the CandleEmbedder was breaking it. not with #[cfg] exactly, but with hf-hub, which uses filesystem APIs (dirs, mmap) that don’t exist in WASM. the model loading was desktop-only and I was pretending that was fine.

it was not fine. I fixed it.


hf-hub is gone

replaced with reqwest. instead of hf-hub’s sync filesystem API that downloads to a local cache directory, the embedder now does raw HTTP GETs to huggingface.co/{model}/resolve/main/{file}. three downloads: model.safetensors, config.json, tokenizer.json. the bytes stay in memory, no filesystem touch.

reqwest is platform-conditional in Cargo.toml: rustls on native (no OpenSSL dependency), bare defaults on wasm32 (uses browser fetch under the hood). plus getrandom with the wasm_js feature so random number generation works in WASM.


two feature flags instead of one

candle now just gives you the model and tokenizer types. no download capability, no reqwest. you construct a CandleEmbedder from bytes you already have.

candle-load adds reqwest and the CandleEmbedder::load() method that downloads from HuggingFace. this is the one that pulls in the network stack.

the split matters because the WASM build might want to load model bytes from OPFS or a bundled asset instead of downloading every time. the download path is opt-in.


mmap is gone too

from_mmaped_safetensors (with its unsafe block) became from_buffered_safetensors. loads from a byte vec instead of memory-mapping a file path. works everywhere. no unsafe. the vocab_size is now read from config.json instead of hardcoded to 30522.

Tokenizer::from_file became Tokenizer::from_bytes. same pattern.


error handling cleanup

every candle operation was using ? directly, which only works if PenumbraError implements From<candle_core::Error>. it doesn’t, and it shouldn’t, because candle errors are an implementation detail. added an e_msg helper that wraps any Display into PenumbraError::Embedding, and switched every candle call to .map_err(e_msg)?. verbose but correct.

ArcticEmbedXS::new and forward now return candle’s own error type instead of PenumbraError. the boundary between “candle stuff” and “penumbra stuff” is cleaner. the CandleEmbedder wrapper handles the translation.


tests

New candle tests behind #[cfg(feature = "candle")]. the trick: building a synthetic safetensors file and a minimal WordLevel tokenizer entirely in memory. no model download needed for the test suite.

test_safetensors() generates fake embedding weights and encoder weights with deterministic values, writes the safetensors header manually (much better than pulling it in) (length prefix + JSON metadata + raw f32 bytes), and hands it to VarBuilder. test_tokenizer_bytes() builds a 10-word vocabulary with a Whitespace pre-tokenizer.

tests cover: forward pass output shape, L2 normalization, non-zero output, embedder dimensions, embed_text roundtrip. plus one candle-load gated test that actually downloads the real model from HuggingFace (only runs when you explicitly pass --features candle-load).


the “no #[cfg] in cross-cutting interfaces” rule now holds for real. the entire embed pipeline compiles on wasm32 without conditional compilation. reqwest handles the platform difference internally. candle handles the compute. the embedder trait doesn’t know or care.


Quick note, sorry about there mostly being terminal or VSCode images lol, there’s no UI to demo rn…

making Candle actually WASM-compatible

so remember the “no #[cfg] in cross-cutting interfaces” rule? the CandleEmbedder was breaking it. not with #[cfg] exactly, but with hf-hub, which uses filesystem APIs (dirs, mmap) that don’t exist in WASM. the model loading was desktop-only and I was pretending that was fine.

it was not fine. I fixed it.


hf-hub is gone

replaced with reqwest. instead of hf-hub’s sync filesystem API that downloads to a local cache directory, the embedder now does raw HTTP GETs to huggingface.co/{model}/resolve/main/{file}. three downloads: model.safetensors, config.json, tokenizer.json. the bytes stay in memory, no filesystem touch.

reqwest is platform-conditional in Cargo.toml: rustls on native (no OpenSSL dependency), bare defaults on wasm32 (uses browser fetch under the hood). plus getrandom with the wasm_js feature so random number generation works in WASM.


two feature flags instead of one

candle now just gives you the model and tokenizer types. no download capability, no reqwest. you construct a CandleEmbedder from bytes you already have.

candle-load adds reqwest and the CandleEmbedder::load() method that downloads from HuggingFace. this is the one that pulls in the network stack.

the split matters because the WASM build might want to load model bytes from OPFS or a bundled asset instead of downloading every time. the download path is opt-in.


mmap is gone too

from_mmaped_safetensors (with its unsafe block) became from_buffered_safetensors. loads from a byte vec instead of memory-mapping a file path. works everywhere. no unsafe. the vocab_size is now read from config.json instead of hardcoded to 30522.

Tokenizer::from_file became Tokenizer::from_bytes. same pattern.


error handling cleanup

every candle operation was using ? directly, which only works if PenumbraError implements From<candle_core::Error>. it doesn’t, and it shouldn’t, because candle errors are an implementation detail. added an e_msg helper that wraps any Display into PenumbraError::Embedding, and switched every candle call to .map_err(e_msg)?. verbose but correct.

ArcticEmbedXS::new and forward now return candle’s own error type instead of PenumbraError. the boundary between “candle stuff” and “penumbra stuff” is cleaner. the CandleEmbedder wrapper handles the translation.


tests

New candle tests behind #[cfg(feature = "candle")]. the trick: building a synthetic safetensors file and a minimal WordLevel tokenizer entirely in memory. no model download needed for the test suite.

test_safetensors() generates fake embedding weights and encoder weights with deterministic values, writes the safetensors header manually (much better than pulling it in) (length prefix + JSON metadata + raw f32 bytes), and hands it to VarBuilder. test_tokenizer_bytes() builds a 10-word vocabulary with a Whitespace pre-tokenizer.

tests cover: forward pass output shape, L2 normalization, non-zero output, embedder dimensions, embed_text roundtrip. plus one candle-load gated test that actually downloads the real model from HuggingFace (only runs when you explicitly pass --features candle-load).


the “no #[cfg] in cross-cutting interfaces” rule now holds for real. the entire embed pipeline compiles on wasm32 without conditional compilation. reqwest handles the platform difference internally. candle handles the compute. the embedder trait doesn’t know or care.


Quick note, sorry about there mostly being terminal or VSCode images lol, there’s no UI to demo rn…

Replying to @NellowTCS

0
1
Open comments for this post

55m 56s logged

the sync backend exists now

so the TODO said “Cloud sync” and I was like “how hard can it be”

it was not hard actually. which is suspicious. but it works.

new directory: Backend/sync-worker/. it’s a Cloudflare Worker backed by R2 for storage. the whole sync API is three JS files and a wrangler config.


the API

  • POST /sync/push - batch upload notes, embeddings, and positions
  • POST /sync/pull - batch download everything (or since a snapshot)
  • GET /sync/status - storage stats (note count, bytes used, 512MB limit)
  • POST /sync/clear - nuke everything
  • plus individual CRUD routes for single notes, embeddings, and positions

notes are JSON. embeddings are raw f32 bytes (384 * 4 = 1,536 bytes per note). positions are JSON. everything lives in R2 under a clean key scheme: notes/{id}.json, embeddings/{id}.bin, positions/{id}.json.


the storage layer

storage.js is the R2 abstraction. getJson, putJson, getBinary, putBinary, del, listKeys. plus a manifest system that tracks note count, last modified time, and the current snapshot ID. snapshots are versioned so the client can eventually do “pull only what changed since X” (right now it just dumps everything, but the plumbing is there).


push and pull

push accepts a JSON body with notes, embeddings, and positions objects. writes them all to R2, updates the manifest, creates a snapshot record, returns the snapshot ID.

pull lists all note keys, fetches each note plus its embedding and position, and returns the whole bundle. embeddings get round-tripped through Float32Array so the binary format is preserved correctly.

CORS is wide open (*) because this is a personal sync worker, not a public API. the client will be the Penumbra app running in the browser or on desktop.
I’ll probably change this tho


the generated types file

wrangler generated a 14,000-line TypeScript declaration file for the Cloudflare runtime. I committed it because it’s how wrangler works and I don’t want to fight it. it defines the R2Bucket binding and every other Workers API type. it’s big. it’s fine.


this is the first non-Rust code in the repo. feels weird. but the sync layer is inherently a server-side thing and Cloudflare Workers with R2 is genuinely the simplest way to get object storage with an HTTP API. no server to manage, no database to provision, just a bucket and some routing.


Also, had to kick out usearch since it wasn’t WASM compatible now everything but Candle is WASM compatible!

the sync backend exists now

so the TODO said “Cloud sync” and I was like “how hard can it be”

it was not hard actually. which is suspicious. but it works.

new directory: Backend/sync-worker/. it’s a Cloudflare Worker backed by R2 for storage. the whole sync API is three JS files and a wrangler config.


the API

  • POST /sync/push - batch upload notes, embeddings, and positions
  • POST /sync/pull - batch download everything (or since a snapshot)
  • GET /sync/status - storage stats (note count, bytes used, 512MB limit)
  • POST /sync/clear - nuke everything
  • plus individual CRUD routes for single notes, embeddings, and positions

notes are JSON. embeddings are raw f32 bytes (384 * 4 = 1,536 bytes per note). positions are JSON. everything lives in R2 under a clean key scheme: notes/{id}.json, embeddings/{id}.bin, positions/{id}.json.


the storage layer

storage.js is the R2 abstraction. getJson, putJson, getBinary, putBinary, del, listKeys. plus a manifest system that tracks note count, last modified time, and the current snapshot ID. snapshots are versioned so the client can eventually do “pull only what changed since X” (right now it just dumps everything, but the plumbing is there).


push and pull

push accepts a JSON body with notes, embeddings, and positions objects. writes them all to R2, updates the manifest, creates a snapshot record, returns the snapshot ID.

pull lists all note keys, fetches each note plus its embedding and position, and returns the whole bundle. embeddings get round-tripped through Float32Array so the binary format is preserved correctly.

CORS is wide open (*) because this is a personal sync worker, not a public API. the client will be the Penumbra app running in the browser or on desktop.
I’ll probably change this tho


the generated types file

wrangler generated a 14,000-line TypeScript declaration file for the Cloudflare runtime. I committed it because it’s how wrangler works and I don’t want to fight it. it defines the R2Bucket binding and every other Workers API type. it’s big. it’s fine.


this is the first non-Rust code in the repo. feels weird. but the sync layer is inherently a server-side thing and Cloudflare Workers with R2 is genuinely the simplest way to get object storage with an HTTP API. no server to manage, no database to provision, just a bucket and some routing.


Also, had to kick out usearch since it wasn’t WASM compatible now everything but Candle is WASM compatible!

Replying to @NellowTCS

0
1
Open comments for this post

21m 13s logged

auto-linker + TODO overhaul

new crate: penumbra-auto-link. 158 lines. this is the thing that makes the canvas feel fun.

the pipeline is simple: embed the note, search the vector index for neighbours, create implicit links for anything above the similarity threshold. process_note() takes a Note, embeds its text via the EmbeddingProvider, inserts the embedding into the index (so it’s immediately discoverable by future saves), searches for the top-k closest neighbours, filters by score, checks for duplicates, and creates implicit links in the graph. explicit links are never touched. I don’t mess with your links. (why does that sound funny lol)

configurable: top_k (default 10), min_score (default 0.75), max_links per pass (default 5). the AutoLinker holds Arc references to the embedder, index, graph, and event bus. every new link fires a LinkAdded event so the UI can animate the card drifting to its new neighbours.

the duplicate check matters. between the search returning results and the link being created, another thread could have already linked the same pair. so it locks the graph, checks if the link exists, and only creates it if it doesn’t. two process_note() calls on the same note produce links on the first call and an empty vec on the second.

Some tests in auto_link_matrix.rs: creates implicit links, skips self-links, obeys score threshold, respects max_links cap, handles empty index (no candidates), no duplicate links on re-process, empty body doesn’t panic, top_k=0 returns empty.


also cleaned up some stray em-dashes and arrow symbols in comments across the codebase. vscode keeps auto-correcting -- to em-dashes and -> to the arrow symbol and I keep not noticing until I read the diff. idk if it’s a bug or a feature but it’s annoying.


the TODO got a real overhaul. the old flat checklist is gone. everything that’s done is removed. what’s left is organized into sections: Dioxus UI plans (spring animations, camera lerp, pinned stars, cached positions), WASM worker threading, storage and sync (Google Drive, GitHub repo, offline mode, merge strategy), and future stuff (encrypted vault). it reads like a roadmap now instead of a grocery list.

eleven crates. too much? no.
modularity is never “too much,” rather it’s better to be more modular than less. (unless you have something insane like a file per function)

auto-linker + TODO overhaul

new crate: penumbra-auto-link. 158 lines. this is the thing that makes the canvas feel fun.

the pipeline is simple: embed the note, search the vector index for neighbours, create implicit links for anything above the similarity threshold. process_note() takes a Note, embeds its text via the EmbeddingProvider, inserts the embedding into the index (so it’s immediately discoverable by future saves), searches for the top-k closest neighbours, filters by score, checks for duplicates, and creates implicit links in the graph. explicit links are never touched. I don’t mess with your links. (why does that sound funny lol)

configurable: top_k (default 10), min_score (default 0.75), max_links per pass (default 5). the AutoLinker holds Arc references to the embedder, index, graph, and event bus. every new link fires a LinkAdded event so the UI can animate the card drifting to its new neighbours.

the duplicate check matters. between the search returning results and the link being created, another thread could have already linked the same pair. so it locks the graph, checks if the link exists, and only creates it if it doesn’t. two process_note() calls on the same note produce links on the first call and an empty vec on the second.

Some tests in auto_link_matrix.rs: creates implicit links, skips self-links, obeys score threshold, respects max_links cap, handles empty index (no candidates), no duplicate links on re-process, empty body doesn’t panic, top_k=0 returns empty.


also cleaned up some stray em-dashes and arrow symbols in comments across the codebase. vscode keeps auto-correcting -- to em-dashes and -> to the arrow symbol and I keep not noticing until I read the diff. idk if it’s a bug or a feature but it’s annoying.


the TODO got a real overhaul. the old flat checklist is gone. everything that’s done is removed. what’s left is organized into sections: Dioxus UI plans (spring animations, camera lerp, pinned stars, cached positions), WASM worker threading, storage and sync (Google Drive, GitHub repo, offline mode, merge strategy), and future stuff (encrypted vault). it reads like a roadmap now instead of a grocery list.

eleven crates. too much? no.
modularity is never “too much,” rather it’s better to be more modular than less. (unless you have something insane like a file per function)

Replying to @NellowTCS

0
2
Open comments for this post

21m 59s logged

GPU layout engine + collision avoidance

ripped out the entire hand-rolled ForceAtlas2 + Barnes-Hut implementation and replaced it with vibe-graph-layout-gpu. (OMG i do not like that name, you could have called it anything. LITERALLY ANYTHING else)

195 lines of quadtree code: deleted. RIP. the custom ForceAccumulator, NodeState, adaptive speed calculation: all gone. replaced by a wgpu-backed force-directed layout that runs on the GPU. (no it’s not that intensive, only if you have like a million notes, and then too, we be smart and only calculate the stuff for notes you can see (animation and gamedev tip #1) rather than everything!

the engine now wraps GpuLayout with lazy initialization. the GPU doesn’t spin up until the first step() call, so construction stays synchronous. if the graph changes (node added, removed, links updated), it sets a dirty flag and re-inits on the next step. pinned nodes get their positions restored after each GPU step since the GPU doesn’t know about pinning.

step_neighborhood() is now just step(). with GPU acceleration the full graph is always computed, so neighborhood-only updates don’t make sense anymore. the test got rewritten to match: step_moves_unpinned_nodes instead of the old neighborhood-specific assertions.


collision avoidance

the GPU handles forces and attraction but it doesn’t know about card sizes. nodes are points to it. so after the GPU step runs, there’s now a CPU-side collision resolution pass that pushes apart overlapping bounding boxes.

each node can have a Bounds (width, height) set by the UI. after forces are applied and pinned nodes are restored, resolve_collisions() runs up to 5 iterative passes. each pass checks every overlapping pair, computes a center-to-center separation vector, and pushes both nodes apart by half the overlap distance (clamped to 20px per pass). pinned nodes never move from collision push.

the fallback for coincident centers is fun: when two nodes land on the exact same pixel, there’s no center-to-center vector to push along. so it derives a deterministic angle from the node indices using big primes as hash seeds. deterministic because the same pair should always separate in the same direction. not random because randomness in layout makes things jitter.

three new tests: overlapping nodes get separated, nodes without bounds are skipped (no crash), pinned nodes stay put during collision resolution.

the Cargo.lock diff (image) is… yeah. wgpu pulls in the entire graphics stack. metal, vulkan (ash), naga (oooh fun fact! this is the Hindi name for a bunch of semi-divine half-snake beings!), glow, spirv (pov you try to type spin but absolutely FAIL). the dependency tree roughly tripled. but the layout runs on the GPU now so I’m not complaining.

also deleted the test_events.rs example from penumbra-markdown that I forgot to clean up last commit. oops.

and also made the LayoutEngine WASM-compatible by making it async. WHOOPS

GPU layout engine + collision avoidance

ripped out the entire hand-rolled ForceAtlas2 + Barnes-Hut implementation and replaced it with vibe-graph-layout-gpu. (OMG i do not like that name, you could have called it anything. LITERALLY ANYTHING else)

195 lines of quadtree code: deleted. RIP. the custom ForceAccumulator, NodeState, adaptive speed calculation: all gone. replaced by a wgpu-backed force-directed layout that runs on the GPU. (no it’s not that intensive, only if you have like a million notes, and then too, we be smart and only calculate the stuff for notes you can see (animation and gamedev tip #1) rather than everything!

the engine now wraps GpuLayout with lazy initialization. the GPU doesn’t spin up until the first step() call, so construction stays synchronous. if the graph changes (node added, removed, links updated), it sets a dirty flag and re-inits on the next step. pinned nodes get their positions restored after each GPU step since the GPU doesn’t know about pinning.

step_neighborhood() is now just step(). with GPU acceleration the full graph is always computed, so neighborhood-only updates don’t make sense anymore. the test got rewritten to match: step_moves_unpinned_nodes instead of the old neighborhood-specific assertions.


collision avoidance

the GPU handles forces and attraction but it doesn’t know about card sizes. nodes are points to it. so after the GPU step runs, there’s now a CPU-side collision resolution pass that pushes apart overlapping bounding boxes.

each node can have a Bounds (width, height) set by the UI. after forces are applied and pinned nodes are restored, resolve_collisions() runs up to 5 iterative passes. each pass checks every overlapping pair, computes a center-to-center separation vector, and pushes both nodes apart by half the overlap distance (clamped to 20px per pass). pinned nodes never move from collision push.

the fallback for coincident centers is fun: when two nodes land on the exact same pixel, there’s no center-to-center vector to push along. so it derives a deterministic angle from the node indices using big primes as hash seeds. deterministic because the same pair should always separate in the same direction. not random because randomness in layout makes things jitter.

three new tests: overlapping nodes get separated, nodes without bounds are skipped (no crash), pinned nodes stay put during collision resolution.

the Cargo.lock diff (image) is… yeah. wgpu pulls in the entire graphics stack. metal, vulkan (ash), naga (oooh fun fact! this is the Hindi name for a bunch of semi-divine half-snake beings!), glow, spirv (pov you try to type spin but absolutely FAIL). the dependency tree roughly tripled. but the layout runs on the GPU now so I’m not complaining.

also deleted the test_events.rs example from penumbra-markdown that I forgot to clean up last commit. oops.

and also made the LayoutEngine WASM-compatible by making it async. WHOOPS

Replying to @NellowTCS

0
3
Open comments for this post

20m 14s logged

penumbra-markdown: the parser arc

so I was supposed to work on other stuff today.

I did not work on other stuff today.

instead I wrote an entire markdown pipeline from scratch. parser, AST, HTML renderer, plain text extractor, streaming layer for live editing, and 639 lines of tests. because apparently “notes app needs notes parsing” was more urgent than “notes app needs to save notes.”

No I did not do this in 15 minutes. I spent much more time on it. I’m just, like I said, one commit behind (no longer!!!)


the custom syntax

the whole point of this crate is two custom inline types that make Penumbra’s markdown different from regular markdown:

NoteEmbed: [[some-note-id]]. double-bracket references to other notes on the canvas. the parser tracks bracket depth so nested brackets don’t break things.

TagRef: #tag-name. inline tags. the parser distinguishes #tag from # heading by checking if there’s a space after the hash and whether the preceding character is a word boundary.

these get parsed as structured AST nodes, not raw text. when the HTML renderer hits them, they become custom elements: <pe-embed data-ref="id"> and <pe-tag data-name="name">. the Dioxus UI will intercept these and make them interactive (click to pan, click to filter).

I know they’re technically also Obsidian things, but smh my head, I did it too


the parser

612 lines wrapping pulldown-cmark. every text chunk goes through split_text_for_custom() for embed/tag extraction before hitting the inline stack. uses a frame stack with saved positions for nesting (strong, emphasis, links, etc). tables required a state machine. tight list items needed special flush handling. the usual “markdown is simple until it isn’t” experience.

The image shows my excellent idea for inline stuff, as remember how println!() works with vars? Okay, no this is a pretty common thing now that I think about it… sigh. It’s the thought that counts


streaming

wraps mdstream for live editing. instead of re-parsing the whole document on every keystroke, it tracks which blocks changed and only re-parses those. append(), finalize(), snapshot(), reset(). the common case (typing at the end) is fast.


the quadtree fix

also fixed a subtle bug in Barnes-Hut. the force calculation was clamping distance AFTER the sqrt but using unclamped dist_sq in the same formula. moved the .max(1.0) to dist_sq so both values stay consistent. the old version could produce weirdly large forces for very close nodes.

Yeah this is why i’m ripping it out, and replacing it


tests and sanitization

boring stuff, just writing the same thing with some tiny changes a lot of times


ten crates now. TEN CRATES??!?!!?

penumbra-markdown: the parser arc

so I was supposed to work on other stuff today.

I did not work on other stuff today.

instead I wrote an entire markdown pipeline from scratch. parser, AST, HTML renderer, plain text extractor, streaming layer for live editing, and 639 lines of tests. because apparently “notes app needs notes parsing” was more urgent than “notes app needs to save notes.”

No I did not do this in 15 minutes. I spent much more time on it. I’m just, like I said, one commit behind (no longer!!!)


the custom syntax

the whole point of this crate is two custom inline types that make Penumbra’s markdown different from regular markdown:

NoteEmbed: [[some-note-id]]. double-bracket references to other notes on the canvas. the parser tracks bracket depth so nested brackets don’t break things.

TagRef: #tag-name. inline tags. the parser distinguishes #tag from # heading by checking if there’s a space after the hash and whether the preceding character is a word boundary.

these get parsed as structured AST nodes, not raw text. when the HTML renderer hits them, they become custom elements: <pe-embed data-ref="id"> and <pe-tag data-name="name">. the Dioxus UI will intercept these and make them interactive (click to pan, click to filter).

I know they’re technically also Obsidian things, but smh my head, I did it too


the parser

612 lines wrapping pulldown-cmark. every text chunk goes through split_text_for_custom() for embed/tag extraction before hitting the inline stack. uses a frame stack with saved positions for nesting (strong, emphasis, links, etc). tables required a state machine. tight list items needed special flush handling. the usual “markdown is simple until it isn’t” experience.

The image shows my excellent idea for inline stuff, as remember how println!() works with vars? Okay, no this is a pretty common thing now that I think about it… sigh. It’s the thought that counts


streaming

wraps mdstream for live editing. instead of re-parsing the whole document on every keystroke, it tracks which blocks changed and only re-parses those. append(), finalize(), snapshot(), reset(). the common case (typing at the end) is fast.


the quadtree fix

also fixed a subtle bug in Barnes-Hut. the force calculation was clamping distance AFTER the sqrt but using unclamped dist_sq in the same formula. moved the .max(1.0) to dist_sq so both values stay consistent. the old version could produce weirdly large forces for very close nodes.

Yeah this is why i’m ripping it out, and replacing it


tests and sanitization

boring stuff, just writing the same thing with some tiny changes a lot of times


ten crates now. TEN CRATES??!?!!?

Replying to @NellowTCS

0
1
Open comments for this post

1h 10m 17s logged

Oh wow I locked in

so remember yesterday when I said “today was all architecture, all decisions, all research”?

yeah.

today I wrote the architecture. all of it. 27 files. 3,795 lines added (1,900 ish is just Cargo.lock btw). one commit.

I sat down to “maybe scaffold the embed crate” and then I just… didn’t stop. the hyperfixation hit and I looked up and every single crate had real code in it. not stubs. not //! penumbra-whatever. actual implementations with actual tests.

let me try to explain what happened.


three embedders walked into a crate

(no the title is NOT a punchline to a joke, however much it sounds like one)

penumbra-embed went from a one-line comment to three full embedding providers:

CandleEmbedder is the real one. it loads Snowflake-Arctic-Embed-XS from HuggingFace Hub, tokenizes input, runs it through a BERT-style forward pass (word embeddings, mean pooling, linear encoder), and L2-normalizes the output into a 384-dim vector. the model weights come from safetensors via candle_nn::VarBuilder::from_mmaped_safetensors. the whole Candle pipeline is behind a feature flag so you don’t pull in the ML universe if you don’t need it.

SimpleEmbedder is the clever one. it slides character trigrams across the input, hashes each one, and accumulates them into buckets across a fixed-dimension vector. then L2-normalizes. it’s deterministic, fast, needs zero ML deps, and produces embeddings where similar text actually lands in similar regions of the vector space. not semantically meaningful in the way a transformer is, but way better than random for development and testing. (You know? Well actually probably you don’t know, most people didn’t spend all of yesterday doing a bunch of NLP and ML research lollll)

NullEmbedder is the lazy one. returns a zero vector. exists for when you need the trait satisfied but don’t care about the output. sometimes you just need a stub and that’s fine. (aka a lot of tests, DO NOT USE THIS anywhere else… or maybeee hehe /j)

all three implement EmbeddingProvider from core. swap them at construction time. the rest of the system doesn’t know or care which one is running.
(i love doing that pattern. it’s just the best. Thanks, Saikuro)


the vector index

penumbra-index wraps USearch with a clean VectorIndex trait: insert, remove, search, len, is_empty. the USearchIndex backend handles all the HNSW configuration. SearchHit carries a NoteId and a score.

the trait exists so I can mock the index in tests without spinning up the real HNSW structure every time. (learned that lesson from Honzo’s test suite.)


the graph got a cleanup

penumbra-graph didn’t get new features but it got a cargo fmt.

NoteId lost its redundant to_string() method because… it already derives Display. why did I write that. past me, explain yourself. what were you planning.


core got an Index error variant

PenumbraError::Index(String). three lines. because the index crate needed its own error type and I wasn’t going to abuse Embedding for it.


the stuff I did but still isn’t fully done

the layout engine with Barnes-Hut quadtree (i tried a simpler manual implementation, but it’s icky, i’m switching to a different crate), the hybrid search engine (i should probably switch this to a usearch method or different crate idk), and the storage layer (c’est mostly bon) all got real implementations in this commit too. plus a full test suite: core_matrix.rs, embed_matrix.rs, events_matrix.rs, graph_matrix.rs, index_matrix.rs, search_matrix.rs, storage_matrix.rs. seven test files. the naming convention carries over from Honzo because it works and I’m not fixing what isn’t broken.


You might notice that I’m running one commit behind lol.
Welllll I am, yeah
Ooops
Yeahhhh devlogging after every commit is the best way to do it, but sometimes you just wanna code

Oh wow I locked in

so remember yesterday when I said “today was all architecture, all decisions, all research”?

yeah.

today I wrote the architecture. all of it. 27 files. 3,795 lines added (1,900 ish is just Cargo.lock btw). one commit.

I sat down to “maybe scaffold the embed crate” and then I just… didn’t stop. the hyperfixation hit and I looked up and every single crate had real code in it. not stubs. not //! penumbra-whatever. actual implementations with actual tests.

let me try to explain what happened.


three embedders walked into a crate

(no the title is NOT a punchline to a joke, however much it sounds like one)

penumbra-embed went from a one-line comment to three full embedding providers:

CandleEmbedder is the real one. it loads Snowflake-Arctic-Embed-XS from HuggingFace Hub, tokenizes input, runs it through a BERT-style forward pass (word embeddings, mean pooling, linear encoder), and L2-normalizes the output into a 384-dim vector. the model weights come from safetensors via candle_nn::VarBuilder::from_mmaped_safetensors. the whole Candle pipeline is behind a feature flag so you don’t pull in the ML universe if you don’t need it.

SimpleEmbedder is the clever one. it slides character trigrams across the input, hashes each one, and accumulates them into buckets across a fixed-dimension vector. then L2-normalizes. it’s deterministic, fast, needs zero ML deps, and produces embeddings where similar text actually lands in similar regions of the vector space. not semantically meaningful in the way a transformer is, but way better than random for development and testing. (You know? Well actually probably you don’t know, most people didn’t spend all of yesterday doing a bunch of NLP and ML research lollll)

NullEmbedder is the lazy one. returns a zero vector. exists for when you need the trait satisfied but don’t care about the output. sometimes you just need a stub and that’s fine. (aka a lot of tests, DO NOT USE THIS anywhere else… or maybeee hehe /j)

all three implement EmbeddingProvider from core. swap them at construction time. the rest of the system doesn’t know or care which one is running.
(i love doing that pattern. it’s just the best. Thanks, Saikuro)


the vector index

penumbra-index wraps USearch with a clean VectorIndex trait: insert, remove, search, len, is_empty. the USearchIndex backend handles all the HNSW configuration. SearchHit carries a NoteId and a score.

the trait exists so I can mock the index in tests without spinning up the real HNSW structure every time. (learned that lesson from Honzo’s test suite.)


the graph got a cleanup

penumbra-graph didn’t get new features but it got a cargo fmt.

NoteId lost its redundant to_string() method because… it already derives Display. why did I write that. past me, explain yourself. what were you planning.


core got an Index error variant

PenumbraError::Index(String). three lines. because the index crate needed its own error type and I wasn’t going to abuse Embedding for it.


the stuff I did but still isn’t fully done

the layout engine with Barnes-Hut quadtree (i tried a simpler manual implementation, but it’s icky, i’m switching to a different crate), the hybrid search engine (i should probably switch this to a usearch method or different crate idk), and the storage layer (c’est mostly bon) all got real implementations in this commit too. plus a full test suite: core_matrix.rs, embed_matrix.rs, events_matrix.rs, graph_matrix.rs, index_matrix.rs, search_matrix.rs, storage_matrix.rs. seven test files. the naming convention carries over from Honzo because it works and I’m not fixing what isn’t broken.


You might notice that I’m running one commit behind lol.
Welllll I am, yeah
Ooops
Yeahhhh devlogging after every commit is the best way to do it, but sometimes you just wanna code

Replying to @NellowTCS

0
2
Open comments for this post

47m 50s logged

The Architecture Day (aka 0.4h on Hackatime but like 10h in reality)

So uh.

Hi.

New project.

Today was one of those days where I didn’t really code. Like, Hackatime says 0.4 hours. ZERO POINT FOUR. That’s nothing.

But here’s the thing: I spent basically the entire day thinking. Researching. Drawing arrows between boxes in my head. Googling “is candle-core wasm compat” and “best wasm compatible tokio alternative, smol or futures” and going down rabbit holes about OPFS and #[cfg] gates and whether petgraph is pure Rust (it is, thank god).

And I think I came out the other side with something actually good?


the architecture

so Penumbra is a spatial notes app. notes live on a canvas. related notes pull toward each other. unrelated notes drift apart. the whole thing is powered by embeddings and force-directed layout and it should feel like a living map of your thoughts.

and the ENTIRE thing needs to work on web AND desktop without platform #[cfg] spaghetti. that’s the rule. no #[cfg(target_arch = "wasm32")] in any cross-cutting interface. if I have to write #[cfg] I’ve already lost.

here’s what I landed on:

penumbra-core        # domain types + traits. zero deps beyond serde/uuid/thiserror
penumbra-events      # async-channel event bus
penumbra-graph       # petgraph-based graph store
penumbra-layout      # ForceAtlas2 + Barnes-Hut
penumbra-embed       # Candle embedding provider
penumbra-index       # USearch vector index
penumbra-search      # hybrid search (vector + text + tags + temporal)
penumbra-storage     # OPFS-backed persistence
penumbra-sync        # cloud sync abstraction

nine crates. for a notes app. am I overengineering this? maybe. do I care? absolutely not. Saikuro taught me that clean crate boundaries make everything easier later. I’d rather have nine small crates with clear jobs than one megacrate that does everything and hates me. (learned never to entangle your code from HTMLPlayer v2 (yeah that still exists, but i kinda need to press the big Archive this Repo button, I’m making a better new one)


the dependency flow

this is the part I’m actually proud of:

everything depends on core. core depends on nothing interesting. the graph feeds into layout. embed feeds into index feeds into search. storage and sync are siblings. events is the bridge to the UI.

no cycles. no tangled imports. one-directional flow. I drew this like five times on paper before I was happy with it. (I should really learn to use Mermaid for everything but my paper and pen are faster for brainstorming, sue me)


the WASM thing

okay so this was the big research rabbit hole. the whole point of Penumbra is true cross-platform. not “it works on desktop and we have a janky web version.” ACTUAL cross-platform. same code. same behavior. same everything.

the trick is picking libraries that just… work on both targets without needing platform gates:

  • serde/serde_json: universal. obviously.
  • uuid with the js feature: uses js-sys getRandomValues on wasm32. no #[cfg].
  • chrono with wasmbind: uses js-sys Date on wasm32. no #[cfg].
  • futures + async-trait: universal. no runtime dependency.
  • petgraph: pure Rust. no system deps. just works.
  • opfs: this crate is kind of magic. tokio::fs on native, browser OPFS on wasm. one API. the whole reason I don’t need a StorageProvider trait.
  • candle: wasm32 with SIMD backend. Snowflake-Arctic-Embed-XS quantized to GGUF.
  • usearch: wasm32 compatible. HNSW index with add/search/remove.
  • async-channel: universal. no runtime coupling.

I am unreasonably happy about this list.

The Architecture Day (aka 0.4h on Hackatime but like 10h in reality)

So uh.

Hi.

New project.

Today was one of those days where I didn’t really code. Like, Hackatime says 0.4 hours. ZERO POINT FOUR. That’s nothing.

But here’s the thing: I spent basically the entire day thinking. Researching. Drawing arrows between boxes in my head. Googling “is candle-core wasm compat” and “best wasm compatible tokio alternative, smol or futures” and going down rabbit holes about OPFS and #[cfg] gates and whether petgraph is pure Rust (it is, thank god).

And I think I came out the other side with something actually good?


the architecture

so Penumbra is a spatial notes app. notes live on a canvas. related notes pull toward each other. unrelated notes drift apart. the whole thing is powered by embeddings and force-directed layout and it should feel like a living map of your thoughts.

and the ENTIRE thing needs to work on web AND desktop without platform #[cfg] spaghetti. that’s the rule. no #[cfg(target_arch = "wasm32")] in any cross-cutting interface. if I have to write #[cfg] I’ve already lost.

here’s what I landed on:

penumbra-core        # domain types + traits. zero deps beyond serde/uuid/thiserror
penumbra-events      # async-channel event bus
penumbra-graph       # petgraph-based graph store
penumbra-layout      # ForceAtlas2 + Barnes-Hut
penumbra-embed       # Candle embedding provider
penumbra-index       # USearch vector index
penumbra-search      # hybrid search (vector + text + tags + temporal)
penumbra-storage     # OPFS-backed persistence
penumbra-sync        # cloud sync abstraction

nine crates. for a notes app. am I overengineering this? maybe. do I care? absolutely not. Saikuro taught me that clean crate boundaries make everything easier later. I’d rather have nine small crates with clear jobs than one megacrate that does everything and hates me. (learned never to entangle your code from HTMLPlayer v2 (yeah that still exists, but i kinda need to press the big Archive this Repo button, I’m making a better new one)


the dependency flow

this is the part I’m actually proud of:

everything depends on core. core depends on nothing interesting. the graph feeds into layout. embed feeds into index feeds into search. storage and sync are siblings. events is the bridge to the UI.

no cycles. no tangled imports. one-directional flow. I drew this like five times on paper before I was happy with it. (I should really learn to use Mermaid for everything but my paper and pen are faster for brainstorming, sue me)


the WASM thing

okay so this was the big research rabbit hole. the whole point of Penumbra is true cross-platform. not “it works on desktop and we have a janky web version.” ACTUAL cross-platform. same code. same behavior. same everything.

the trick is picking libraries that just… work on both targets without needing platform gates:

  • serde/serde_json: universal. obviously.
  • uuid with the js feature: uses js-sys getRandomValues on wasm32. no #[cfg].
  • chrono with wasmbind: uses js-sys Date on wasm32. no #[cfg].
  • futures + async-trait: universal. no runtime dependency.
  • petgraph: pure Rust. no system deps. just works.
  • opfs: this crate is kind of magic. tokio::fs on native, browser OPFS on wasm. one API. the whole reason I don’t need a StorageProvider trait.
  • candle: wasm32 with SIMD backend. Snowflake-Arctic-Embed-XS quantized to GGUF.
  • usearch: wasm32 compatible. HNSW index with add/search/remove.
  • async-channel: universal. no runtime coupling.

I am unreasonably happy about this list.

Replying to @NellowTCS

0
2
Loading more…

Followers

Loading…