rumi-Devlog #8 🍓
Current status: rumis smarter, my codebase survived, and my sanity is questionable 💔
A huge part of this session wasn’t actually spent improving discovery quality.
It was spent improving everything around RUMI.
As RUMI’s discovery pipeline keeps getting bigger, debugging has become a nightmare. Reading raw JSON reports, searching through logs, and manually inspecting hundreds of entities was getting painful.
So I built a full visualization and monitoring system for RUMI.
The dashboard can now visualize:
discovery reports
theory competitions
adversarial testing results
predictions
refinement stages
scoring breakdowns
knowledge graphs
paper collections
hypothesis memory
all from a single interface.
Knowledge Graph Visualization
One thing I’ve wanted for a while was being able to actually SEE what RUMI was discovering.
Not read it.
See it.
So I built an interactive graph explorer using vis.js.
Every entity type gets its own visual identity.
Nodes glow based on type.
Edges scale with confidence.
Hovering shows metadata, paper counts, and relationships.
The graph uses ForceAtlas physics and overlap avoidance to make large discovery graphs easier to navigate.
Watching hundreds of concepts connect together is honestly way more useful than reading raw graph dumps.
🥀
Discovery Report Explorer
RUMI reports have become massive.
Some are hundreds of kilobytes.
Some are over a thousand lines.
Reading those manually sucks.
So I built dedicated views for:
theory competition
adversarial evaluation
predictions
scoring
derivations
peer review
refinement stages
which makes analyzing runs significantly easier.
Provider Stack Surgery 💔
While testing the new recurrent architecture I discovered a huge bottleneck.
Xiaomi MiMo.
It was slow.
Constantly rate limiting.
Burning through provider chains.
And starving downstream phases.
This turned out to be the reason molecule generation kept randomly failing.
So MiMo got removed from the routing stack.
After testing 17 API keys across all providers:
13 healthy
4 problematic
Current routing stack includes:
NVIDIA DeepSeek
Kimi K2.6
Cerebras GPT-OSS-120B
Groq
Gemini
Fireworks
Still chasing down some timeout and rate limit issues but overall the stack is way healthier now.
Discovery Testing
Used the upgraded architecture to stress test RUMI on several difficult domains.
One of the most interesting runs was Molecular Glue Drug Discovery.
Current run stats:
16 hidden variables
15 mechanisms per loop
7 predictions
multiple adversarial survivors
recurrent refinement loops enabled
The architecture is definitely producing stronger outputs than before.
meow meow meow
Comments 1
Genuinely blown away by what is going on here, having some sanity left after all this is admirable!
Sign in to join the conversation.