Devlog by @Sabrina

@Sabrina on LLM Text Compression · about 2 months ago

25m 43s logged

I fixed the varint4 reading and writing but as it seems right now it increases the size of the bit mask. but i think i mightve found a way to do without the bit mask.
the vocab is encoded in utf-8 and so for all my characters (a-z, A-Z, 0-9, and a space), only the lowder 7 bits are used, the 8th bit is ALWAYS 0
because of this, a character will NEVER start with a 1, meaning i can see if the next bit in the compress data is a 1 and if so we read an index for the llm’s prediction!