Devlog by @Sabrina

@Sabrina on LLM Text Compression · about 2 months ago

3h 0m 38s logged

I have been working on trying to get the LLM to compress more. I tried to brute force a bunch of seeds and they only kinda worked
I tried a few other leads but no cigar

Now im planning on having my predict next character return the top 5 characters, check if the correct character is in there, and if so then we use entropy/huffman coding to write which one it is. the lower on the list the less

example for 5 characters
11 -> 1st result
10 -> 2nd
01 -> 3rd
001 -> 4th
000 -> 5th

if i return 4 characters then my tree looks like this:
0 -> 1st
10 -> 2nd
110 -> 3rd
1110 -> 4th

the less probable characters have longer codes!
this does have the added complexity of throwing off the bit count, but i can either just pop off bits or i can pad the current byte to 8 bits before i encode a character