Devlog by @Sabrina

@Sabrina on LLM Text Compression · about 2 months ago

58m 14s logged

I got compression and decompression w/ the top 4 predictions working!! I encoded the indexes and then just slapped them in there. i had to pad the ending of the bit stream with 0s to align it to a byte but thats fine. I had to make a bit reader class to read it back and change up my decompresser code to read a byte and keep reading until we finish reading our index, but it was quite simple.

Now ive got my message being compressed to about 66 bytes, much better than the ~100 bytes from earlier with just raw prediction.
For reference the input text is 110 bytes.
The huffman encoding website says it compresses it down to 441 bits or ~56 bytes.

Right now im working on run length encoding to compress the llm prediction bitmask down. for my example message it is 16 bytes.
but since it is a bit mask there are lots of 0s and 1s in a row, so RLE should work wonders here. Hopefully i can reduce those 16 bytes donn to 6 and then i’d be tied with the huffman compression!
That would be cool,,, it is a 40% reduction in bytes so not impossible but we’ll see how it goes.