I have been working on trying to get the LLM to compress more. I tried to brute force a bunch of seeds and they only kinda worked
I tried a few other leads but no cigar
Now im planning on having my predict next character return the top 5 characters, check if the correct character is in there, and if so then we use entropy/huffman coding to write which one it is. the lower on the list the less
example for 5 characters
11 -> 1st result
10 -> 2nd
01 -> 3rd
001 -> 4th
000 -> 5th
if i return 4 characters then my tree looks like this:
0 -> 1st
10 -> 2nd
110 -> 3rd
1110 -> 4th
the less probable characters have longer codes!
this does have the added complexity of throwing off the bit count, but i can either just pop off bits or i can pad the current byte to 8 bits before i encode a character
Comments 0
No comments yet. Be the first!
Sign in to join the conversation.