bpe.c Byte-Pair Encoding tokenizer for training large language models on huge datasets. I don't know C yet, so most of the code comes from AI :D I hope to learn C by rewriting it and making changes, fixes etc