nanochat
Overview
Tokenizer
rustbpe: ligthtweight BPE tokenizer in Rust
tiktoken: fast BPE tokenizer in Rust with Python bindings by OpenAI
import tiktoken
enc = tiktoken.get_encoding("o200k_base")
assert enc.decode(enc.encode("hello world")) == "hello world"
# To get the tokeniser corresponding to a specific model in the OpenAI API:
enc = tiktoken.encoding_for_model("gpt-4o")
minbpe: both training and inference in inefficient Python