Ktoken is a BPE tokenizer designed for seamless integration with OpenAI's models.
Install Ktoken by adding the dependency to your build.gradle
file:
repositories {
mavenCentral()
}
dependencies {
implementation "com.aallam.ktoken:ktoken:0.4.0"
}
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE)
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4")
val tokens = tokenizer.encode("hello world")
val text = tokenizer.decode(listOf(15339, 1917))
Ktoken operates in two modes: Local (default for JVM) and Remote (default for JS/Native).
Utilize LocalPbeLoader
to retrieve encodings from local files:
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.SYSTEM))
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4", loader = LocalPbeLoader(FileSystem.SYSTEM))
Artifacts for JVM include encoding files. Use FileSystem.RESOURCES
to load them:
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.RESOURCES))
Note: this is the default behavior for JVM.
- Add Engine: Include one of Ktor's engines to your dependencies.
- Use
RemoteBpeLoader
: To load encoding from remote sources:
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = RemoteBpeLoader())
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4", loader = RemoteBpeLoader())
You might alternatively use ktoken-bom by adding the following dependency to your build.gradle
file:
dependencies {
// Import Kotlin API client BOM
implementation platform('com.aallam.ktoken:ktoken-bom:0.4.0')
// Define dependencies without versions
implementation 'com.aallam.ktoken:ktoken'
runtimeOnly 'io.ktor:ktor-client-okhttp'
}
For multiplatform projects, add the ktoken dependency to commonMain
, and select an engine for each target.
Ktoken is open-source software and distributed under the MIT license. This project is not affiliated with nor endorsed by OpenAI.