Skip to content
/ ktoken Public

Kotlin multiplatform BPE tokenizer library for OpenAI models

License

Notifications You must be signed in to change notification settings

aallam/ktoken

Repository files navigation

Ktoken

Maven Central License Documentation

Ktoken is a BPE tokenizer designed for seamless integration with OpenAI's models.

📦 Setup

Install Ktoken by adding the dependency to your build.gradle file:

repositories {
    mavenCentral()
}

dependencies {
    implementation "com.aallam.ktoken:ktoken:0.4.0"
}

⚡️ Getting Started

val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE)
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4")

val tokens = tokenizer.encode("hello world")
val text = tokenizer.decode(listOf(15339, 1917))

⚙️ Usage Modes

Ktoken operates in two modes: Local (default for JVM) and Remote (default for JS/Native).

📍 Local Mode

Utilize LocalPbeLoader to retrieve encodings from local files:

val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.SYSTEM))
// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4", loader = LocalPbeLoader(FileSystem.SYSTEM))
JVM Specifics:

Artifacts for JVM include encoding files. Use FileSystem.RESOURCES to load them:

val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.RESOURCES))

Note: this is the default behavior for JVM.

🌐 Remote Mode

  1. Add Engine: Include one of Ktor's engines to your dependencies.
  2. Use RemoteBpeLoader: To load encoding from remote sources:
val tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = RemoteBpeLoader())

// For a specific model in the OpenAI API:
val tokenizer = Tokenizer.of(model = "gpt-4", loader = RemoteBpeLoader())

📋 BOM Usage

You might alternatively use ktoken-bom by adding the following dependency to your build.gradle file:

dependencies {
    // Import Kotlin API client BOM
    implementation platform('com.aallam.ktoken:ktoken-bom:0.4.0')

    // Define dependencies without versions
    implementation 'com.aallam.ktoken:ktoken'
    runtimeOnly 'io.ktor:ktor-client-okhttp'
}

🔀 Multiplatform Projects

For multiplatform projects, add the ktoken dependency to commonMain, and select an engine for each target.

📄 License

Ktoken is open-source software and distributed under the MIT license. This project is not affiliated with nor endorsed by OpenAI.