-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
180 lines (176 loc) · 9.94 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale-1.0">
<title>Tiktokenizer.js</title>
<link rel="stylesheet" href="style.css">
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet"
integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.3.0/font/bootstrap-icons.css">
</head>
<body>
<main>
<div class="container mt-4">
<div class="row justify-content-between align-items-center">
<div class="col-4 px-0">
<h1>Tiktokenizer.js</h1>
</div>
<div class="col-4 px-2">
<select class="form-select" name="selector" id="model-selector">
<optgroup label="Tokenizers">
<option selected label="bpe" value="bpe"></option>
<!-- <option label="wpe" value="wpe"></option> -->
</optgroup>
<optgroup label="Models">
<option label="gpt-2 (bpe)" value="gpt-2"></option>
<!-- <option label="bert" value="bert"></option> -->
</optgroup>
</select>
</div>
</div>
<div class="row">
<div class="col left-col">
<!-- <div class="row mt-2 messages justify-content-between align-items-center">
<div class="col-3 px-0">
<select class="form-select" name="speaker" id="speaker">
<option label="User" value="User"></option>
<option selected label="System" value="Systen"></option>
<option label="Assistant" value="Assistant"></option>
<option label="Custom" value="Custom"></option>
</select>
</div>
<div class="col-8 px-2">
<textarea class="form-control resize-none" type="text" name="message" rows="1"
cols="45">You are a helpful assistant</textarea>
</div>
<div class="col-1 px-0 btn-div">
<button type="button" class="btn btn-close"></button>
</div>
</div>
<div class="row mt-2 messages justify-content-between align-items-center">
<div class="col-3 px-0">
<select class="form-select" name="speaker" id="speaker">
<option selected label="User" value="User"></option>
<option label="System" value="Systen"></option>
<option label="Assistant" value="Assistant"></option>
<option label="Custom" value="Custom"></option>
</select>
</div>
<div class="col-8 px-2">
<textarea class="form-control resize-none" type="text" name="message" rows="1"
cols="36"></textarea>
</div>
<div class="col-1 px-0 btn-div">
<button type="button" class="btn btn-close"></button>
</div>
</div>
<div class="row mt-4">
<button class="btn btn-dark">Add Message</button>
</div> -->
<div class="row mt-2">
<textarea class="form-control font-monospace p-3" type="text" name="message" id="message"
rows="12" cols="50" placeholder="Start typing here ..."></textarea>
</div>
</div>
<div class="col ps-3 right-col">
<div class="row mt-2 px-2 justify-content-between align-items-center">
<div class="col px-0 pe-0">
<div class="card">
<div class="card-body">
<h5 class="card-title">Token Count</h5>
<p class="card-text" id="tokenCount"></p>
</div>
</div>
</div>
<!-- <div class="col-6 px-0 ps-2">
<div class="card">
<div class="card-body">
<h5 class="card-title">Price per prompt</h5>
<p class="card-text" id="tokenPrice"></p>
</div>
</div>
</div> -->
</div>
<div class="row mt-3 px-2 color-color">
<div class="col px-0 card-container fixed">
<div class="card px-0 h-100">
<div class="card-body">
<div class="card-text font-monospace break-all">
<pre class="break-all unvcenter" id="output"></pre>
</div>
</div>
</div>
</div>
</div>
<div class="row mt-3 px-2 tokens">
<div class="col px-0 card-container">
<div class="card px-0 h-100">
<div class="card-body">
<div class="card-text font-monospace break-all" id="tokenIds">
<span>
</span>
</div>
</div>
</div>
</div>
</div>
<!-- <div class="row mt-3 px-2">
<div class="col px-0">
<input class="form-check-input" type="checkbox" name="whitespace" id="whitespace"
value="yes">
<label for="whitespace">Show whitespace</label>
</div>
</div> -->
</div>
</div>
<br>
<div class="col-12 px-0">
<p class="text-muted">
A custom tokenizer visualizer written in pure JavaScript that mirrors the functionality of <a
class="font-weight-bold text-dark" href=" https://platform.openai.com/tokenizer" target="_blank">OpenAI's
GPT-2/GPT-3
Byte
Pair Encoding (BPE) tokenizer </a> to showcase how text is tokenized
into subword
units. The <span class="font-weight-bold text-dark">encoder.json</span> and <span
class="font-weight-bold text-dark">vocab.bpe</span> files provided by OpenAI are used here so the tokens IDs
are
exactly matches the official BPE (GPT-2/GPT-3) representation. Currently, the dropdown on top is just a
placeholder
to add more schemes in future. Try different characters including ASCII, emojis, and non-English languages!
<br>
Did you notice any difference from the <a
class="font-weight-bold text-dark" href=" https://platform.openai.com/tokenizer" target="_blank">official tokenizer</a> 😎 Check out the <a class="font-weight-bold text-dark"
href="https://github.com/sinjoysaha/tiktokenizer-js">GitHub Repo</a> for this
project. See more projects <a class="font-weight-bold text-dark" href="https://sinjoysaha.github.io">here</a>.
</p>
</div>
<footer class="row mt-4 justify-content-between align-items-center">
<div class="col-md-4 px-0 built-by align-items-center">
<span class="text-muted">Inspired by </span><span><a class="font-weight-bold text-dark"
href="https://tiktokenizer.vercel.app/?model=gpt2" target="_blank">Tiktokenizer</a></span><span class="text-muted">.</span>
<span class="text-muted">Built by </span><span><a class="font-weight-bold text-dark"
href="https://sinjoysaha.github.io">Sinjoy Saha</a></span><span class="text-muted">.</span>
</div>
<ul class="nav col-md-4 px-0 justify-content-end list-unstyled d-flex">
<li class=""><a class="btn" href="https://github.com/sinjoysaha" target="_blank"><i class="bi bi-github"></i></a></li>
<li class=""><a class="btn" href="https://linkedin.com/in/sinjoysaha" target="_blank"><i class="bi bi-linkedin"></i></a></li>
<li class=""><a class="btn" href="https://twitter.com/sinjoysaha" target="_blank"><i class="bi bi-twitter"></i></a></li>
</ul>
</footer>
</div>
</main>
<script src="app.js"></script>
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-TDS38QPM4D"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() { dataLayer.push(arguments); }
gtag('js', new Date());
gtag('config', 'G-TDS38QPM4D');
</script>
<!-- End of Google tag (gtag.js) -->
</body>
</html>