The default/internal tokenizer: GPT2Tokenizer vs tiktoken #3875
kerlion
started this conversation in
Suggestion
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Some models do not have a tokenizer functionality, but Dify has an internal/default tokenizer: GPT2Tokenizer. We can use it to generate tokens or calculate the number of tokens.
I would like to ask, what are the benefits of using GPT2Tokenizer? Why not use tiktoken instead, as it has better generality and faster speed? Tiktoken supports not only GPT-2, but also text-embedding-ada-002, gpt-3.5-turbo, gpt-4, and others.
Beta Was this translation helpful? Give feedback.
All reactions