Skip to content

Which parts are missing to support doing advanced stuff like this chunking strategy? #392

Answered by giladgd
Madd0g asked this question in Q&A
Discussion options

You must be logged in to vote

Thanks for sharing; this is brilliant!

I recommend you to read this documentation to get more background about how generation works, but in essence, the model generates a probability for each token in the vocabulary to be the next token for a given sequence of tokens, and while response generation for a prompt is an iterative process, it's technically possible to generate probabilities for all sequence lengths in parallel, which would enable a massive speedup in performance.

I think it's worth having some API for these kinds of use cases in node-llama-cpp, but I don't think it's a good idea to expose the entire logprobs and other related low-level APIs for this as it would affect the perf…

Replies: 2 comments 8 replies

Comment options

You must be logged in to vote
5 replies
@Madd0g
Comment options

@giladgd
Comment options

@Madd0g
Comment options

@giladgd
Comment options

@Madd0g
Comment options

Answer selected by giladgd
Comment options

You must be logged in to vote
3 replies
@giladgd
Comment options

@zhzLuke96
Comment options

@giladgd
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants