-
I came across this ZeroEntropy-AI/llama-chunk: A new chunking strategy developed by ZeroEntropy for general semantic chunking using Llama-70B. I'm wondering what's missing from node-llama-cpp to support these kind of advanced use-cases. Also, I'm not an LLM expert so I'm not fully understanding this sentence:
What does it mean "simply pass"? Are logprobs generated for inputs? I thought it was for outputs Sorry for noob questions |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 8 replies
-
Thanks for sharing; this is brilliant! I recommend you to read this documentation to get more background about how generation works, but in essence, the model generates a probability for each token in the vocabulary to be the next token for a given sequence of tokens, and while response generation for a prompt is an iterative process, it's technically possible to generate probabilities for all sequence lengths in parallel, which would enable a massive speedup in performance. I think it's worth having some API for these kinds of use cases in I'll look into implementing some API for this chunking strategy. |
Beta Was this translation helpful? Give feedback.
-
First of all, thanks a lot to @Madd0g for sharing llama-chunker! I found it super helpful and inspiring me. I’m working on a text-processing project, and the method you mentioned gave me some great ideas. Hi, I’d like to share my use case here, which is somewhat related and might be relevant to this discussion. I’m curious if the maintainer has any thoughts or suggestions on this? |
Beta Was this translation helpful? Give feedback.
Thanks for sharing; this is brilliant!
I recommend you to read this documentation to get more background about how generation works, but in essence, the model generates a probability for each token in the vocabulary to be the next token for a given sequence of tokens, and while response generation for a prompt is an iterative process, it's technically possible to generate probabilities for all sequence lengths in parallel, which would enable a massive speedup in performance.
I think it's worth having some API for these kinds of use cases in
node-llama-cpp
, but I don't think it's a good idea to expose the entire logprobs and other related low-level APIs for this as it would affect the perf…