-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline creates Chunks with duplicate ids when executed multiple times #221
Comments
Hi @risafj , Indeed, this behavior is quite annoying, we'll take a closer look. In the meantime, you can control this prefix by setting it in a So you code will look like this: from neo4j_graphrag.experimental.components.types import LexicalGraphConfig
config = LexicalGraphConfig(
id_prefix="myPrefix",
)
await pipe.run(data={
# ...
"extractor": {
# ...
"lexical_graph_config": config,
}
}) Let me know if you need more assistance. |
Are you using a custom entity and relation extractor? |
Hi @stellasia , Thank you so much for the quick turnaround and helpful response! Your solution worked perfectly!
No, I'm using the one defined in this library: from neo4j_graphrag.experimental.components.entity_relation_extractor import (
LLMEntityRelationExtractor, OnError)
extractor = LLMEntityRelationExtractor(
llm=llm,
on_error=OnError.RAISE,
prompt_template=custom_prompt,
) |
Thank you for raising the issue and the information, we will investigate this shortly. |
When I run the
Pipeline()
on a loop with multiple documents, a Chunk node with an id property of":1"
and index of1
is created for each run. This causes problems, since the ids are no longer unique.For example, when the lexical graph gets created, a Chunk node with an id of
":1"
has a NEXT_NODE relation to every Chunk node that has an id of ":2".After running the pipeline with 4 documents, it looks like this:
The same issue is occuring with FROM_CHUNK, where an entity that's supposed to have a relation like
(n:Entity)-[:FROM_CHUNK]->(c:Chunk {id: ":1", index: "1"})
actually has that relation to all documents' chunks with an index of 1.Is there any workaround for this?
I'm guessing this issue would be solved if I could somehow pass document-specific id_prefix so each chunk gets a unique id?
neo4j-graphrag-python/src/neo4j_graphrag/experimental/components/lexical_graph.py
Lines 78 to 79 in bc6dd9c
Additional info:
I use v1.2.0.
I have a standard pipeline setup that has these components.
The text was updated successfully, but these errors were encountered: