Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonld: Do not merge nodes with different invalid URIs #3011

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

progval
Copy link
Contributor

@progval progval commented Dec 16, 2024

Summary of changes

When parsing JSON-LD with invalid URIs in the @id, the generalized_rdf: True option allows parsing these nodes as blank nodes instead of outright rejecting the document.

However, all nodes with invalid URIs were mapped to the same blank node, resulting in incorrect data. For example, without this patch, the new test fails with:

AssertionError: Expected:
@prefix schema: <https://schema.org/> .

<https://example.org/root-object> schema:author [ schema:familyName "Doe" ;
            schema:givenName "Jane" ;
            schema:name "Jane Doe" ],
        [ schema:familyName "Doe" ;
            schema:givenName "John" ;
            schema:name "John Doe" ] .

Got:
@prefix schema: <https://schema.org/> .

<https://example.org/root-object> schema:author <> .

<> schema:familyName "Doe" ;
    schema:givenName "Jane",
        "John" ;
    schema:name "Jane Doe",
        "John Doe" .

Checklist

  • Checked that there aren't other open pull requests for
    the same change.
  • Checked that all tests and type checking passes.
  • If the change has a potential impact on users of this project:
    • Added or updated tests that fail without the change.
    • Updated relevant documentation to avoid inaccuracies.
    • Considered adding additional documentation. -> should this be documented in generalized_rdf's description? It's not clear to me what the spec says should happen to invalid URIs here
  • Considered granting push permissions to the PR branch,
    so maintainers can fix minor issues and keep your PR up to date.

When parsing JSON-LD with invalid URIs in the `@id`, the
`generalized_rdf: True` option allows parsing these nodes as blank nodes
instead of outright rejecting the document.

However, all nodes with invalid URIs were mapped to the same blank node,
resulting in incorrect data. For example, without this patch, the new test
fails with:

```
AssertionError: Expected:
@Prefix schema: <https://schema.org/> .

<https://example.org/root-object> schema:author [ schema:familyName "Doe" ;
            schema:givenName "Jane" ;
            schema:name "Jane Doe" ],
        [ schema:familyName "Doe" ;
            schema:givenName "John" ;
            schema:name "John Doe" ] .

Got:
@Prefix schema: <https://schema.org/> .

<https://example.org/root-object> schema:author <> .

<> schema:familyName "Doe" ;
    schema:givenName "Jane",
        "John" ;
    schema:name "Jane Doe",
        "John Doe" .
```
@coveralls
Copy link

Coverage Status

coverage: 90.279% (+0.003%) from 90.276%
when pulling 65cd9da on progval:invalid-uris
into 228f3a1 on RDFLib:main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants