Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot cope with ISO8601-compliant BCE dates #2210

Open
wxwilcke opened this issue Jan 19, 2023 · 5 comments
Open

Cannot cope with ISO8601-compliant BCE dates #2210

wxwilcke opened this issue Jan 19, 2023 · 5 comments
Labels
core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` enhancement New feature or request warnings Related to warnings.

Comments

@wxwilcke
Copy link

wxwilcke commented Jan 19, 2023

Literal values which represent dates or years before BCE (ie negative years) are no longer supported due to limitations of the newly added isodate dependency. This is a regression with the previous behaviour.

I assume that the project is using the isodate package, yet that package explicitly mentions that it does not support negative values:

[...] it is not possible to convert all possible ISO 8601 dates/times. For instance, dates before 0001-01-01 are not allowed [...]

Example:

  import rdflib
  year = rdflib.Literal("-0001", datatype=rdflib.namespace.XSD.gYear)
  
  Failed to convert Literal lexical form to value. Datatype=http://www.w3.org/2001/XMLSchema#gYear, Converter=<function parse_date at 0x7f28d3830d30>
  Traceback (most recent call last):
    File "/data/projects/mrgcn/python310venv/lib/python3.10/site-packages/rdflib/term.py", line 2084, in _castLexicalToPython
      return conv_func(lexical)  # type: ignore[arg-type]
    File "/data/projects/mrgcn/python310venv/lib/python3.10/site-packages/isodate/isodates.py", line 203, in parse_date
      raise ISO8601Error('Unrecognised ISO 8601 date format: %r' % datestring)
  isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: '-0001

Excepted behaviour:

I expect all values that comply with the Seven-Property-Model or the ISO 8601 standard to be supported. This includes all positive years from 0001 to 9999 and all negative years from -0001 to -9999, and all dates that specify those years.

Possible solution:

Since RDF enjoys a large adoption in historical domains I think it is unacceptable to exclude dates before BCE. Yet I understand that you want to wrap dates to make processing them by users more convenient. I therefore suggest to add a parameter to_python=[T|F] to the parser to enable users to make this decision for themselves whether they want to convert their literal values to internal Python constructs. Libraries like NumPy already do something similar. This also solves the loss of accuracy with numbers such as "001", which get converted to just "1".

@aucampia aucampia added bug Something isn't working core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` labels Mar 19, 2023
@aucampia
Copy link
Member

Hi @wxwilcke, this does seem like a bug, I think it should only generate a warning, not an exception. An exception resulting from parsing negative dates seems like a violation of https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-Graph-Literal to me.

@aucampia aucampia self-assigned this Mar 19, 2023
@aucampia
Copy link
Member

aucampia commented Mar 20, 2023

@wxwilcke what you are seeing is a warning being logged, not an error. The literal is constructed, it just does not have literal value associated with it, so it will only have a lexical form.

This behaviour was introduced with #1944, and is in line with the W3C spec:

https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal

Otherwise, the literal is ill-typed and no literal value can be associated with the literal. Such a case produces a semantic inconsistency but is not syntactically ill-formed. Implementations must accept ill-typed literals and produce RDF graphs from them. Implementations may produce warnings when encountering ill-typed literals

Some options available are:

As this is not a bug but compliant behaviour, I will close this issue, but I will leave it open for some feedback first.

@aucampia aucampia added marked for closing The issue or PR will be closed soon if no further feedback is provided. works as intended Indicates that the reported behavior is what is intended. and removed bug Something isn't working critical labels Mar 20, 2023
@aucampia aucampia removed their assignment Mar 20, 2023
@wxwilcke
Copy link
Author

@aucampia I agree that printing a warning and continuing on ill-typed values is compliant behaviour and normally not an issue. Here, however, it is unclear that this is merely a warning, since all that the user sees is the error raised by the isodate parser. You even failed to notice it at first. Likewise, it is unclear whether the literal is added or omitted. The only way a user could know for sure is by manually checking. It would therefore be very helpful if this would become more evident from the printed message.

Irrespective, negative values are within the value space of the gYear datatype and should not treated as ill-typed. Of course, your workaround avoids the warning, but it still shouldn't have been raised in the first place.

@aucampia aucampia added enhancement New feature or request and removed marked for closing The issue or PR will be closed soon if no further feedback is provided. works as intended Indicates that the reported behavior is what is intended. labels Mar 25, 2023
@aucampia
Copy link
Member

@aucampia I agree that printing a warning and continuing on ill-typed values is compliant behaviour and normally not an issue. Here, however, it is unclear that this is merely a warning, since all that the user sees is the error raised by the isodate parser. You even failed to notice it at first. Likewise, it is unclear whether the literal is added or omitted. The only way a user could know for sure is by manually checking. It would therefore be very helpful if this would become more evident from the printed message.

I agree the behaviour is not ideal, we will look into what we can do to make what is happening clearer.

Irrespective, negative values are within the value space of the gYear datatype and should not treated as ill-typed. Of course, your workaround avoids the warning, but it still shouldn't have been raised in the first place.

Having re-read that section of the spec I think you are right, in this case it should not be ill-typed, but there is very little we can do in the short term to really alleviate it as far as I can see, we would need to swap out isodate (which we anyway should) for something better, but this will only really be acceptable in RDFLib 7. Happy for pull requests to do this though (for info on public interface management, see this)

@PaulBenn-UnlikelyAI
Copy link

For anyone who comes across this in the future, @aucampia mentioned above that we can:

Use features from python's logging subsystem to ignore the warnings

In practice, here's what that looks like:

import logging
import rdflib

class SuppressYearBeforeCommonEraWarning(logging.Filter):
    def filter(self, record):
        message = record.getMessage()
        if record.exc_info is None:
            return True
        _error_type, error_payload, _error_traceback = record.exc_info
        should_suppress = message.startswith("Failed to convert Literal lexical form to value") and "Invalid isoformat string: '-" in str(error_payload)
        return not should_suppress

# Suppress rdflib warnings caused by converting BCE date-times
rdflib_logger = logging.getLogger("rdflib.term")
rdflib_logger.addFilter(SuppressYearBeforeCommonEraWarning())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` enhancement New feature or request warnings Related to warnings.
Projects
None yet
Development

No branches or pull requests

3 participants