-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stricter lark grammar for notebook documents #143
base: main
Are you sure you want to change the base?
Conversation
i'm worried about merging this one when it does land. i think i'd feel better about the support if i ran the parser through some hypothesis schema jawn. |
this parser is super strict at the notebook and cell levels. after that the parser is free parse, but it will only pick up the source parts at the moment. i'd like to harden it up with more tests. some questions i have are:
|
If not architecturally impossible, a common pattern when making something stricter would be to build, test, and ship both parsers, and initially make the strict one opt-in, then the default, then, maybe, deprecate the old one. The parsers could be registered with In addition to clear docs on how that works in e.g. with Notebook():
import a_somehow_bad_notebook
# UserWarning:
# a_somehow_bad_notebook failed strict parsing, on line {line}, column {line}
#
# {lark error}
#
# The loose parser was used, but {features} will not be available. Silence this warning with:
#
# with Notebook(parser='loose'):
#
# see https://github.com/deathbeds/importnb/issues/{number} Then offer: with Notebook(parser="loose"):
import a_somehow_bad_notebook And allow opting-in to hard fails: with Notebook(parser="strict"):
import a_somehow_bad_notebook
# ImportnbParseError: a_somehow_bad_notebook.ipynb failed to parse on line {line}, column {line}
# {lark error} Where
Yes, But the above is irrelevant until the existing tests all run cleanly, ideally without doing anything to disk, |
yea i love this idea. i think i can manage it in this PR. we'll have to build two different standalone parsers to ship for a few months then sunset the more permissive one. ultimately, i think this should work out cause both the notebook and cell levels define |
Yeah, this doesn't seem like a huge hit to avoid breaking things, and getting new features.
Right, as long as there is clear messaging from the as-installed software that this is going to happen, and a timeline/trigger (e.g. first release that supports
Sure, as long as not losing features for the 99% use case of So if some use case requires going wild, it could subclass the machinery (even interactively), by overloading the right level of the grammar/decoder/parser/transformer/whatever (these need some If the anticipated needs are very predictable, having a more constrained API A more formalized system could package and register it by normal means, which should be kept if added.
Changes like that happen slowly, but not worth losing features. We could add an |
this pull request modifies the json parser to use explicit properties from the notebook format to constrain the parser.
the top-level notebook keys and cell keys are known ahead of time because of the schema that notebooks conform to. these changes hard code the notebook and cell properties into the parser.
this change will NOT work for arbitrary data formats, only those conforming to subsets of the specified schema. otherwise the parser throws errors.
this change streamlines the transformation logic.