Error attempting to process file with ENTITY #398
-
Hey, all. I love this library! A simple usable API and very easy to learn. I am having one problem importing a file that uses an Entity. I'm in the process of developing a python library to convert NIST metaschema specifications to python classes. The core schema is here: https://github.com/usnistgov/metaschema/tree/main/schema/xml I am attempting to process a specification for OSCAL: https://github.com/usnistgov/OSCAL/tree/main/src/metaschema To pick one example of a file that is giving me problems, we can choose the catalog I have downloaded all of the schema and XML files and I am processing them locally. I've extracted the relevant portions below: ...
<!DOCTYPE METASCHEMA [
<!ENTITY allowed-values-control-group-property-name SYSTEM "shared-constraints/allowed-values-control-group-property-name.ent">
]>
...
<allowed-values target="prop[has-oscal-namespace('http://csrc.nist.gov/ns/oscal')]/@name">
&allowed-values-control-group-property-name;
</allowed-values>
... The content of the entity file is as follows: <enum xmlns="http://csrc.nist.gov/ns/oscal/metaschema/1.0" value="label">A human-readable label for the parent context, which may be rendered in place of the actual identifier for some use cases.</enum>
<enum xmlns="http://csrc.nist.gov/ns/oscal/metaschema/1.0" value="sort-id">An alternative identifier, whose value is easily sortable among other such values in the document.</enum>
<enum xmlns="http://csrc.nist.gov/ns/oscal/metaschema/1.0" value="alt-identifier">An alternate or aliased identifier for the parent context.</enum> With the following code: import xmlschema
metaschema_schema = xmlschema.XMLSchema(
source="/workspaces/metaschema-python/metaschema/schema/xml/metaschema.xsd",
defuse="never",
)
print(
str(
metaschema_schema.is_valid(
"/workspaces/metaschema-python/OSCAL/src/metaschema/oscal_catalog_metaschema.xml"
)
)
)
metaschema_schema.validate(
source="/workspaces/metaschema-python/OSCAL/src/metaschema/oscal_catalog_metaschema.xml",
path="/workspaces/metaschema-python/OSCAL/src/metaschema/",
) I receive an error: Traceback (most recent call last):
File "/workspaces/metaschema-python/metaschema-python/metaschema_python/test_xmlschema.py", line 10, in <module>
metaschema_schema.is_valid(
File "/home/vscode/.pyenv/versions/3.9.19/envs/metaschema-python/lib/python3.9/site-packages/xmlschema/validators/schemas.py", line 1718, in is_valid
error = next(self.iter_errors(source, path, schema_path, use_defaults,
File "/home/vscode/.pyenv/versions/3.9.19/envs/metaschema-python/lib/python3.9/site-packages/xmlschema/validators/schemas.py", line 1742, in iter_errors
resource = XMLResource(source, defuse=self.defuse, timeout=self.timeout)
File "/home/vscode/.pyenv/versions/3.9.19/envs/metaschema-python/lib/python3.9/site-packages/xmlschema/resources.py", line 258, in __init__
self.parse(source, lazy)
File "/home/vscode/.pyenv/versions/3.9.19/envs/metaschema-python/lib/python3.9/site-packages/xmlschema/resources.py", line 614, in parse
self._parse_resource(resource, url, lazy)
File "/home/vscode/.pyenv/versions/3.9.19/envs/metaschema-python/lib/python3.9/site-packages/xmlschema/resources.py", line 578, in _parse_resource
self._parse(resource)
File "/home/vscode/.pyenv/versions/3.9.19/envs/metaschema-python/lib/python3.9/site-packages/xmlschema/resources.py", line 548, in _parse
for event, node in ElementTree.iterparse(resource, events):
File "/home/vscode/.pyenv/versions/3.9.19/lib/python3.9/xml/etree/ElementTree.py", line 1253, in iterator
yield from pullparser.read_events()
File "/home/vscode/.pyenv/versions/3.9.19/lib/python3.9/xml/etree/ElementTree.py", line 1324, in read_events
raise event
File "/home/vscode/.pyenv/versions/3.9.19/lib/python3.9/xml/etree/ElementTree.py", line 1296, in feed
self._parser.feed(data)
xml.etree.ElementTree.ParseError: undefined entity &allowed-values-control-group-property-name;: line 145, column 12 Line 145, column 12 is the reference to the XML entry in the extract above (I've extracted just the relevant items). I have tried changing the value of defuse in the original XMLSchema initialization, but the only other message I receive is that Entities are forbidden (if I say "always"). Do you see any problem with the way I am initializing the XMLSchema object? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi, With lxml.etree this is also disabled for default: >>> import lxml.etree as etree
>>> etree.parse('oscal_catalog_metaschema.xml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src/lxml/etree.pyx", line 3570, in lxml.etree.parse
File "src/lxml/parser.pxi", line 1952, in lxml.etree._parseDocument
File "src/lxml/parser.pxi", line 1978, in lxml.etree._parseDocumentFromURL
File "src/lxml/parser.pxi", line 1881, in lxml.etree._parseDocFromFile
File "src/lxml/parser.pxi", line 1200, in lxml.etree._BaseParser._parseDocFromFile
File "src/lxml/parser.pxi", line 633, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 743, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 672, in lxml.etree._raiseParseError
File "oscal_catalog_metaschema.xml", line 145
lxml.etree.XMLSyntaxError: Entity 'allowed-values-control-group-property-name' not defined, line 145, column 57 but you can enable it with a custom parser instance: >>> parser = etree.XMLParser(load_dtd=True, resolve_entities=True)
>>> etree.parse('oscal_catalog_metaschema.xml', parser=parser)
<lxml.etree._ElementTree object at 0x7fb8d190bfc0> Loading external entities is disabled for well-known security reasons. Building a schema instance is possible providing sources as parsed trees but it can be problematic if there are includes/redefine/override. |
Beta Was this translation helpful? Give feedback.
-
Okay. I had it working in lxml using the parser as you described, but your API is so much better that I was hoping to drop the dependency on lxml and use xmlschema exclusively. Thanks for the answer! |
Beta Was this translation helpful? Give feedback.
Hi,
no way for doing this from path/url, because ElementTree does not parse external entities.
With lxml.etree this is also disabled for default: