-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to consume XML elements lazily? #1
Comments
The performance and (implementation) simplicity stems largely from not supporting laziness. Obviously lazy parsing is possible, but I am not confident I could do it with substantially less overhead than data.xml. Honestly I would just use data.xml if lazy parsing is a must. Unfortunately the libraries are not 100% compatible so I have to admit it would be easier to be able to just toggle an option. |
Thanks for looking at this. I take your point regarding going fully lazy. There might be a suitable balance between pure laziness and eagerness. My use case, which I believe is quite a common use case, is to process a document that will have repeated child nodes under a parent: <parent>
<child />
<child />
<!-- ... lots of child nodes ... -->
<child />
</parent> If there's a way to specify a path that denotes the child node, then a sequence of eagerly processed child nodes might be enough to strike a balance between laziness and performance. |
I have a half-baked XML parser combinator library. That approach should enable your example and much more with even less memory usage. But then I thought it is probably better to just make the 90% use case more efficient and released Eximia instead. I am still thinking about memory reduction and parse-time transformations for both XML and JSON. There doesn't seem to be a whole lot of demand but maybe people just don't know what they are missing 🤷 |
Great, I'll have a look at Esco. And thank you again for looking into this - it's very much appreciated. |
I've been using Eximia and have been very pleased with its performance and simplicity.
However, I'd like to use Eximia to operate on large documents in a memory-constrained environment (AWS Lambda)
The parsing seems to eagerly process all of the XML input which consumes a lot of memory and places a hard limit on the size of input that can be processed. For example, if I load a 29MiB input document, my Lambda reports a memory usage of 780MiB.
Would it be possible to have an option to consume the stream of XML tokens lazily, say via a lazy seq?
The text was updated successfully, but these errors were encountered: