Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to consume XML elements lazily? #1

Open
lowecg opened this issue Jun 5, 2021 · 4 comments
Open

Is there a way to consume XML elements lazily? #1

lowecg opened this issue Jun 5, 2021 · 4 comments

Comments

@lowecg
Copy link

lowecg commented Jun 5, 2021

I've been using Eximia and have been very pleased with its performance and simplicity.

However, I'd like to use Eximia to operate on large documents in a memory-constrained environment (AWS Lambda)

The parsing seems to eagerly process all of the XML input which consumes a lot of memory and places a hard limit on the size of input that can be processed. For example, if I load a 29MiB input document, my Lambda reports a memory usage of 780MiB.

Would it be possible to have an option to consume the stream of XML tokens lazily, say via a lazy seq?

@nilern
Copy link
Owner

nilern commented Jun 7, 2021

The performance and (implementation) simplicity stems largely from not supporting laziness. Obviously lazy parsing is possible, but I am not confident I could do it with substantially less overhead than data.xml. Honestly I would just use data.xml if lazy parsing is a must. Unfortunately the libraries are not 100% compatible so I have to admit it would be easier to be able to just toggle an option.

@lowecg
Copy link
Author

lowecg commented Jun 8, 2021

Thanks for looking at this.

I take your point regarding going fully lazy.

There might be a suitable balance between pure laziness and eagerness.

My use case, which I believe is quite a common use case, is to process a document that will have repeated child nodes under a parent:

<parent>
  <child />
  <child />
  <!-- ... lots of child nodes ... -->
  <child />
</parent>

If there's a way to specify a path that denotes the child node, then a sequence of eagerly processed child nodes might be enough to strike a balance between laziness and performance.

@nilern
Copy link
Owner

nilern commented Jun 8, 2021

I have a half-baked XML parser combinator library. That approach should enable your example and much more with even less memory usage.

But then I thought it is probably better to just make the 90% use case more efficient and released Eximia instead.

I am still thinking about memory reduction and parse-time transformations for both XML and JSON. There doesn't seem to be a whole lot of demand but maybe people just don't know what they are missing 🤷

@lowecg
Copy link
Author

lowecg commented Jun 8, 2021

Great, I'll have a look at Esco.

And thank you again for looking into this - it's very much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants