Is there a way to consume XML elements lazily? #1

lowecg · 2021-06-05T06:35:14Z

I've been using Eximia and have been very pleased with its performance and simplicity.

However, I'd like to use Eximia to operate on large documents in a memory-constrained environment (AWS Lambda)

The parsing seems to eagerly process all of the XML input which consumes a lot of memory and places a hard limit on the size of input that can be processed. For example, if I load a 29MiB input document, my Lambda reports a memory usage of 780MiB.

Would it be possible to have an option to consume the stream of XML tokens lazily, say via a lazy seq?

nilern · 2021-06-07T07:35:24Z

The performance and (implementation) simplicity stems largely from not supporting laziness. Obviously lazy parsing is possible, but I am not confident I could do it with substantially less overhead than data.xml. Honestly I would just use data.xml if lazy parsing is a must. Unfortunately the libraries are not 100% compatible so I have to admit it would be easier to be able to just toggle an option.

lowecg · 2021-06-08T10:49:10Z

Thanks for looking at this.

I take your point regarding going fully lazy.

There might be a suitable balance between pure laziness and eagerness.

My use case, which I believe is quite a common use case, is to process a document that will have repeated child nodes under a parent:

<parent>
  <child />
  <child />
  <!-- ... lots of child nodes ... -->
  <child />
</parent>

If there's a way to specify a path that denotes the child node, then a sequence of eagerly processed child nodes might be enough to strike a balance between laziness and performance.

nilern · 2021-06-08T12:03:51Z

I have a half-baked XML parser combinator library. That approach should enable your example and much more with even less memory usage.

But then I thought it is probably better to just make the 90% use case more efficient and released Eximia instead.

I am still thinking about memory reduction and parse-time transformations for both XML and JSON. There doesn't seem to be a whole lot of demand but maybe people just don't know what they are missing 🤷

lowecg · 2021-06-08T12:50:28Z

Great, I'll have a look at Esco.

And thank you again for looking into this - it's very much appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to consume XML elements lazily? #1

Is there a way to consume XML elements lazily? #1

lowecg commented Jun 5, 2021

nilern commented Jun 7, 2021

lowecg commented Jun 8, 2021

nilern commented Jun 8, 2021

lowecg commented Jun 8, 2021

Is there a way to consume XML elements lazily? #1

Is there a way to consume XML elements lazily? #1

Comments

lowecg commented Jun 5, 2021

nilern commented Jun 7, 2021

lowecg commented Jun 8, 2021

nilern commented Jun 8, 2021

lowecg commented Jun 8, 2021