Make FeatureIterator more robust #207

michaelkirk · 2022-09-02T22:33:36Z

FeatureIterator is intended to be an efficient way to iterate through individual Features in a FeatureCollection without ever having to parse the entire FeatureCollection into memory at once.

To do so, it seeks through the input stream until it finds the features array, and then starts parsing the individual features one by one.

Currently however, the way we seek to the "start of the features" array is looking for any occurrence of [. This would fail with a document like:

{ 
  "type": "FeatureCollection",
  "bbox": [100.0, 0.0, 105.0, 1.0],
  "features": [
    ...
  ] 
}

or more complicated ones like this:

{ 
  "type": "FeatureCollection"
  "some_foreign_member": { "a": [1], "b" : { "foo": [2, 3] } },
  "features": [
    ...
  ],
}

We should update FeatureIterator to be more robust in how it parses a FeatureCollection.

The text was updated successfully, but these errors were encountered:

206: stream reading features r=frewsxcv a=michaelkirk - [x] I agree to follow the project's [code of conduct](https://github.com/georust/geo/blob/master/CODE_OF_CONDUCT.md). - [x] I added an entry to `CHANGES.md` if knowledge of this change could be valuable to users. --- This greatly reduces the peak memory usage needed to parse a GeoJSON FeatureCollection by utilizing the FeatureIterator's trick of seeking to the `"features": [` array without actually parsing the FeatureCollection. Note there are [some robustness issues](#207) with this approach, some of which can be improved, but overall I think it's worth the trade-off. There is a small (0-1%) CPU regression in our deserialization bench due to this change, but the peak memory profile of the `stream_reader_writer` examples decreased 90% - from 2.22MiB to 237Kib. Before: <img width="1624" alt="Screen Shot 2022-09-02 at 2 42 04 PM" src="https://user-images.githubusercontent.com/217057/188239657-ade76677-f3e2-4750-b71d-bed0effbe215.png"> After: <img width="1624" alt="Screen Shot 2022-09-02 at 2 40 29 PM" src="https://user-images.githubusercontent.com/217057/188239645-5274e0ff-71b2-4da6-8ac9-1f72d288c2bb.png"> Notice that overall memory allocations stayed the same (see #88 (comment)), but the peak usage is much lower since we never hold the entire data structure in memory at once. Co-authored-by: Michael Kirk <michael.code@endoftheworl.de>

This was referenced Sep 2, 2022

stream reading features #206

Merged

Flexible field ordering for FeatureIterator #200

Merged

michaelkirk added the help wanted label Sep 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make FeatureIterator more robust #207

Make FeatureIterator more robust #207

michaelkirk commented Sep 2, 2022

Make FeatureIterator more robust #207

Make FeatureIterator more robust #207

Comments

michaelkirk commented Sep 2, 2022