Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(geoarrow-types): Add Arrow-implementation-agnostic minimal geoarrow type implementation #48

Closed
wants to merge 15 commits into from

Conversation

paleolimbot
Copy link
Contributor

No description provided.

Comment on lines 15 to 16
#: Unknown or uninitialized encoding
UNKNOWN = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: depending on how your docs site works, I've grown used to putting these comments in docstrings below the assignment. This means that they'll show up in the docs site and in an IDE: https://geoarrow.org/geoarrow-rs/python/latest/api/core/enums/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works with Sphinx but docstrings sound way better!

geoarrow-types/src/geoarrow/types/constants.py Outdated Show resolved Hide resolved
Comment on lines 113 to 114
#: Unknown or uninitialized edge type
UNKNOWN = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the edges keyword is optional in the metadata right? Is it better to have an UNKNOWN enum variant or to leave out the EdgeType, as in Optional[EdgeType]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

from typing import Union, Mapping


class Crs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd tend to suggest a Protocol here, because the pyproj.CRS doesn't need to subclass from this Crs object.

Comment on lines 68 to 69
def with_encoding(self, encoding: Encoding) -> "SerializedType":
raise NotImplementedError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't return Self?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to look up when Self shows up but I think it is a more recent version of Python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's >=3.11 (at least in the standard lib...maybe available elsewhere?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use typing_extensions for pre 3.11

Copy link
Member

@kylebarron kylebarron May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But regardless of whether you use Self, my point here was that this type hint on the GeoArrowType class does not return a GeoArrowType, but rather a SerializedType. Given that GEOARROW is a variant of the Encoding enum, it seems weird that GeoArrowType.with_encoding(Encoding.GEOARROW) would return a SerializedType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great point!

#: Unknown or uninitialized coordinate type
UNKNOWN = 0
#: Coordinate type composed of separate arrays for each dimension (i.e., a struct)
SEPARATE = 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SEPARATE = 1
SEPARATED = 1

To be consistent with interleaved?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a part that would belong in geoarrow.pyarrow? (which can then depend on geoarrow.types for all enums)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see arguments for both and I am happy to put it in either place. In geoarrow.pyarrow there's an opportunity for integrating the Array and Scalar subclasses that require some compute capabilities; however, I haven't found that those integrations add much value (and in theory we could inject them if it turns out that they do). Having them here gives them an opportunity to get a clean and polished version of them sooner (since the scope is smaller).

Encoding.LARGE_WKB: pa.large_binary(),
}

_NATIVE_STORAGE_TYPES = _generate_storage_types()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally expose those storage type definitions a bit more publicly. Although I assume one can of course do something like PointType().storage_type ?

For example in geopandas, the main thing I currently need are those storage type definitions (I don't know if it would be worth it for geopandas to depend on this package if it is just for that, though, but it's probably a good example use case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! (I ran out of time yesterday evening to finish this thought 🙂 )

WKB = 1
#: Well-known binary encoding
LARGE_WKB = 2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider that as different encodings? Like the "geoarrow" encoding can also be backed by multiple physical types, I would consider just a "WKB" as the encoding, and binary or large_binary two possible physical storage types.

Of course, it probably depends on how someone would use this whether the one or the other makes more sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point...we need some way to parameterize WKB with 64-bit offsets (but probably more consistent with the geoarrow "encoding" to punt the parameters to another enum or field).

@paleolimbot
Copy link
Contributor Author

Closing since these changes were implemented in #50 and #51!

@paleolimbot paleolimbot deleted the pyarrow-type-rewrite branch June 24, 2024 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants