-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(geoarrow-types): Add Arrow-implementation-agnostic minimal geoarrow type implementation #48
Conversation
#: Unknown or uninitialized encoding | ||
UNKNOWN = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: depending on how your docs site works, I've grown used to putting these comments in docstrings below the assignment. This means that they'll show up in the docs site and in an IDE: https://geoarrow.org/geoarrow-rs/python/latest/api/core/enums/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works with Sphinx but docstrings sound way better!
#: Unknown or uninitialized edge type | ||
UNKNOWN = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the edges
keyword is optional in the metadata right? Is it better to have an UNKNOWN
enum variant or to leave out the EdgeType
, as in Optional[EdgeType]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
from typing import Union, Mapping | ||
|
||
|
||
class Crs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd tend to suggest a Protocol here, because the pyproj.CRS
doesn't need to subclass from this Crs
object.
def with_encoding(self, encoding: Encoding) -> "SerializedType": | ||
raise NotImplementedError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't return Self
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to look up when Self
shows up but I think it is a more recent version of Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's >=3.11 (at least in the standard lib...maybe available elsewhere?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use typing_extensions
for pre 3.11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But regardless of whether you use Self
, my point here was that this type hint on the GeoArrowType
class does not return a GeoArrowType
, but rather a SerializedType
. Given that GEOARROW
is a variant of the Encoding
enum, it seems weird that GeoArrowType.with_encoding(Encoding.GEOARROW)
would return a SerializedType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great point!
#: Unknown or uninitialized coordinate type | ||
UNKNOWN = 0 | ||
#: Coordinate type composed of separate arrays for each dimension (i.e., a struct) | ||
SEPARATE = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SEPARATE = 1 | |
SEPARATED = 1 |
To be consistent with interleaved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a part that would belong in geoarrow.pyarrow
? (which can then depend on geoarrow.types
for all enums)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see arguments for both and I am happy to put it in either place. In geoarrow.pyarrow
there's an opportunity for integrating the Array
and Scalar
subclasses that require some compute capabilities; however, I haven't found that those integrations add much value (and in theory we could inject them if it turns out that they do). Having them here gives them an opportunity to get a clean and polished version of them sooner (since the scope is smaller).
Encoding.LARGE_WKB: pa.large_binary(), | ||
} | ||
|
||
_NATIVE_STORAGE_TYPES = _generate_storage_types() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally expose those storage type definitions a bit more publicly. Although I assume one can of course do something like PointType().storage_type
?
For example in geopandas, the main thing I currently need are those storage type definitions (I don't know if it would be worth it for geopandas to depend on this package if it is just for that, though, but it's probably a good example use case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely! (I ran out of time yesterday evening to finish this thought 🙂 )
WKB = 1 | ||
#: Well-known binary encoding | ||
LARGE_WKB = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider that as different encodings? Like the "geoarrow" encoding can also be backed by multiple physical types, I would consider just a "WKB" as the encoding, and binary or large_binary two possible physical storage types.
Of course, it probably depends on how someone would use this whether the one or the other makes more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point...we need some way to parameterize WKB with 64-bit offsets (but probably more consistent with the geoarrow "encoding" to punt the parameters to another enum or field).
Co-authored-by: Kyle Barron <kylebarron2@gmail.com>
56ed162
to
c815d40
Compare
No description provided.