Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(geoarrow-types): pyarrow geometry extension type implementation #51

Merged
merged 14 commits into from
Jun 19, 2024

Conversation

paleolimbot
Copy link
Contributor

@paleolimbot paleolimbot commented Jun 13, 2024

import geoarrow.types as gt

extension_type = gt.wkb(crs=gt.OGC_CRS84).to_pyarrow()
extension_type.storage_type
#> DataType(binary)

extension_type.crs
#> ProjJsonCrs(OGC:CRS84)

extension_type.to_pandas_dtype()
#> extension<geoarrow.wkb<WkbType>>[pyarrow]

extension_type = gt.point(dimensions="xyz", coord_type="interleaved").to_pyarrow()
extension_type.storage_type
#> FixedSizeListType(fixed_size_list<xyz: double not null>[3])

@paleolimbot paleolimbot marked this pull request as ready for review June 13, 2024 19:47
Comment on lines +165 to +169
UNSPECIFIED = -1
"""Unspecified dimensions"""

UNKNOWN = 0
"""Unknown or mixed dimensions"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% clear on the difference between unspecified and unknown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is that it's like GEOMETRY (i.e., potentially mixed but we don't know) but for dimensions. I added it here because I realized that gt.wkb().to_pyarrow().dimensions was returning UNSPECIFIED, which is not really an accurate description of what's happening.

It could be argued that UNSPECIFIED should just be None, and it could be implemented that way (but makes the typing slightly more awkward). In the case of crs, None is a perfectly acceptable value, so there at least needs to be an UNSPECIFIED sentinel for that one (such that coalesce(None, crs.UNSPECIFIED -> None).

The intention is that users don't run into UNSPECIFIED all that often (they just generate it by implicitly not adding a type constraint).

import geoarrow.types as gt

gt.TypeSpec.coalesce(gt.GeometryType.UNSPECIFIED, gt.GeometryType.POINT)
#> TypeSpec(GeometryType.POINT)
gt.TypeSpec.coalesce(gt.GeometryType.GEOMETRY, gt.GeometryType.POINT)
#> TypeSpec(GeometryType.GEOMETRY)

gt.TypeSpec.coalesce(gt.Dimensions.UNSPECIFIED, gt.Dimensions.XYZ)
#> TypeSpec(Dimensions.XYZ)
gt.TypeSpec.coalesce(gt.Dimensions.UNKNOWN, gt.Dimensions.XYZ)
#> TypeSpec(Dimensions.UNKNOWN)

gt.TypeSpec.common(gt.Dimensions.XYM, gt.Dimensions.XYZ)
#> TypeSpec(Dimensions.XYZM)

gt.TypeSpec.common(gt.Dimensions.UNSPECIFIED, gt.Dimensions.XYZ)
#> TypeSpec(Dimensions.XYZ)

gt.TypeSpec.common(gt.Dimensions.UNKNOWN, gt.Dimensions.XYZ)
#> TypeSpec(Dimensions.UNKNOWN)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, I do like the ideas of coalesce and common for types. I had so much pain working with type casting in geoarrow-rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stole the idea from the R vctrs package! https://vctrs.r-lib.org/reference/internal-faq-ptype2-identity.html

@paleolimbot paleolimbot merged commit 08a62e6 into geoarrow:main Jun 19, 2024
8 checks passed
@paleolimbot paleolimbot deleted the types-pyarrow branch June 19, 2024 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants