Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use generic types for config objects. #1325

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mgilson
Copy link

@mgilson mgilson commented Aug 30, 2023

This is made to be consistent with typeshed best practices:

avoid invariant collection types (list, dict) in argument positions, in favor of covariant types like Mapping or Sequence

Specifically, we have a linter in house that complains about mutable defaults on class members:

class MySchema(pandera.SchemaModel):
    a: ...
    b: ...

    class Config:
        unique: List[str] = ["a", "b"]  # <- violation

I can't see any reason that unique must be a list rather than a tuple (why would we ever need to mutate this object?). I haven't looked hard enough to determine if I think that there is a "danger" if someone goes and MySchema.Config.unique.append("c") or accidentally mutates this via some oddball path of references at some other place in the code.

This is made to be consistent with typeshed best practices:

> avoid invariant collection types (list, dict) in argument positions, in favor of covariant types like Mapping or Sequence

Signed-off-by: Matt Gilson <mgilson@lat.ai>
@mgilson mgilson force-pushed the generic-input-types branch from bb39c56 to 9e38577 Compare August 30, 2023 15:25
@joaoe
Copy link

joaoe commented Sep 1, 2023

If anyone would write such a thing

MySchema.Config.unique.append("c")

they could as easily write

MySchema.Config.unique += ("c",)

Hardly a usecase we should be concerned about. :)

@mgilson
Copy link
Author

mgilson commented Sep 1, 2023

I think that the bigger question is whether we want to prevent someone from using immutable objects in the Config objects (and whether there is a strong reason that we have to have a list in the interface rather than a tuple or some other type of sequence)

Hardly a usecase we should be concerned about. :)

Yes, I mostly agree. IMHO, the primary reason to make this change is to provide more flexibility for an end user. I certainly agree that if someone wants to do something degenerate here, they can and if they are determined, there's probably no good way to stop them.

Having said that, I still think that (in the general case) using immutable type annotations can help users to avoid accidental bugs in circumstances like these.

def non_pure_function(x: MutableSequence[Any]) -> int:
    random.shuffle(x)
    return 7

n = non_pure_function(MySchema.Config.unique)
...

I sort of doubt that there is a use-case for doing something like this on these Config objects, but it still seems like a good best-practice/habit and -- in our case -- we have a linter that is enforcing this best-practice which is really why I care. I don't want to have to tell my linter to ignore this best practice on every one of these.

1if it doesn't, we should use Collection instead of Sequence ;)

@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Patch coverage: 50.00% and no project coverage change.

Comparison is base (850dcf8) 76.48% compared to head (9e38577) 76.48%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1325   +/-   ##
=======================================
  Coverage   76.48%   76.48%           
=======================================
  Files          91       91           
  Lines        6676     6676           
=======================================
  Hits         5106     5106           
  Misses       1570     1570           
Files Changed Coverage Δ
pandera/api/pyspark/container.py 0.00% <0.00%> (ø)
pandera/api/pyspark/model_config.py 0.00% <0.00%> (ø)
pandera/api/pandas/container.py 97.77% <100.00%> (ø)
pandera/api/pandas/model_config.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cosmicBboy
Copy link
Collaborator

thanks for the contribution @mgilson, please see the contributing guide to see how to run pre-commit and local tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants