Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust): Add min periods for correlation (#15458) #16987

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

fkemeth
Copy link

@fkemeth fkemeth commented Jun 16, 2024

Add min_periods support to correlation (pearson, spearman and cov). See #15458.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature rust Related to Rust Polars labels Jun 16, 2024
Copy link

codecov bot commented Jun 16, 2024

Codecov Report

Attention: Patch coverage is 90.69767% with 8 lines in your changes missing coverage. Please review.

Project coverage is 80.73%. Comparing base (d13efa5) to head (8c83322).
Report is 4 commits behind head on main.

Files Patch % Lines
...s/polars-plan/src/dsl/function_expr/correlation.rs 81.25% 6 Missing ⚠️
crates/polars-ops/src/chunked_array/cov.rs 91.30% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16987      +/-   ##
==========================================
- Coverage   80.74%   80.73%   -0.01%     
==========================================
  Files        1475     1475              
  Lines      193389   193427      +38     
  Branches     2760     2760              
==========================================
+ Hits       156146   156164      +18     
- Misses      36733    36753      +20     
  Partials      510      510              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fkemeth fkemeth changed the title feat(rust): min periods for correlation (#15458) feat(rust): Add min periods for correlation (#15458) Jun 16, 2024
@ritchie46 ritchie46 self-assigned this Jun 23, 2024
@ritchie46
Copy link
Member

Thank you @fkemeth, I think this looks pretty good. However, we should not return NaN if below min-periods. We use None on other places where we have a min-periods argument.

@fkemeth
Copy link
Author

fkemeth commented Jul 3, 2024

@ritchie46 thank you for the comment! I updated the PR accordingly. The functions return Null now if min_periods>n.
This entails that, afaik, the return type of cov and online_pearson_corr must be Option<f64> now, instead of f64.

I also updated the unit tests, which now check for

    assert not out1["pearson"][0]
    assert not out1["spearman"][0]

if min_periods is larger than the common overlap between the two vectors.

Can you have a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants