Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Docstring: Update with note on using dictionary outputs #107

Merged
merged 3 commits into from
Jun 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 9 additions & 14 deletions src/nested_pandas/nestedframe/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,8 @@ def reduce(self, func, *args, **kwargs) -> NestedFrame: # type: ignore[override
----------
func : callable
Function to apply to each nested dataframe. The first arguments to `func` should be which
columns to apply the function to.
columns to apply the function to. See the Notes for recommendations
on writing func outputs.
args : positional arguments
Positional arguments to pass to the function, the first *args should be the names of the
columns to apply the function to.
Expand All @@ -376,22 +377,16 @@ def reduce(self, func, *args, **kwargs) -> NestedFrame: # type: ignore[override

Notes
-----
The recommend return value of func should be a `pd.Series` where the indices are the names of the
output columns in the dataframe returned by `reduce`. Note however that in cases where func
returns a single value there may be a performance benefit to returning the scalar value
rather than a `pd.Series`.
By default, `reduce` will produce a `NestedFrame` with enumerated
column names for each returned value of the function. For more useful
naming, it's recommended to have `func` return a dictionary where each
key is an output column of the dataframe returned by `reduce`.

Example User Function:
```
import pandas as pd

def my_sum(col1, col2):
return pd.Series(
[sum(col1), sum(col2)],
index=["sum_col1", "sum_col2"],
)

```
>>> def my_sum(col1, col2):
>>> '''reduce will return a NestedFrame with two columns'''
>>> return {"sum_col1": sum(col1), "sum_col2": sum(col2)}

"""
# Parse through the initial args to determine the columns to apply the function to
Expand Down