Add/st dataframe pagenation #77

singjc · 2024-08-20T17:16:33Z

User description

This add a method for rendering a streamlit dataframe with pagenation to allow viewing small chunks of the dataframe at a time, which also solves #76

PR Type

enhancement

Description

Added a new method display_large_dataframe in src/common.py to render large DataFrames with pagination and row selection.
Implemented a helper function to calculate DataFrame memory usage in megabytes.
Replaced the existing st.dataframe call in src/view.py with the new display_large_dataframe method to enhance the view_spectrum functionality.

Changes walkthrough 📝

Relevant files

Enhancement

common.py `Add paginated DataFrame display and memory usage calculation` src/common.py Added a function `display_large_dataframe` for paginated DataFrame display. Implemented memory usage calculation for DataFrames. Introduced pagination controls and row selection for DataFrames.	+78/-0
view.py `Integrate paginated DataFrame display in view_spectrum` src/view.py Replaced `st.dataframe` call with `display_large_dataframe`. Integrated new paginated DataFrame display in `view_spectrum`.	+2/-16

💡 PR-Agent usage:
Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

codiumai-pr-agent-pro · 2024-08-20T17:16:58Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Key issues to review Performance Concern The `display_large_dataframe` function recalculates the current chunk on every rerun, which may be inefficient for large DataFrames. Code Smell The `get_dataframe_mem_useage` function name has a typo. It should be `get_dataframe_mem_usage`. Potential Bug The `rows` variable is used without checking if `event` is None, which could lead to an error if no row is selected.

codiumai-pr-agent-pro · 2024-08-20T17:17:24Z

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Code Suggestions ✨

Category	Suggestion	Score
Best practice	Avoid using mutable default arguments in function definitions The `chunk_sizes` parameter in the function signature uses a mutable default argument (an empty list), which can lead to unexpected behavior. It's better to use `None` as the default and then set the list inside the function. src/common.py [254-255] -def display_large_dataframe(df, - chunk_sizes: list[int] = [100, 1_000, 10_000]): +def display_large_dataframe(df, chunk_sizes: list[int] \| None = None): + if chunk_sizes is None: + chunk_sizes = [100, 1_000, 10_000] Apply this suggestion Suggestion importance[1-10]: 9 Why: The suggestion correctly identifies a best practice issue by recommending the use of `None` instead of a mutable default argument, which can prevent unexpected behavior.	9
Performance	Implement more efficient pagination for large datasets to reduce memory usage Consider using a more efficient method for pagination. Instead of loading the entire DataFrame into memory and then slicing it, you could use SQL-like operations to fetch only the required chunk from the data source. This would significantly reduce memory usage for very large datasets. src/common.py [277-280] -def get_current_chunk(df, chunk_size, chunk_index): +def get_current_chunk(data_source, chunk_size, chunk_index): start = chunk_index * chunk_size - end = min(start + chunk_size, len(df)) # Ensure end does not exceed dataframe length - return df.iloc[start:end], start, end + end = start + chunk_size + return data_source.fetch_rows(start, end), start, min(end, data_source.total_rows) Apply this suggestion Suggestion importance[1-10]: 8 Why: The suggestion addresses a performance issue by proposing a more efficient method for pagination, which is crucial for handling large datasets and reducing memory usage.	8
Possible issue	Add a check for None before accessing properties of a returned object The `event` variable is used before it's defined. This could lead to a NameError if `display_large_dataframe` doesn't return an event object. Consider adding a check to ensure `event` is not None before accessing its properties. src/view.py [225-226] event = display_large_dataframe(df) -rows = event.selection.rows +rows = event.selection.rows if event is not None else [] Apply this suggestion Suggestion importance[1-10]: 8 Why: The suggestion addresses a potential bug by ensuring that the `event` variable is checked for `None` before accessing its properties, preventing possible runtime errors.	8
Maintainability	Correct function name spelling and use a more descriptive name The function `get_dataframe_mem_useage` has a typo in its name. It should be `get_dataframe_mem_usage`. Also, consider using a more descriptive name like `calculate_dataframe_memory_usage_mb`. src/common.py [418] -def get_dataframe_mem_useage(df): +def calculate_dataframe_memory_usage_mb(df): Apply this suggestion Suggestion importance[1-10]: 7 Why: The suggestion improves code maintainability by correcting a typo and providing a more descriptive function name, enhancing code readability.	7

t0mdavid-m · 2024-08-21T14:45:46Z

Thanks a lot for the addition. I will review this PR early next week as I am OOO for the rest of this week.

t0mdavid-m

Thanks for the addition! I just have some minor comments. After those are addressed the PR would be good to merge from my side.

src/common.py

t0mdavid-m · 2024-08-29T10:47:43Z

src/common.py

+
+    event = st.dataframe(
+        current_chunk_df,
+        column_order=[


This should be more flexible as the data displayed may vary. I think adding column_order as a function parameter would work best. Is there a reason why selection_mode, on_select, use_container_width, and hide_index have to be set to these specific values? If not I would suggest adding those as parameters as well.

Anything specific to a particular workflow/dataset should not be located in common.py as it is intended to contain functions of general use. As a sidenote, common modules should be moved in a separate directory to make this more clear to developers.

That's a good point. These are specifically set because I came across this issue when using the pyopenms workflow for selecting spectra to plot when selecting a row, but this should be more flexible for other use cases.

We could pass kwargs to the display_large_dataframe method for st.dataframe because that's probably the only main method that would need to be changed based on particular needs. I opted for this option

t0mdavid-m · 2024-08-29T10:54:27Z

Merging with main should fix the circular import issue with the linter.

t0mdavid-m · 2024-08-30T12:49:45Z

Thanks for making the changes. I like the solution with kwargs. From my side this PR would be ready to merge.

singjc added 3 commits August 20, 2024 13:09

add: dataframe pagenation render

0a8bc3b

remove: old st.dataframe call

f3dad89

remove: edits from tk_dialog PR

ba99afc

codiumai-pr-agent-pro bot added the enhancement New feature or request label Aug 20, 2024

singjc requested a review from axelwalter August 20, 2024 17:16

codiumai-pr-agent-pro bot added the Review effort [1-5]: 3 label Aug 20, 2024

singjc requested a review from t0mdavid-m August 20, 2024 17:17

singjc linked an issue Aug 20, 2024 that may be closed by this pull request

Large Data Exceeds View Size Limit #76

Closed

t0mdavid-m requested changes Aug 29, 2024

View reviewed changes

singjc and others added 3 commits August 29, 2024 16:06

add: on_change callback to reset indexing page

fcc2be2

update: display_large_dataframe for flexible st.dataframe params

d73ce80

Merge branch 'main' into add/st_dataframe_pagenation

085ba0f

t0mdavid-m merged commit 261502a into OpenMS:main Aug 30, 2024
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add/st dataframe pagenation #77

Add/st dataframe pagenation #77

singjc commented Aug 20, 2024 •

edited by codiumai-pr-agent-pro bot

Loading

codiumai-pr-agent-pro bot commented Aug 20, 2024

codiumai-pr-agent-pro bot commented Aug 20, 2024 •

edited

Loading

t0mdavid-m commented Aug 21, 2024

t0mdavid-m left a comment

t0mdavid-m Aug 29, 2024

singjc Aug 29, 2024

t0mdavid-m commented Aug 29, 2024

t0mdavid-m commented Aug 30, 2024

Add/st dataframe pagenation #77

Add/st dataframe pagenation #77

Conversation

singjc commented Aug 20, 2024 • edited by codiumai-pr-agent-pro bot Loading

User description

PR Type

Description

Changes walkthrough 📝

codiumai-pr-agent-pro bot commented Aug 20, 2024

PR Reviewer Guide 🔍

codiumai-pr-agent-pro bot commented Aug 20, 2024 • edited Loading

PR Code Suggestions ✨

t0mdavid-m commented Aug 21, 2024

t0mdavid-m left a comment

Choose a reason for hiding this comment

t0mdavid-m Aug 29, 2024

Choose a reason for hiding this comment

singjc Aug 29, 2024

Choose a reason for hiding this comment

t0mdavid-m commented Aug 29, 2024

t0mdavid-m commented Aug 30, 2024

singjc commented Aug 20, 2024 •

edited by codiumai-pr-agent-pro bot

Loading

codiumai-pr-agent-pro bot commented Aug 20, 2024 •

edited

Loading