Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update: upload_widget #75

Merged
merged 30 commits into from
Aug 28, 2024
Merged

update: upload_widget #75

merged 30 commits into from
Aug 28, 2024

Conversation

singjc
Copy link
Contributor

@singjc singjc commented Aug 13, 2024

User description

I have been playing around with the streamlit-template to create a web-app for Sage. There were a few things I wanted to change with the file upload widget, more so specifically when running a web-app with local. If running a web-app with local, when uploading files, I personally prefer creating symbolic links to the original files (if the purpose is to just read data), to avoid making multiple copies. This I think is probably useful if there are a lot of files or large files that you want to process. I added an option to the file upload widget to create symlinks if requested (default) instead, otherwise make copies when running an app locally.

  • Allow for file_types to be a list of str if user wants to upload more than one file type with the same upload widget
  • Allow for uploading a directory, i.e. uploading a bruker .d file
  • Allow for local symlink creation instead of copying/duplicating files

Let me know what you think.


PR Type

Enhancement, Bug fix


Description

  • Enhanced the upload widget to support multiple file types by changing file_type to file_types.
  • Added an option to create symlinks instead of copying files when running the app locally.
  • Improved the handling of file uploads and local directory copying.
  • Fixed formatting issues and removed debug print statements.

Changes walkthrough 📝

Relevant files
Enhancement
Workflow.py
Support multiple file types in upload widget                         

src/Workflow.py

  • Changed file_type to file_types to support multiple file types.
  • +1/-1     
    StreamlitUI.py
    Enhance upload widget with multiple file types and symlink option

    src/workflow/StreamlitUI.py

  • Added support for multiple file types in the upload widget.
  • Added option to create symlinks instead of copying files.
  • Improved handling of file uploads and local directory copying.
  • Fixed formatting and removed debug print statements.
  • +52/-28 

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    - Allow for file_types to be a list of str if user wants to upload more than one file type with the same upload widget
    - Allow for uploading a directory, i.e. uploading a bruker .d file
    - Allow for local symlink creation instead of copying/duplicating files
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Error Handling
    The implementation of symlink creation does not handle potential exceptions that might occur during symlink creation, such as permissions issues or filesystem limitations. It's recommended to add error handling around the os.symlink call to ensure robustness.

    File Type Validation
    The current implementation assumes that the file type is always provided in a format that includes the extension dot (e.g., '.txt'). This might not always be the case, and the code should handle scenarios where the file type is provided without a dot.

    Copy link

    codiumai-pr-agent-pro bot commented Aug 13, 2024

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Best practice
    Improve the safety of directory removal operations

    Replace the direct use of shutil.rmtree with a safer approach that checks if the
    directory exists before attempting to remove it. This prevents potential exceptions
    if the directory does not exist.

    src/workflow/StreamlitUI.py [61]

    -shutil.rmtree(files_dir)
    +if files_dir.exists():
    +    shutil.rmtree(files_dir)
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: This suggestion adds a check to ensure the directory exists before attempting to remove it, which prevents potential exceptions and improves the robustness of the code.

    9
    Add error handling for symlink creation

    Add error handling for the os.symlink operation to catch any potential exceptions
    that may occur if the symlink creation fails, such as permissions issues or existing
    symlinks.

    src/workflow/StreamlitUI.py [135]

    -os.symlink(f, symlink_target)
    +try:
    +    os.symlink(f, symlink_target)
    +except OSError as e:
    +    st.error(f"Failed to create symlink: {e}")
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Adding error handling for the os.symlink operation improves the robustness of the code by catching potential exceptions, such as permissions issues or existing symlinks.

    8
    Enhance path handling robustness

    Replace the string concatenation in the Path constructor with a more robust path
    joining using Path.joinpath to ensure correct path handling across different
    operating systems.

    src/workflow/StreamlitUI.py [110]

    -local_dir = Path(local_dir).expanduser()
    +local_dir = Path().expanduser().joinpath(local_dir)
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Using Path.joinpath ensures correct path handling across different operating systems, which is a good practice, though the current implementation is not necessarily incorrect.

    7
    Enhancement
    Make file type checking case-insensitive

    Modify the condition to check for file types in the uploaded files to ensure that
    the file type checking is case-insensitive, which is more user-friendly and robust.

    src/workflow/StreamlitUI.py [93]

    -if f.name.endswith(ft) for ft in file_types
    +if f.name.lower().endswith(ft.lower()) for ft in file_types
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Making file type checking case-insensitive is a user-friendly enhancement that ensures the application handles file types more robustly, regardless of case variations.

    8

    @timosachsenberg
    Copy link
    Contributor

    Great idea. An optional symlink feature was in my internal wishlist ;)
    Currently on vacation and back in about two weeks. If you want a timely feedback please tag some of the web app guys (though a lot are OOO)

    Copy link
    Member

    @t0mdavid-m t0mdavid-m left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Thank for contributing.

    I am also strongly in favour of allowing the use of Symlinks for local deployments as this potentially significantly reduces the amount of storage needed.

    However, there are also advantages to separating the WebApp file system from the "real" file system. Most notably, this ensures that files are not altered, renamed, or moved during their life cycle within the WebApp (which can be quite long depending on the WebApp). This definitely does not cancel out the advantage of more efficient use of storage but should be accounted for by developers and thus results in higher complexity. Hence, I would argue to not enable it by default.

    Unfortunately, the symlink approach does not work on unaltered windows. Maybe we could instead store the path to the file? This should work on all platforms.

    singjc and others added 2 commits August 13, 2024 13:22
    Apply suggested change, change symlink default to False.
    
    Co-authored-by: Tom David Müller <57191390+t0mdavid-m@users.noreply.github.com>
    Co-authored-by: Tom David Müller <57191390+t0mdavid-m@users.noreply.github.com>
    @singjc
    Copy link
    Contributor Author

    singjc commented Aug 13, 2024

    Thank for contributing.

    I am also strongly in favour of allowing the use of Symlinks for local deployments as this potentially significantly reduces the amount of storage needed.

    However, there are also advantages to separating the WebApp file system from the "real" file system. Most notably, this ensures that files are not altered, renamed, or moved during their life cycle within the WebApp (which can be quite long depending on the WebApp). This definitely does not cancel out the advantage of more efficient use of storage but should be accounted for by developers and thus results in higher complexity. Hence, I would argue to not enable it by default.

    Unfortunately, the symlink approach does not work on unaltered windows. Maybe we could instead store the path to the file? This should work on all platforms.

    It's a good point for keeping WebApp's separate from the real working directory to avoid unwanted actions being performed on the actual file system. Storing the file path would probably be the best approach for all OS's.

    Should we add another advanced flag, or use the symlink flag to determine whether the real local file system files should be used?

    Use os.link to create a link to a file in windows or use mklink to link a directory. Not sure if this actually works yet, need to check on a win machine
    src/workflow/StreamlitUI.py Outdated Show resolved Hide resolved
    - fix: if not using symlink and copying directory, use copytree and allow for existing directory
    src/workflow/StreamlitUI.py Outdated Show resolved Hide resolved
    src/workflow/StreamlitUI.py Outdated Show resolved Hide resolved
    @t0mdavid-m
    Copy link
    Member

    It's a good point for keeping WebApp's separate from the real working directory to avoid unwanted actions being performed on the actual file system. Storing the file path would probably be the best approach for all OS's.

    Should we add another advanced flag, or use the symlink flag to determine whether the real local file system files should be used?

    I think the symlink flag should be sufficient. Developers that use that flag will have to use caution and check if the file still exists. We could also check for that in FileManager.get_files() and remove non-existing files from the file list. Also StreamlitUI.select_input_file() should be adjusted accordingly.

    @t0mdavid-m
    Copy link
    Member

    t0mdavid-m commented Aug 14, 2024

    Thanks for the productive discussion. You are right, I totally missed that in my original review:

    • The solution should work for both single file uploads and folder uploads
    • The current implementation for selecting a folder for upload is not user friendly as the user has to specify a file path

    Using tkinter for local deployments thus sounds like a great solution!

    - added a tkinter based directory selector
    - removed symlink/hardlink option
    src/workflow/StreamlitUI.py Outdated Show resolved Hide resolved
    @singjc singjc requested a review from t0mdavid-m August 20, 2024 03:33
    Copy link
    Member

    @t0mdavid-m t0mdavid-m left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Thanks a lot for making the changes! I think this PR is almost good to go. I just noticed a few minor details when checking it out:

    1. The "Symlink Checkbox" should also be available for single file uploads. We could just replace the upload widget with a big button prompting the TKinter file selection if the "Symlink Checkbox" is unchecked.
    2. The Current MS data files list in the upload section should include "symlinked" files
    3. The "Copy MS data files from folder button" should be changed to "Add MS data files from folder button" or similar
    4. Only files that still exist should be made available as input (see code suggestion)

    @singjc
    Copy link
    Contributor Author

    singjc commented Aug 22, 2024

    Thanks a lot for making the changes! I think this PR is almost good to go. I just noticed a few minor details when checking it out:

    1. The "Symlink Checkbox" should also be available for single file uploads. We could just replace the upload widget with a big button prompting the TKinter file selection if the "Symlink Checkbox" is unchecked.
    

    Done. I added a big button that calls a tkinter file selection (allows multiple file selection) if the "Symlink Checkbox" is unchecked.

    image

    2. The Current MS data files list in the upload section should include "symlinked" files
    

    Done. Not sure which way is the best to display, but for now I made it to display as (local) file.mzML

    i.e.

    image

    the alternative would be to just display the whole full filepath name, though it may look messier?

    i.e.

    image

    or we could just display the filename and not indicate whether it's local or not 🤷‍♂️

    3. The "_Copy_ MS data files from folder button" should be changed to "_Add_ MS data files from folder button"  or similar
    

    Done.

    4. Only files that still exist should be made available as input (see code suggestion)
    

    Done.

    @t0mdavid-m
    Copy link
    Member

    Thanks for making the changes!

    I like the version were files are displayed with the (local) tag.

    I changed one info-box which mistakenly told the user that their files were copied and made some minor adjustments to satisfy the linter.

    From my side the PR would be ready to merge!

    @t0mdavid-m
    Copy link
    Member

    The failing CI pipeline will be addressed in a separate PR and is unrelated to the changes in this PR.

    @t0mdavid-m t0mdavid-m merged commit b40f0c3 into OpenMS:main Aug 28, 2024
    5 of 6 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    3 participants