Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress bar for repack and pack_all_loose #171

Merged
merged 5 commits into from
Sep 19, 2024
Merged

Conversation

khsrali
Copy link
Contributor

@khsrali khsrali commented May 21, 2024

Suggested by @giovannipizzi:
Currently, verdi storage maintain in aiida-core might take a long time to proceed, and many users might not be patient enough, and they may cancel the progress just very close to the end. It's nice to provide an estimated time so they will have an idea of how long this would take.
To implement it, I followed these steps:

  1. verdi storage maintain calls here in the disk object store backend
  2. which would possibly call either of these three methods: a) pack_all_loose b) repack c) clean_storage
  3. I made a small container (300MB, ~600 files) and ran cProfile to look for the time-consuming parts and inserted "update" in the relevant positions:
    • pack_all_loose: the iterations over loose_objects
    • repack: mostly the iteration, but also safe_flush_to_disk which calls os.fsync, there is no easy way to predict time consumed on that, so I just ignore it, which means the maintenance will still go on a bit even after the bar reaches 100%.
    • clean_storage: was very quick for the mentioned size of container—just a few milliseconds—so I trusted the scalability and skipped adding a bar for this method.

Once this is approved, I can make another PR on aiida-core to pass a progress bar to these.

@khsrali
Copy link
Contributor Author

khsrali commented Jul 19, 2024

Note: test fails because setup-python@v1 is deprecated. A solution is provided in PR #172

@khsrali khsrali requested a review from unkcpz July 19, 2024 06:36
@khsrali
Copy link
Contributor Author

khsrali commented Jul 25, 2024

@unkcpz if you have time, this PR could benefit your review :)
Let me know in case you don't plan to test it

@unkcpz
Copy link
Member

unkcpz commented Jul 25, 2024

Thanks @khsrali, any reason why not call tqdm and show progress bar how many loose files are packed?

@khsrali
Copy link
Contributor Author

khsrali commented Jul 30, 2024

Thanks @khsrali, any reason why not call tqdm and show progress bar how many loose files are packed?

Thanks @unkcpz , actually I initially developed this with tqdm then I realized the callback option was designed to accept any sort of progress bar as input.
For example, AiiDA internally uses tqdm with a custom format & heading, which it can pass to disk-objectstore in various places where they are being updated.

So this PR respects the original design.

@unkcpz
Copy link
Member

unkcpz commented Aug 8, 2024

Callback pattern is fine. How I test the change? Can you open a subsequent PR in aiida-core that use this changes so we can test it together to run maintain command?

@khsrali
Copy link
Contributor Author

khsrali commented Aug 9, 2024

I tested with instantiating disk-objectstore alone, not through aiida.
But, sure, I'll open a PR on aiida-core since we have to do that anyways.

@khsrali
Copy link
Contributor Author

khsrali commented Sep 3, 2024

alright @unkcpz, the test PR added in aiida-core please go ahead and test it.

Copy link
Contributor

@agoscinski agoscinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have at least one test in test in test_container.py that passes a callback function to see if the code runs. Does not need to any logical check, just some dummy callback that does not crash. It is already tested in aiida-core, but it is hard to understand this after the PR is merged

@@ -1319,6 +1320,7 @@ def pack_all_loose( # pylint: disable=too-many-locals,too-many-branches,too-man
compress: Union[bool, CompressMode] = CompressMode.NO,
validate_objects: bool = True,
do_fsync: bool = True,
callback: Optional[Callable] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
callback: Optional[Callable] = None,
callback: Optional[Callable[[str, Any], None]] = None,

Please add some doc that describes how the callback looks like or at least refers to aiida-core aiida.common.progress_reporter.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why, when I put your line mypy send useless errors:

disk_objectstore/container.py:1408: error: Unexpected keyword argument "action"  [call-arg]
disk_objectstore/container.py:1408: error: Unexpected keyword argument "value"  [call-arg]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marking the arguments with mypy_extensions should work. I only load it during TYPE_CHECKING as mypy is only a dev depedency

from collections.abc import Callable
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
    from mypy_extensions import Arg

def foo(callback: Callable[[Arg(str, "action"), Arg(Any, "value")], None]):
    callback(action="hello", value=None)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
callback: Optional[Callable] = None,
callback: Optional[Callable[Arg(str, "action"), Arg(Any, "value")], None]] = None,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the remaining import somewhere on the top

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agoscinski , it was really easy to make it work, took only 1 second :)

},
)
# I wish this would show as MB, GB. In tqdm it's easy:
# just pass unit='B' and unit_scale=1024. But would callback accept **keywords?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you pass these arguments to your other PR in aiida-core in the function set_progress_bar_tqdm(..., unit=..., unit_scale=...)? You even make 'B', 'MB', 'GB' a user arg. You could also determine a reasonable value from the total size, but then you need to pass set_progress_bar_tqdm as local function in the callback as the total size is only known through the callback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be even easier, since it is a partial function so one should be able to pass it to progress_reporter. You might need a check if the the progress_reporter is actually tqdm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agoscinski , this will be a bit tricky, because in that case we have to change the total value of the bar from items to byte.
And some of the thing has to be done in aiida-core, I feel like this makes disk-objectstore less and less transferable.

Copy link
Contributor

@agoscinski agoscinski Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done fully in aiida-core in the create_callback function, but I can give it a go after this PR, if the progress bar is really not so readable. Maybe you are right and I just don't see the complexity for now.

Copy link
Contributor Author

@khsrali khsrali Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good suggestion @agoscinski . However that requires some change in aiida-core I opened an issue for future references : aiidateam/aiida-core#6564

I think for now, we can accept the way it is.

disk_objectstore/container.py Outdated Show resolved Hide resolved
@khsrali
Copy link
Contributor Author

khsrali commented Sep 17, 2024

Thanks for your review @agoscinski , tests are added.

@khsrali khsrali merged commit 737f9c7 into aiidateam:main Sep 19, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants