Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ArtifactManager and restart TaskManager #529

Merged
merged 4 commits into from
Apr 18, 2023

Conversation

gabegma
Copy link
Contributor

@gabegma gabegma commented Apr 3, 2023

Fixes #521

Description:

  • ArtifactManager is now a real Singleton (1 per process). 3 are created -> 1 per worker and 1 in the main thread.
    • By extension, fewer DMs are created. Different tasks will access the same DM.
  • The client is now restarted after the startup tasks. This empties the memory, creating new ArtifactManagers. This avoids that the pipeline and the sentence encoder are kept in memory after the startup.

Checklist:

You should check all boxes before the PR is ready. If a box does not apply, check it to acknowledge it.

  • ISSUE NUMBER. You linked the issue number (Ex: Resolve #XXX).
  • PRE-COMMIT. You ran pre-commit on all commits, or else, you
    ran pre-commit run --all-files at the end.
  • USER CHANGES. The changes are added to CHANGELOG.md and the documentation, if they impact
    our users.
  • DEV CHANGES.
    • Update the documentation if this PR changes how to develop/launch on the app.
    • Update the README files and our wiki for any big design decisions, if relevant.
    • Add unit tests, docstrings, typing and comments for complex sections.

@gabegma gabegma changed the title Refactor artifact manager and restart TaskManager Refactor ArtifactManager and restart TaskManager Apr 3, 2023
@gabegma gabegma force-pushed the ggm/refactor-artifact-manager branch from 6fd26a1 to 0fc4e4a Compare April 3, 2023 19:58
@gabegma gabegma changed the base branch from main to ggm/refactor-dm April 3, 2023 20:23
@gabegma gabegma mentioned this pull request Apr 3, 2023
4 tasks
@@ -297,6 +296,32 @@ def create_app() -> FastAPI:
return app


def load_dataset_split_managers_from_config(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to move this function because of circular import issues.
One big change that I made here is to use an ArtifactManager in the main thread. I was afraid that multiple dms were created otherwise.

dm: DatasetSplitManager,
results: List[List[PerturbedUtteranceResult]],
pipeline_index: int,
config: AzimuthConfig,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was causing an error in a test. We shouldn't rely on the config of the dataset split manager, because it only depends on the project_hash.

@@ -63,29 +63,6 @@ def get_module_data(simple_text_config):
)


def test_clearing_cache(tiny_text_config):
Copy link
Contributor Author

@gabegma gabegma Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need a new test to replace this one - but it didn't look relevant anymore. Replacing task_manager.clear_worker_cache() by task_manager.restart() would work, but then the test takes a lot of time (~50 seconds), and I'm not sure it is testing something relevant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the assertions any([m and d for m, d in cached.values()]). I leave it up to you! Saving 50 seconds does sound tempting if it's useless.

@gabegma gabegma marked this pull request as ready for review April 3, 2023 21:01
@gabegma gabegma self-assigned this Apr 4, 2023
@gabegma gabegma force-pushed the ggm/refactor-dm branch 2 times, most recently from 4402ef6 to 5f81471 Compare April 5, 2023 19:12
@gabegma gabegma force-pushed the ggm/refactor-artifact-manager branch from 0fc4e4a to a3f8412 Compare April 5, 2023 19:13
@gabegma gabegma force-pushed the ggm/refactor-artifact-manager branch from a3f8412 to 228f0ac Compare April 7, 2023 21:58
@gabegma gabegma force-pushed the ggm/refactor-dm branch 2 times, most recently from a93803f to 4d33219 Compare April 9, 2023 01:41
@gabegma gabegma force-pushed the ggm/refactor-artifact-manager branch from 228f0ac to 4095a73 Compare April 9, 2023 01:44
@gabegma gabegma force-pushed the ggm/refactor-artifact-manager branch from 4095a73 to 3ecfa72 Compare April 13, 2023 20:58
Base automatically changed from ggm/refactor-dm to main April 18, 2023 03:05
@gabegma gabegma force-pushed the ggm/refactor-artifact-manager branch from 3ecfa72 to c5fbc3c Compare April 18, 2023 03:06
Copy link
Contributor

@JosephMarinier JosephMarinier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I don't understand all the details, but I guess we'll confirm that everything works by using it!

@gabegma gabegma merged commit 9ba2f9b into main Apr 18, 2023
@gabegma gabegma deleted the ggm/refactor-artifact-manager branch April 18, 2023 21:13
gabegma added a commit that referenced this pull request Apr 19, 2023
* Add logging to help with config update debugging

* Restart client to free memory

* Refactor artifact manager to real singleton and remove clear_cache

* Adapt based on comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tasks often get lost because a worker dies.
2 participants