Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split read_params_from_cmdline() for job and main script and use argparse #104

Merged
merged 8 commits into from
Jun 4, 2024

Conversation

luator
Copy link
Member

@luator luator commented May 22, 2024

  • Use argparse to define the arguments passed to the scripts. This gives us automatic --help/-h arguments and makes the code easier to maintain.
  • Split read_params_from_cmdline() for job and main script.
    read_params_from_cmdline() was previously used by both the cluster_utils main script (e.g. grid_search.py) and the job scripts. They have different requirements and thus the function was pretty convoluted. With this change now
    • read_params_from_cmdline() is only responsible only for the job scripts. It is refactored and uses argparse internally for hopefully better readable code and --help-support.
    • read_main_script_params_with_smart_settings() is the counter-part for the grid_search/hp_optimization scripts.

BREAKING 1: read_params_from_cmdline(): The job script now expects named arguments instead of positional ones. This makes the optionality of server information and distinction between settings file vs dictionary string much easier to implement.
This affects manual calls of the job script and may break non-python job scripts which operate on the arguments.

BREAKING 2: Support for custom hooks in read_params_from_cmdline() has been removed as it doesn't seem to be relevant here. I'm not sure, though, if this is okay or if some people depend on it in the job script. If in doubt, I'll add it back.

Closes #103

How I tested

  • running examles locally
  • unit tests
  • running example on Slurm

Copy link
Collaborator

@mseitzer mseitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a great change. This should also fix #19.

My concern is the change of the CLI for user scripts, which I would like not to break if there is no real need to. So my vote would be for adding argument parsing for the automatic arguments, --server-connection-info, --parameter-dict, and leaving the manual arguments as is (see also my comment).

Ultimately, I would like to refactor the user-facing API such that the user has control over how the commandline is parsed. Parsing settings and communicating with the server is entangled in cluster utils (in read_params_from_commandline), because historically people liked to have a package that does both things at the same time, but I don't think that this is necessary or good design. I would like to separate the two things by 1) having something like a function register_with_server(comm_info) that the user can call after manually parsing the commandline, 2) offer an API method that offers pre-setup commandline parsing for convenience, 3) keep read_params_from_commandline for the time being, but deprecate it.

In the redesign, it should also become more clear of how to get rid of smart settings (or make it optional). Ultimately, I'm questioning whether we need to prescribe a settings library on the user side at all.

But I'm not sure what the best design is now, so I don't want to include it in this PR.

cluster_utils/settings.py Show resolved Hide resolved
cluster_utils/settings.py Show resolved Hide resolved
cluster_utils/settings.py Show resolved Hide resolved
cluster_utils/settings.py Outdated Show resolved Hide resolved
cluster_utils/settings.py Outdated Show resolved Hide resolved
Use argparse to define the arguments passed to this script.  This gives
us automatic `--help/-h` arguments and makes it easier to extend in the
future.

So far, the arguments are still forwarded to `read_params_from_cmdline`
but it is a first step to getting rid of that function in the
user-called scripts.

Also move the related code to helper functions, so there is less code
duplication between grid_search and hp_optimization.

Related issue: #103
`read_params_from_cmdline()` was previously used by both the
cluster_utils main script (e.g. grid_search.py) and the job scripts.
They have different requirements and thus the function was pretty
convoluted.  With this change now

- `read_params_from_cmdline()` is only responsible for the job scripts.
  It is refactored and uses argparse internally for hopefully better
  readable code and `--help`-support.
- `read_main_script_params_with_smart_settings()` is the counter-part
  for the grid_search/hp_optimization scripts.

BREAKING: `read_params_from_cmdline()`: The job script now expects
named arguments for server communication info and for distinction
between file path or dictionary for the parameters (see CHANGELOG).
This makes the optionality of server information and distinction between
settings file vs dictionary string much easier to implement.
This does not affect manual calls of the job script if a parameter file
is used but does change how cluster_utils runs them, so especially
non-python job scripts which operate on the arguments likely need to be
updated.

BREAKING: Support for custom hooks has been removed as it doesn't seem
to be relevant here.
Instead of passing a dictionary with the job ID and server IP/port, use
two separate arguments

    --job_id <id> --cluster-utils-server <ip>:<port>

This is both more user friendly (in case one ever wants to pass these
values manually) and makes the parsing easier.
Copy link
Collaborator

@mseitzer mseitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left minor comments.

cluster_utils/grid_search.py Outdated Show resolved Hide resolved
cluster_utils/hp_optimization.py Outdated Show resolved Hide resolved
cluster_utils/settings.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
luator and others added 3 commits June 4, 2024 13:26
Co-authored-by: mseitzer <16725193+mseitzer@users.noreply.github.com>
@luator luator merged commit 2aedfd0 into master Jun 4, 2024
6 checks passed
@luator luator deleted the fkloss/use_argparse branch June 4, 2024 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use argparse for reading arguments of hp_optimization/grid_search
2 participants