-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split read_params_from_cmdline() for job and main script and use argparse #104
Conversation
266dda7
to
8430bdb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a great change. This should also fix #19.
My concern is the change of the CLI for user scripts, which I would like not to break if there is no real need to. So my vote would be for adding argument parsing for the automatic arguments, --server-connection-info
, --parameter-dict
, and leaving the manual arguments as is (see also my comment).
Ultimately, I would like to refactor the user-facing API such that the user has control over how the commandline is parsed. Parsing settings and communicating with the server is entangled in cluster utils (in read_params_from_commandline
), because historically people liked to have a package that does both things at the same time, but I don't think that this is necessary or good design. I would like to separate the two things by 1) having something like a function register_with_server(comm_info)
that the user can call after manually parsing the commandline, 2) offer an API method that offers pre-setup commandline parsing for convenience, 3) keep read_params_from_commandline
for the time being, but deprecate it.
In the redesign, it should also become more clear of how to get rid of smart settings (or make it optional). Ultimately, I'm questioning whether we need to prescribe a settings library on the user side at all.
But I'm not sure what the best design is now, so I don't want to include it in this PR.
Use argparse to define the arguments passed to this script. This gives us automatic `--help/-h` arguments and makes it easier to extend in the future. So far, the arguments are still forwarded to `read_params_from_cmdline` but it is a first step to getting rid of that function in the user-called scripts. Also move the related code to helper functions, so there is less code duplication between grid_search and hp_optimization. Related issue: #103
`read_params_from_cmdline()` was previously used by both the cluster_utils main script (e.g. grid_search.py) and the job scripts. They have different requirements and thus the function was pretty convoluted. With this change now - `read_params_from_cmdline()` is only responsible for the job scripts. It is refactored and uses argparse internally for hopefully better readable code and `--help`-support. - `read_main_script_params_with_smart_settings()` is the counter-part for the grid_search/hp_optimization scripts. BREAKING: `read_params_from_cmdline()`: The job script now expects named arguments for server communication info and for distinction between file path or dictionary for the parameters (see CHANGELOG). This makes the optionality of server information and distinction between settings file vs dictionary string much easier to implement. This does not affect manual calls of the job script if a parameter file is used but does change how cluster_utils runs them, so especially non-python job scripts which operate on the arguments likely need to be updated. BREAKING: Support for custom hooks has been removed as it doesn't seem to be relevant here.
Instead of passing a dictionary with the job ID and server IP/port, use two separate arguments --job_id <id> --cluster-utils-server <ip>:<port> This is both more user friendly (in case one ever wants to pass these values manually) and makes the parsing easier.
69886a2
to
51b6cb4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, left minor comments.
Co-authored-by: mseitzer <16725193+mseitzer@users.noreply.github.com>
--help/-h
arguments and makes the code easier to maintain.read_params_from_cmdline()
for job and main script.read_params_from_cmdline()
was previously used by both the cluster_utils main script (e.g. grid_search.py) and the job scripts. They have different requirements and thus the function was pretty convoluted. With this change nowread_params_from_cmdline()
is only responsible only for the job scripts. It is refactored and uses argparse internally for hopefully better readable code and--help
-support.read_main_script_params_with_smart_settings()
is the counter-part for the grid_search/hp_optimization scripts.BREAKING 1:
read_params_from_cmdline()
: The job script now expects named arguments instead of positional ones. This makes the optionality of server information and distinction between settings file vs dictionary string much easier to implement.This affects manual calls of the job script and may break non-python job scripts which operate on the arguments.
BREAKING 2: Support for custom hooks in
read_params_from_cmdline()
has been removed as it doesn't seem to be relevant here. I'm not sure, though, if this is okay or if some people depend on it in the job script. If in doubt, I'll add it back.Closes #103
How I tested