-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BC] Add an interactive only mode for env.py #377
Conversation
given by combine_tfa_policies_lib.get_input_signature() and action spec given by combine_tfa_policies_lib.get_action_spec() The combiner policy uses a new timestep spec feature "model_selector" to select the requested policy at the current state. The feature is computed as a md5 hash from the respective policies names.
…iler-opt into policy_combiner
TimeStep. The patch also gives the option to keep the temporary working_dir by setting keep_temps in compilation_runner.py to a directory where all temporary working_dirs will be saved.
compiles only using iclang instead of both clang and iclang.
clang_path, module, task_type, interactive=False) as clang: | ||
yield iclang, clang | ||
if interactive_only: | ||
yield iclang, iclang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we return iclang here, i.e. not a tuple, and detect that on the receiving end? or (iclang, None)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relative reward is computed, that is, the compute_relative_rewards function of env.py always expects two scores. I found this to be the cleanest change. Returning only iclang and None will end up being identical but would require multiple changes and I didn't want to break anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - can you update the documentation string explaining this, too? (i.e. it's not quite "interactive_only: If set to true only iclang is yielded
" - looking at the docstring, the return isn't quite well specified actually, what the generator generates is what's interesting - could you basically make the doc "right"?)
compiler_opt/rl/env_test.py
Outdated
|
||
step = test_env.step(np.array([1], dtype=np.int64)) | ||
self.assertEqual(step.step_type, env.StepType.LAST) | ||
self.assertEqual(step.reward, {'default': 0.}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the only way we know it's interactive by seeing the reward be 0? Can we test that the 2 iclangs are the same, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can directly check if test_env._iclang and test_env._clang are the same object.
Changed the unit test for MLGOEnvironment to check if the clang sessions are equal and respect the interactive_only variable.
…piler-opt into behavior_cloning
compiler_opt/rl/env.py
Outdated
|
||
Returns: | ||
The generator for InteractiveClang objects. | ||
A tuple of generators created with clang_session. First argument of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the return is a generator of tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nit in the comment
Added interactive only mode for env.py which compiles only using iclang instead of both clang and iclang.