Skip to content

Commit

Permalink
Cleanup.
Browse files Browse the repository at this point in the history
  • Loading branch information
MatthewGerber committed Jun 25, 2024
1 parent 30e734d commit 185f2c6
Show file tree
Hide file tree
Showing 11 changed files with 13 additions and 13 deletions.
10 changes: 5 additions & 5 deletions docs/ch_Feature_Extractors.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ A feature extractor for the gridworld. This extractor, being based on the `State
A feature extractor for the gridworld. This extractor does not interact feature values with actions. Its primary use
is in state-value estimation (e.g., for the baseline of policy gradient methods).
```
### [rlai.core.environments.gymnasium.CartpoleFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L915)
### [rlai.core.environments.gymnasium.CartpoleFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L917)
```
A feature extractor for the Gym cartpole environment. This extractor, being based on the
`StateActionInteractionFeatureExtractor`, directly extracts the fully interacted state-action feature matrix. It
Expand All @@ -19,20 +19,20 @@ A feature extractor for the Gym cartpole environment. This extractor, being base
separate intercept term being present for each state segment and action combination. The function approximator
should not add its own intercept term.
```
### [rlai.core.environments.gymnasium.ContinuousLunarLanderFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L1456)
### [rlai.core.environments.gymnasium.ContinuousLunarLanderFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L1458)
```
Feature extractor for the continuous lunar lander environment.
```
### [rlai.core.environments.gymnasium.ContinuousMountainCarFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L1232)
### [rlai.core.environments.gymnasium.ContinuousMountainCarFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L1234)
```
Feature extractor for the continuous mountain car environment.
```
### [rlai.core.environments.gymnasium.ScaledFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L701)
### [rlai.core.environments.gymnasium.ScaledFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L703)
```
A feature extractor for continuous Gym environments. Extracts a scaled (standardized) version of the Gym state
observation.
```
### [rlai.core.environments.gymnasium.SignedCodingFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L783)
### [rlai.core.environments.gymnasium.SignedCodingFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L785)
```
Signed-coding feature extractor. Forms a category from the conjunction of all state-feature signs and then places
the continuous feature vector into its associated category. Works for all continuous-valued state spaces in Gym.
Expand Down
2 changes: 1 addition & 1 deletion docs/cli_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,6 @@ agent for the Gym cartpole (inverted pendulum) environment.
rlai train --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 1.0 --environment rlai.core.environments.gymnasium.Gym --T 1000 --gym-id CartPole-v1 --render-every-nth-episode 5000 --train-function rlai.gpi.temporal_difference.iteration.iterate_value_q_pi --mode Q_LEARNING --num-improvements 100 --num-episodes-per-improvement 50 --epsilon 0.01 --q-S-A rlai.gpi.state_action_value.tabular.TabularStateActionValueEstimator --continuous-state-discretization-resolution 0.1 --make-final-policy-greedy True --num-improvements-per-plot 100 --save-agent-path ~/Desktop/cartpole_agent.pickle
```
A video should be rendered at the start of training, and a plot will be rendered at the end similar to the following.
![cartpole](cli-cartpole.png)
![cartpole](images/cli-cartpole.png)
Details of training plots like this one are provided in the Case Studies
(e.g., [cartpole](case_studies/inverted_pendulum.md)).
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
6 changes: 3 additions & 3 deletions docs/jupyterlab_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,17 @@
A companion JupyterLab notebook is provided to ease the use of RLAI. The goal of the interface is to assist with the
composition, execution, and real-time inspection of RLAI commands. The primary composer interface is show below:

![jupyterlab](jupyterlab-composer.png)
![jupyterlab](images/jupyterlab-composer.png)

The notebook provides controls for starting, pausing, and resuming the execution of RLAI commands. All plots are
interactive and support zooming, panning, and axis rescaling. An example is shown below:

![jupyterlab-running](jupyterlab-running.png)
![jupyterlab-running](images/jupyterlab-running.png)

Certain state-active value function estimators (e.g., the scikit-learn stochastic gradient descent model) support
diagnostic plots. An example is shown below:

![jupyterlab-diag](jupyterlab-diag.png)
![jupyterlab-diag](images/jupyterlab-diag.png)

For single-click access to the notebook, please click below:

Expand Down
6 changes: 3 additions & 3 deletions docs/model_diagnostics_and_interpretation.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ shown below (see [continuous mountain car](./case_studies/mountain_car_continuou
* Top right: Action values.
* Bottom left: State-value estimate, which is used as a baseline in the REINFORCE policy gradient algorithm.
* Bottom right: Shape parameters `a` and `b` for the beta PDF.
*

# Model Coefficient Plots
Consider the gridworld of Example 4.1 solved with temporal-difference q-learning and stochastic gradient descent based
on the four features extracted by
Expand All @@ -30,7 +30,7 @@ rlai train --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 1 --e
```
The above command should generate plots such as the following:

![gridworld-sgd-plot](gridworld_sgd.png)
![gridworld-sgd-plot](images/gridworld_sgd.png)

As indicated by the title, this figure shows boxplots of model coefficients (y-values) over time (x-values) for each
feature (row) and action (column). The coefficients quantify the relationships among the feature-action pairs and the
Expand All @@ -52,7 +52,7 @@ The [JupyterLab interface](jupyterlab_guide.md) provides detailed instrumentatio
shown below. This information can be useful when diagnosing convergence and stability issues in state-action value
function approximation.

![sgd-instrumentation](jupyterlab-diag.png)
![sgd-instrumentation](images/jupyterlab-diag.png)

The left plot above shows per-iteration averages of return (green), model loss (red), and step size (blue). The right
plot shows the same variables for a single iteration, so that each time step is visible. The JupyterLab interface allows
Expand Down
Binary file modified src/rlai/figures/Epsilon-greedy, nonstationary bandit.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion src/rlai/meta/__init__.py → src/rlai/meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ def main():
# noinspection PyTypeChecker
summarize(rlai, chapter_page_descriptions)

docs_dir = f'{os.path.dirname(__file__)}/../../../docs/'
docs_dir = f'{os.path.dirname(__file__)}/../../docs/'
meta_md_path = f'{docs_dir}links_to_code.md'

ch_num_name = {
Expand Down

0 comments on commit 185f2c6

Please sign in to comment.