diff --git a/docs/ch_Feature_Extractors.md b/docs/ch_Feature_Extractors.md index 2a05dfc4..d8c89831 100644 --- a/docs/ch_Feature_Extractors.md +++ b/docs/ch_Feature_Extractors.md @@ -10,7 +10,7 @@ A feature extractor for the gridworld. This extractor, being based on the `State A feature extractor for the gridworld. This extractor does not interact feature values with actions. Its primary use is in state-value estimation (e.g., for the baseline of policy gradient methods). ``` -### [rlai.core.environments.gymnasium.CartpoleFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L915) +### [rlai.core.environments.gymnasium.CartpoleFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L917) ``` A feature extractor for the Gym cartpole environment. This extractor, being based on the `StateActionInteractionFeatureExtractor`, directly extracts the fully interacted state-action feature matrix. It @@ -19,20 +19,20 @@ A feature extractor for the Gym cartpole environment. This extractor, being base separate intercept term being present for each state segment and action combination. The function approximator should not add its own intercept term. ``` -### [rlai.core.environments.gymnasium.ContinuousLunarLanderFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L1456) +### [rlai.core.environments.gymnasium.ContinuousLunarLanderFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L1458) ``` Feature extractor for the continuous lunar lander environment. ``` -### [rlai.core.environments.gymnasium.ContinuousMountainCarFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L1232) +### [rlai.core.environments.gymnasium.ContinuousMountainCarFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L1234) ``` Feature extractor for the continuous mountain car environment. ``` -### [rlai.core.environments.gymnasium.ScaledFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L701) +### [rlai.core.environments.gymnasium.ScaledFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L703) ``` A feature extractor for continuous Gym environments. Extracts a scaled (standardized) version of the Gym state observation. ``` -### [rlai.core.environments.gymnasium.SignedCodingFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L783) +### [rlai.core.environments.gymnasium.SignedCodingFeatureExtractor](https://github.com/MatthewGerber/rlai/tree/master/src/rlai/core/environments/gymnasium.py#L785) ``` Signed-coding feature extractor. Forms a category from the conjunction of all state-feature signs and then places the continuous feature vector into its associated category. Works for all continuous-valued state spaces in Gym. diff --git a/docs/cli_guide.md b/docs/cli_guide.md index 6d4d9416..85fb08ec 100644 --- a/docs/cli_guide.md +++ b/docs/cli_guide.md @@ -135,6 +135,6 @@ agent for the Gym cartpole (inverted pendulum) environment. rlai train --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 1.0 --environment rlai.core.environments.gymnasium.Gym --T 1000 --gym-id CartPole-v1 --render-every-nth-episode 5000 --train-function rlai.gpi.temporal_difference.iteration.iterate_value_q_pi --mode Q_LEARNING --num-improvements 100 --num-episodes-per-improvement 50 --epsilon 0.01 --q-S-A rlai.gpi.state_action_value.tabular.TabularStateActionValueEstimator --continuous-state-discretization-resolution 0.1 --make-final-policy-greedy True --num-improvements-per-plot 100 --save-agent-path ~/Desktop/cartpole_agent.pickle ``` A video should be rendered at the start of training, and a plot will be rendered at the end similar to the following. -![cartpole](cli-cartpole.png) +![cartpole](images/cli-cartpole.png) Details of training plots like this one are provided in the Case Studies (e.g., [cartpole](case_studies/inverted_pendulum.md)). \ No newline at end of file diff --git a/docs/cli-cartpole.png b/docs/images/cli-cartpole.png similarity index 100% rename from docs/cli-cartpole.png rename to docs/images/cli-cartpole.png diff --git a/docs/gridworld_sgd.png b/docs/images/gridworld_sgd.png similarity index 100% rename from docs/gridworld_sgd.png rename to docs/images/gridworld_sgd.png diff --git a/docs/jupyterlab-composer.png b/docs/images/jupyterlab-composer.png similarity index 100% rename from docs/jupyterlab-composer.png rename to docs/images/jupyterlab-composer.png diff --git a/docs/jupyterlab-diag.png b/docs/images/jupyterlab-diag.png similarity index 100% rename from docs/jupyterlab-diag.png rename to docs/images/jupyterlab-diag.png diff --git a/docs/jupyterlab-running.png b/docs/images/jupyterlab-running.png similarity index 100% rename from docs/jupyterlab-running.png rename to docs/images/jupyterlab-running.png diff --git a/docs/jupyterlab_guide.md b/docs/jupyterlab_guide.md index 6deb70df..f0061e66 100644 --- a/docs/jupyterlab_guide.md +++ b/docs/jupyterlab_guide.md @@ -6,17 +6,17 @@ A companion JupyterLab notebook is provided to ease the use of RLAI. The goal of the interface is to assist with the composition, execution, and real-time inspection of RLAI commands. The primary composer interface is show below: -![jupyterlab](jupyterlab-composer.png) +![jupyterlab](images/jupyterlab-composer.png) The notebook provides controls for starting, pausing, and resuming the execution of RLAI commands. All plots are interactive and support zooming, panning, and axis rescaling. An example is shown below: -![jupyterlab-running](jupyterlab-running.png) +![jupyterlab-running](images/jupyterlab-running.png) Certain state-active value function estimators (e.g., the scikit-learn stochastic gradient descent model) support diagnostic plots. An example is shown below: -![jupyterlab-diag](jupyterlab-diag.png) +![jupyterlab-diag](images/jupyterlab-diag.png) For single-click access to the notebook, please click below: diff --git a/docs/model_diagnostics_and_interpretation.md b/docs/model_diagnostics_and_interpretation.md index 163135e3..4a522a12 100644 --- a/docs/model_diagnostics_and_interpretation.md +++ b/docs/model_diagnostics_and_interpretation.md @@ -18,7 +18,7 @@ shown below (see [continuous mountain car](./case_studies/mountain_car_continuou * Top right: Action values. * Bottom left: State-value estimate, which is used as a baseline in the REINFORCE policy gradient algorithm. * Bottom right: Shape parameters `a` and `b` for the beta PDF. -* + # Model Coefficient Plots Consider the gridworld of Example 4.1 solved with temporal-difference q-learning and stochastic gradient descent based on the four features extracted by @@ -30,7 +30,7 @@ rlai train --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 1 --e ``` The above command should generate plots such as the following: -![gridworld-sgd-plot](gridworld_sgd.png) +![gridworld-sgd-plot](images/gridworld_sgd.png) As indicated by the title, this figure shows boxplots of model coefficients (y-values) over time (x-values) for each feature (row) and action (column). The coefficients quantify the relationships among the feature-action pairs and the @@ -52,7 +52,7 @@ The [JupyterLab interface](jupyterlab_guide.md) provides detailed instrumentatio shown below. This information can be useful when diagnosing convergence and stability issues in state-action value function approximation. -![sgd-instrumentation](jupyterlab-diag.png) +![sgd-instrumentation](images/jupyterlab-diag.png) The left plot above shows per-iteration averages of return (green), model loss (red), and step size (blue). The right plot shows the same variables for a single iteration, so that each time step is visible. The JupyterLab interface allows diff --git a/src/rlai/figures/Epsilon-greedy, nonstationary bandit.pdf b/src/rlai/figures/Epsilon-greedy, nonstationary bandit.pdf index 8246bd88..52f608ad 100644 Binary files a/src/rlai/figures/Epsilon-greedy, nonstationary bandit.pdf and b/src/rlai/figures/Epsilon-greedy, nonstationary bandit.pdf differ diff --git a/src/rlai/meta/__init__.py b/src/rlai/meta.py similarity index 99% rename from src/rlai/meta/__init__.py rename to src/rlai/meta.py index a28726bf..532731bd 100644 --- a/src/rlai/meta/__init__.py +++ b/src/rlai/meta.py @@ -124,7 +124,7 @@ def main(): # noinspection PyTypeChecker summarize(rlai, chapter_page_descriptions) - docs_dir = f'{os.path.dirname(__file__)}/../../../docs/' + docs_dir = f'{os.path.dirname(__file__)}/../../docs/' meta_md_path = f'{docs_dir}links_to_code.md' ch_num_name = {