* Wrap up refactoring.

* Fix refit scaler being opposite of correct. * Scale down intercept to equalize with other features and permit appropriate step size.
MatthewGerber · Dec 28, 2023 · 908fac5 · 908fac5
1 parent a369927
commit 908fac5
Show file tree

Hide file tree

Showing 17 changed files with 66 additions and 81 deletions.
diff --git a/docs/raspberry_pi.md b/docs/raspberry_pi.md
@@ -2,59 +2,25 @@
 * Content
 {:toc}
 
-## Introduction
-The Raspberry Pi is an appealing platform for mobile computation in general and for RLAI in particular. The latest model
-as of October 2021 has a 64-bit quad-core CPU with 8 GB of RAM. Add-on hardware provides a wide range of sensing and 
-actuation capabilities, and the entire ecosystem is quite affordable.
+# Operating System
+The Raspberry Pi is an appealing platform for mobile computation in general and for RLAI in particular. Add-on hardware
+provides a wide range of sensing and actuation capabilities, and the entire ecosystem is quite affordable.
 
-## Operating System
-At present, the official Raspberry Pi OS is a 32-bit version of Debian running on the 64-bit ARM CPU. Thus, the OS 
-presents a 32-bit CPU to all software in the OS. It is possible to install most RLAI dependencies, either directly from 
-the package repositories of by building them from source. A specific few, particularly JAX, are neither available in the 
-repositories nor straightforward to build from source for the ARM CPU. There is an open issue for this 
-[here](https://github.com/google/jax/issues/1161), and it indicates that support for 32-bit Raspberry Pi is not likely 
-to appear soon. I experimented with Ubuntu Desktop 21.04 64-bit, which installs and runs without issues on the Raspberry 
-Pi 4 Model B; however, the desktop interface is sluggish, and since this is not an LTS version it is not possible to use 
-the Deadsnakes repository to install Python 3.7 and 3.8 (the Ubuntu default is Python 3.9). The Raspberry Pi Imager does 
-not provide any other Ubuntu Desktop versions. Ultimately, I settled on Ubuntu Server 20.04 64-bit, which is a much 
-slimmer OS that also installs and runs without issues. It defaults to Python 3.8 and works fine with lighter desktop 
-environments like XFCE. The installation is more complicated than for Ubuntu Desktop, but it is entirely feasible. 
-Detailed instructions are provided below.
+1. Install and start the [Raspberry Pi Imager](https://www.raspberrypi.com/software/).
+2. Install the default 64-bit Raspberry Pi OS.
+3. `sudo apt update`
+4. `sudo apt upgrade`
+5. Reboot the system.
 
-### Image the Raspberry Pi SD Card
-1. Install and start the Raspberry Pi Imager.
-2. Select Ubuntu Server 20.04 64-bit within the Raspberry Pi Imager, then write the OS to the SD card.
-3. Insert the SD card into the Raspberry Pi and boot.
+# Install Required Packages and XFCE Desktop Environment
+1. `sudo apt install gfortran python3-dev libblas-dev liblapack-dev build-essential swig python-pygame git virtualenv xvfb ffmpeg`
+2. `sudo systemctl reboot`
 
-### Configure Wireless Internet
-1. `sudo nano /etc/wpa_supplicant.conf` (edit as follows, replacing values as indicated):
-   ```
-   country=US
-   ctrl_interface=DIR=/var/run/wpa_supplicant
-   update_config=1
-   network={
-     ssid="Your Wi-Fi SSID"
-     scan_ssid=1
-     psk="Your Wi-Fi Password"
-     key_mgmt=WPA-PSK
-   }
-   ```
-2. Enable wireless interface:  `sudo wpa_supplicant -Dnl80211 -B iwlan0 -c/etc/wpa_supplicant.conf`
-3. Obtain wireless address:  `sudo dhclient -v`
+# Python Integrated Development Environment (IDE)
+There are several alternative IDEs. PyCharm is a very good one and is free for personal use.
 
-The above should be sufficient to get your Raspberry Pi connected to Wi-Fi. Note that subsequent installation of the 
-XFCE Desktop Environment (below) will cause the wireless networking settings to be managed by NetworkManager, which 
-stores connection information in `/etc/NetworkManager/system-connections`.
-
-### Upgrade OS
-1. `sudo apt update`
-2. `sudo apt upgrade`
-3. `sudo systemctl reboot`
-
-### Install Required Packages and XFCE Desktop Environment
-1. `sudo apt install gfortran python3-dev libblas-dev liblapack-dev build-essential swig python-pygame git virtualenv qt5-default xvfb ffmpeg`
-2. `sudo apt install xubuntu-desktop`
-3. `sudo systemctl reboot`
+1. Download [here](https://www.jetbrains.com/pycharm/download).
+2. Extract and install:
 
 ## Install RLAI
 

diff --git a/run_configurations/CartPole run FA.run.xml b/run_configurations/CartPole run FA.run.xml
@@ -14,7 +14,7 @@
     <option name="ADD_SOURCE_ROOTS" value="true" />
     <EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
     <option name="SCRIPT_NAME" value="$PROJECT_DIR$/src/rlai/runners/agent_in_environment.py" />
-    <option name="PARAMETERS" value="--agent ~/Desktop/cartpole_agent.pickle --environment rlai.core.environments.gymnasium.Gym --gym-id CartPole-v1 --plot-environment --render-every-nth-episode 1 --n-runs 1 --T 10000 --plot" />
+    <option name="PARAMETERS" value="--agent ../../../trained_agents/cartpole/parametric/cartpole_agent.pickle --environment rlai.core.environments.gymnasium.Gym --gym-id CartPole-v1 --plot-environment --render-every-nth-episode 1 --n-runs 1 --T 10000 --plot" />
     <option name="SHOW_COMMAND_LINE" value="false" />
     <option name="EMULATE_TERMINAL" value="false" />
     <option name="MODULE_MODE" value="false" />

diff --git a/run_configurations/CartPole train FA.run.xml b/run_configurations/CartPole train FA.run.xml
@@ -14,7 +14,7 @@
     <option name="ADD_SOURCE_ROOTS" value="true" />
     <EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
     <option name="SCRIPT_NAME" value="$PROJECT_DIR$/src/rlai/runners/trainer.py" />
-    <option name="PARAMETERS" value="--random-seed 12345 --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 0.95 --environment rlai.core.environments.gymnasium.Gym --gym-id CartPole-v1 --T 1000 --render-every-nth-episode 100 --video-directory ~/Desktop/cartpole_videos --train-function rlai.gpi.temporal_difference.iteration.iterate_value_q_pi --mode SARSA --num-improvements 5000 --num-episodes-per-improvement 1 --epsilon 0.1 --q-S-A rlai.gpi.state_action_value.function_approximation.ApproximateStateActionValueEstimator --plot-model --plot-model-per-improvements 100 --function-approximation-model rlai.gpi.state_action_value.function_approximation.models.sklearn.SKLearnSGD --loss squared_error --sgd-alpha 0.0 --learning-rate constant --eta0 0.01 --feature-extractor rlai.core.environments.gymnasium.CartpoleFeatureExtractor --make-final-policy-greedy True --num-improvements-per-plot 100 --num-improvements-per-checkpoint 1000 --checkpoint-path ~/Desktop/cartpole_checkpoint/cartpole_checkpoint.pickle --save-agent-path ~/Desktop/cartpole_agent.pickle --log INFO" />
+    <option name="PARAMETERS" value="--random-seed 12345 --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 0.95 --environment rlai.core.environments.gymnasium.Gym --gym-id CartPole-v1 --T 1000 --render-every-nth-episode 100 --video-directory ~/Desktop/cartpole_videos --train-function rlai.gpi.temporal_difference.iteration.iterate_value_q_pi --mode SARSA --num-improvements 5000 --num-episodes-per-improvement 1 --epsilon 0.05 --q-S-A rlai.gpi.state_action_value.function_approximation.ApproximateStateActionValueEstimator --plot-model --plot-model-per-improvements 100 --function-approximation-model rlai.gpi.state_action_value.function_approximation.models.sklearn.SKLearnSGD --loss squared_error --sgd-alpha 0.0 --learning-rate constant --eta0 0.001 --feature-extractor rlai.core.environments.gymnasium.CartpoleFeatureExtractor --make-final-policy-greedy True --num-improvements-per-plot 100 --num-improvements-per-checkpoint 1000 --checkpoint-path ~/Desktop/cartpole_checkpoint/cartpole_checkpoint.pickle --save-agent-path ~/Desktop/cartpole_agent.pickle --log INFO" />
     <option name="SHOW_COMMAND_LINE" value="false" />
     <option name="EMULATE_TERMINAL" value="false" />
     <option name="MODULE_MODE" value="false" />

diff --git a/run_configurations/Gridworld SGD.run.xml b/run_configurations/Gridworld SGD.run.xml
@@ -14,7 +14,7 @@
     <option name="ADD_SOURCE_ROOTS" value="true" />
     <EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
     <option name="SCRIPT_NAME" value="$PROJECT_DIR$/src/rlai/runners/trainer.py" />
-    <option name="PARAMETERS" value="--random-seed 12345 --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 1.0 --environment rlai.core.environments.gridworld.Gridworld --id example_4_1 --T 25 --train-function rlai.gpi.temporal_difference.iteration.iterate_value_q_pi --mode Q_LEARNING --num-improvements 50 --num-episodes-per-improvement 50 --num-improvements-per-plot 10 --epsilon 0.05 --q-S-A rlai.gpi.state_action_value.function_approximation.ApproximateStateActionValueEstimator --plot-model --plot-model-per-improvements 10 --plot-model-bins 5 --function-approximation-model rlai.gpi.state_action_value.function_approximation.models.sklearn.SKLearnSGD --verbose 1 --feature-extractor rlai.core.environments.gridworld.GridworldFeatureExtractor" />
+    <option name="PARAMETERS" value="--random-seed 12345 --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 0.99 --environment rlai.core.environments.gridworld.Gridworld --id example_4_1 --T 25 --train-function rlai.gpi.temporal_difference.iteration.iterate_value_q_pi --mode Q_LEARNING --num-improvements 50 --num-episodes-per-improvement 50 --num-improvements-per-plot 10 --epsilon 0.05 --q-S-A rlai.gpi.state_action_value.function_approximation.ApproximateStateActionValueEstimator --plot-model --plot-model-per-improvements 10 --plot-model-bins 5 --function-approximation-model rlai.gpi.state_action_value.function_approximation.models.sklearn.SKLearnSGD --verbose 1 --feature-extractor rlai.core.environments.gridworld.GridworldFeatureExtractor" />
     <option name="SHOW_COMMAND_LINE" value="false" />
     <option name="EMULATE_TERMINAL" value="false" />
     <option name="MODULE_MODE" value="false" />

diff --git a/run_configurations/MountainCar continuous run FA.run.xml b/run_configurations/MountainCar continuous run FA.run.xml
@@ -14,7 +14,7 @@
     <option name="ADD_SOURCE_ROOTS" value="true" />
     <EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
     <option name="SCRIPT_NAME" value="$PROJECT_DIR$/src/rlai/runners/agent_in_environment.py" />
-    <option name="PARAMETERS" value="--agent ~/Desktop/mountaincar_continuous_agent.pickle --environment rlai.core.environments.gymnasium.Gym --steps-per-second 35 --gym-id MountainCarContinuous-v0 --render-every-nth-episode 1 --n-runs 10 --T 2000 --plot --plot-environment" />
+    <option name="PARAMETERS" value="--agent ../../../trained_agents/mountaincar_continuous/mountaincar_continuous_agent.pickle --environment rlai.core.environments.gymnasium.Gym --steps-per-second 35 --gym-id MountainCarContinuous-v0 --render-every-nth-episode 1 --n-runs 10 --T 2000 --plot --plot-environment" />
     <option name="SHOW_COMMAND_LINE" value="false" />
     <option name="EMULATE_TERMINAL" value="false" />
     <option name="MODULE_MODE" value="false" />

diff --git a/src/rlai/core/environments/gymnasium.py b/src/rlai/core/environments/gymnasium.py
@@ -899,7 +899,7 @@ def get_reward(
                 -(
                     np.abs([
                         observation[0],  # position
-                        observation[2] * 5.0,  # angle:  equalize with the position's scale
+                        observation[2] * 7.5,  # angle:  equalize with the position's scale
                     ]).sum()
                 )
             ),
@@ -1003,7 +1003,7 @@ def extract(
 
         # extract and scale features for each state vector
         state_feature_matrix = self.feature_scaler.scale_features(state_matrix, refit_scaler)
-        intercept_state_feature_matrix = np.ones(shape=np.add(state_feature_matrix.shape, (0, 1)))
+        intercept_state_feature_matrix = 0.01 * np.ones(shape=np.add(state_feature_matrix.shape, (0, 1)))
         intercept_state_feature_matrix[:, 1:] = state_feature_matrix
         state_feature_matrix = intercept_state_feature_matrix
 
@@ -1076,9 +1076,16 @@ class ContinuousMountainCarCustomizer(ContinuousActionGymCustomizer):
     Continuous mountain car customizer.
     """
 
+    # x-position of the lowest point (trough)
     TROUGH_X_POS = -0.5
+
+    # x-position of the goal
     GOAL_X_POS = 0.45
+
+    # number of reward increments to provide on the upward slope when moving forward
     NUM_REWARD_INCREMENTS = 20
+
+    # maximum fuel use per simulation step (unitless)
     MAX_FUEL_USE_PER_STEP = 2.0 / 300.0
 
     def __init__(
@@ -1091,7 +1098,7 @@ def __init__(
         super().__init__()
 
         self.fuel_level: Optional[float] = None
-        self.reward_increments: Optional[Dict] = None
+        self.reward_x_positions: Optional[List[float]] = None
 
     def get_action_dimension_names(
             self,
@@ -1136,7 +1143,7 @@ def get_reset_observation(
 
         self.fuel_level = 1.0
 
-        self.reward_increments = np.linspace(
+        self.reward_x_positions = np.linspace(
             start=self.TROUGH_X_POS,
             stop=self.GOAL_X_POS,
             num=self.NUM_REWARD_INCREMENTS
@@ -1203,9 +1210,10 @@ def get_reward(
         custom_terminated = terminated
         position, velocity = observation[0:2]
 
-        if len(self.reward_increments) > 0 and position >= self.reward_increments[0]:
-            custom_reward = (self.reward_increments[0] - self.TROUGH_X_POS) * self.fuel_level
-            self.reward_increments = self.reward_increments[1:]
+        # provide the next incremental reward if any exist, then remove from availability.
+        if len(self.reward_x_positions) > 0 and position >= self.reward_x_positions[0]:
+            custom_reward = (self.reward_x_positions[0] - self.TROUGH_X_POS) * self.fuel_level
+            self.reward_x_positions = self.reward_x_positions[1:]
         else:
             custom_reward = 0.0
 
@@ -1248,7 +1256,7 @@ def extract(
         :return: State-feature vector.
         """
 
-        state_feature_vector = np.append([1.0], super().extract(state, refit_scaler))
+        state_feature_vector = np.append([0.01], super().extract(state, refit_scaler))
         state_category_feature_vector = self.state_category_interacter.interact(
             np.array([state.observation]),
             np.array([state_feature_vector])

diff --git a/src/rlai/gpi/state_action_value/function_approximation/__init__.py b/src/rlai/gpi/state_action_value/function_approximation/__init__.py
@@ -335,6 +335,7 @@ def improve_policy(
         # if we have pending experience, then fit the model and reset the data.
         if self.experience_pending:
 
+            # extract features and fit the scaler while doing so
             state_action_feature_matrix = self.extract_features(
                 self.experience_states,
                 self.experience_actions,
@@ -406,9 +407,9 @@ def extract_features(
 
         :param states: States.
         :param actions: Actions.
-        :param refit_scaler: Whether to refit the feature scaler before scaling the extracted features. This is
-        only appropriate in settings where nonstationarity is desired (e.g., during training). During evaluation, the
-        scaler should remain fixed, which means this should be False.
+        :param refit_scaler: Whether to refit the feature scaler before scaling the extracted features. This is only
+        appropriate in settings where nonstationarity is desired (e.g., during training). During evaluation, the scaler
+        should remain fixed, which means this should be False.
         :return: State-feature numpy.ndarray.
         """
 

diff --git a/src/rlai/policy_gradient/policies/continuous_action.py b/src/rlai/policy_gradient/policies/continuous_action.py
@@ -224,7 +224,7 @@ def __commit_updates__(
 
         # extract state-feature matrix
         state_feature_matrix = np.array([
-            self.feature_extractor.extract(s, False)
+            self.feature_extractor.extract(s, True)
             for s in self.update_batch_s
         ])
 
@@ -387,7 +387,7 @@ def __getitem__(
 
         self.set_action(state)
 
-        state_feature_vector = self.feature_extractor.extract(state, True)
+        state_feature_vector = self.feature_extractor.extract(state, False)
 
         # add intercept if the extractor doesn't extract one
         if not self.feature_extractor.extracts_intercept():
@@ -542,7 +542,7 @@ def __commit_updates__(
 
         # extract state-feature matrix:  one row per update and one column per state dimension.
         state_feature_matrix = np.array([
-            self.feature_extractor.extract(s, False)
+            self.feature_extractor.extract(s, True)
             for s in self.update_batch_s
         ])
 
@@ -774,7 +774,7 @@ def __getitem__(
 
         self.set_action(state)
 
-        state_feature_vector = self.feature_extractor.extract(state, True)
+        state_feature_vector = self.feature_extractor.extract(state, False)
 
         # add intercept if the extractor doesn't extract one
         if not self.feature_extractor.extracts_intercept():