Skip to content

Commit

Permalink
* Wrap up refactoring.
Browse files Browse the repository at this point in the history
* Fix refit scaler being opposite of correct.
* Scale down intercept to equalize with other features and permit appropriate step size.
  • Loading branch information
MatthewGerber committed Dec 28, 2023
1 parent a369927 commit 908fac5
Show file tree
Hide file tree
Showing 17 changed files with 66 additions and 81 deletions.
64 changes: 15 additions & 49 deletions docs/raspberry_pi.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,59 +2,25 @@
* Content
{:toc}

## Introduction
The Raspberry Pi is an appealing platform for mobile computation in general and for RLAI in particular. The latest model
as of October 2021 has a 64-bit quad-core CPU with 8 GB of RAM. Add-on hardware provides a wide range of sensing and
actuation capabilities, and the entire ecosystem is quite affordable.
# Operating System
The Raspberry Pi is an appealing platform for mobile computation in general and for RLAI in particular. Add-on hardware
provides a wide range of sensing and actuation capabilities, and the entire ecosystem is quite affordable.

## Operating System
At present, the official Raspberry Pi OS is a 32-bit version of Debian running on the 64-bit ARM CPU. Thus, the OS
presents a 32-bit CPU to all software in the OS. It is possible to install most RLAI dependencies, either directly from
the package repositories of by building them from source. A specific few, particularly JAX, are neither available in the
repositories nor straightforward to build from source for the ARM CPU. There is an open issue for this
[here](https://github.com/google/jax/issues/1161), and it indicates that support for 32-bit Raspberry Pi is not likely
to appear soon. I experimented with Ubuntu Desktop 21.04 64-bit, which installs and runs without issues on the Raspberry
Pi 4 Model B; however, the desktop interface is sluggish, and since this is not an LTS version it is not possible to use
the Deadsnakes repository to install Python 3.7 and 3.8 (the Ubuntu default is Python 3.9). The Raspberry Pi Imager does
not provide any other Ubuntu Desktop versions. Ultimately, I settled on Ubuntu Server 20.04 64-bit, which is a much
slimmer OS that also installs and runs without issues. It defaults to Python 3.8 and works fine with lighter desktop
environments like XFCE. The installation is more complicated than for Ubuntu Desktop, but it is entirely feasible.
Detailed instructions are provided below.
1. Install and start the [Raspberry Pi Imager](https://www.raspberrypi.com/software/).
2. Install the default 64-bit Raspberry Pi OS.
3. `sudo apt update`
4. `sudo apt upgrade`
5. Reboot the system.

### Image the Raspberry Pi SD Card
1. Install and start the Raspberry Pi Imager.
2. Select Ubuntu Server 20.04 64-bit within the Raspberry Pi Imager, then write the OS to the SD card.
3. Insert the SD card into the Raspberry Pi and boot.
# Install Required Packages and XFCE Desktop Environment
1. `sudo apt install gfortran python3-dev libblas-dev liblapack-dev build-essential swig python-pygame git virtualenv xvfb ffmpeg`
2. `sudo systemctl reboot`

### Configure Wireless Internet
1. `sudo nano /etc/wpa_supplicant.conf` (edit as follows, replacing values as indicated):
```
country=US
ctrl_interface=DIR=/var/run/wpa_supplicant
update_config=1
network={
ssid="Your Wi-Fi SSID"
scan_ssid=1
psk="Your Wi-Fi Password"
key_mgmt=WPA-PSK
}
```
2. Enable wireless interface: `sudo wpa_supplicant -Dnl80211 -B iwlan0 -c/etc/wpa_supplicant.conf`
3. Obtain wireless address: `sudo dhclient -v`
# Python Integrated Development Environment (IDE)
There are several alternative IDEs. PyCharm is a very good one and is free for personal use.

The above should be sufficient to get your Raspberry Pi connected to Wi-Fi. Note that subsequent installation of the
XFCE Desktop Environment (below) will cause the wireless networking settings to be managed by NetworkManager, which
stores connection information in `/etc/NetworkManager/system-connections`.

### Upgrade OS
1. `sudo apt update`
2. `sudo apt upgrade`
3. `sudo systemctl reboot`

### Install Required Packages and XFCE Desktop Environment
1. `sudo apt install gfortran python3-dev libblas-dev liblapack-dev build-essential swig python-pygame git virtualenv qt5-default xvfb ffmpeg`
2. `sudo apt install xubuntu-desktop`
3. `sudo systemctl reboot`
1. Download [here](https://www.jetbrains.com/pycharm/download).
2. Extract and install:

## Install RLAI

Expand Down
2 changes: 1 addition & 1 deletion run_configurations/CartPole run FA.run.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="$PROJECT_DIR$/src/rlai/runners/agent_in_environment.py" />
<option name="PARAMETERS" value="--agent ~/Desktop/cartpole_agent.pickle --environment rlai.core.environments.gymnasium.Gym --gym-id CartPole-v1 --plot-environment --render-every-nth-episode 1 --n-runs 1 --T 10000 --plot" />
<option name="PARAMETERS" value="--agent ../../../trained_agents/cartpole/parametric/cartpole_agent.pickle --environment rlai.core.environments.gymnasium.Gym --gym-id CartPole-v1 --plot-environment --render-every-nth-episode 1 --n-runs 1 --T 10000 --plot" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
Expand Down
2 changes: 1 addition & 1 deletion run_configurations/CartPole train FA.run.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="$PROJECT_DIR$/src/rlai/runners/trainer.py" />
<option name="PARAMETERS" value="--random-seed 12345 --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 0.95 --environment rlai.core.environments.gymnasium.Gym --gym-id CartPole-v1 --T 1000 --render-every-nth-episode 100 --video-directory ~/Desktop/cartpole_videos --train-function rlai.gpi.temporal_difference.iteration.iterate_value_q_pi --mode SARSA --num-improvements 5000 --num-episodes-per-improvement 1 --epsilon 0.1 --q-S-A rlai.gpi.state_action_value.function_approximation.ApproximateStateActionValueEstimator --plot-model --plot-model-per-improvements 100 --function-approximation-model rlai.gpi.state_action_value.function_approximation.models.sklearn.SKLearnSGD --loss squared_error --sgd-alpha 0.0 --learning-rate constant --eta0 0.01 --feature-extractor rlai.core.environments.gymnasium.CartpoleFeatureExtractor --make-final-policy-greedy True --num-improvements-per-plot 100 --num-improvements-per-checkpoint 1000 --checkpoint-path ~/Desktop/cartpole_checkpoint/cartpole_checkpoint.pickle --save-agent-path ~/Desktop/cartpole_agent.pickle --log INFO" />
<option name="PARAMETERS" value="--random-seed 12345 --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 0.95 --environment rlai.core.environments.gymnasium.Gym --gym-id CartPole-v1 --T 1000 --render-every-nth-episode 100 --video-directory ~/Desktop/cartpole_videos --train-function rlai.gpi.temporal_difference.iteration.iterate_value_q_pi --mode SARSA --num-improvements 5000 --num-episodes-per-improvement 1 --epsilon 0.05 --q-S-A rlai.gpi.state_action_value.function_approximation.ApproximateStateActionValueEstimator --plot-model --plot-model-per-improvements 100 --function-approximation-model rlai.gpi.state_action_value.function_approximation.models.sklearn.SKLearnSGD --loss squared_error --sgd-alpha 0.0 --learning-rate constant --eta0 0.001 --feature-extractor rlai.core.environments.gymnasium.CartpoleFeatureExtractor --make-final-policy-greedy True --num-improvements-per-plot 100 --num-improvements-per-checkpoint 1000 --checkpoint-path ~/Desktop/cartpole_checkpoint/cartpole_checkpoint.pickle --save-agent-path ~/Desktop/cartpole_agent.pickle --log INFO" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
Expand Down
2 changes: 1 addition & 1 deletion run_configurations/Gridworld SGD.run.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="$PROJECT_DIR$/src/rlai/runners/trainer.py" />
<option name="PARAMETERS" value="--random-seed 12345 --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 1.0 --environment rlai.core.environments.gridworld.Gridworld --id example_4_1 --T 25 --train-function rlai.gpi.temporal_difference.iteration.iterate_value_q_pi --mode Q_LEARNING --num-improvements 50 --num-episodes-per-improvement 50 --num-improvements-per-plot 10 --epsilon 0.05 --q-S-A rlai.gpi.state_action_value.function_approximation.ApproximateStateActionValueEstimator --plot-model --plot-model-per-improvements 10 --plot-model-bins 5 --function-approximation-model rlai.gpi.state_action_value.function_approximation.models.sklearn.SKLearnSGD --verbose 1 --feature-extractor rlai.core.environments.gridworld.GridworldFeatureExtractor" />
<option name="PARAMETERS" value="--random-seed 12345 --agent rlai.gpi.state_action_value.ActionValueMdpAgent --gamma 0.99 --environment rlai.core.environments.gridworld.Gridworld --id example_4_1 --T 25 --train-function rlai.gpi.temporal_difference.iteration.iterate_value_q_pi --mode Q_LEARNING --num-improvements 50 --num-episodes-per-improvement 50 --num-improvements-per-plot 10 --epsilon 0.05 --q-S-A rlai.gpi.state_action_value.function_approximation.ApproximateStateActionValueEstimator --plot-model --plot-model-per-improvements 10 --plot-model-bins 5 --function-approximation-model rlai.gpi.state_action_value.function_approximation.models.sklearn.SKLearnSGD --verbose 1 --feature-extractor rlai.core.environments.gridworld.GridworldFeatureExtractor" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
Expand Down
2 changes: 1 addition & 1 deletion run_configurations/MountainCar continuous run FA.run.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="$PROJECT_DIR$/src/rlai/runners/agent_in_environment.py" />
<option name="PARAMETERS" value="--agent ~/Desktop/mountaincar_continuous_agent.pickle --environment rlai.core.environments.gymnasium.Gym --steps-per-second 35 --gym-id MountainCarContinuous-v0 --render-every-nth-episode 1 --n-runs 10 --T 2000 --plot --plot-environment" />
<option name="PARAMETERS" value="--agent ../../../trained_agents/mountaincar_continuous/mountaincar_continuous_agent.pickle --environment rlai.core.environments.gymnasium.Gym --steps-per-second 35 --gym-id MountainCarContinuous-v0 --render-every-nth-episode 1 --n-runs 10 --T 2000 --plot --plot-environment" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
Expand Down
24 changes: 16 additions & 8 deletions src/rlai/core/environments/gymnasium.py
Original file line number Diff line number Diff line change
Expand Up @@ -899,7 +899,7 @@ def get_reward(
-(
np.abs([
observation[0], # position
observation[2] * 5.0, # angle: equalize with the position's scale
observation[2] * 7.5, # angle: equalize with the position's scale
]).sum()
)
),
Expand Down Expand Up @@ -1003,7 +1003,7 @@ def extract(

# extract and scale features for each state vector
state_feature_matrix = self.feature_scaler.scale_features(state_matrix, refit_scaler)
intercept_state_feature_matrix = np.ones(shape=np.add(state_feature_matrix.shape, (0, 1)))
intercept_state_feature_matrix = 0.01 * np.ones(shape=np.add(state_feature_matrix.shape, (0, 1)))
intercept_state_feature_matrix[:, 1:] = state_feature_matrix
state_feature_matrix = intercept_state_feature_matrix

Expand Down Expand Up @@ -1076,9 +1076,16 @@ class ContinuousMountainCarCustomizer(ContinuousActionGymCustomizer):
Continuous mountain car customizer.
"""

# x-position of the lowest point (trough)
TROUGH_X_POS = -0.5

# x-position of the goal
GOAL_X_POS = 0.45

# number of reward increments to provide on the upward slope when moving forward
NUM_REWARD_INCREMENTS = 20

# maximum fuel use per simulation step (unitless)
MAX_FUEL_USE_PER_STEP = 2.0 / 300.0

def __init__(
Expand All @@ -1091,7 +1098,7 @@ def __init__(
super().__init__()

self.fuel_level: Optional[float] = None
self.reward_increments: Optional[Dict] = None
self.reward_x_positions: Optional[List[float]] = None

def get_action_dimension_names(
self,
Expand Down Expand Up @@ -1136,7 +1143,7 @@ def get_reset_observation(

self.fuel_level = 1.0

self.reward_increments = np.linspace(
self.reward_x_positions = np.linspace(
start=self.TROUGH_X_POS,
stop=self.GOAL_X_POS,
num=self.NUM_REWARD_INCREMENTS
Expand Down Expand Up @@ -1203,9 +1210,10 @@ def get_reward(
custom_terminated = terminated
position, velocity = observation[0:2]

if len(self.reward_increments) > 0 and position >= self.reward_increments[0]:
custom_reward = (self.reward_increments[0] - self.TROUGH_X_POS) * self.fuel_level
self.reward_increments = self.reward_increments[1:]
# provide the next incremental reward if any exist, then remove from availability.
if len(self.reward_x_positions) > 0 and position >= self.reward_x_positions[0]:
custom_reward = (self.reward_x_positions[0] - self.TROUGH_X_POS) * self.fuel_level
self.reward_x_positions = self.reward_x_positions[1:]
else:
custom_reward = 0.0

Expand Down Expand Up @@ -1248,7 +1256,7 @@ def extract(
:return: State-feature vector.
"""

state_feature_vector = np.append([1.0], super().extract(state, refit_scaler))
state_feature_vector = np.append([0.01], super().extract(state, refit_scaler))
state_category_feature_vector = self.state_category_interacter.interact(
np.array([state.observation]),
np.array([state_feature_vector])
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,7 @@ def improve_policy(
# if we have pending experience, then fit the model and reset the data.
if self.experience_pending:

# extract features and fit the scaler while doing so
state_action_feature_matrix = self.extract_features(
self.experience_states,
self.experience_actions,
Expand Down Expand Up @@ -406,9 +407,9 @@ def extract_features(
:param states: States.
:param actions: Actions.
:param refit_scaler: Whether to refit the feature scaler before scaling the extracted features. This is
only appropriate in settings where nonstationarity is desired (e.g., during training). During evaluation, the
scaler should remain fixed, which means this should be False.
:param refit_scaler: Whether to refit the feature scaler before scaling the extracted features. This is only
appropriate in settings where nonstationarity is desired (e.g., during training). During evaluation, the scaler
should remain fixed, which means this should be False.
:return: State-feature numpy.ndarray.
"""

Expand Down
8 changes: 4 additions & 4 deletions src/rlai/policy_gradient/policies/continuous_action.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ def __commit_updates__(

# extract state-feature matrix
state_feature_matrix = np.array([
self.feature_extractor.extract(s, False)
self.feature_extractor.extract(s, True)
for s in self.update_batch_s
])

Expand Down Expand Up @@ -387,7 +387,7 @@ def __getitem__(

self.set_action(state)

state_feature_vector = self.feature_extractor.extract(state, True)
state_feature_vector = self.feature_extractor.extract(state, False)

# add intercept if the extractor doesn't extract one
if not self.feature_extractor.extracts_intercept():
Expand Down Expand Up @@ -542,7 +542,7 @@ def __commit_updates__(

# extract state-feature matrix: one row per update and one column per state dimension.
state_feature_matrix = np.array([
self.feature_extractor.extract(s, False)
self.feature_extractor.extract(s, True)
for s in self.update_batch_s
])

Expand Down Expand Up @@ -774,7 +774,7 @@ def __getitem__(

self.set_action(state)

state_feature_vector = self.feature_extractor.extract(state, True)
state_feature_vector = self.feature_extractor.extract(state, False)

# add intercept if the extractor doesn't extract one
if not self.feature_extractor.extracts_intercept():
Expand Down
Loading

0 comments on commit 908fac5

Please sign in to comment.