In our project, we are tasked with learning an agent to
traverse a frozen lake without falling into the water. The
agent learns by trial-and-error, adjusting the actions it takes
based on the rewards it received in the past.
We will use the Q-learning algorithm. This algorithm generates a table called the Q-table which has a
mapping of every state and possible action to a value. The
agent will learn which actions to take based on the values
of this table.
- How does the behavior of the agent differ when using a high or low value for the exploration-exploitation (ε) parameter
- Does the discount factor (γ) have a noticeable impact on the score achieved by the agent
- Does the learning rate (α) have a noticeable impact on the score achieved by the agent
Full report