Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 273 Bytes

File metadata and controls

6 lines (4 loc) · 273 Bytes

Suppose that the performance measure is concerned with just the first $T$ time steps of the environment and ignores everything thereafter. Show that a rational agent’s action may depend not just on the state of the environment but also on the time step it has reached.