Lesson 1: Introduction to Markov Decision Processes

Lesson 2: Goal of Reinforcement Learning

Lesson 3: Continuing Tasks

Weekly Reading

For this week, read Chapter 3.3 (pages 47-56) in Reinforcement Learning: An Introduction

Example


An example of an MDP could be a self driving car. The states would be all of the sensor readings that car gets at each time step: LIDAR, cameras, the amount of fuel left, current wheel angle, current velocity, gps location. The actions could be accelerate, decelerate, turn wheels left, and turn wheels right. The rewards could be -1 at every time step so that the agent is encouraged to get to the goal as quickly as possible, but -1 billion if it crashes or breaks the law so that it knows not to do that.