Lesson 1: Introduction to Markov Decision Processes

Understand Markov Decision Processes, or MDPs
Describe how the dynamics of an MDP are defined
Understand the graphical representation of a Markov Decision Process
Explain how many diverse processes can be written in terms of the MDP framework

Lesson 2: Goal of Reinforcement Learning

Describe how rewards relate to the goal of an agent
Understand episodes and identify episodic tasks

Lesson 3: Continuing Tasks

Formulate returns for continuing tasks using discounting
Describe how returns at successive time steps are related to each other
Understand when to formalize a task as episodic or continuing

Weekly Reading

For this week, read Chapter 3.3 (pages 47-56) in Reinforcement Learning: An Introduction

Example

An example of an MDP could be a self driving car. The states would be all of the sensor readings that car gets at each time step: LIDAR, cameras, the amount of fuel left, current wheel angle, current velocity, gps location. The actions could be accelerate, decelerate, turn wheels left, and turn wheels right. The rewards could be -1 at every time step so that the agent is encouraged to get to the goal as quickly as possible, but -1 billion if it crashes or breaks the law so that it knows not to do that.