Module 3 Learning Objectives
By the end of this lesson, you should have achieved the following learning objectives:
Lesson 1: Policies and Value Functions
- Recognize that a policy is a distribution over actions for each possible state
- Describe the similarities and differences between stochastic and deterministic policies
- Identify the characteristics of a well-defined policy
- Generate examples of valid policies for a given MDP
- Describe the roles of state-value and action-value functions in reinforcement learning
- Describe the relationship between value functions and policies
- Create examples of valid value functions for a given MDP
Lesson 2: Bellman Equations
- Derive the Bellman equation for state-value functions
- Derive the Bellman equation for action-value functions
- Understand how Bellman equations relate current and future values
- Use the Bellman equations to compute value functions
Lesson 3: Optimality (Optimal Policies & Value Functions)
- Define an optimal policy
- Understand how a policy can be at least as good as every other policy in every state
- Identify an optimal policy for given MDPs
- Derive the Bellman optimality equation for state-value functions