By the end of this module, you should have achieved the following learning objectives:
Lesson 1: Policy Evaluation (Prediction)
- Understand the distinction between policy evaluation and control
- Explain the setting in which dynamic programming can be applied, as well as its limitations
- Outline the iterative policy evaluation algorithm for estimating state values under a given policy
- Apply iterative policy evaluation to compute value functions
Lesson 2: Policy Iteration (Control)
- Understand the policy improvement theorem
- Use a value function for a policy to produce a better policy for a given MDP
- Outline the policy iteration algorithm for finding the optimal policy
- Understand “the dance of policy and value”
- Apply policy iteration to compute optimal policies and optimal value functions
Lesson 3: Generalized Policy Iteration
- Understand the framework of generalized policy iteration
- Outline value iteration, an important example of generalized policy iteration
- Understand the distinction between synchronous and asynchronous dynamic programming methods
- Describe brute force search as an alternative method for searching for an optimal policy
- Describe Monte Carlo as an alternative method for learning a value function