6강 | Value Function Approximation | Notion

Large-Scale Reinforcement Learning

강화학습은 큰 문제를 풀때도 사용될 수 있다.
Backgammon: 10^20 states
Computer Go: 10^170 states
Helicopter: Continuous state space // 테이블 룩업을 사용할 수 가 없다.

Value Function Approximation

지금까지 value function을 lookup table로 표현했다.
large MDPs의 문제는 많은 스테이트를 메모리에 담을 수 없고, 속도가 너무 느릴 것 이다.
Solution for large MDPs
- v^hat(s,w) $\approx v_\pi(s)$
- q^hat(s,a,w) $\approx q_\pi(s,a)$
- v^hat, q^hat은 각각 v_pi, q_pi를 근사하는 approximator이다.
- 파라미터 w를 MC나 TD를 통해 update시킨다.
- unseen state를 seen states로 부터 generalize를 한다.

Which Function Approximator?

differentiable
- Linear combinations of features
- Nerual network
Decision tree
Nearest neighbor
Fourier / wavelet bases
더해서 non-stationary, non-iid data가 필요하다.