Reward

Reward Hypothesis

State transition probability

Expected Reward for state-action pair

Return