Intelligent Systems Lecture Notes

23 November 2011 • Markov Decision Processes


Outline

Agent’s World

Agent’s Actions

Agent’s Utility

Agent’s Problem

Markov Decision Processes

MDP Illustrated

mdp schematic

MDP Example

Reward History

Valuing Policy

Vπ() and Qπ()

Optimal Policy Values

Optimal Policies

Value-Iterations Algorithm

Computing Vπ() Without Qπ()

Improving π

Policy-Iterations Algorithm

Summary

References