Intelligent Systems Lecture Notes

25 November 2011 • Reinforcement Learning


Outline

Markov Decision Processes

Reinforcement Learning

A Warm-Up Problem

Passive Learning

Estimating Vπ()

Example

Observations

Improving DUE

Estimating Pr()

Finding Vπ()

ADP Algorithm

Further Improvements

Mean Problems

Computing Averages

The Temporal-Difference Error

Learning Rate

Learning-Rate Example

Move the mouse pointer over the lighter curves at the data points (bends in the curve) to highlight the curve.

The Q-Learning Algorithm

Summary

References