Catalogue of Artificial Intelligence Techniques


Jump to: Top | Entry | References | Comments

View Maths as: Images | MathML

Temporal Difference Methods

Keywords: AHC algorithm, Q-learning, bucket brigade algorithm, games, learning

Categories: Learning

Author(s): Jeremy Wyatt

The temporal credit assignment problem is the problem of assigning credit to particular actions or events in determining the outcome of a process in terms of some measure of success. If for example I win a game of chess, how do I determine which moves in the course of the game contributed to my win? Normally we want to reward good moves (make them more likely) and punish bad ones (make them less likely). Temporal difference methods are a class of learning algorithms which attempt to solve the temporal credit assignment problem. Temporal difference methods work by backpropagating reinforcements (measures of success) generated at any point in a process over chains of state-action pairs. By doing this the learner adjusts estimates of how successful particular actions chosen in particular states will be in the long term. and bucket brigade algorithms are the most recent additions to the field of temporal difference learning. Temporal difference methods are also related to Dynamic Programming.


No references to display.


Add Comment

No comments.