Catalogue of Artificial Intelligence Techniques
Temporal Difference Methods
Keywords: AHC algorithm, Q-learning, bucket brigade algorithm, games, learning
Author(s): Jeremy Wyatt
The temporal credit assignment problem is the problem of assigning credit to particular actions or events in determining the outcome of a process in terms of some measure of success. If for example I win a game of chess, how do I determine which moves in the course of the game contributed to my win? Normally we want to reward good moves (make them more likely) and punish bad ones (make them less likely). Temporal difference methods are a class of learning algorithms which attempt to solve the temporal credit assignment problem. Temporal difference methods work by backpropagating reinforcements (measures of success) generated at any point in a process over chains of state-action pairs. By doing this the learner adjusts estimates of how successful particular actions chosen in particular states will be in the long term. and bucket brigade algorithms are the most recent additions to the field of temporal difference learning. Temporal difference methods are also related to Dynamic Programming.
No references to display.