Catalogue of Artificial Intelligence Techniques
Aliases: gradient ascent, gradient method, method of steepest descent, steepest descent
Keywords: differential equation, gradient, maximum, minimum, search
Categories: Neural Networks
Author(s): Christin Wolf
The method of gradient descent, also called the gradient method, is an optimisation algorithm, used in numeric and neural networks for finding the nearest local minimum (F(x)=0) of a function. Because for complex functions in higher dimensions, the equation often can't be solved directly ,an iterative method is used.
Starting from point xo, if the real-valued function F(x) is defined and differentiable in a neighborhood of a point , we compute the gradient. Because for the gradient we also have the interesting property, that it shows in the direction in which it increases fastests ( decrease fastest), we can search along the direction of deepest descent towards the minimum. As many times as needed, move upwards from to along the line extending from in the direction of the gradient. We can compute iteratively by the formula:
= step size
We iterate until the gradient is zero. To have gradient ascent, which goes towards a local maximum, one needs to replace - with .
Gradient descent doesen't succeed in every case. For example it can happen that we end up in an oscillation or the running time increases enormeously if we reach a plateau. In the last case the gradient can become even zero. Also, it depends on the starting point if we reach the global minimum. The reason is that not all points x where the gradient is zero are minima (they can also be saddle points, local minima and so on). This happens if the curvature in all directions is very different. If the gradient near the global minimum is very big, we can leave the optimal minimum because the product of gradient and stepsize is big.
We can modify gradient descent such that the value of the step size is allowed to change in every iteration. Finding the optimal step size per step can be time-consuming. Conversely, using a fixed step size can yield poor results. Methods based on Newton's method and inversion of the Hessian using conjugate gradient techniques are often a better alternative. A more powerful algorithm is given by the BFGS method which consists in calculating on every step a matrix by which the gradient vector is multiplied to go into a "better" direction. Some modification methods used for implement neural networks are Weight Decay, Quickpropagation and rule.
- Mordecai Avriel, Nonlinear Programming: Analysis and Methods, DoverPublishing, 2003, ISBN 0-486-43227-0.
- Jan A. Snyman, Practical Mathematical Optimization, An Introduction toBasic Optimization Theory and Classical and New Gradient-Based Algorithms, 2005, ISBN 0-387-24348-8.