# Catalogue of Artificial Intelligence Techniques

## Gradient descent

**Aliases:**
gradient ascent, gradient method, method of steepest descent, steepest descent

**Keywords:**
differential equation, gradient, maximum, minimum, search

### Categories: Neural Networks

Author(s): **Christin Wolf**

__Gradient descent__

The method of gradient descent, also called the gradient method, is an optimisation algorithm, used in numeric and neural networks for finding the nearest local minimum (F(x)=0) of a function. Because for complex functions in higher dimensions, the equation often can't be solved directly ,an iterative method is used.

Starting from point xo, if the real-valued function F(x) is defined and differentiable in a neighborhood of a point , we compute the gradient. Because for the gradient we also have the interesting property, that it shows in the direction in which it increases fastests ( decrease fastest), we can search along the direction of deepest descent towards the minimum. As many times as needed, move upwards from to along the line extending from in the direction of the gradient. We can compute iteratively by the formula:

= step size

We iterate until the gradient is zero. To have gradient ascent, which goes towards a local maximum, one needs to replace - with .

**Problems**

Gradient descent doesen't succeed in every case. For example it can happen that we end up in an oscillation or the running time increases enormeously if we reach a plateau. In the last case the gradient can become even zero. Also, it depends on the starting point if we reach the global minimum. The reason is that not all points x where the gradient is zero are minima (they can also be saddle points, local minima and so on). This happens if the curvature in all directions is very different. If the gradient near the global minimum is very big, we can leave the optimal minimum because the product of gradient and stepsize is big.

**Step size**

We can modify gradient descent such that the value of the step size is allowed to change in every iteration. Finding the optimal step size per step can be time-consuming. Conversely, using a fixed step size can yield poor results. Methods based on Newton's method and inversion of the Hessian using conjugate gradient techniques are often a better alternative. A more powerful algorithm is given by the BFGS method which consists in calculating on every step a matrix by which the gradient vector is multiplied to go into a "better" direction. Some modification methods used for implement neural networks are Weight Decay, Quickpropagation and rule.

### References:

- Mordecai Avriel,
*Nonlinear Programming: Analysis and Methods*, DoverPublishing, 2003, ISBN 0-486-43227-0. - Jan A. Snyman,
*Practical Mathematical Optimization*, An Introduction toBasic Optimization Theory and Classical and New Gradient-Based Algorithms, 2005, ISBN 0-387-24348-8.

### Comments:

No comments.