Catalogue of Artificial Intelligence Techniques

   

Jump to: Top | Entry | References | Comments

View Maths as: Images | MathML

k-Nearest Neighbour and kD-Trees

Keywords: kd, nearest, neighbour, pattern, search, space, tree

Categories: Pattern Recognition and Image Processing


Author(s): Ian Brechin

The k-Nearest Neighbour algorithm is a pattern recognition method whereby objects are classified according to the classification of their k nearest neighbours from the training set.

The training examples are represented as points in a vector space with dimension equal to the number of features recorded concerning the objects. The nearest neighbours are defined as the k points from the training set with the lowest straight-line distances in the multidimensional vector space from the object to be classified. The most common classification amongst the k nearest neighbours is then used to classify the object.

The main detriment to the accuracy of the k-Nearest Neighbour algorithm is the likely presence of noise in the training set. Choosing a greater value of k will reduce the significance of inaccurate or exceptional training data, but at the same time the broadening of the range will result in more general classification of objects.

The most obvious method of finding the k-nearest neighbours is to compare the vector of the object with every sample from the training set. This method is, however, linear in complexity, and thus other algorithms have been created to find the k nearest neighbours more efficiently.

Constructing a kD-Tree of the training set allows for a search algorithm with logarithmic rather than linear complexity, greatly increasing the efficiency of the k-nearest neighbour search. A kD-Tree is formed by subdividing the vector space of the training set into hyper rectangles, with each node in the tree representing one of these sup spaces. The search algorithm determines which leaf node contains the vector point of the object to be classified, and then classifies it according to the data points from the training set within that hyper rectangle. Whilst this method is not guaranteed to give the exact nearest neighbours, it is a good approximation.


References:


Comments:

Add Comment

No comments.