Catalogue of Artificial Intelligence Techniques
Aliases: Dimension Reduction
Keywords: construction, extraction, feature, learning, mining, selection, statistics
Categories: Knowledge Representation
Author(s): Steven Holmes
Dimensionality reduction is a technique for reducing the number of features (random variables) associated with a dataset. It is utilised in many problem domains, including data mining, data classification and pattern recognition. Reducing the number of features is advantageous as it reduces time and space requirements needed for analysis and mitigates the "curse of dimensionality" problem in which the number of measurements needed to estimate a probability distribution grows exponentially with the number of features. This latter property is particularly crucial for machine learning, where often only a few samples are available, each with a large number of features.
Effective dimensionality reduction retains the properties of the original dataset that are of interest. A bank, for instance, might wish to find out which customers may be receptive to mortgage offers. In this case, given the full customer records, it would be useful to reduce them to a feature vector that indicates who is likely to soon buy a house. Similarly, in OCR it may be useful to reduce the (likely large) feature vector for an input to those features that most discriminate between letters.
Feature selection is an approach that tries to select the most useful subset of features from the original feature vector. Techniques for this include using "filters" that rank and select individual features, and "subset selection" methods that try to select groups of complementary features (which avoids the pitfalls of considering features in isolation).
Feature construction (or feature extraction) tries to construct a new feature vector from the original. Two common, simple, general techniques for this are PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) which utilise a linear map from the original feature vector to the new one.
Both feature selection and feature construction can benefit dramatically if domain-specific knowledge is incorporated into their results.
- Guyon, I. and Elisseeff, A., An Introduction to Variable and Feature Selection Journal of Machine Learning Research Special Issue: Variable and Feature Selection (2003), The MIT Press, 55 Hayward Street, Cambridge, MA 02142-1493, USA, 1157-1182.
- Fodor, I. K., A survey of dimension reduction techniques, UCRL-ID-148494 Lawrence Livermore National Laboratory Technical Report (2002).