Catalogue of Artificial Intelligence Techniques
Aliases: ID3 Approach
Keywords: classification, nonnumeric, pattern, recognition
Categories: Pattern Recognition and Image Processing
Author(s): Joe Bain
This algorithm is a procedure for generating efficient discrimination trees for elements that have non-numeric values or attributes. The algorithm works on a set of training data, which has been sorted into classes, and creates a tree which divides the data based on common attributes. The tree can then be used to classify real-world data of the same variety. It is best used on very large sets of data where each element is made up of a long list of attributes. There are more sophisticated techniques, such as the Procedure M, but these are not so efficient.
The algorithm works by dividing the given data based on each of the different attributes the elements have and determines the increase or decrease in mixed-class branches after each divide. The divide which produces the most sorted set of branches (or least entropy) in terms of class is finalised and the algorithm moves onto the second stage to repeat the procedure on each of the branches. This continues until each lowest branch contains only elements of one class.
The main advantages of the ID3 algorithm are that it is easily implemented, being quite a simple process, and its running time increases only linearly with the complexity of the problem. The main disadvantage is that the tree cannot be updated when new data is classified incorrectly, instead a new tree must be generated.
- Yoh-Han Pao, Adaptive Pattern Recognition and Neural Networks, Addison-Wesley, Reading, Massachusetts, 1989, pp.85-93.