# Catalogue of Artificial Intelligence Techniques

## CYK Algorithm

**Aliases:**
CKY Algorithm

**Keywords:**
algorithm, bottom-up, chart, chomsky, context-free, dynamic, form, grammar, language, natural, non-terminal, normal, parse, terminal

### Categories: Natural Language

Author(s): **Aris Valtazanos**

The Cocke-Younger-Kasami (CYK) Algorithm is a bottom-up dynamic programming algorithm that, given a string s and a Context-Free Grammar G, determines whether s can be generated by G. As it is a bottom-up algorithm, it begins with the most fundamental units of the given string (its terminal symbols) and tries to build trees that will eventually reach the grammar’s non-terminal symbols. The simplest version of the CYK Algorithm parses words of grammars that are in Chomsky Normal Form (i.e. grammars whose production rules have either two non-terminals or one terminal on their right hand side), even though it can be extended to parse any given grammar.

A simple way to understand how the CYK Algorithm works is to picture it as an NxN chart, where N is the length of the given string. The string is numbered from 0 to N so that each letter lies between two numbers (e.g. the string “abab” would be numbered as follows: 0 a 1 b 2 a 3 b 4) and is then placed below the chart. Each row of the chart is numbered from 0 to N-1 and each column from 1 to N. Then, for each substring X[i,j] of length 1 to N, where i and j are the numbers bounding the substring, if there is a production rule in the grammar such that A -> X[i,j], then A is added in the (i,j)th position of the chart. For example, if the given grammar has a start symbol S and contains the rules: {S -> AB | SS, A -> a, B -> b} and the given string is “abab”, the final chart will be the following:

If, as in the above example, the (0, N)th entry contains the grammar’s start symbol, then the string belongs to the grammar. Additionally, by observing the chart we can also obtain a parse of the string, which in this case is: __S__ -> __S__S -> __A__BS -> a__B__S -> ab__S__ -> ab__A__B -> aba__B__ -> abab. For a string of length n, the CYK algorithm can build the chart in O(n³) time.

### References:

- Jurafsky, Daniel and James H. Martin,
*Speech and Language Processing*, An introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Prentice Hall, Upper Saddle River, New Jersey 07458, 2000, pp.361, 392, 455, 474. - Dexter C. Kozen,
*Automata and Computability*(David Gries, Fred B. Schneidered.), Springer, 1997, pp.191-195. - Webber, Bonnie (modified by Frank Keller),
*Informatics 2A lecture notes*, Lecture 16: Chart Parsing - The CYK Algorithm, University of Edinburgh - School of Informatics, 2006, pp.11-26 (http://www.inf.ed.ac.uk/teaching/courses/inf2a/slides/inf2a_l16_slides.pdf).

### Comments:

No comments.