Catalogue of Artificial Intelligence Techniques


Jump to: Top | Entry | References | Comments

View Maths as: Images | MathML

CYK Algorithm

Aliases: CKY Algorithm

Keywords: algorithm, bottom-up, chart, chomsky, context-free, dynamic, form, grammar, language, natural, non-terminal, normal, parse, terminal

Categories: Natural Language

Author(s): Aris Valtazanos

The Cocke-Younger-Kasami (CYK) Algorithm is a bottom-up dynamic programming algorithm that, given a string s and a Context-Free Grammar G, determines whether s can be generated by G. As it is a bottom-up algorithm, it begins with the most fundamental units of the given string (its terminal symbols) and tries to build trees that will eventually reach the grammarís non-terminal symbols. The simplest version of the CYK Algorithm parses words of grammars that are in Chomsky Normal Form (i.e. grammars whose production rules have either two non-terminals or one terminal on their right hand side), even though it can be extended to parse any given grammar.

A simple way to understand how the CYK Algorithm works is to picture it as an NxN chart, where N is the length of the given string. The string is numbered from 0 to N so that each letter lies between two numbers (e.g. the string ďababĒ would be numbered as follows: 0 a 1 b 2 a 3 b 4) and is then placed below the chart. Each row of the chart is numbered from 0 to N-1 and each column from 1 to N. Then, for each substring X[i,j] of length 1 to N, where i and j are the numbers bounding the substring, if there is a production rule in the grammar such that A -> X[i,j], then A is added in the (i,j)th position of the chart. For example, if the given grammar has a start symbol S and contains the rules: {S -> AB | SS, A -> a, B -> b} and the given string is ďababĒ, the final chart will be the following:

If, as in the above example, the (0, N)th entry contains the grammarís start symbol, then the string belongs to the grammar. Additionally, by observing the chart we can also obtain a parse of the string, which in this case is: S -> SS -> ABS -> aBS -> abS -> abAB -> abaB -> abab. For a string of length n, the CYK algorithm can build the chart in O(n≥) time.



Add Comment

No comments.