| This discussion is meant to aid understanding of how the form of the PS-rules and transformations of a GT grammar are related to finite state transducers, which comprise the model under which many computational linguists are working. The facts and the form of the presentation are taken from James Allen (1994, pp. 46, 70-72). A thorough study of how to use finite state transducers in describing morphology in a rigorous fashion suitable for computational linguistics has recently appeared (Beesley & Karttunen, 2003). |
Generative capacity.It is possible to compare the formal grammars that use rewrite rules by measuring their generative capacity. This capacity is a range of symbol systems that the grammar is capable of describing. Mathematicians defined this measure to characterize the formal languages that describe a natural language, such as English, but unless a natural language is defined precisely its own generative capacity cannot be measured. The unstable state of a living language and the limits of human capacities make it difficult if not impossible to measure a natural languages generative capacity with precision. If we accept the existence of ineffable concepts, however, it seems that they would mark the limits of the generative capacity of a natural language. |
A grammar for an ordered sequences of symbols.Consider a formal language, L1, whose "words" consist of the symbols a, b, c, and d. Suppose that this language allows any sequence of these letters in alphabetical order. For example, the sequences: abd, ad, bcd, b, and abcd, would all be legal sentences of L1. A grammar to describe this language may consist of PS-rules the right-hand side of every one of which would consist of one terminal symbol possibly followed by one non-terminal symbol. Such a grammar is called a regular grammar. For L1 the grammar would consist of the following PS-rules: |
S a + S1 |
S b + S2 |
S c + S3 |
S d |
S 1 b + S2 |
S 1 c + S3 |
S 1 d | |
S 2 c + S3 |
S 2 d | ||
S 3 d |
| Thus it takes 10 rules to describe 24 possible sentences. True, natural languages have words that are ordered, but they are not limited just to sentences whose words are in a simple ranked order. Moreover each word does not appear just once in a sentence. |
A grammar for repeated sequences of symbols.Consider another language, L2, whose only sentences are sequences of as followed by an equal number of bs that is, ab, aabb, aaabbb, and so on. It is impossible to write a regular grammar to generate L2, exactly. The grammar that generates L2 is a simple context-free grammar (CFG): |
S a + b |
S a + S + b |
| Adding the ability to describe repeated sequences generalizes the regular grammar and simplifies it. But here again we cannot characterize natural languages by CFGs. We seem to have succeeded in being able to characterize isolated phrases in English, but have hardly brought it very much closer to describing English in all its complexity. |
A grammar for ranked repeated sequences.Here are some purely artificial languages that cannot be generated by a CFG. One language might consist of a sequence of as followed by the same number of bs followed by the same number of cs that is abc, aabbcc, aaabbbccc, and so on. Similarly, no context-free grammar can generate the language that consists of any sequence of letters repeated in the same order twice, such as abab, abcabc, acdabacdab, and so on. A more general grammar can generate such sequences, however. One way to expand the generative capacity is to make use of a context-sensitive grammar, which consists of rules of the form: |
![]() |
| where A is a symbol, alpha and beta are (possibly empty) sequences of symbols, and psi is a non-empty sequence of symbols. We will have occasion to use this kind of rewrite rules not to describe phrase structures but to describe segment (or morpheme) structure. |
The Chomsky hierarchy.Even more general than the context-sensitive grammars are the type 0 grammars, which place no restrictions on their rewrite rules. This allows transformations of every imaginable kind to be written in the format of a rewrite rule. Work in formal language theory began with Chomsky (1957). Since the languages generated by regular grammars are a subset of those generated by context-free grammars, which in turn are a subset of those generated by context-sensitive grammars, which in turn are a subset of those generated by type 0 languages, they form a hierarchy of languages. This heirarchy is called the Chomsky hierarchy. |
Generative capacity of transition networks.It is possible to classify transition network systems by the types of languages they can describe. In fact, there are direct correspondences between various network systems and the rewrite rule systems, hence also correspondences to the GT grammar. |
Finite state machines.For instance, simple transition networks, i.e., finite state machines (FSM) with no beginning arcs are expressively equivalent to regular grammars that is, it is possible to describe any language that you can describe using a simple transition network by means of a regular grammar, and vice versa. An FSM, for L1 above, describable with a regular grammar, is |
![]() |
Recursive transition networks.Recursive transition networks (RTN), on the other hand are expressively equivalent to context-free grammars. Thus we can convert an RTN into a CFG and vice versa. A recursive transition network for L2, the language consisting of a number of as followed by an equal number of bs is |
![]() |
Finite state models for morphological processing.Although in simple examples and small systems you can list all the words allowed by the system, large vocabulary systems face a serious problem in representing the lexicon. Not only are there a large number of words, but each word may combine with affixes to produce additional related words. One way to address this problem is to preprocess the input sentence into a sequence of morphemes. A word may consist of a single morpheme, but often a word consists of a root form plus an affix. For instance, the word eaten consists of the root form eat and the suffix -en, which indicates the past participle form. Without any preprocessing, a lexicon would have to list all the forms of eat, including eats, eating, ate, and eaten. With preprocessing, there would be one morpheme eat that may combine with suffixes such as -ing, -s, and -en, and one entry for the irregular form ate. Thus the lexicon would only need to store two entries (eat and ate) rather than four. Likewise the word happiest breaks down into the root form happy and the suffix -est, and thus does not need a separate entry in the lexicon. Of course, not all forms are allowed; for example, the word seed cannot be decomposed into a root form se (or see) and a suffix -ed. The lexicon would have to encode what forms are allowed with each root. |
![]() |
Finite state transducers.One of the most popular models is based on finite state transducers (FSTs), which are like finite state machines except that they produce an output given an input. In this way they are precisely equivalent to the transformations of a GT grammar. An arc in an FST may be labeled with a pair of symbols. For example, an arc labeled i:y could only be followed if the current input is the letter i and the output is the letter y. This is the condition on a transformation. FSTs can be used to represent the lexicon in a concise way. They transform the surface form of words into a sequence of morphemes. The figure shows a simple FST that defines the forms of the word happy and its derived forms. It transforms the word happier into the sequence happy+er and happiest into the sequence happy+est. |
A finite state transducer as mapping a transformation.Arcs labeled by a single letter have that letter as both the input and the output. Nodes that are double circles indicate success states, that is, acceptable words. Consider processing the input word happier starting from state 1. The upper network accepts the first four letters, happ, and copies them to the output. From state 5 you could accept a y and have a complete word, or you could jump to state 6 to consider affixes. (The dashed link, indicating a jump, is not formally necessary but is useful for showing the break between the processing of the root form and the processing of the suffix.) For the word happier, you must jump to state 6. The next letter must be an i, which is transformed into a y. This is followed by a transition that uses no input (the empty symbol ) and outputs a plus sign. From state 8 the input must be an e, and the output is also e. This must be followed by an r to get to state 10, which is encoded as a double circle indicating a possible end of word. This is a success state for the FST one which properly would have an ending arc proceeding from it. Thus this FST accepts the appropriate forms and outputs the desired sequence of morphemes. |
![]() |
The lexicon as a finite state transducer.The entire lexicon can be encoded as an FST that encodes all the legal input words and transforms them into morphemic sequences. The FSTs for the different suffixes need only be defined once, and all root forms that allow that suffix can point to the same node. Words that share a common prefix (such as torch, toss, and to) also can share the same nodes, greatly reducing the size of the network. The FST in Figure 2 accepts the following words, which all start with t: tie (state 4), ties (10), trap (7), traps (10), try (11), tries (15), to (16), torch (19), torches (15), toss (21), and tosses (15). In addition, it outputs the appropriate sequence of morphemes. |
| Note that you may pass through acceptable states (those with an ending arc attached) along the way when processing a word. For instance, with the input toss you would pass through state 16, indicating that to is a word. However, this analysis is not useful because if to was accepted then the letters ss would not be accounted for. |
| Using such an FST, an input sentence can be processed into a sequence of morphemes. Occasionally, a word will be ambiguous and have multiple different decompositions into morphemes. |