Laboratory 1a -- Word Parsing

When I have the system recognize dogs, the result is `DOG+S [Noun(dog) +PL]. In order to make dogs be recognized also as a verb, I edited the Alternations/Lexicon file, englishpyk.lex, in the following way: Under V_ROOT_NO_PREF:, I added the line `dog V_Root4 Verb(dog), as shown below:

From examining the trace for recognizing flier, I cannot agree with the quote from one of Kimmo’s original authors that “the recognizer only makes a single left-to-right pass through the string as it homes in on its target in the lexicon.” The diagram below shows part of the trace for recognizing flier (moving across from left to right means that the previous selection is accepted, i.e. not forbidden by any of the automata, and the state labels are provided only for ease of discussion):

Notice that, although the eventual correct input-output pair is as highlighted in green, Kimmo accepts the incorrect input-output pair (highlighted in yellow) in State B, moving onto State C, working on the next pair to the right. Since the incorrect input-output pair needs to be corrected at a later point in time before success can be met, Kimmo actually returns left to the part of the string made up of the incorrect pair. This behavior clearly contradicts the quote of the one original author.

From above, we can see that happier’s recognition needs to step back from accepting the same incorrect pair as flier’s recognition, highlighted in yellow. On the other hand, when happily’s recognition encounters an incorrect pair (highlighted in red), it rejects it right away, letting it proceed to the end without stepping back.

The kind of situation presented here can thus be characterized as being dependent on whether or not incorrect but not impossible pairs, such as 0:E and 0:+ (as in recognizing flier, slipped, and happier), are encountered by Kimmo or not. Kimmo knows that a pair such as 0:L (as in recognizing happily) is never possible in the English language, and therefore can reject it, but cannot do the same for pairs such as 0:E and 0:+ that are possible in English, making it necessary for it to make a mistake first, then step back to the left to correct it.

Now, let us observe generation. In general, it can be claimed that lexical forms are always “logical” (the plural form for FOX is FOX+S, like math), whereas surface forms may be “illogical” (the plural form for fox is not just an s appended to fox). So during recognition, there are many fewer lexical outputs to be searched through to match the surface inputs, compared to the number of surface outputs that can initially seem to be correct for the lexical input in generation, when moving from left to right of a word. Thus the stepping back situation occurs much more frequently for generation, even for words that did not need to step back during recognition. For instance, although happily’s recognition did not contain any steps back, its generation needs to step back from initially accepting and moving forward from the state h:H a:A p:P p:P y:Y. Therefore, generation is similar to recognition in that stepping back occurs, yet different in the abundance of the occurrence and the exact words to which the situation occurs.

I. Testing the rules one by one

Generating from the lexical form kaNpat, we get kampat by enforcing just Rule 1. Generating from kampat, we get kammat by enforcing just Rule 2.

II. Ordering the rules one after another

Running Rule 2 then Rule 1, the lexical form kaNpat generates kampat, as expected. Running Rule 1 then Rule 2, the lexical form kaNpat generates kampat also, although if the order is being abided, kammat should be generated. But since Kimmo is a two-level model, rule ordering is not available, which causes the deviation from the expected result.

III. Rewriting the automata to allow simultaneous application of the rules

In rewriting the automata, I chose to alter each automaton individually, instead of combining the two into a single automaton. The altered automata are shown below in the screen shot of the main Kimmo interface, and the following sequence of shots show the test output at each stage of the trace:

(1) A notable characteristic of my design choice to alter each automaton individually is the maintenance of simplicity in each of the automata. If 4 to 5 interacting rules were to be handled, each of them would be limited to a small number of states each, as they would each be merely allowing others to be carried out. They do so by not including the triggers (input-output pairs that initially move the state from State 1) of other rules to be carried away in their @:@. Thus each additional interaction would require additional pairs to be specified for each of the interacting rules, which may in turn sacrifice overall topological efficiency by not making complex state connections within each automaton.

(2) With regards to a language that can be automated by a limited number of automata, such as English, what logically happens to the language can be clearly illustrated and followed using my choice of design, making it relatively transparent. Yet, the separate rules that govern the natural language may not be arranged in such a bureaucratic manner (only a couple of states in each automaton, with numerous State 1s). Rather, it seems more likely that natural language abides by fewer, or even a single, automaton that controls rule interaction.