6.863 // Bo Kim (kaede11@mit.edu) // Spring 2005
Home | Lab 1a | Lab 1b | Lab 2a | Lab 2b | Lab 3

Laboratory 1a -- Word Parsing

Question 1

(In order to minimize confusion, I will capitalize the LR entities and italicize the SR entities.)

 

When I have the system generate from the underlying (lexical) string REFER+ING, the result that I actually get is refering instead of referring.  Rule 15 (gemination) specifies that the lexical string for re0ferr0ed is RE`FER0+ED instead of REFER0+ED, so the way to repair it would be to include ` in the input that I provide.  The ` precedes the syllable that is stressed in a word.  When I provide RE`FER+ING as the input, the correct result, referring, is generated.  Shown below is a screen shot taken of the correct generation:

q1screenshot.gif

Question 2

When I have the system recognize dogs, the result is `DOG+S [Noun(dog) +PL].  In order to make dogs be recognized also as a verb, I edited the Alternations/Lexicon file, englishpyk.lex, in the following way:  Under V_ROOT_NO_PREF:, I added the line `dog  V_Root4 Verb(dog), as shown below:

q2screenshot1.gif

Recognizing dogs after the edit, it is also recognized as a verb.  Shown below is a screen shot taken of the elaborated recognition:

q2screenshot2.gif

Question 3

From examining the trace for recognizing flier, I cannot agree with the quote from one of Kimmos original authors that the recognizer only makes a single left-to-right pass through the string as it homes in on its target in the lexicon.  The diagram below shows part of the trace for recognizing flier (moving across from left to right means that the previous selection is accepted, i.e. not forbidden by any of the automata, and the state labels are provided only for ease of discussion):

q3diagram1.gif

Notice that, although the eventual correct input-output pair is as highlighted in green, Kimmo accepts the incorrect input-output pair (highlighted in yellow) in State B, moving onto State C, working on the next pair to the right.  Since the incorrect input-output pair needs to be corrected at a later point in time before success can be met, Kimmo actually returns left to the part of the string made up of the incorrect pair.  This behavior clearly contradicts the quote of the one original author.

In order to characterize the kind of situation that has risen here, let us look at the recognition for slipped, which also steps back to the left after proceeding with an incorrect pair:

q3diagram2.gif

Next, let us compare happiers recognition, which steps back, and happilys recognition, which does not step back:

q3diagram3.gif

q3diagram4.gif

From above, we can see that happiers recognition needs to step back from accepting the same incorrect pair as fliers recognition, highlighted in yellow.  On the other hand, when happilys recognition encounters an incorrect pair (highlighted in red), it rejects it right away, letting it proceed to the end without stepping back.

The kind of situation presented here can thus be characterized as being dependent on whether or not incorrect but not impossible pairs, such as 0:E and 0:+ (as in recognizing flier, slipped, and happier), are encountered by Kimmo or not.  Kimmo knows that a pair such as 0:L (as in recognizing happily) is never possible in the English language, and therefore can reject it, but cannot do the same for pairs such as 0:E and 0:+ that are possible in English, making it necessary for it to make a mistake first, then step back to the left to correct it.

Now, let us observe generation.  In general, it can be claimed that lexical forms are always logical (the plural form for FOX is FOX+S, like math), whereas surface forms may be illogical (the plural form for fox is not just an s appended to fox).  So during recognition, there are many fewer lexical outputs to be searched through to match the surface inputs, compared to the number of surface outputs that can initially seem to be correct for the lexical input in generation, when moving from left to right of a word.  Thus the stepping back situation occurs much more frequently for generation, even for words that did not need to step back during recognition.  For instance, although happilys recognition did not contain any steps back, its generation needs to step back from initially accepting and moving forward from the state h:H a:A p:P p:P y:Y.  Therefore, generation is similar to recognition in that stepping back occurs, yet different in the abundance of the occurrence and the exact words to which the situation occurs.

Question 4

I.  Testing the rules one by one

Generating from the lexical form kaNpat, we get kampat by enforcing just Rule 1.  Generating from kampat, we get kammat by enforcing just Rule 2.

II.  Ordering the rules one after another

Running Rule 2 then Rule 1, the lexical form kaNpat generates kampat, as expected.  Running Rule 1 then Rule 2, the lexical form kaNpat generates kampat also, although if the order is being abided, kammat should be generated.  But since Kimmo is a two-level model, rule ordering is not available, which causes the deviation from the expected result.

III.  Rewriting the automata to allow simultaneous application of the rules

In rewriting the automata, I chose to alter each automaton individually, instead of combining the two into a single automaton.  The altered automata are shown below in the screen shot of the main Kimmo interface, and the following sequence of shots show the test output at each stage of the trace:

q4screenshot1.gif

q4screenshot2a.gif

q4screenshot2b.gif

q4screenshot2c.gif

q4screenshot2d.gif

q4screenshot2e.gif

q4screenshot2f.gif

q4screenshot2g.gif

q4screenshot2h.gif

q4screenshot2i.gif

q4screenshot2j.gif

q4screenshot2k.gif

q4screenshot2l.gif

q4screenshot2m.gif

(1)  A notable characteristic of my design choice to alter each automaton individually is the maintenance of simplicity in each of the automata.  If 4 to 5 interacting rules were to be handled, each of them would be limited to a small number of states each, as they would each be merely allowing others to be carried out.  They do so by not including the triggers (input-output pairs that initially move the state from State 1) of other rules to be carried away in their @:@.  Thus each additional interaction would require additional pairs to be specified for each of the interacting rules, which may in turn sacrifice overall topological efficiency by not making complex state connections within each automaton.

(2)  With regards to a language that can be automated by a limited number of automata, such as English, what logically happens to the language can be clearly illustrated and followed using my choice of design, making it relatively transparent.  Yet, the separate rules that govern the natural language may not be arranged in such a bureaucratic manner (only a couple of states in each automaton, with numerous State 1s).  Rather, it seems more likely that natural language abides by fewer, or even a single, automaton that controls rule interaction.

Bo Sung Kim © 2005 kaede11