6.863 // Bo Kim (kaede11@mit.edu) // Spring 2005
Home | Lab 1a | Lab 1b | Lab 2a | Lab 2b | Lab 3

Laboratory 1b -- Spanish Word Parsing

I.  How my system operates

In order to successfully carry out morphological analysis for a subset of Spanish, I primarily consulted the provided englishpyk.rul file.

 

A.  Adding lexical characters

I initially added three lexical (underlying) characters -- C for a possible c softening, J for a possible g softening, and Z for a possible z insertion -- as are mentioned in Section 1.1.1 of the Lab 1b handout.  Eventually, J was the only additional lexical character to be used in the morphological analyses carried out.

 

B.  Defining subsets

I define six subsets of characters --- V for vowels, BACK for back vowels, FRONT for front vowels, LOW for low vowels, HIGH for high vowels, and CONSONANTS for consonants --- as are mentioned in Section 1.1.2 of the Lab 1b handout.  Of course, the subsets for vowels and certain types of vowels include the accented vowels that exist in Spanish (á, é, í, ó, and ü), and the subset for consonants likewise contains the n tilde (ñ).  In my rule automata, I actually only use the BACK, FRONT, and CONSONANTS subsets to specify characters that affect the mutations and pluralization.

 

C.  Rule automata

The g-j mutation rule (RULE 3 in my spanishrulBo.rul file, for which the link is provided in Section II of this report) ensures that the consonant g becomes a j before back vowels, but remains as a g otherwise.  I utilize the defined BACK subset and the added lexical character J, which can either be represented as a g or a j on the surface.  The following is the graphical form of the implemented finite state automaton:

rul3actual.jpg

The z-c mutation rule (RULE 4) ensures that the consonant z becomes a c before front vowels, but remains as a z otherwise.  I utilize the defined FRONT subset.  Similar to that of RULE 3 above, the following is the graphical form of the implemented finite state automaton:

rul4actual.jpg

The pluralization rule (RULE 5) ensures that pluralizing (adding s) to a noun that ends in a consonant induces an e to appear on the surface, preceding the s.  I utilize the defined CONSONANTS subset.  The following is the graphical form of the implemented finite state automaton (please note that CONSONANTS:CONSONANTS is represented as CONS:CONS):

rul5actual.jpg

D.  Lexicon automaton

The lexicon automaton (contained in my spanishlexBo.lex file, for which the link is provided in Section II of this report) has nine distinct states and seven transition arc descriptions, assembled as shown in the figure below:

lexfsagraphic1.gif

The NUMBER description specifies whether the noun being handled is singular or plural, the V_SUFFIX1 description specifies the verb conjugation (suffixes) for ar verbs, the V_SUFFIX2 similarly specifies for er verbs, and the V_SUFFIX3 is for ir verbs.  The N_ROOT description provides the root of the noun being handled and its English meaning, the V_ROOT provides similar information when a verb is being handled, and the END description indicates the arc the state transition will follow at the end of an analysis.

 

E.  Issue during development

The only major issue that I faced during development was the need to revise my initial formulation of the z-c mutation rule (RULE 4) due to its interaction with the pluralization rule (RULE 5).  Initially, RULE 4 did not contain the arcs that are labeled as +:e in its graphical form shown above.  But since the rules are run simultaneously, and consonant changes from z to c need the opportunity to be reflected on the surface before es is added to pluralize a noun with a root ending in z, the revision was implemented.

 

F.  System extendibility

If one had to add more nouns and verbs to the system (without extending the morphological processes that can be handled), he only needs to list the roots of those additional nouns and verbs (specifying verb type ar(1), er(2), or ir(3)) under the N_ROOT and V_ROOT transition arc descriptions, respectively, within the lexicon file spanishlexBo.lex.  My system is thus highly extendible to include more nouns and verbs, although the task of inclusion could possibly be made even easier with 1) a method for automatically specifying verb type and 2) a collaborative function (either within the system or separately) that outputs the root to any input verb or noun that needs to be included, then in turn automatically includes the outputted root under the respective transition arc description in the lexicon file.

II.  Pointers to my automata files

http://web.mit.edu/~kaede11/Public/spanishrulBo.rul

http://web.mit.edu/~kaede11/Public/spanishlexBo.lex

III.  Log of batch run on spanish.rec

----- Tue, 15 Feb 2005 05:37 AM -----
;; Good examples (47)
coger -> coJ+er        [Verb(catch,seize,grab) .INF]
cojo -> coJ+o        [Verb(catch,seize,grab) +PresInd1pSG]
coges -> coJ+es        [Verb(catch,seize,grab) +PresInd2pSG]
coge -> coJ+e        [Verb(catch,seize,grab) +PresInd3pSG]
cogemos -> coJ+emos        [Verb(catch,seize,grab) +PresInd1pPL]
cogen -> coJ+en        [Verb(catch,seize,grab) +PresInd3pPL]
coja -> coJ+a        [Verb(catch,seize,grab) +PresSubj1pSG]
        coJ+a        [Verb(catch,seize,grab) +PresSubj3pSG]
llegar -> lleg+ar        [Verb(arrive) .INF]
llego -> lleg+o        [Verb(arrive) +PresInd1pSG]
llegan -> lleg+an        [Verb(arrive) +PresInd3pPL]
pagar -> pag+ar        [Verb(pay) .INF]
pago -> pag+o        [Verb(pay) +PresInd1pSG]
pagan -> pag+an        [Verb(pay) +PresInd3pPL]
cruzar -> cruz+ar        [Verb(cross) .INF]
cruzo -> cruz+o        [Verb(cross) +PresInd1pSG]
cruzas -> cruz+as        [Verb(cross) +PresInd2pSG]
cruza -> cruz+a        [Verb(cross) +PresInd3pSG]
cruzamos -> cruz+amos        [Verb(cross) +PresInd1pPL]
cruzan -> cruz+an        [Verb(cross) +PresInd3pPL]
cruce -> cruz+e        [Verb(cross) +PresSubj1pSG]
         cruz+e        [Verb(cross) +PresSubj3pSG]
l^piz -> l^piz        [Noun(pencil) .SG]
l^pices -> l^piz+s        [Noun(pencil) +PL]
ciudad -> ciudad        [Noun(city) .SG]
ciudades -> ciudad+s        [Noun(city) +PL]
bota -> bota        [Noun(boot) .SG]
botas -> bota+s        [Noun(boot) +PL]
cojas -> coJ+as        [Verb(catch,seize,grab) +PresSubj2pSG]
cojamos -> coJ+amos        [Verb(catch,seize,grab) +PresSubj1pPL]
cojan -> coJ+an        [Verb(catch,seize,grab) +PresSubj3pPL]
conozcas -> conozc+as        [Verb(know) +PresSubj2pSG]
conozcamos -> conozc+amos        [Verb(know) +PresSubj1pPL]
conozcan -> conozc+an        [Verb(know) +PresSubj3pPL]
parezcas -> parezc+as        [Verb(seem) +PresSubj2pSG]
parezcamos -> parezc+amos        [Verb(seem) +PresSubj1pPL]
parezcan -> parezc+an        [Verb(seem) +PresSubj3pPL]
venzas -> venz+as        [Verb(conquer,defeat) +PresSubj2pSG]
venzamos -> venz+amos        [Verb(conquer,defeat) +PresSubj1pPL]
venzan -> venz+an        [Verb(conquer,defeat) +PresSubj3pPL]
cuezas -> cuez+as        [Verb(cook,bake) +PresSubj2pSG]
cuezamos -> cuez+amos        [Verb(cook,bake) +PresSubj1pPL]
cuezan -> cuez+an        [Verb(cook,bake) +PresSubj3pPL]
ejerzas -> ejerz+as        [verb(exercise,practice) +PresSubj2pSG]
ejerzamos -> ejerz+amos        [verb(exercise,practice) +PresSubj1pPL]
ejerzan -> ejerz+an        [verb(exercise,practice) +PresSubj3pPL]
cruces -> cruz+es        [Verb(cross) +PresSubj2pSG]
crucemos -> cruz+emos        [Verb(cross) +PresSubj1pPL]
crucen -> cruz+en        [Verb(cross) +PresSubj3pPL]
;;;
;;; Bad Examples  (13)
;;;
llejo ->
lleja ->
cogo ->
coga ->
cruco ->
cruca ->
crucan ->
cruze ->
l^pizes ->
ciudads ->
l^pizs ->
l^pics ->
botaes ->

Bo Sung Kim © 2005 kaede11