LatAnalysis (IN PROGRESS)


  • Parse Whitaker's Words output ✅

  • Build graphs of possible sentences

  • Build training model and perform Latin->English substitution (POSTPONED)

Parsing Whitaker's Words output

  • Uses subprocess to run bin/words.exe command line application

  • Generally straightforward

    • Forms (Person, Number, Case, etc)

    • Dictionary Entry [TAGS]

    • Definition

  • Create a WordAnalysis object with all information

    • There is a child class Word that contains relevant information for different parts of speech


WORDS interactive session

Sentence Graphs

  • Large graph structure of subgraphs containing the sentence structure. In the subgraph, words that modify other words are connected by a directed edge. The larger graph is simply a way of keeping track of the subgraphs.

  • Finally, the leaves of the large graph are taken and unique graphs are chosen. Usually duplicates because of the same result after different order of appending.

Sentence graph for "amicus canem salutat" (the friend greets the dog)

The next step is adding functionality for all parts of speech, and removing duplicate sentences. Parts of speech were done on a case by case basis, and duplicate sentences were compared by comparing node and edge values.

All parts of speech have been taken care of. Now, the job is to write out the rules of Latin. Ablative absolutes, substantive adjectives, prepositional phrases have already been implemented. I am not looking to definitively solve the Latin language. I am only trying to show the versatility of my model. Next up will be more complex grammatical constructions such as indirect speech, adding support for verbs made up of multiple words, etc.

The direction graph is getting a little wild (this is only 3 words and each blob is a possible sentence). I hope to do more pruning (ie. get rid of all duplicates early). The final pool is of 12 possible sentences.

How could there be 12 sentences from 3 words you ask? Well, the sentence in this case is "errare est humanum". Firstly, errare could be a syncopated verb form as well as an infinitive. This is the left half of the tree where the program can't find a place to put another verb. "est" also has an archaic meaning of "eat" that is added as a possibility (see the two larger branches on the right side). Additionally, there are 7 possiblities for "humanum". 3 as a noun in the nominative, vocative, and accusative case, and 4 as an adjective in the nominative, vocative, accusative (x2 because of masculine and neuter). This is the jumble of nodes at the bottom right.