Info

All the following materials are referred from Speech and Language Processing1-a very good NLP book from Stanford University, and CS224N Lecture 42

Basics

Glossary

  1. Dependency grammars: is one of the grammar formalisms families. The syntactic structure of a sentence is represented by a directed graph, where the nodes are the words of the sentence and the edges are the dependencies between them.

    example of dependency parse

  2. Root: the head of the entire sentence

  3. Head & Dependent: Relations between words are illustrated by directed, labeled arcs. The arc is directed from the head to the dependent. The head plays the role of the central dependent organizing word, and the dependent as a kind of modifier.

Dependency Relations

Two advantages

  1. The head-dependency structure is a good representation of the syntactic structure of a sentence and meanwhile a good proxy for the semantic relations between predicates and their arguments.
  2. The dependency grammars can deal with a relatively free word order well.

Grammatical relation

Linguists have developed taxonomies of relations that go well beyond the familiar notions of subject and object.

The Universal Dependencies (UD) project3 is an open community effort to annotate dependencies and other aspects of grammar across more than 100 languages, provides an inventory of 37 dependency relations.

Basically, the grammatical relations are classified into two types:

  1. Clausal relations that describe syntactic roles with respect to a predicate (often a verb).
  2. Modifier relations that categorize the ways that words can modify their heads. Consider the former example in this page, the clausal relations NSUBJ and OBJ identify the subject and object of the predicate cancel, While the NMOD, DET, and CASE relations denote modifiers of the nouns flights and Houston.

Dependency formulism

  • A dependency structure can be represented as a directed graph , consisting of a set of vertices V, and a set of ordered pairs of vertices A, which we’ll call arcs. The set of arcs, A, captures the headdependent and grammatical function relationships between the elements in V.
  • Among the more frequent restrictions are that the structures must be connected, have a designated root node, and be acyclic or planar. And a dependency tree is a directed graph.

Constraints for a dependency tree

  1. There is a single designated root node that has no incoming arcs.
  2. With the exception of the root node, each vertex has exactly one incoming arc.
  3. There is a unique path from the root node to each vertex in V.

Projectivity

  • An arc from a head to a dependent is said to be projective projective if there is a path from the head to every word that lies between the head and the dependent in the sentence. A dependency tree is then said to be projective if all the arcs that make it up are projective.
  • A dependency tree is projective if it can be drawn with no crossing edges.

Two issues with Projectivity:

  1. It will be incorrect when non-projective examples are encountered. Because the most widely used English dependency treebanks were automatically derived from phrasestructure treebanks generated in such a fashion which will always be projective.
  2. There are computational limitations to the most widely used families of parsing algorithms. Since the transition-based approaches can only produce projective trees, hence any sentences with non-projective structures will contain some errors.

The UD Dependency Treebanks is useful. Using UD, we can get the dependency tree and generate training data for parsing.

Transition-Based Dependency Parsing

Transition-based parsing architecture draws on shift-reduce parsing, a paradigm originally developed for analyzing programming languages. In transition-based parsing we’ll have a stack on which we build the parse, a buffer of tokens to be parsed, and a parser which takes actions on the parse via a predictor called an oracle, as illustrated in the following figure.

  1. Stack: is a list of words or tokens that have been shifted and are awaiting further processing.
  2. Buffer: is a list of words or tokens that have not yet been shifted into the stack.
  3. Parser(oracle): is a predictor that takes a configuration and returns a transition operator.

Transition-based parsing

Arc standard 4

  • In arc standard parsing the transition operators only assert relations between elements at the top of the stack, and once an element has been assigned its head it is removed from the stack and is not available for further processing.
  • This approach is a straightforward greedy algorithm—the oracle provides a single choice at each step and the parser proceeds with that choice.
  • The arc standard approach is quite effective and is simple to implement.

configuration: is the current state of the parse which includes the stack, an input buffer of words or tokens, and a set of relations representing a dependency tree.

Parsing: means making a sequence of transitions through the space of possible configurations.

Transition Operators for Arc Standard

  1. Left-Arc: Assert a head-dependent relation between the word at the top of the stack and the second word; remove the second word from the stack.
  2. Right-Arc: Assert a head-dependent relation between the second word on the stack and the word at the top; remove the top word from the stack.
  3. Shift: Remove the word from the front of the input buffer and push it onto the stack.

The process ends when all the words in the sentence have been consumed and the ROOT node is the only element remaining on the stack.

Parsing Procedure for Arc Standard

  1. Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration,
  2. Otherwise, choose RIGHTARC if (1) it produces a correct head-dependent relation given the reference parse and (2) all of the dependents of the word at the top of the stack have already been assigned,
  3. Otherwise, choose SHIFT.

Attention

The restriction on selecting the RIGHTARC operator is needed to ensure that a word is not popped from the stack, and thus lost to further processing, before all its dependents have been assigned to it.

Parsing Example

“Book the flight through Houston”

StepStackWord ListPredicted ActionDependency Formed
0[root][book, the, flight, through, houston]SHIFT-
1[root, book][the, flight, through, houston]SHIFT-
2[root, book, the][flight, through, houston]SHIFT-
3[root, book, the, flight][through, houston]LEFTARCflightthe (det)
4[root, book, flight][through, houston]SHIFT-
5[root, book, flight, through][houston]SHIFT-
6[root, book, flight, through, houston][]LEFTARChoustonthrough (case)
7[root, book, flight, houston][]RIGHTARCflighthouston (nmod)
8[root, book, flight][]RIGHTARCbookflight (obj)
9[root, book][]RIGHTARCrootbook (root)
10[root][]Done-

Arc-Eager

In an arc-standard approach, dependents are removed from the stack as soon as they are assigned their heads. If flight had been assigned book as its head in Step 4, it would no longer be available to serve as the head of Houston. While this delay doesn’t cause any issues in this example, in general the longer a word has to wait to get assigned its head the more opportunities there are for something to go awry. The arc eager approach gets its name from its ability to assert rightward relations much sooner than in the arc standard approach.

Transition Operators for Arc Eager

  1. LEFTARC: Assert a head-dependent relation between the word at the front of the input buffer and the word at the top of the stack; pop the stack.
  2. RIGHTARC: Assert a head-dependent relation between the word on the top of the stack and the word at the front of the input buffer; shift the word at the front of the input buffer to the stack.
  3. SHIFT: Remove the word from the front of the input buffer and push it onto the stack.
  4. REDUCE: Pop the stack.

Take the same example sentence as before, arc eager approach can produce the same result dependency tree in a different way.

StepStackWord ListActionRelation Added
0[root][book, the, flight, through, houston]RIGHTARCroot → book
1[root, book][the, flight, through, houston]SHIFT-
2[root, book, the][flight, through, houston]LEFTARCflight ← the (det)
3[root, book][flight, through, houston]RIGHTARCbook → flight (obj)
4[root, book, flight][through, houston]SHIFT-
5[root, book, flight, through][houston]LEFTARChouston ← through (case)
6[root, book, flight][houston]RIGHTARCflight → houston (nmod)
7[root, book, flight, houston][]REDUCE-
8[root, book, flight][]REDUCE-
9[root, book][]REDUCE-
10[root][]Done-

Evaluation

Evaluation Metrics

  1. Labeled Attachment Score (LAS): The proper assignment of a word to its head along with the correct dependency relation.
  2. Unlabeled Attachment Score (UAS): The correctness of the assigned head, ignoring the dependency relation.
  • Label Accuracy Score (LS): the percentage of tokens with correct labels, ignoring where the relations are coming from.

evaluation methods

37 Dependency Relations from UD

LabelFull NameDefinitionExample
acladnominal clausefinite or non-finite clause modifying a nominal(28)
advcladverbial clauseadverbial clause modifying a predicate or modifier word(27)
advmodadverbial modifieradverb or adverbial phrase modifying a predicate or modifier word(20a)
amodadjectival modifieradjectival modifier of a nominal(12)
apposappositional modifiera nominal used to define, name, or describe the referent of a preceding nominal(15)
auxauxiliarylinks a function word expressing tense, mood, aspect, voice, or evidentiality to a predicate(16c)
casecase markinglinks a case-marking element (preposition, postposition, or clitic) to a nominal(9)
cccoordinating conjunctionlinks a coordinating conjunction to the following conjunct(23)
ccompclausal complementclausal complement of a verb or adjective without an obligatorily controlled subject(26b)
clfclassifier(numeral) classifier; a word reflecting a conceptual classification of nouns linked to a numeric modifier or determiner(11)
compoundcompoundany kind of word-level compounding (noun compound, serial verb, phrasal verb)(37)
conjconjunctlinks two elements which are conjoined(23)
copcopulalinks a function word used to connect a subject and a nonverbal predicate to the nonverbal predicate(17a)
csubjclausal subjectclausal syntactic subject of a predicate(25)
depunspecified dependencyunspecified dependency, used when a more precise relation cannot be determined-
detdeterminerdeterminer (article, demonstrative, etc.) in a nominal(10)
discoursediscourse elementdiscourse element (interjection, filler, or non-adverbial discourse marker)(20b)
dislocateddislocated elementa peripheral (initial or final) nominal in a clause that does not fill a regular role of the predicate but has roles such as topic or afterthought(22b)
explexpletivelinks a pronominal form in a core argument position but not assigned any semantic role to a predicate(22c)
fixedfixed multiword expressionlinks elements of grammaticalized expressions that behave as function words or short adverbials(39)
flatflat multiword expressionlinks elements of headless semi-fixed multiword expressions like names(40)
goeswithgoes withlinks parts of a word that are separated but should go together according to standard orthography or linguistic wordhood(44)
iobjindirect objectindirect object; nominal core argument of a verb that is not its subject or (direct) object(16c)
listlistlinks elements of comparable items interpreted as a list(46)
markmarkerlinks a function word marking a clause as subordinate to the predicate of the clause(27a)
nmodnominal modifiernominal modifier; a nominal modifying another nominal(13)
nummodnumeric modifiernumeric modifier; numeral in a nominal(10)
nsubjnominal subjectnominal subject; nominal core argument which is the syntactic subject (or pivot) of a predicate(16)
objobjectobject; the core argument nominal which is the most basic core argument that is not the subject, typically the most directly affected participant(16)
oblobliqueoblique; a nominal functioning as a non-core (oblique) modifier of a predicate(21)
orphanorphanlinks orphaned dependents of an elided predicate(43)
parataxisparataxislinks constituents placed side by side with no explicit coordination or subordination(32)
punctpunctuationpunctuation attached to the head of its clause or phrase(23b)
reparandumreparandumrepair of a (normally spoken language) disfluency(45)
rootrootroot of the sentence(16)
vocativevocativenominal directed to an addressee(22a)
xcompclausal complement with controlled subjectclausal complement of a verb or adjective with an obligatorily controlled subject(26a)

References

Footnotes

  1. Speech and Language Processing: version 3

  2. CS224N Lecture 4

  3. Universal Dependencies

  4. Arc standard approach