The markedup File

This is currently internal documentation only and is subject to change.

The markedup file provides information about heads and dependencies, and also defines the mapping to Briscoe and Carroll style grammatical relations. B&C style relations (GRs) are output when the --printer grs option is used, which is the default option for parser output. CCG dependencies are output when the --printer deps option is used.

The file is located in the cats subdirectory of your parser model directory (models/parser/cats/markedup in the predefined models).

Full details can be found in Clark & Curran 2007 (Computational Linguistics); see in particular Section 11.1 and Appendix B.

Example Sentences

Our first example is People know that mice have tails. Here is the parser output, showing the CCG lexical categories and the GRs.

(dobj have_4 tails_5)
(ncsubj have_4 mice_3 _)
(ccomp that_2 know_1 have_4)
(ncsubj know_1 People_0 _)
<c> People|NNS|N know|VBP|(S[dcl]\NP)/S[em] that|IN|S[em]/S[dcl] mice|NNS|N have|VBP|(S[dcl]\NP)/NP tails|NNS|N .|.|.

Our second example is Peter likes to dance. Parser output:

(xcomp to_2 likes_1 dance_3)
(ncsubj dance_3 Peter_0 _)
(ncsubj likes_1 Peter_0 _)
<c> Peter|NNP|N likes|VBZ|(S[dcl]\NP)/(S[to]\NP) to|TO|(S[to]\NP)/(S[b]\NP) dance|VB|S[b]\NP .|.|.

GR Mapping Notation

%l indicates the lexical item and %f indicates the head of the constituent that fills an argument slot (the filler).

The markedup entry for the category (S[dcl]\NP)/S[em] is:

(S[dcl]\NP)/S[em]
  2 ((S[dcl]{_}\NP{Y}<1>){_}/S[em]{Z}<2>){_}
  1 ncsubj %l %f _
  2 ccomp %f %l %c =S[em]/S[dcl]
  2 ccomp %f %l %c =S[em]/S[b]

In example 1, the lexical item know has this category. Argument slot 1 is filled by people. Therefore the line

1 ncsubj %l %f _

will result in the output

ncsubj know_1 People_0 _

Difference between %c and %k

Two special slots are represented by %c and %k. We see %c in the markedup entry above. This represents the head of the first argument of the filler. The line

2 ccomp %f %l %c =S[em]/S[dcl]

means that there is a ccomp relation between %f = head of filler = that, %l = lexical item = know, and %c = the head of the first argument of the filler. Since this relation only holds when the filler has category S[em]/S[dcl] (as indicated by the expression beginning with the equal sign), the first (and only) argument of the filler is S[dcl], and its head is have. Therefore the output will be:

(ccomp that_2 know_1 have_4)

We see %k, on the other hand, in the markedup entry for (S[dcl]\NP)/(S[to]\NP), the category of likes in example 2.

(S[dcl]\NP)/(S[to]\NP)
  2 ((S[dcl]{_}\NP{Y}<1>){_}/(S[to]{Z}<2>\NP{Y*}){Z}){_}
  1 ncsubj %l %f _
  2 xcomp %f %l %k =(S[to]\NP)/(S[b]\NP)

%k refers to the head of the second argument of the filler. In the line

2 xcomp %f %l %k =(S[to]\NP)/(S[b]\NP)

we have %f = filler = to, %l = lexical item = likes. This relation only holds when the filler has category (S[to]\NP)/(S[b]\NP). This category has two arguments, and %k picks out the second one, namely S[b]\NP = dance -- not the first argument, which is the NP subject of to.