Examples

The C&C tools consist of a robust, wide-coverage CCG parser and a number of Maximum Entropy taggers, each of which can be run as a separate program, or combined in one go.

Combined Analysis (candc)

The program candc performs pos tagging, chunking, named entity recognition and parsing all in one go. To run this combined analysis program, simply type:

% bin/candc --models models

where models is the model directory containing the models for the various components.

The program expects tokenised sentences as input (by default via STDIN, but also from a file if the --input argument is used) and then candc does the rest. Type bin/candc --help to see some of the options available.

When the --input option is used the default format for the input file is one sentence per line, each sentence tokenised according to the Penn Treebank standard.

Boxer

To run Boxer (version 1.0), you first need to create some output from the candc program:

% echo "Every man runs ." | bin/candc --models models/boxer > /tmp/test.ccg

To run Boxer (subversion), you need to create some output from the candc program with an additional option:

% echo "Every man runs ." | bin/candc --models models/boxer --candc-printer boxer > /tmp/test.ccg

Now this file can be used as input to Boxer:

% bin/boxer --input /tmp/test.ccg --box true --flat true

Parser

The parser binary expects pos tagged sentences as input, and requires a parser model and supertagger model as arguments:

% bin/parser --model models/parser/ --super models/super

This program performs CCG supertagging and parsing, but not pos tagging, chunking or named entity recognition.

Taggers

To run the taggers, simply type the corresponding binary and give a relevant model directory as an argument:

% bin/pos --model models/pos

The pos tagger expects tokenised sentences as input; the chunker, named entity recogniser and CCG supertagger all expect pos tagged sentences as input.

The default models for the taggers expect tokenised input which includes brackets and quotes in their original form (rather than transformed to the Penn Treebank enoding of LRB etc, or to a representation where quotes are missing, as in CCGbank). The model for the parser, on the other hand, expects input in a format consistent with CCGbank. The default models provided with the release deal with these different requirements. See the Models page for details.