Models
The C&C tools come with several pre-trained models that are available from the download page. The main models package (e.g. models-1.00.tgz) contains models that have been trained on CCGbank 1.2 and MUC 7 gold standard data. The models have been extended to include support for quotes and double which are not included in CCGbank, and they use symbols for parentheses, braces and square brackets, rather than the Penn Treebank -LRB- and -RRB- etc.
To use the tools with the standard settings, you can simply run:
% bin/candc --models models
There are a number of settings modes included in models:
| boxer | a mode for producing the Prolog output needed for Boxer |
| questions | a mode for producing the Prolog output needed for Boxer with additional support for questions |
| noquotes | a mode without additional support for quotes |
These different modes are directories of configuration files and symbolic links within the models directory, and they can be accessed by appending the directory name to models, like this (for the questions model):
% bin/candc --models models/questions
The questions mode for example uses a different POS tagger and supertagger model that have been trained on extra manually annotated questions, and relaxes constraints in the parser about how categories can combine.
Other models
The download page also has some extra models trained on the Penn Treebank, which has a slightly different POS tag set to CCGbank 1.2. These can be used with the individual taggers, but we don't recommend that you use them with the parser because it expects POS tags from CCGbank.