Step 1: Register

First register if you haven't done so already; a username and password will be sent to you. There are now two options: if you want to use the stable version (currently 1.0) go to Step 2(a). If you want to use the more up to date development version, go to Step 2(b) instead.

Step 2(a): Download and unpack the source code or binaries

After downloading the source package (or a binary package, i.e. one which includes the name of the operating system or library in the name of the file, e.g. candc-cygwin-1.00.tgz) you need to unpack it. In the examples below we use candc but you will be using package with a version number, e.g. candc-1.00.

If you downloaded the gzipped tar file then the command to use is:

% tar -xvzf candc.tgz

If you downloaded the bzipped tar file then the command to use is:

% tar -xvjf candc.tbz2

If you downloaded the zip file then the command to use is:

% unzip candc.zip

You should then have a directory called candc with the version number appended on the end. Note that the binary packages contain pre-compiled versions of the C&C tools, but do not include compiled versions of Boxer at this time. If you also want to use boxer then you must unpack the source distribution and follow the instructions for building boxer given later in Step 6. Now go to Step 3.

Step 2(b): Uploading the development version

The development version is stored in the subversion repository. Follow the instructions given there and proceed to Step 3.

Step 3: Selecting a Makefile

If you have unpacked pre-compiled binaries, please go to Step 7. Otherwise choose the appropriate Makefile for your environment. Here are the options:

Makefile Environment
Makefile.cygwin for building Windows binaries with the Cygwin Linux emulation layer (no Python API support yet)
Makefile.macosx for building Mac OS X binaries (including the Python API)
Makefile.macosxu for building Mac OS X universal binaries (no Python API support yet)
Makefile.mingw for building Windows binaries which don't require the Cygwin environment (no Python API support yet)
Makefile.unix should work for all versions of Linux and Unix (except SunOS/Solaris) with g++ (including the Python API)
Makefile.sunos should work for most versions of SunOS/Solaris with g++

The simplest and best way of choosing the correct Makefile is to add a symbolic link. For instance, if you're working on a unix platform, this is done by:

% ln -s Makefile.unix Makefile

Alternatively, you can provide the Makefile of your choice as the first argument to make with -f when building the tools in Step 5.

Step 4: Building the C&C tools

Building the C&C tools (without the SOAP server and Boxer) simply involves running make from the candc directory:

% make
g++ -Wall -Werror -O3 -Isrc/include   -c -o src/lib/base.o src/lib/base.cc
g++ -Wall -Werror -O3 -Isrc/include   -c -o src/lib/port.o src/lib/port.cc
g++ -Wall -Werror -O3 -Isrc/include   -c -o src/lib/timer.o src/lib/timer.cc
...

This will create three binaries for each tagger and a binary for the parser/supertagger pair. For each tagger there is a training binary (e.g. train_pos), a single tagging binary (e.g. pos) and a multi-tagging binary (e.g. mpos).

You can also explicitly specify the Makefile of your choice as an argument of the make command. Here is an example:

% make -f Makefile.unix

If you are compiling for some other environment, or the build fails for some reason, please add an issue to the tracking system.

Step 5: Building the SOAP server

There is an additional build stage which creates binaries allowing you to run the tools as a SOAP server. This is an optional step, you can safely ignore it and go to Step 6 if you're not planning to use the SOAP server. There are separate instructions on how to install the SOAP server here.

Step 6: Building Boxer

You need SWI Prolog to install Boxer. Then type:

% make bin/boxer

Or you can specify a specific Makefile of your choice using the -f option:

% make -f Makefile.unix bin/boxer

In either way you should get a response along the following lines:

cd src/prolog/boxer; pl -g '[boxer], qsave_program(boxer,[global=128000,local=128000,goal=start,stand_alone=true]), halt.'
%     library(lists) compiled into lists 0.00 sec, 11,436 bytes
%    alphaConversionDRT compiled into alphaConversionDRT 0.00 sec, 18,908 bytes
%   betaConversionDRT compiled into betaConversionDRT 0.01 sec, 22,172 bytes
...
%   drs2fol compiled into drs2fol 0.00 sec, 17,780 bytes
%   printDrs compiled into printDrs 0.00 sec, 29,664 bytes
%  output compiled into output 0.01 sec, 84,524 bytes
% boxer compiled 0.10 sec, 592,656 bytes
% /usr/lib/pl-5.6.22/library/quintus compiled into quintus 0.01 sec, 12,332 bytes
% /usr/lib/pl-5.6.22/library/date compiled into date 0.00 sec, 1,928 bytes

Building Boxer in Cygwin/MinGW

The Cygwin version of SWI Prolog is a hybrid of the Windows and Linux versions, and as such doesn't currently support the qsave_program correctly. See this post from the SWI Prolog developers for more details. If you try to build boxer with the Cygwin SWI Prolog you will get a strange file containing what looks like XML + binary which won't execute.

The solution is to install and use the Windows version of SWI Prolog from within Cygwin. Makefile.cygwin has been setup to do this assuming the default installation location of SWI Prolog (C:\Program Files\pl):

...
PROLOG = /cygdrive/c/Program\ Files/pl/bin/plcon.exe
...

For this version of boxer to run, you need to then add the location of the SWI Prolog DLLs to your path variable like this:

% export PATH=$PATH:/cygdrive/c/Program\ Files/pl/bin

otherwise when you try to run boxer, nothing will happen (i.e. no help messages for example). The same applies when building boxer using the MinGW makefile (except when you run this outside of Cygwin you need to set the Windows PATH environment variable).

Step 7: Unpacking the statistical models

To use the tools you will either need to train new models or use the existing models that are available from the download page. (Programs are available for retraining the taggers, eg train_pos, but the programs and documentation required for retraining the parser will follow in a later version.)

The models package can be unpacked anywhere, but it is often convenient to unpack it within the candc directory. The models are available as gzipped tar, bzipped tar and zip files, just like the source code, and they can be unpacked in the same way, e.g.:

% cd <path>/candc
% tar -xvzf models-1.00.tgz

The version number on the models is independent of the version number of the source code, so minor updates to the source code will not result in new models being distributed. At each major release, new models will also be released.

The extra models trained with Penn Treebank POS tags, which are also available from the download page can be unpacked in a similar manner.

Step 8: Ready to go

The binary files in bin can be manually copied to any installation location, e.g. /usr/local/bin or your own bin directory. They can also be used directly from within the candc directory. Try some examples to check if you're installation was successful or not.