Step 1: Register
First register if you haven't done so already; a username and password will be sent to you. There are now two options: if you want to use the stable version (currently 1.0) go to Step 2(a). If you want to use the more up to date development version, go to Step 2(b) instead.
Step 2(a): Download and unpack the source code or binaries
After downloading the source package (or a binary package, i.e. one which includes the name of the operating system or library in the name of the file, e.g. candc-cygwin-1.00.tgz) you need to unpack it. In the examples below we use candc but you will be using package with a version number, e.g. candc-1.00.
If you downloaded the gzipped tar file then the command to use is:
% tar -xvzf candc.tgz
If you downloaded the bzipped tar file then the command to use is:
% tar -xvjf candc.tbz2
If you downloaded the zip file then the command to use is:
% unzip candc.zip
You should then have a directory called candc with the version number appended on the end. Note that the binary packages contain pre-compiled versions of the C&C tools, but do not include compiled versions of Boxer at this time. If you also want to use boxer then you must unpack the source distribution and follow the instructions for building boxer given later in Step 6. Now go to Step 3.
Step 2(b): Uploading the development version
The development version is stored in the subversion repository. Follow the instructions given there and proceed to Step 3.
Step 3: Selecting a Makefile
If you have unpacked pre-compiled binaries, please go to Step 7. Otherwise choose the appropriate Makefile for your environment. Here are the options:
| Makefile | Environment |
| Makefile.cygwin | for building Windows binaries with the Cygwin Linux emulation layer (no Python API support yet) |
| Makefile.macosx | for building Mac OS X binaries (including the Python API) |
| Makefile.macosxu | for building Mac OS X universal binaries (no Python API support yet) |
| Makefile.mingw | for building Windows binaries which don't require the Cygwin environment (no Python API support yet) |
| Makefile.unix | should work for all versions of Linux and Unix (except SunOS/Solaris) with g++ (including the Python API) |
| Makefile.sunos | should work for most versions of SunOS/Solaris with g++ |
The simplest and best way of choosing the correct Makefile is to add a symbolic link. For instance, if you're working on a unix platform, this is done by:
% ln -s Makefile.unix Makefile
Alternatively, you can provide the Makefile of your choice as the first argument to make with -f when building the tools in Step 5.
Step 4: Building the C&C tools
Building the C&C tools (without the SOAP server and Boxer) simply involves running make from the candc directory:
% make g++ -Wall -Werror -O3 -Isrc/include -c -o src/lib/base.o src/lib/base.cc g++ -Wall -Werror -O3 -Isrc/include -c -o src/lib/port.o src/lib/port.cc g++ -Wall -Werror -O3 -Isrc/include -c -o src/lib/timer.o src/lib/timer.cc ...
This will create three binaries for each tagger and a binary for the parser/supertagger pair. For each tagger there is a training binary (e.g. train_pos), a single tagging binary (e.g. pos) and a multi-tagging binary (e.g. mpos).
You can also explicitly specify the Makefile of your choice as an argument of the make command. Here is an example:
% make -f Makefile.unix
If you are compiling for some other environment, or the build fails for some reason, please add an issue to the tracking system.
Step 5: Building the SOAP server
There is an additional build stage which creates binaries allowing you to run the tools as a SOAP server. This is an optional step, you can safely ignore it and go to Step 6 if you're not planning to use the SOAP server. There are separate instructions on how to install the SOAP server here.
Step 6: Building Boxer
You need SWI Prolog to install Boxer. Then type:
% make bin/boxer
Or you can specify a specific Makefile of your choice using the -f option:
% make -f Makefile.unix bin/boxer
In either way you should get a response along the following lines:
cd src/prolog/boxer; pl -g '[boxer], qsave_program(boxer,[global=128000,local=128000,goal=start,stand_alone=true]), halt.' % library(lists) compiled into lists 0.00 sec, 11,436 bytes % alphaConversionDRT compiled into alphaConversionDRT 0.00 sec, 18,908 bytes % betaConversionDRT compiled into betaConversionDRT 0.01 sec, 22,172 bytes ... % drs2fol compiled into drs2fol 0.00 sec, 17,780 bytes % printDrs compiled into printDrs 0.00 sec, 29,664 bytes % output compiled into output 0.01 sec, 84,524 bytes % boxer compiled 0.10 sec, 592,656 bytes % /usr/lib/pl-5.6.22/library/quintus compiled into quintus 0.01 sec, 12,332 bytes % /usr/lib/pl-5.6.22/library/date compiled into date 0.00 sec, 1,928 bytes
Building Boxer in Cygwin/MinGW
The Cygwin version of SWI Prolog is a hybrid of the Windows and Linux versions, and as such doesn't currently support the qsave_program correctly. See this post from the SWI Prolog developers for more details. If you try to build boxer with the Cygwin SWI Prolog you will get a strange file containing what looks like XML + binary which won't execute.
The solution is to install and use the Windows version of SWI Prolog from within Cygwin. Makefile.cygwin has been setup to do this assuming the default installation location of SWI Prolog (C:\Program Files\pl):
... PROLOG = /cygdrive/c/Program\ Files/pl/bin/plcon.exe ...
For this version of boxer to run, you need to then add the location of the SWI Prolog DLLs to your path variable like this:
% export PATH=$PATH:/cygdrive/c/Program\ Files/pl/bin
otherwise when you try to run boxer, nothing will happen (i.e. no help messages for example). The same applies when building boxer using the MinGW makefile (except when you run this outside of Cygwin you need to set the Windows PATH environment variable).
Step 7: Unpacking the statistical models
To use the tools you will either need to train new models or use the existing models that are available from the download page. (Programs are available for retraining the taggers, eg train_pos, but the programs and documentation required for retraining the parser will follow in a later version.)
The models package can be unpacked anywhere, but it is often convenient to unpack it within the candc directory. The models are available as gzipped tar, bzipped tar and zip files, just like the source code, and they can be unpacked in the same way, e.g.:
% cd <path>/candc % tar -xvzf models-1.00.tgz
The version number on the models is independent of the version number of the source code, so minor updates to the source code will not result in new models being distributed. At each major release, new models will also be released.
The extra models trained with Penn Treebank POS tags, which are also available from the download page can be unpacked in a similar manner.
Step 8: Ready to go
The binary files in bin can be manually copied to any installation location, e.g. /usr/local/bin or your own bin directory. They can also be used directly from within the candc directory. Try some examples to check if you're installation was successful or not.