Biomedical Parsing
To run the biomedical version of the C&C Parser as described in Rimell and Clark (2008, 2009), you will need the POS model, the supertagger model, the markedup file, and the post-processing script.
POS Model
POS model trained on full GENIA corpus. To use this model with the POS tagger:
% bin/pos --model models/pos_genia
To use this model with the candc executable:
% bin/candc --pos-model models/pos_genia
This tagger is trained on gold standard POS tags from GENIA. The following changes were made to the GENIA data:
- Intra-token white space deleted
- Ambiguous tags converted to single tags (e.g. all instances of JJ|NN converted to NN)
- Two missing tags corrected
- Tags for quotes changed from `` and '' to LQU and RQU
- Tags for brackets changed from (), [], and {} to LRB and RRB
You can also use text that is POS-tagged with another biomedical POS tagger as input to the parsing stage of the C&C pipeline, but do be aware of possible POS tag mismatches.
Supertagger Model
Supertagger model trained on CCGbank (Wall Street Journal) plus ten copies of 1000 MedLine sentences (the first 1000 of GENIA) manually tagged with CCG lexical categories.
To use this model with the parser or the candc executable:
% bin/parser --super models/pos_genia
% bin/candc --super models/pos_genia
Note that there is no separate parser model for biomedical parsing: the parser was adapted by retraining the POS tagger and supertagger only, taking advantage of the lexicalized nature of CCG parsing. However, to increase coverage, it is recommended that you use the settings (DESCRIBE).
Markedup File
It's important to have the most up-to-date markedup file since the biomedical pipeline uses a small number of CCG categories that aren't present in CCGbank and hence were not in version 1.00 of the markedup file.
Moreover, you may want the version of the markedup file that produces grammatical relations (GRs) in Stanford Dependency Format, though this isn't necessary unless your application particularly requires Stanford dependencies.
Use the markedup file with this command:
If you're using the Stanford version, it's particularly important that you use the post-processing script.
Post-Processing Script
There is one for DepBank? style dependencies.
However the one for Stanford dependencies is even more important.