Changes between Version 3 and Version 4 of PreProcessing
- Timestamp:
- 06/26/09 01:28:41 (5 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
PreProcessing
v3 v4 1 1 * Create a tokenizer that works as well as the SED script. 2 - Create a simple program to use Boost Regex to tokenize.2 - Use Boost Regex to tokenize. 3 3 - Read input from a file tokenize and write out to a file. 4 4 - Use a file of Regex to tell what each token is. 5 * Investigate how UIMA can be used in the pipeline.6 5 * In time develop the preprocessor to hold on to representations of various forms 7 6 - example: html, pdf, word 7 8 Week 1: 9 * Created binaries for gcc compiler and visual studio. 10 * Wrote simple regular expressions for tokenization 11 * Working on developing better expression and learning the boost commands