Changes between Version 3 and Version 4 of PreProcessing

Show
Ignore:
Author:
curt (IP: 128.220.117.40)
Timestamp:
06/26/09 01:28:41 (5 months ago)
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PreProcessing

    v3 v4  
    11 * Create a tokenizer that works as well as the SED script.  
    2    - Create a simple program to use Boost Regex to tokenize. 
     2   - Use Boost Regex to tokenize. 
    33   - Read input from a file tokenize and write out to a file. 
    44   - Use a file of Regex to tell what each token is. 
    5  * Investigate how UIMA can be used in the pipeline. 
    65 * In time develop the preprocessor to hold on to representations of various forms 
    76   - example: html, pdf, word 
     7 
     8 Week 1:  
     9  * Created binaries for gcc compiler and visual studio. 
     10  * Wrote simple regular expressions for tokenization 
     11  * Working on developing better expression and learning the boost commands