- Create a tokenizer that works as well as the SED script.
- Use Boost Regex to tokenize.
- Read input from a file tokenize and write out to a file.
- Use a file of Regex to tell what each token is.
- In time develop the preprocessor to hold on to representations of various forms
- example: html, pdf, word
Week 1:
- Created binaries for gcc compiler and visual studio.
- Wrote simple regular expressions for tokenization
- Working on developing better expression and learning the boost commands