• Create a tokenizer that works as well as the SED script.
    • Use Boost Regex to tokenize.
    • Read input from a file tokenize and write out to a file.
    • Use a file of Regex to tell what each token is.
  • In time develop the preprocessor to hold on to representations of various forms
    • example: html, pdf, word

Week 1:

  • Created binaries for gcc compiler and visual studio.
  • Wrote simple regular expressions for tokenization
  • Working on developing better expression and learning the boost commands