Week 1

  • worked out where Suzy's stuff was, read honours thesis
  • rewrote scraping code; two modes, extracting from abstracts and web pages (r1045)
    • ran query extraction again (~/factoid/data/initial) for both modes
  • changed parser interface to pass sentences in (starting changing it to have chart object, then decided I was wasting time) (r1041)
  • wrote a simple python parser wrapper (source:branches/jhu/factoid/src/api/nlp/simple.py)

Week 2

  • fiddled more with scraping
  • thought about improvements to graph/MST algorithm
  • began evaluation of current output based on changes made (i.e. is produced training data any better?)
    • wrote html/javascript checking tool (diff old/new GRs using graphviz/svg)
    • log results to CGI script

Week 3

  • rewrote constraint extraction; now more correct (?) and handles generalisations (i.e. extracting common constraints across a category)
  • investigated tokenisation; wrote simple Quex tokeniser (buggy)
  • rewrote constraint application in preparation. Handles VAR:X generalisation.
    • bad sentences miraculously fixed by functionally equiv (?) rewrite

Week 4

  • initial evaluation numbers (bad - not enough data?)
  • some code cleanup

TODO

  • run evals with 39604 sentences if possible, and even distribution across types?
    • (i.e. add to 7827: 31777)
    • cf: orig, orig+all, orig+constrained, orig+constrainednochange, orig+onlychanged