Week 1
- worked out where Suzy's stuff was, read honours thesis
- rewrote scraping code; two modes, extracting from abstracts and web pages (r1045)
- ran query extraction again (~/factoid/data/initial) for both modes
- changed parser interface to pass sentences in (starting changing it to have chart object, then decided I was wasting time) (r1041)
- wrote a simple python parser wrapper (source:branches/jhu/factoid/src/api/nlp/simple.py)
Week 2
- fiddled more with scraping
- thought about improvements to graph/MST algorithm
- began evaluation of current output based on changes made (i.e. is produced training data any better?)
- wrote html/javascript checking tool (diff old/new GRs using graphviz/svg)
- log results to CGI script
Week 3
- rewrote constraint extraction; now more correct (?) and handles generalisations (i.e. extracting common constraints across a category)
- investigated tokenisation; wrote simple Quex tokeniser (buggy)
- rewrote constraint application in preparation. Handles VAR:X generalisation.
- bad sentences miraculously fixed by functionally equiv (?) rewrite
Week 4
- initial evaluation numbers (bad - not enough data?)
- some code cleanup
TODO
- run evals with 39604 sentences if possible, and even distribution across types?
- (i.e. add to 7827: 31777)
- cf: orig, orig+all, orig+constrained, orig+constrainednochange, orig+onlychanged
Download in other formats: