Factoid Bootstrapping

  • Acquire simple facts (e.g. from Wikipedia)
    • e.g. "Mozart was born on January 27, 1756"
    • short factual snippets can be reliably parsed
    • any other sentence containing all of these keywords highly likely to have same dependency relations
    • use not for parsing new data (very limited domain!), but for generating new training data
  • Research questions:
    • how to identify the facts
    • soft/hard constraints?
    • how to train the parser model on partial data
    • whether it works
  • Other ideas
    • Dekang: do it in reverse (i.e. search for simple sentence given longer)
    • Redundancy in wikipedia revision logs
    • Don't modify parser operation: simply pass on 'correct' sentences
  • Related?
    • DIRT paper
      • i.e. for better constraint extraction (is the constraint between two NEs likely to be unique?)
  • Yahoo API
    • key: Tjk5xiDV34GRhKNRnpwxLzATOSZXKEWuiSIhia8QYMAPsSViz0SgYR2VLN4-
    • shouldn't really be using it (or Bing) - user-facing apps only or something.