Factoid Bootstrapping
- Acquire simple facts (e.g. from Wikipedia)
- e.g. "Mozart was born on January 27, 1756"
- short factual snippets can be reliably parsed
- any other sentence containing all of these keywords highly likely
to have same dependency relations
- use not for parsing new data (very limited domain!), but for generating new training data
- Research questions:
- how to identify the facts
- soft/hard constraints?
- how to train the parser model on partial data
- whether it works
- Other ideas
- Dekang: do it in reverse (i.e. search for simple sentence given longer)
- Redundancy in wikipedia revision logs
- Don't modify parser operation: simply pass on 'correct' sentences
- Related?
- DIRT paper
- i.e. for better constraint extraction (is the constraint between two NEs likely to be unique?)
- Yahoo API
- key: Tjk5xiDV34GRhKNRnpwxLzATOSZXKEWuiSIhia8QYMAPsSViz0SgYR2VLN4-
- shouldn't really be using it (or Bing) - user-facing apps only or something.
Download in other formats: