Changes between Version 8 and Version 9 of CoordinationTask

Show
Ignore:
Author:
aurelie (IP: 192.12.13.2)
Timestamp:
07/14/09 04:04:06 (4 months ago)
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CoordinationTask

    v8 v9  
    66A bibliography about coordination disambiguation can be found [CoordinationBibliography here]. 
    77 
    8 == Corpus == 
    9 The corpus can be accessed from [http://www.cl.cam.ac.uk/~ah433/parsing_the_web/ this page]. Limited to 30 words and less sentences containing 'and' or 'or'. Obtained from 5000 pages of Wikipedia data. The CCG parse is collapsed onto one line, with the original sentence afterwards. There is also a version showing coordinations only, where I have started marking the sentences with an incorrect and/or parse. 
     8== Examples of coordination == 
     9An early corpus can be accessed from [http://www.cl.cam.ac.uk/~ah433/parsing_the_web/ this page]. Limited to 30 words and less sentences containing 'and' or 'or'. Obtained from 5000 pages of Wikipedia data. The CCG parse is collapsed onto one line, with the original sentence afterwards. There is also a version showing coordinations only, where I have started marking the sentences with an incorrect and/or parse. 
    1010 
    1111== Work so far == 
    12 A baseline implementation is on its way.  The following modules are part of it. The code and examples of output can be found [http://www.cl.cam.ac.uk/~ah433/parsing_the_web/ here]. 
    13  
    14  * integrate:  The program takes the output of the part of speech module (possym) and the similarity module (scoreconjsim) and outputs a re-ranked list of coordinations for the sentence under consideration.  The full list of features considered in this last stage are: 
     12A feature-based implementation has been produced. The features considered are: 
    1513  * similarity of coordinates according to WordNet edge-counting measure 
    16   * equality of coordinates' part of speech 
    17   * distance of the coordinates to the coordination 
     14  * similarity of n-grams to which the coordinates belong (based on parts of speech) 
     15  * distance of the second coordinate to the coordination 
     16  * distance between the two coordinates 
    1817  * original rank 
    1918 
    20  * scoreconjsim: The program attributes a score to the conjunctions found by the C&C parser using the output of compwnsims.  
    21  
    22  * compwnsims: The program calculate similarities between each pair of nouns/verbs in the sentence using the output of wntags. Very basic implementation using a cosine function.  
    23  
    24  * wntags: The program tags each noun and verb in a sentence with its WordNet hypernyms. All senses are collapsed. This implements the tagging done by Agarwal and Boggess (1992) for medical texts in a basic, but more general way.  
    25  
    26  * possym: The program looks for symmetries in parts of speech on either side of a coordination. Both bigrams and trigrams are considered. This implements in a very basic way the search for syntactic symmetries in Agarwal and Boggess (1992).  
     19This implementation gives improvements in F-score in the area of three points, both on Wikipedia data and on Depbank data.