Two main directions
- Sequence labeling
Yue is working on this. The first method that has been tried is BE tagging, where binary tags are assigned to each word in the input sentence to indicate whether it can be the beginning or ending of a multi-word span. Begin tags and end tags are assigned separately.
- tag = 0 means that the corresponding word cannot begin or end a multi-word span
- tag = 1 means that the corresponding word can begin or end a multi-word span
Some results.
| Model | Speed | Accuracy | Coverage |
| baseline | 24.49 | 84.89 | 98.69 |
| be 01 | 26.79 | 84.83 | 98.90 |
| en 01 | 30.83 | 84.68 | 98.90 |
| both 01 | 32.94 | 84.76 | 98.85 |
| gold oracle | 32.35 | 85.36 | 98.90 |
| self oracle | 46.81 | 84.89 | 98.69 |
Force words
| Model | Speed | Accuracy | Coverage |
| baseline | 24.48 | 83.65 | 100.00 |
| be 01 | 26.35 | 83.78 | 100.00 |
| en 01 | 31.27 | 83.62 | 100.00 |
| both 01 | 32.65 | 83.64 | 100.00 |
| gold oracle | 32.25 | 84.31 | 100.00 |
Sentences smaller 40
| Model | Speed | Accuracy | Coverage |
| baseline | 39.62 | 85.09 | 99.44 |
| baseline (fw) | 39.28 | 84.83 | 100.00 |
| be 01 | 44.23 | 85.04 | 99.50 |
| be 01 (fw) | 44.00 | 84.81 | 100.00 |
| en 01 | 49.96 | 84.92 | 99.50 |
| en 01 (fw) | 49.87 | 84.69 | 100.00 |
| both 01 | 52.78 | 84.94 | 99.50 |
| both 01 (fw) | 51.22 | 84.71 | 100.00 |
| gold oracle | 51.18 | 85.55 | 99.50 |
| gold oracle (fw) | 51.21 | 85.32 | 100.00 |
| self oracle | 58.51 | 85.09 | 99.44 |
Sentence smaller 100
| Model | Speed | Accuracy | Coverage |
| baseline | 26.25 | 84.89 | 98.85 |
| baseline (fw) | 26.25 | 83.98 | 100.00 |
| be 01 | 28.90 | 84.83 | 99.06 |
| be 01 (fw) | 28.87 | 84.12 | 100.00 |
| en 01 | 33.38 | 84.68 | 99.06 |
| en 01 (fw) | 33.30 | 83.95 | 100.00 |
| both 01 | 35.33 | 84.76 | 99.01 |
| both 01 (fw) | 35.23 | 83.97 | 100.00 |
| gold oracle | 34.94 | 85.36 | 99.06 |
| gold oracle (fw) | 34.95 | 84.64 | 100.00 |
| self oracle | 48.39 | 84.89 | 98.85 |
All sentences, beam=1e-5
| Model | Speed | Accuracy | Coverage |
| baseline | 25.56 | 84.05 | 99.27 |
| baseline (fw) | 25.46 | 83.36 | 100.00 |
| be 01 | 28.09 | 83.99 | 99.27 |
| be 01 (fw) | 27.94 | 83.29 | 100.00 |
| en 01 | 29.67 | 83.94 | 99.32 |
| en 01 (fw) | 30.10 | 83.32 | 100.00 |
| both 01 | 31.73 | 83.92 | 99.32 |
| both 01 (fw) | 31.55 | 83.30 | 100.00 |
| gold oracle | 31.48 | 84.50 | 99.32 |
| gold oracle (fw) | 31.47 | 83.87 | 100.00 |
| self oracle | 56.00 | 84.38 | 98.64 |
All sentences, beam=1e-4
| Model | Speed | Accuracy | Coverage |
| baseline | 28.32 | 84.07 | 99.16 |
| baseline (fw) | 28.64 | 83.34 | 100.00 |
| be 01 | 31.30 | 83.95 | 99.22 |
| be 01 (fw) | 31.38 | 83.29 | 100.00 |
| en 01 | 33.20 | 83.92 | 99.22 |
| en 01 (fw) | 33.08 | 83.27 | 100.00 |
| both 01 | 33.09 | 83.81 | 99.32 |
| both 01 (fw) | 32.92 | 83.29 | 100.00 |
| gold oracle | 34.42 | 84.52 | 99.22 |
| gold oracle (fw) | 34.27 | 83.85 | 100.00 |
| self oracle | 58.27 | 84.41 | 98.54 |
All sentences, beam=1e-1
| Model | Speed | Accuracy | Coverage |
| baseline | 38.47 | 81.36 | 93.83 |
| baseline (fw) | 38.53 | 78.05 | 100.00 |
| be 01 | 39.77 | 81.28 | 93.83 |
| be 01 (fw) | 39.53 | 77.97 | 100.00 |
| en 01 | 39.56 | 81.30 | 93.83 |
| en 01 (fw) | 39.69 | 77.97 | 100.00 |
| both 01 | 40.17 | 81.25 | 93.83 |
| both 01 (fw) | 40.37 | 77.92 | 100.00 |
| gold oracle | 40.28 | 81.45 | 93.73 |
| gold oracle (fw) | 40.06 | 78.06 | 100.00 |
| self oracle | 59.75 | 81.80 | 93.05 |
Tagging
begin tag precision: 0.961494430012 end tag precision: 0.961384351195
begin statistics
| tag | count in output | reference |
| 0 | 14692 | 14875 |
| 1 | 30730 | 30547 |
end statistics
| tag | count in output | reference |
| 0 | 30732 | 30548 |
| 1 | 14690 | 14874 |
The second method is multiple tagging. Instead of using a binary tag to indicate whether a word can start a span, a class is assigned to the word to indicate which spans it starts. The numbers on words indicates the maximum span that it can start or end. In particular:
- class = 0 means that no limits to span
- class = 1 means that the word can only start or end a length 1 span
- class = 2 means that the word can start or end a span of up to length 2
- class = 3 means that the word can start or end a span of up to length 3
- class = 4 means that the word can start or end a span of up to length 4
Some results:
| Model | Speed | Accuracy | Coverage |
| baseline | 23.47 | 84.89 | 98.69 |
| baseline (fw) | 23.83 | 83.65 | 100.00 |
| be layer | 31.10 | 84.31 | 98.85 |
| be layer fw) | 31.03 | 83.21 | 100.00 |
| en layer | 34.41 | 84.21 | 98.90 |
| en layer (fw) | 33.82 | 83.15 | 100.00 |
| both layer | 31.93 | 84.11 | 98.75 |
| both layer (fw) | 32.73 | 82.90 | 100.00 |
| gold oracle | 37.98 | 85.95 | 98.90 |
| gold oracle (fw) | 38.03 | 84.87 | 100.00 |
| self oracle | 59.98 | 84.89 | 98.69 |
Sentences 40 words
| Model | Speed | Accuracy | Coverage |
| baseline | 38.38 | 85.09 | 99.44 |
| baseline (fw) | 38.51 | 84.83 | 100.00 |
| be layer | 47.17 | 84.49 | 99.50 |
| be layer fw) | 48.58 | 84.26 | 100.00 |
| en layer | 54.73 | 84.47 | 99.50 |
| en layer (fw) | 55.07 | 84.24 | 100.00 |
| both layer | 54.68 | 84.28 | 99.50 |
| both layer (fw) | 54.71 | 84.06 | 100.00 |
| gold oracle | 60.23 | 86.14 | 99.50 |
| gold oracle (fw) | 60.50 | 85.90 | 100.00 |
| self oracle | 65.71 | 85.09 | 99.44 |
Tagging
begin tag precision: 0.82750649465 end tag precision: 0.914446743869
begin statistics
| 0 | 19122 | 17041 |
| 1 | 14888 | 14875 |
| 2 | 6828 | 6737 |
| 3 | 3130 | 4122 |
| 4 | 1454 | 2647 |
end statistics
| 0 | 7180 | 7169 |
| 1 | 30990 | 30548 |
| 2 | 4200 | 4107 |
| 3 | 2327 | 2336 |
| 4 | 725 | 1262 |
Another tagset
only three tags beside the zero tag
| Model | Speed | Accuracy | Coverage |
| baseline | 24.97 | 84.89 | 98.69 |
| baseline (fw) | 24.23 | 83.65 | 100.00 |
| be layer | 29.85 | 84.49 | 98.90 |
| be layer fw) | 29.88 | 83.45 | 100.00 |
| en layer | 35.22 | 84.39 | 98.90 |
| en layer (fw) | 35.26 | 83.33 | 100.00 |
| both layer | 33.10 | 84.35 | 98.80 |
| both layer (fw) | 33.50 | 83.19 | 100.00 |
| gold oracle | 36.25 | 85.83 | 98.90 |
| gold oracle (fw) | 35.94 | 84.76 | 100.00 |
| self oracle | 57.93 | 84.89 | 98.69 |
Another tagset
Only two tags besides the zero tag in this set
| Model | Speed | Accuracy | Coverage |
| baseline | 24.80 | 84.89 | 98.69 |
| baseline (fw) | 24.76 | 83.65 | 100.00 |
| be layer | 28.45 | 84.74 | 98.90 |
| be layer fw) | 29.54 | 83.70 | 100.00 |
| en layer | 34.92 | 84.62 | 98.95 |
| en layer (fw) | 34.43 | 83.63 | 100.00 |
| both layer | 35.68 | 84.56 | 98.85 |
| both layer (fw) | 36.43 | 83.48 | 100.00 |
| gold oracle | 34.85 | 85.72 | 98.85 |
| gold oracle (fw) | 34.34 | 84.59 | 100.00 |
| self oracle | 53.34 | 84.89 | 98.69 |
** Purnng every iteration
The case with binary values to begin and end constraint tags
| Model | Speed | Accuracy | Coverage | Pruned cells | total cells |
| baseline | 25.15 | 84.89 | 98.69 | 0 | 6530585 |
| baseline (fw) | 25.00 | 83.65 | 100.00 | 0 | 6530585 |
| be 01 | 27.76 | 84.71 | 98.95 | 1598979 | 6722969 |
| be 01 (fw) | 27.55 | 83.72 | 100.00 | 1598979 | 6722969 |
| en 01 | 30.42 | 84.56 | 98.69 | 3732056 | 6718049 |
| en 01 (fw) | 30.48 | 83.43 | 100.00 | 3732056 | 6718049 |
| both 01 | 34.09 | 84.60 | 98.64 | 4424065 | 6676385 |
| both 01 (fw) | 34.55 | 83.42 | 100.00 | 4424065 | 6676385 |
| join 01 | 34.20 | 84.60 | 98.64 | 4423753 | 6676385 |
| join 01 (fw) | 34.35 | 83.42 | 100.00 | 4423753 | 6676385 |
| gold oracle | 34.03 | 85.67 | 98.48 | 4387771 | 6644412 |
| gold oracle (fw) | 33.99 | 84.41 | 100.00 | 4387771 | 6644412 |
| self oracle | 55.12 | 84.89 | 98.69 | 4289579 | 6530585 |
More trying:
Prune all levels:
Binary:
| Model | Speed | Accuracy | Coverage | Pruned cells | total cells |
| baseline | 25.10 | 84.89 | 98.69 | 0 | 6530585 |
| be 01 | 27.49 | 84.71 | 98.95 | 1598979 | 6722969 |
| be 01 backoff(90) | 26.93 | 84.79 | 98.90 | 1259108 | 6635029 |
| be 01 backoff(95) | 26.67 | 84.85 | 98.85 | 1110058 | 6625149 |
| en 01 | 30.33 | 84.56 | 98.69 | 3732056 | 6718049 |
| en 01 backoff(90) | 29.16 | 84.91 | 98.85 | 3443327 | 6727488 |
| en 01 backoff(95) | 28.59 | 84.89 | 98.90 | 3298394 | 6739829 |
| both 01 | 33.90 | 84.60 | 98.64 | 4424065 | 6676385 |
| both 01 backoff(90) | 31.25 | 84.93 | 98.90 | 4082676 | 6737368 |
| both 01 backoff(95) | 30.33 | 84.91 | 98.95 | 3889253 | 6749709 |
| join 01 | 33.78 | 84.60 | 98.64 | 4423753 | 6676385 |
| gold oracle | 33.60 | 85.67 | 98.48 | 4387771 | 6644412 |
| self oracle | 55.25 | 84.89 | 98.69 | 4289579 | 6530585 |
fw
| Model | Speed | Accuracy | Coverage | Pruned cells | total cells |
| baseline (fw) | 24.96 | 83.65 | 100.00 | 0 | 6530585 |
| be 01 (fw) | 27.38 | 83.72 | 100.00 | 1598979 | 6722969 |
| be 01 backoff(90) (fw) | 26.91 | 83.75 | 100.00 | 1259108 | 6635029 |
| be 01 backoff(95) (fw) | 26.82 | 83.77 | 100.00 | 1110058 | 6625149 |
| en 01 (fw) | 30.31 | 83.43 | 100.00 | 3732056 | 6718049 |
| en 01 backoff(90) (fw) | 29.16 | 83.89 | 100.00 | 3443327 | 6727488 |
| en 01 backoff(95) (fw) | 28.62 | 83.91 | 100.00 | 3298394 | 6739829 |
| both 01 (fw) | 33.85 | 83.42 | 100.00 | 4424065 | 6676385 |
| both 01 backoff(90) (fw) | 31.18 | 83.94 | 100.00 | 4082676 | 6737368 |
| both 01 backoff(95) (fw) | 30.25 | 83.96 | 100.00 | 3889253 | 6749709 |
| join 01 (fw) | 33.91 | 83.42 | 100.00 | 4423753 | 6676385 |
| gold oracle (fw) | 33.03 | 84.41 | 100.00 | 4387771 | 6644412 |
Level tag 2:
| Model | Speed | Accuracy | Coverage | Pruned cells | total cells |
| baseline | 24.98 | 84.89 | 98.69 | 0 | 6530585 |
| be layer | 31.27 | 84.65 | 99.01 | 2320090 | 6747773 |
| be layer backoff(90) | 28.01 | 84.75 | 99.01 | 1714690 | 6746617 |
| be layer backoff(95) | 27.41 | 84.81 | 98.95 | 1433814 | 6667538 |
| en layer | 36.04 | 84.35 | 98.27 | 4103717 | 6676743 |
| en layer backoff(90) | 29.99 | 84.87 | 98.75 | 3761852 | 6713679 |
| en layer backoff(95) | 30.23 | 84.93 | 98.75 | 3515843 | 6699030 |
| both layer | 40.17 | 84.13 | 98.38 | 5036902 | 6710187 |
| both layer backoff(90) | 32.79 | 84.92 | 98.75 | 4500090 | 6682675 |
| both layer backoff(95) | 32.19 | 84.95 | 98.80 | 4216252 | 6708910 |
| join layer | 40.66 | 84.14 | 98.27 | 5020057 | 6705227 |
| gold oracle | 47.31 | 86.49 | 98.43 | 5476803 | 6736195 |
| self oracle | 76.07 | 84.89 | 98.69 | 5264641 | 6530585 |
fw
| Model | Speed | Accuracy | Coverage | Pruned cells | total cells |
| baseline (fw) | 24.94 | 83.65 | 100.00 | 0 | 6530585 |
| be layer fw) | 31.18 | 83.71 | 100.00 | 2320090 | 6747773 |
| be layer backoff(90) fw) | 28.01 | 83.82 | 100.00 | 1714690 | 6746617 |
| be layer backoff(95) fw) | 27.34 | 83.80 | 100.00 | 1433814 | 6667538 |
| en layer (fw) | 35.96 | 82.97 | 100.00 | 4103717 | 6676743 |
| en layer backoff(90) (fw) | 29.93 | 83.75 | 100.00 | 3761852 | 6713679 |
| en layer backoff(95) (fw) | 30.35 | 83.82 | 100.00 | 3515843 | 6699030 |
| both layer (fw) | 40.13 | 82.88 | 100.00 | 5036902 | 6710187 |
| both layer backoff(90) (fw) | 32.63 | 83.79 | 100.00 | 4500090 | 6682675 |
| both layer backoff(95) (fw) | 32.45 | 83.87 | 100.00 | 4216252 | 6708910 |
| join layer (fw) | 40.87 | 82.84 | 100.00 | 5020057 | 6705227 |
| gold oracle (fw) | 47.44 | 85.29 | 100.00 | 5476803 | 6736195 |
Level tag 3:
| Model | Speed | Accuracy | Coverage | Pruned cells | total cells |
| baseline | 25.11 | 84.89 | 98.69 | 0 | 6530585 |
| be layer | 31.96 | 84.34 | 99.01 | 2691134 | 6747773 |
| be layer backoff(90) | 28.22 | 84.74 | 99.01 | 1769193 | 6746617 |
| be layer backoff(95) | 27.22 | 84.80 | 98.95 | 1466113 | 6736737 |
| en layer | 37.76 | 83.65 | 97.75 | 4336222 | 6671536 |
| en layer backoff(90) | 32.60 | 84.95 | 98.64 | 3897912 | 6720901 |
| en layer backoff(95) | 30.45 | 84.93 | 98.75 | 3610419 | 6682675 |
| both layer | 39.26 | 83.16 | 97.33 | 5175622 | 6527949 |
| both layer backoff(90) | 34.32 | 84.96 | 98.64 | 4637517 | 6704041 |
| both layer backoff(95) | 32.50 | 84.93 | 98.75 | 4287735 | 6682675 |
| join layer | 44.63 | 82.97 | 97.33 | 5260951 | 6634250 |
| gold oracle | 43.39 | 86.30 | 98.43 | 5247246 | 6636011 |
| self oracle | 72.70 | 84.89 | 98.69 | 5129679 | 6530585 |
fw
| Model | Speed | Accuracy | Coverage | Pruned cells | total cells |
| baseline (fw) | 25.27 | 83.65 | 100.00 | 0 | 6530585 |
| be layer fw) | 32.04 | 83.41 | 100.00 | 2691134 | 6747773 |
| be layer backoff(90) fw) | 28.22 | 83.81 | 100.00 | 1769193 | 6746617 |
| be layer backoff(95) fw) | 27.19 | 83.83 | 100.00 | 1466113 | 6736737 |
| en layer (fw) | 37.55 | 82.09 | 100.00 | 4336222 | 6671536 |
| en layer backoff(90) (fw) | 32.63 | 83.82 | 100.00 | 3897912 | 6720901 |
| en layer backoff(95) (fw) | 30.51 | 83.79 | 100.00 | 3610419 | 6682675 |
| both layer (fw) | 39.64 | 81.28 | 100.00 | 5175622 | 6527949 |
| both layer backoff(90) (fw) | 34.34 | 83.83 | 100.00 | 4637517 | 6704041 |
| both layer backoff(95) (fw) | 32.60 | 83.80 | 100.00 | 4287735 | 6682675 |
| join layer (fw) | 44.77 | 81.23 | 100.00 | 5260951 | 6634250 |
| gold oracle (fw) | 43.39 | 85.06 | 100.00 | 5247246 | 6636011 |
Level tag 4;
| Model | Speed | Accuracy | Coverage | Pruned cells | total cells |
| baseline | 25.09 | 84.89 | 98.69 | 0 | 6530585 |
| be layer | 32.37 | 84.13 | 99.01 | 2822348 | 6717193 |
| be layer backoff(80) | 29.48 | 84.73 | 98.95 | 2165852 | 6740633 |
| be layer backoff(90) | 28.28 | 84.76 | 99.01 | 1795396 | 6746617 |
| be layer backoff(95) | 27.37 | 84.80 | 98.90 | 1436566 | 6635029 |
| en layer | 38.67 | 83.05 | 97.39 | 4536320 | 6783474 |
| en layer backoff(80) | 36.51 | 84.71 | 98.48 | 4219428 | 6731999 |
| en layer backoff(90) | 33.30 | 84.96 | 98.69 | 3954752 | 6722925 |
| en layer backoff(95) | 30.59 | 84.97 | 98.75 | 3626653 | 6682675 |
| both layer | 42.18 | 82.40 | 96.08 | 5282303 | 6513771 |
| both layer backoff(80) | 39.71 | 84.73 | 98.48 | 5048676 | 6734234 |
| both layer backoff(90) | 34.91 | 84.95 | 98.75 | 4713175 | 6732300 |
| both layer backoff(95) | 33.12 | 84.97 | 98.75 | 4298681 | 6682675 |
| join layer | 46.44 | 82.21 | 95.97 | 5510032 | 6763578 |
| join layer backoff(80) | 38.63 | 84.68 | 98.64 | 4978461 | 6754580 |
| join layer backoff(90) | 31.78 | 84.82 | 98.85 | 4321360 | 6743359 |
| join layer backoff(95) | 29.88 | 84.79 | 98.90 | 3162651 | 6665608 |
| gold oracle | 47.45 | 86.49 | 98.43 | 5476803 | 6736195 |
| self oracle | 77.35 | 84.89 | 98.69 | 5264641 | 6530585 |
fw
| Model | Speed | Accuracy | Coverage | Pruned cells | total cells |
| baseline (fw) | 25.13 | 83.65 | 100.00 | 0 | 6530585 |
| be layer (fw) | 32.29 | 83.18 | 100.00 | 2822348 | 6717193 |
| be layer backoff(80) (fw) | 29.39 | 83.77 | 100.00 | 2165852 | 6740633 |
| be layer backoff(90) (fw) | 28.32 | 83.83 | 100.00 | 1795396 | 6746617 |
| be layer backoff(95) (fw) | 27.19 | 83.75 | 100.00 | 1436566 | 6635029 |
| en layer (fw) | 38.67 | 81.40 | 100.00 | 4536320 | 6783474 |
| en layer backoff(80) (fw) | 36.45 | 83.53 | 100.00 | 4219428 | 6731999 |
| en layer backoff(90) (fw) | 33.08 | 83.86 | 100.00 | 3954752 | 6722925 |
| en layer backoff(95) (fw) | 30.43 | 83.84 | 100.00 | 3626653 | 6682675 |
| both layer (fw) | 41.72 | 79.93 | 100.00 | 5282303 | 6513771 |
| both layer backoff(80) (fw) | 39.85 | 83.57 | 100.00 | 5048676 | 6734234 |
| both layer backoff(90) (fw) | 34.60 | 83.91 | 100.00 | 4713175 | 6732300 |
| both layer backoff(95) (fw) | 32.89 | 83.84 | 100.00 | 4298681 | 6682675 |
| join layer (fw) | 46.39 | 79.85 | 100.00 | 5510032 | 6763578 |
| join layer backoff(80) (fw) | 38.48 | 83.62 | 100.00 | 4978461 | 6754580 |
| join layer backoff(90) (fw) | 31.92 | 83.81 | 100.00 | 4321360 | 6743359 |
| join layer backoff(95) (fw) | 30.24 | 83.75 | 100.00 | 3162651 | 6665608 |
| gold oracle (fw) | 46.89 | 85.29 | 100.00 | 5476803 | 6736195 |
Discussions:
With level tags, the tagging accuracy goes down, but the oracle is improved.
With level tags, the difference between begin end tags, are more. End tags are lower probably due to Rbranching.
With level tags, the joint tagger is more impressive on pruning.
Generalization from binary case to multiple-tags:
The BE tags can be generalized from binary case to multiple values. Instead of indicating whether a word can begin or end a multiple-word constituent, tags can be used to indicate the widest constituent it starts or ends. Now intuitively, given a sentence with N words, the number of tags is N. However, when N becomes large, the tagging problem could become over complex. Therefore, we fix the tag set so that there are only T+1 tags. In this set, tags 1 to T refer to the maximum constituent that the corresponding word starts or ends in the sentence. When the maximum constituent has more than T words, the corresponding tag is set to 0. Similar to the binary case, tagging can be performed using a maximum entropy model. Forward backward algorithm can be used to find the probability of each tag. After the tagging is done, all tags with their probabilities are computed for each word, and then used to prune the chart. Denote the probability that the chart cell (i, j) does not contain any constituent according to begin_tags as P(i,j), and probability that the chart cell (i,j) contains the widest constituent that the word i starts as Q(i,j). Q values are computed by the begom tagger. P values can be computed by P(i,j)=sum_1 to j-1(Q(i,j)). When P(i,j) is larger than a threashold value p_0, the chart cell can be pruned. Because P(i,j+1)>=P(i,j) holds by all j<T+1, and P(i,j+1)=P(i,j) for all j>T, we can iterate from 1 to T and compute P. As long as we find cell (i,t) can be pruned, we record t and prune all cells (i,j) j>=t during parsing. If we don't find any cell that can be pruned from (i,1) to (i,T), no pruning is performed. Similarly, denote the probability that the chart cell (i,j) contains the widest constituent that the word i ends as Q(i,j). Q values are computed by the end tagger. P values can be computed by P(i,j)=sum_1 to j-1(Q(i+j-1,j)). When P(i,j) is larger than a threashold value p_0, the chart cell can be pruned. The value of p_0 should be set according to empirical data.
Wikipedia data
Binary tags tested using around 100 manually annotated. Trained with 40000 and 200000 sentences from parser output. The parser was trained on wsj.
Method Fscore Coverage Speed baseline 83.92 99.07 34.05 both.40000 train 84.42 100.00 47.09 both.200000 train 84.84 100.00 54.09 both.1000000 train gis 84.50 100.00 59.55 both.1000000 train bfgs 84.57 100.00 55.48
tested using around 200 manually annotated.
Method Fscore Coverage Speed wiki.baseline 79.26 99.00 46.75 wiki.both.40000 78.41 99.50 69.92 wiki.both.200000 78.94 99.50 70.28 wiki.both.1000000 79.64 99.50 70.41 wiki.both.1000000 79.17 99.50 68.69
tested using a larger set of data with sentences count: 2500 speed from: 47.9 to: 78.6 when trained with 1000000 sentences (gis)
76.0 when trained with 1000000 sentences (bfgs)
to: 77.4 when trained with 200000 sentences to: 75.5 when trained with 40000 sentences
trained using wsj02-21, binary tagger report
Method Fscore Coverage Speed wsj.test.baseline 83.92 99.07 34.02 wsj.test.both 84.36 100.00 56.70
200 sent
Method Fscore Coverage Speed wiki.baseline 79.26 99.00 45.56 wiki.both 78.05 99.00 72.11
tested using the larger set: from 46.6 -> 80.8
testing another set(4tag)
using around 100 manually annotated sentences first
Method Fscore Coverage Speed test.baseline 83.92 99.07 34.05 test.both.4.40000 84.15 100.00 59.71 both.4.200000 84.57 100.00 56.67 both.4.1000000 gis 84.58 100.00 56.20 both.4.1000000 bfgs 84.40 99.07 58.80
using around 200 manually annotated sentence second
Method Fscore Coverage Speed wiki.4.baseline 79.26 99.00 46.63 wiki.4.both.40000 79.74 99.50 83.14 wiki.4.both.200000 79.36 99.50 84.83 wiki.4.both.1000000 gis 80.04 99.50 80.76 wiki.4.both.1000000 bfgs 80.08 99.50 81.98
trained with wsj
Method Fscore Coverage Speed wsj.test.baseline 83.92 99.07 34.14 wsj.test.both.4.40000 84.34 100.00 61.13
200 sentences
Method Fscore Coverage Speed wiki.wsj.4.baseline 79.26 99.00 46.69 wiki.wsj.4.both 78.11 99.50 78.97
tested using a larger set of data with sentences count: 2500 speed from: 45.6 to 92.8 when using 40000 to train from 46.5 to 92.5 using 200000 from 46.6 to 96.6 using 1000000 from 46.8 to 91.5 using 1000000
trained with wsj the speed now becomes: from 45.6 to 93.7 after tagging
- Beam ratio
Byung Gyu is working on this. A certain beam value is used to eliminate constituents with a less probability than that of the best constituent.
Experiments were conducted on WSJ Section 00. The number of sentences is as follows:
| Word counts | Sentence counts |
| No limit | 1913 (100%) |
| 1-30 | 1442 (75.38%) |
| 1-40 | 1784 (93.26%) |
| 1-60 | 1901 (99.37%) |
| 1-80 | 1908 (99.74%) |
Results show that as beam gets wide, the speed increases, but accuracy drops at the same time.
Results for all sentences:
| Beam ratio | Speed | F-score (labeled) | Coverage |
| 0 | 15.4804 | 86.80 | 98.69 |
| 1e-05 | 17.8676 | 86.06 | 99.32 |
| 5e-05 | 19.2137 | 86.01 | 99.27 |
| 0.0001 | 19.0202 | 85.95 | 99.27 |
| 0.0005 | 19.6181 | 85.61 | 99.22 |
| 0.001 | 19.9333 | 85.56 | 99.22 |
| 0.005 | 21.7883 | 85.10 | 99.11 |
| 0.01 | 23.8065 | 85.03 | 98.59 |
| 0.05 | 22.6318 | 83.76 | 96.92 |
| 0.1 | 24.242 | 83.24 | 94.09 |
Results for 1-30 word sentences:
| Beam ratio | Speed | F-score (labeled) | Coverage |
| 0 | 41.5353 | 87.66 | 99.58 |
| 1e-05 | 44.7832 | 87.31 | 99.51 |
| 5e-05 | 45.2688 | 87.32 | 99.45 |
| 0.0001 | 46.9333 | 87.32 | 99.45 |
| 0.0005 | 44.2239 | 87.10 | 99.31 |
| 0.001 | 46.9176 | 87.08 | 99.31 |
| 0.005 | 47.3899 | 86.61 | 99.24 |
| 0.01 | 50.2478 | 86.53 | 98.61 |
| 0.05 | 49.3484 | 85.45 | 97.50 |
| 0.1 | 47.5403 | 84.86 | 95.42 |
Results for 1-40 word sentences:
| Beam ratio | Speed | F-score (labeled) | Coverage |
| 0 | 26.5175 | 86.98 | 99.38 |
| 1e-05 | 33.0422 | 86.52 | 99.61 |
| 5e-05 | 30.3617 | 86.44 | 99.55 |
| 0.0001 | 30.065 | 86.42 | 99.55 |
| 0.0005 | 28.218 | 86.16 | 99.44 |
| 0.001 | 27.7347 | 86.17 | 99.44 |
| 0.005 | 31.7449 | 85.83 | 99.27 |
| 0.01 | 32.3019 | 85.75 | 98.77 |
| 0.05 | 34.5538 | 84.65 | 97.37 |
| 0.1 | 35.6805 | 84.01 | 94.79 |
Results for 1-60 word sentences:
| Beam ratio | Speed | F-score (labeled) | Coverage |
| 0 | 17.3675 | 86.84 | 99.16 |
| 1e-05 | 23.454 | 86.25 | 99.58 |
| 5e-05 | 26.9272 | 86.17 | 99.53 |
| 0.0001 | 26.8184 | 86.11 | 99.53 |
| 0.0005 | 27.2245 | 85.86 | 99.37 |
| 0.001 | 28.0771 | 85.84 | 99.37 |
| 0.005 | 30.4088 | 85.47 | 99.16 |
| 0.01 | 28.5134 | 85.44 | 98.63 |
| 0.05 | 24.9792 | 84.21 | 96.95 |
| 0.1 | 21.6632 | 83.57 | 94.21 |
Results for 1-80 word sentences:
| Beam ratio | Speed | F-score (labeled) | Coverage |
| 0 | 10.2419 | 86.80 | 98.95 |
| 1e-05 | 17.9858 | 86.06 | 99.58 |
| 5e-05 | 21.5694 | 86.01 | 99.53 |
| 0.0001 | 23.9083 | 85.95 | 99.53 |
| 0.0005 | 26.5148 | 85.70 | 99.37 |
| 0.001 | 25.8473 | 85.66 | 99.37 |
| 0.005 | 27.3008 | 85.31 | 99.16 |
| 0.01 | 27.3252 | 85.25 | 98.64 |
| 0.05 | 27.5279 | 84.03 | 96.96 |
| 0.1 | 27.4514 | 83.43 | 94.18 |
Results without --force_words option (which has a true value by default):
| Beam ratio | Speed | F-score (labeled) | Coverage |
| 0 | 18.3145 | 85.55 | 100.00 |
| 1e-05 | 19.6152 | 85.37 | 100.00 |
| 5e-05 | 20.3056 | 85.31 | 100.00 |
| 0.0001 | 20.4203 | 85.25 | 100.00 |
| 0.0005 | 21.6801 | 84.99 | 100.00 |
| 0.001 | 20.9608 | 84.94 | 100.00 |
| 0.005 | 23.1756 | 84.49 | 100.00 |
| 0.01 | 24.867 | 84.18 | 100.00 |
| 0.05 | 23.0082 | 81.94 | 100.00 |
| 0.1 | 23.8857 | 79.86 | 100.00 |
Results for sentences with 1-40 words:
| Beam ratio | Speed | F-score (labeled) | Coverage |
| 0 | 29.1301 | 86.66 | 100.00 |
| 1e-05 | 37.3851 | 86.37 | 100.00 |
| 5e-05 | 38.8166 | 86.27 | 100.00 |
| 0.0001 | 39.5901 | 86.24 | 100.00 |
| 0.0005 | 40.8766 | 85.94 | 100.00 |
| 0.001 | 41.2198 | 85.95 | 100.00 |
| 0.005 | 40.3587 | 85.51 | 100.00 |
| 0.01 | 42.2446 | 85.20 | 100.00 |
| 0.05 | 42.7934 | 83.34 | 100.00 |
| 0.1 | 41.3743 | 81.37 | 100.00 |
Results for all sentences for selective beam search:
| Beam ratio | Speed | F-score (labeled) | Coverage | Description |
| 0 | 15.4804 | 86.80 | 98.69 | Beam disabled |
| 1e-05 | 17.8676 | 86.06 | 99.32 | Beam enabled for all sentences |
| 1e-05 | 17.5669 | 86.45 | 99.11 | Selective beam for long sentences (41+ words) |
There is something funny about unary rules. There are some experiment results below.
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 19.3461 | 18.3344 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 19.2946 | 18.3006 | 85.37 | 86.06 | 100.00 | 99.32 |
| gold oracle-guided | 42.1773 | 40.4683 | 91.39 | 98.85 | 95.97 | 85.10 |
| self oracle-guided | 47.5441 | 45.9091 | 86.02 | 86.08 | 99.32 | 99.22 |
| beam with gold oracle | 14.4566 | 15.7595 | 86.91 | 87.31 | 95.97 | 95.50 |
| beam with self oracle | 21.4434 | 22.6088 | 86.09 | 86.09 | 99.32 | 99.32 |
Normal is the setting without a beam value. Beam search is the setting with the beam value of 0.00001. For gold and self oracle-guided settings, any constituents which does not match to the oracle are pruned. For beam with gold and self oracle settings, the beam value of 0.00001 used and less probable constituents are eliminated except for those constituents matching to the oracle.
For settings with an oracle, coverage(fw) represents the ratio of sentences which have an oracle parse and whose oracle parse chart can be regenerated from the pipe format output.
Same setting, second try:
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 19.4838 | 19.2757 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 20.0884 | 19.2959 | 85.37 | 86.06 | 100.00 | 99.32 |
| gold oracle-guided | 44.1463 | 41.0358 | 91.39 | 98.85 | 95.97 | 85.10 |
| self oracle-guided | 48.093 | 46.2996 | 86.02 | 86.08 | 99.32 | 99.22 |
| beam with gold oracle | 12.5157 | 13.4216 | 86.91 | 87.31 | 95.97 | 95.50 |
| beam with self oracle | 19.5327 | 20.7197 | 86.09 | 86.09 | 99.32 | 99.32 |
For these two experiments above, beam is applied before the unary rules get applied, therefore the result cells from the unary rules are not pruned. I tried another version applying beam after unary rules get applied, and obtained the following results.
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 19.6607 | 19.3346 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 18.5759 | 16.4958 | 85.95 | 86.65 | 100.00 | 99.32 |
| gold oracle-guided | 32.3877 | 26.4007 | 91.15 | 98.86 | 95.97 | 84.79 |
| self oracle-guided | 30.1072 | 34.2314 | 86.02 | 86.08 | 99.32 | 99.22 |
| beam with gold oracle | 13.7288 | 15.8256 | 86.81 | 87.27 | 95.97 | 95.45 |
| beam with self oracle | 18.6477 | 17.9787 | 86.57 | 86.62 | 99.32 | 99.27 |
So the timing when beam search get applied has an impact on the results in terms of speed, f-scores, and coverage.
Applying beam search before unary rules application is more sensible, so afterwards we consider only this case.
Beam search is most effective for shorter spans, so we stopped applying beam for longer spans (21+). Here are the results (run twice).
On login:
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 16.6298 | 16.6547 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 18.1389 | 18.0394 | 85.65 | 86.30 | 100.00 | 99.37 |
| gold oracle-guided | 29.7831 | 29.8276 | 89.19 | 92.25 | 95.97 | 92.05 |
| self oracle-guided | 44.0289 | 43.3253 | 86.10 | 86.10 | 99.32 | 99.32 |
| beam with gold oracle | 18.5215 | 18.3924 | 86.89 | 87.29 | 95.97 | 95.50 |
| beam with self oracle | 21.5692 | 21.6244 | 86.33 | 86.38 | 99.32 | 99.27 |
On x01:
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 17.3751 | 17.53 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 18.1917 | 18.075 | 85.65 | 86.30 | 100.00 | 99.37 |
| gold oracle-guided | 30.4756 | 29.5378 | 89.19 | 92.25 | 95.97 | 92.05 |
| self oracle-guided | 42.5663 | 42.8408 | 86.10 | 86.10 | 99.32 | 99.32 |
| beam with gold oracle | 18.6783 | 18.9849 | 86.89 | 87.29 | 95.97 | 95.50 |
| beam with self oracle | 22.3771 | 22.4117 | 86.33 | 86.38 | 99.32 | 99.27 |
And we tried different formulas to determine the beam value.
0.01 / span
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 17.027 | 17.3116 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 20.9719 | 20.9543 | 85.05 | 85.79 | 100.00 | 99.16 |
0.1 / span
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 16.9504 | 17.0326 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 20.066 | 19.6766 | 84.64 | 85.47 | 100.00 | 98.90 |
fitting (gold oracle)
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 24.5969 | 24.6609 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 28.412 | 28.4924 | 85.35 | 86.41 | 100.00 | 98.95 |
fitting (self beam)
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 24.5593 | 24.417 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 30.6583 | 30.3048 | 85.05 | 86.26 | 100.00 | 98.64 |
fitting (self nobeam)
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 24.645 | 24.6025 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 29.219 | 28.9936 | 85.18 | 86.26 | 100.00 | 98.90 |
We tried viterbi instead of inside probability as the score of each constituents.
when the beam value is 0.00001,
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 24.6679 | 24.5751 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 16.7912 | 16.743 | 84.46 | 85.53 | 100.00 | 98.95 |
fitting (self viterbi)
| Title | Speed(fw) | Speed(nf) | lf(fw) | lf(nf) | Coverage(fw) | Coverage(nf) |
| normal | 24.285 | 24.4829 | 85.55 | 86.80 | 100.00 | 98.69 |
| beam search | 16.9799 | 16.8835 | 84.46 | 85.53 | 100.00 | 98.95 |
- Oracles (SC)
The ultimate self-oracle; perhaps the best we can possibly do during the 6 weeks (as far as pruning is concerned):
prune all cells which don't contain a constituent in the final parse (Yue's ultimate oracle); and prune all constituents which don't end up in the final parse (BG's ultimate oracle). So we're essentially just building the final derivation with no ambiguity. Implementing this will require BG's code which reads in the gold-standard derivation. BG and Yue to discuss.