Two main directions

  • Sequence labeling

ScratchPage

Yue is working on this. The first method that has been tried is BE tagging, where binary tags are assigned to each word in the input sentence to indicate whether it can be the beginning or ending of a multi-word span. Begin tags and end tags are assigned separately.

  • tag = 0 means that the corresponding word cannot begin or end a multi-word span
  • tag = 1 means that the corresponding word can begin or end a multi-word span

Some results.

Model Speed Accuracy Coverage
baseline 24.49 84.89 98.69
be 01 26.79 84.83 98.90
en 01 30.83 84.68 98.90
both 01 32.94 84.76 98.85
gold oracle 32.35 85.36 98.90
self oracle 46.81 84.89 98.69

Force words

Model Speed Accuracy Coverage
baseline 24.48 83.65 100.00
be 01 26.35 83.78 100.00
en 01 31.27 83.62 100.00
both 01 32.65 83.64 100.00
gold oracle 32.25 84.31 100.00

Sentences smaller 40

Model Speed Accuracy Coverage
baseline 39.62 85.09 99.44
baseline (fw) 39.28 84.83 100.00
be 01 44.23 85.04 99.50
be 01 (fw) 44.00 84.81 100.00
en 01 49.96 84.92 99.50
en 01 (fw) 49.87 84.69 100.00
both 01 52.78 84.94 99.50
both 01 (fw) 51.22 84.71 100.00
gold oracle 51.18 85.55 99.50
gold oracle (fw) 51.21 85.32 100.00
self oracle 58.51 85.09 99.44

Sentence smaller 100

Model Speed Accuracy Coverage
baseline 26.25 84.89 98.85
baseline (fw) 26.25 83.98 100.00
be 01 28.90 84.83 99.06
be 01 (fw) 28.87 84.12 100.00
en 01 33.38 84.68 99.06
en 01 (fw) 33.30 83.95 100.00
both 01 35.33 84.76 99.01
both 01 (fw) 35.23 83.97 100.00
gold oracle 34.94 85.36 99.06
gold oracle (fw) 34.95 84.64 100.00
self oracle 48.39 84.89 98.85

All sentences, beam=1e-5

Model Speed Accuracy Coverage
baseline 25.56 84.05 99.27
baseline (fw) 25.46 83.36 100.00
be 01 28.09 83.99 99.27
be 01 (fw) 27.94 83.29 100.00
en 01 29.67 83.94 99.32
en 01 (fw) 30.10 83.32 100.00
both 01 31.73 83.92 99.32
both 01 (fw) 31.55 83.30 100.00
gold oracle 31.48 84.50 99.32
gold oracle (fw) 31.47 83.87 100.00
self oracle 56.00 84.38 98.64

All sentences, beam=1e-4

Model Speed Accuracy Coverage
baseline 28.32 84.07 99.16
baseline (fw) 28.64 83.34 100.00
be 01 31.30 83.95 99.22
be 01 (fw) 31.38 83.29 100.00
en 01 33.20 83.92 99.22
en 01 (fw) 33.08 83.27 100.00
both 01 33.09 83.81 99.32
both 01 (fw) 32.92 83.29 100.00
gold oracle 34.42 84.52 99.22
gold oracle (fw) 34.27 83.85 100.00
self oracle 58.27 84.41 98.54

All sentences, beam=1e-1

Model Speed Accuracy Coverage
baseline 38.47 81.36 93.83
baseline (fw) 38.53 78.05 100.00
be 01 39.77 81.28 93.83
be 01 (fw) 39.53 77.97 100.00
en 01 39.56 81.30 93.83
en 01 (fw) 39.69 77.97 100.00
both 01 40.17 81.25 93.83
both 01 (fw) 40.37 77.92 100.00
gold oracle 40.28 81.45 93.73
gold oracle (fw) 40.06 78.06 100.00
self oracle 59.75 81.80 93.05

Tagging

begin tag precision: 0.961494430012 end tag precision: 0.961384351195

begin statistics

tag count in output reference
0 14692 14875
1 30730 30547

end statistics

tag count in output reference
0 30732 30548
1 14690 14874

The second method is multiple tagging. Instead of using a binary tag to indicate whether a word can start a span, a class is assigned to the word to indicate which spans it starts. The numbers on words indicates the maximum span that it can start or end. In particular:

  • class = 0 means that no limits to span
  • class = 1 means that the word can only start or end a length 1 span
  • class = 2 means that the word can start or end a span of up to length 2
  • class = 3 means that the word can start or end a span of up to length 3
  • class = 4 means that the word can start or end a span of up to length 4

Some results:

Model Speed Accuracy Coverage
baseline 23.47 84.89 98.69
baseline (fw) 23.83 83.65 100.00
be layer 31.10 84.31 98.85
be layer fw) 31.03 83.21 100.00
en layer 34.41 84.21 98.90
en layer (fw) 33.82 83.15 100.00
both layer 31.93 84.11 98.75
both layer (fw) 32.73 82.90 100.00
gold oracle 37.98 85.95 98.90
gold oracle (fw) 38.03 84.87 100.00
self oracle 59.98 84.89 98.69

Sentences 40 words

Model Speed Accuracy Coverage
baseline 38.38 85.09 99.44
baseline (fw) 38.51 84.83 100.00
be layer 47.17 84.49 99.50
be layer fw) 48.58 84.26 100.00
en layer 54.73 84.47 99.50
en layer (fw) 55.07 84.24 100.00
both layer 54.68 84.28 99.50
both layer (fw) 54.71 84.06 100.00
gold oracle 60.23 86.14 99.50
gold oracle (fw) 60.50 85.90 100.00
self oracle 65.71 85.09 99.44

Tagging

begin tag precision: 0.82750649465 end tag precision: 0.914446743869

begin statistics

0 19122 17041
1 14888 14875
2 6828 6737
3 3130 4122
4 1454 2647

end statistics

0 7180 7169
1 30990 30548
2 4200 4107
3 2327 2336
4 725 1262

Another tagset

only three tags beside the zero tag

Model Speed Accuracy Coverage
baseline 24.97 84.89 98.69
baseline (fw) 24.23 83.65 100.00
be layer 29.85 84.49 98.90
be layer fw) 29.88 83.45 100.00
en layer 35.22 84.39 98.90
en layer (fw) 35.26 83.33 100.00
both layer 33.10 84.35 98.80
both layer (fw) 33.50 83.19 100.00
gold oracle 36.25 85.83 98.90
gold oracle (fw) 35.94 84.76 100.00
self oracle 57.93 84.89 98.69

Another tagset

Only two tags besides the zero tag in this set

Model Speed Accuracy Coverage
baseline 24.80 84.89 98.69
baseline (fw) 24.76 83.65 100.00
be layer 28.45 84.74 98.90
be layer fw) 29.54 83.70 100.00
en layer 34.92 84.62 98.95
en layer (fw) 34.43 83.63 100.00
both layer 35.68 84.56 98.85
both layer (fw) 36.43 83.48 100.00
gold oracle 34.85 85.72 98.85
gold oracle (fw) 34.34 84.59 100.00
self oracle 53.34 84.89 98.69

** Purnng every iteration

The case with binary values to begin and end constraint tags

Model Speed Accuracy Coverage Pruned cells total cells
baseline 25.15 84.89 98.69 0 6530585
baseline (fw) 25.00 83.65 100.00 0 6530585
be 01 27.76 84.71 98.95 1598979 6722969
be 01 (fw) 27.55 83.72 100.00 1598979 6722969
en 01 30.42 84.56 98.69 3732056 6718049
en 01 (fw) 30.48 83.43 100.00 3732056 6718049
both 01 34.09 84.60 98.64 4424065 6676385
both 01 (fw) 34.55 83.42 100.00 4424065 6676385
join 01 34.20 84.60 98.64 4423753 6676385
join 01 (fw) 34.35 83.42 100.00 4423753 6676385
gold oracle 34.03 85.67 98.48 4387771 6644412
gold oracle (fw) 33.99 84.41 100.00 4387771 6644412
self oracle 55.12 84.89 98.69 4289579 6530585

More trying:

Prune all levels:

Binary:

Model Speed Accuracy Coverage Pruned cells total cells
baseline 25.10 84.89 98.69 0 6530585
be 01 27.49 84.71 98.95 1598979 6722969
be 01 backoff(90) 26.93 84.79 98.90 1259108 6635029
be 01 backoff(95) 26.67 84.85 98.85 1110058 6625149
en 01 30.33 84.56 98.69 3732056 6718049
en 01 backoff(90) 29.16 84.91 98.85 3443327 6727488
en 01 backoff(95) 28.59 84.89 98.90 3298394 6739829
both 01 33.90 84.60 98.64 4424065 6676385
both 01 backoff(90) 31.25 84.93 98.90 4082676 6737368
both 01 backoff(95) 30.33 84.91 98.95 3889253 6749709
join 01 33.78 84.60 98.64 4423753 6676385
gold oracle 33.60 85.67 98.48 4387771 6644412
self oracle 55.25 84.89 98.69 4289579 6530585

fw

Model Speed Accuracy Coverage Pruned cells total cells
baseline (fw) 24.96 83.65 100.00 0 6530585
be 01 (fw) 27.38 83.72 100.00 1598979 6722969
be 01 backoff(90) (fw) 26.91 83.75 100.00 1259108 6635029
be 01 backoff(95) (fw) 26.82 83.77 100.00 1110058 6625149
en 01 (fw) 30.31 83.43 100.00 3732056 6718049
en 01 backoff(90) (fw) 29.16 83.89 100.00 3443327 6727488
en 01 backoff(95) (fw) 28.62 83.91 100.00 3298394 6739829
both 01 (fw) 33.85 83.42 100.00 4424065 6676385
both 01 backoff(90) (fw) 31.18 83.94 100.00 4082676 6737368
both 01 backoff(95) (fw) 30.25 83.96 100.00 3889253 6749709
join 01 (fw) 33.91 83.42 100.00 4423753 6676385
gold oracle (fw) 33.03 84.41 100.00 4387771 6644412

Level tag 2:

Model Speed Accuracy Coverage Pruned cells total cells
baseline 24.98 84.89 98.69 0 6530585
be layer 31.27 84.65 99.01 2320090 6747773
be layer backoff(90) 28.01 84.75 99.01 1714690 6746617
be layer backoff(95) 27.41 84.81 98.95 1433814 6667538
en layer 36.04 84.35 98.27 4103717 6676743
en layer backoff(90) 29.99 84.87 98.75 3761852 6713679
en layer backoff(95) 30.23 84.93 98.75 3515843 6699030
both layer 40.17 84.13 98.38 5036902 6710187
both layer backoff(90) 32.79 84.92 98.75 4500090 6682675
both layer backoff(95) 32.19 84.95 98.80 4216252 6708910
join layer 40.66 84.14 98.27 5020057 6705227
gold oracle 47.31 86.49 98.43 5476803 6736195
self oracle 76.07 84.89 98.69 5264641 6530585

fw

Model Speed Accuracy Coverage Pruned cells total cells
baseline (fw) 24.94 83.65 100.00 0 6530585
be layer fw) 31.18 83.71 100.00 2320090 6747773
be layer backoff(90) fw) 28.01 83.82 100.00 1714690 6746617
be layer backoff(95) fw) 27.34 83.80 100.00 1433814 6667538
en layer (fw) 35.96 82.97 100.00 4103717 6676743
en layer backoff(90) (fw) 29.93 83.75 100.00 3761852 6713679
en layer backoff(95) (fw) 30.35 83.82 100.00 3515843 6699030
both layer (fw) 40.13 82.88 100.00 5036902 6710187
both layer backoff(90) (fw) 32.63 83.79 100.00 4500090 6682675
both layer backoff(95) (fw) 32.45 83.87 100.00 4216252 6708910
join layer (fw) 40.87 82.84 100.00 5020057 6705227
gold oracle (fw) 47.44 85.29 100.00 5476803 6736195

Level tag 3:

Model Speed Accuracy Coverage Pruned cells total cells
baseline 25.11 84.89 98.69 0 6530585
be layer 31.96 84.34 99.01 2691134 6747773
be layer backoff(90) 28.22 84.74 99.01 1769193 6746617
be layer backoff(95) 27.22 84.80 98.95 1466113 6736737
en layer 37.76 83.65 97.75 4336222 6671536
en layer backoff(90) 32.60 84.95 98.64 3897912 6720901
en layer backoff(95) 30.45 84.93 98.75 3610419 6682675
both layer 39.26 83.16 97.33 5175622 6527949
both layer backoff(90) 34.32 84.96 98.64 4637517 6704041
both layer backoff(95) 32.50 84.93 98.75 4287735 6682675
join layer 44.63 82.97 97.33 5260951 6634250
gold oracle 43.39 86.30 98.43 5247246 6636011
self oracle 72.70 84.89 98.69 5129679 6530585

fw

Model Speed Accuracy Coverage Pruned cells total cells
baseline (fw) 25.27 83.65 100.00 0 6530585
be layer fw) 32.04 83.41 100.00 2691134 6747773
be layer backoff(90) fw) 28.22 83.81 100.00 1769193 6746617
be layer backoff(95) fw) 27.19 83.83 100.00 1466113 6736737
en layer (fw) 37.55 82.09 100.00 4336222 6671536
en layer backoff(90) (fw) 32.63 83.82 100.00 3897912 6720901
en layer backoff(95) (fw) 30.51 83.79 100.00 3610419 6682675
both layer (fw) 39.64 81.28 100.00 5175622 6527949
both layer backoff(90) (fw) 34.34 83.83 100.00 4637517 6704041
both layer backoff(95) (fw) 32.60 83.80 100.00 4287735 6682675
join layer (fw) 44.77 81.23 100.00 5260951 6634250
gold oracle (fw) 43.39 85.06 100.00 5247246 6636011

Level tag 4;

Model Speed Accuracy Coverage Pruned cells total cells
baseline 25.09 84.89 98.69 0 6530585
be layer 32.37 84.13 99.01 2822348 6717193
be layer backoff(80) 29.48 84.73 98.95 2165852 6740633
be layer backoff(90) 28.28 84.76 99.01 1795396 6746617
be layer backoff(95) 27.37 84.80 98.90 1436566 6635029
en layer 38.67 83.05 97.39 4536320 6783474
en layer backoff(80) 36.51 84.71 98.48 4219428 6731999
en layer backoff(90) 33.30 84.96 98.69 3954752 6722925
en layer backoff(95) 30.59 84.97 98.75 3626653 6682675
both layer 42.18 82.40 96.08 5282303 6513771
both layer backoff(80) 39.71 84.73 98.48 5048676 6734234
both layer backoff(90) 34.91 84.95 98.75 4713175 6732300
both layer backoff(95) 33.12 84.97 98.75 4298681 6682675
join layer 46.44 82.21 95.97 5510032 6763578
join layer backoff(80) 38.63 84.68 98.64 4978461 6754580
join layer backoff(90) 31.78 84.82 98.85 4321360 6743359
join layer backoff(95) 29.88 84.79 98.90 3162651 6665608
gold oracle 47.45 86.49 98.43 5476803 6736195
self oracle 77.35 84.89 98.69 5264641 6530585

fw

Model Speed Accuracy Coverage Pruned cells total cells
baseline (fw) 25.13 83.65 100.00 0 6530585
be layer (fw) 32.29 83.18 100.00 2822348 6717193
be layer backoff(80) (fw) 29.39 83.77 100.00 2165852 6740633
be layer backoff(90) (fw) 28.32 83.83 100.00 1795396 6746617
be layer backoff(95) (fw) 27.19 83.75 100.00 1436566 6635029
en layer (fw) 38.67 81.40 100.00 4536320 6783474
en layer backoff(80) (fw) 36.45 83.53 100.00 4219428 6731999
en layer backoff(90) (fw) 33.08 83.86 100.00 3954752 6722925
en layer backoff(95) (fw) 30.43 83.84 100.00 3626653 6682675
both layer (fw) 41.72 79.93 100.00 5282303 6513771
both layer backoff(80) (fw) 39.85 83.57 100.00 5048676 6734234
both layer backoff(90) (fw) 34.60 83.91 100.00 4713175 6732300
both layer backoff(95) (fw) 32.89 83.84 100.00 4298681 6682675
join layer (fw) 46.39 79.85 100.00 5510032 6763578
join layer backoff(80) (fw) 38.48 83.62 100.00 4978461 6754580
join layer backoff(90) (fw) 31.92 83.81 100.00 4321360 6743359
join layer backoff(95) (fw) 30.24 83.75 100.00 3162651 6665608
gold oracle (fw) 46.89 85.29 100.00 5476803 6736195

Discussions:

With level tags, the tagging accuracy goes down, but the oracle is improved.

With level tags, the difference between begin end tags, are more. End tags are lower probably due to Rbranching.

With level tags, the joint tagger is more impressive on pruning.

Generalization from binary case to multiple-tags:

The BE tags can be generalized from binary case to multiple values. Instead of indicating whether a word can begin or end a multiple-word constituent, tags can be used to indicate the widest constituent it starts or ends. Now intuitively, given a sentence with N words, the number of tags is N. However, when N becomes large, the tagging problem could become over complex. Therefore, we fix the tag set so that there are only T+1 tags. In this set, tags 1 to T refer to the maximum constituent that the corresponding word starts or ends in the sentence. When the maximum constituent has more than T words, the corresponding tag is set to 0. Similar to the binary case, tagging can be performed using a maximum entropy model. Forward backward algorithm can be used to find the probability of each tag. After the tagging is done, all tags with their probabilities are computed for each word, and then used to prune the chart. Denote the probability that the chart cell (i, j) does not contain any constituent according to begin_tags as P(i,j), and probability that the chart cell (i,j) contains the widest constituent that the word i starts as Q(i,j). Q values are computed by the begom tagger. P values can be computed by P(i,j)=sum_1 to j-1(Q(i,j)). When P(i,j) is larger than a threashold value p_0, the chart cell can be pruned. Because P(i,j+1)>=P(i,j) holds by all j<T+1, and P(i,j+1)=P(i,j) for all j>T, we can iterate from 1 to T and compute P. As long as we find cell (i,t) can be pruned, we record t and prune all cells (i,j) j>=t during parsing. If we don't find any cell that can be pruned from (i,1) to (i,T), no pruning is performed. Similarly, denote the probability that the chart cell (i,j) contains the widest constituent that the word i ends as Q(i,j). Q values are computed by the end tagger. P values can be computed by P(i,j)=sum_1 to j-1(Q(i+j-1,j)). When P(i,j) is larger than a threashold value p_0, the chart cell can be pruned. The value of p_0 should be set according to empirical data.

Wikipedia data

Binary tags tested using around 100 manually annotated. Trained with 40000 and 200000 sentences from parser output. The parser was trained on wsj.

Method Fscore Coverage Speed
baseline 83.92 99.07 34.05
both.40000 train 84.42 100.00 47.09
both.200000 train 84.84 100.00 54.09
both.1000000 train gis 84.50 100.00 59.55
both.1000000 train bfgs 84.57 100.00 55.48

tested using around 200 manually annotated.

Method Fscore Coverage Speed
wiki.baseline 79.26 99.00 46.75
wiki.both.40000 78.41 99.50 69.92
wiki.both.200000 78.94 99.50 70.28
wiki.both.1000000 79.64 99.50 70.41
wiki.both.1000000 79.17 99.50 68.69

tested using a larger set of data with sentences count: 2500 speed from: 47.9 to: 78.6 when trained with 1000000 sentences (gis)

76.0 when trained with 1000000 sentences (bfgs)

to: 77.4 when trained with 200000 sentences to: 75.5 when trained with 40000 sentences

trained using wsj02-21, binary tagger report

Method Fscore Coverage Speed
wsj.test.baseline 83.92 99.07 34.02
wsj.test.both 84.36 100.00 56.70

200 sent

Method Fscore Coverage Speed
wiki.baseline 79.26 99.00 45.56
wiki.both 78.05 99.00 72.11

tested using the larger set: from 46.6 -> 80.8

testing another set(4tag)

using around 100 manually annotated sentences first

Method Fscore Coverage Speed
test.baseline 83.92 99.07 34.05
test.both.4.40000 84.15 100.00 59.71
both.4.200000 84.57 100.00 56.67
both.4.1000000 gis 84.58 100.00 56.20
both.4.1000000 bfgs 84.40 99.07 58.80

using around 200 manually annotated sentence second

Method Fscore Coverage Speed
wiki.4.baseline 79.26 99.00 46.63
wiki.4.both.40000 79.74 99.50 83.14
wiki.4.both.200000 79.36 99.50 84.83
wiki.4.both.1000000 gis 80.04 99.50 80.76
wiki.4.both.1000000 bfgs 80.08 99.50 81.98

trained with wsj

Method Fscore Coverage Speed
wsj.test.baseline 83.92 99.07 34.14
wsj.test.both.4.40000 84.34 100.00 61.13

200 sentences

Method Fscore Coverage Speed
wiki.wsj.4.baseline 79.26 99.00 46.69
wiki.wsj.4.both 78.11 99.50 78.97

tested using a larger set of data with sentences count: 2500 speed from: 45.6 to 92.8 when using 40000 to train from 46.5 to 92.5 using 200000 from 46.6 to 96.6 using 1000000 from 46.8 to 91.5 using 1000000

trained with wsj the speed now becomes: from 45.6 to 93.7 after tagging


  • Beam ratio

Byung Gyu is working on this. A certain beam value is used to eliminate constituents with a less probability than that of the best constituent.

Experiments were conducted on WSJ Section 00. The number of sentences is as follows:

Word counts Sentence counts
No limit 1913 (100%)
1-30 1442 (75.38%)
1-40 1784 (93.26%)
1-60 1901 (99.37%)
1-80 1908 (99.74%)

Results show that as beam gets wide, the speed increases, but accuracy drops at the same time.

Results for all sentences:

Beam ratio Speed F-score (labeled) Coverage
0 15.4804 86.80 98.69
1e-05 17.8676 86.06 99.32
5e-05 19.2137 86.01 99.27
0.0001 19.0202 85.95 99.27
0.0005 19.6181 85.61 99.22
0.001 19.9333 85.56 99.22
0.005 21.7883 85.10 99.11
0.01 23.8065 85.03 98.59
0.05 22.6318 83.76 96.92
0.1 24.242 83.24 94.09

Results for 1-30 word sentences:

Beam ratio Speed F-score (labeled) Coverage
0 41.5353 87.66 99.58
1e-05 44.7832 87.31 99.51
5e-05 45.2688 87.32 99.45
0.0001 46.9333 87.32 99.45
0.0005 44.2239 87.10 99.31
0.001 46.9176 87.08 99.31
0.005 47.3899 86.61 99.24
0.01 50.2478 86.53 98.61
0.05 49.3484 85.45 97.50
0.1 47.5403 84.86 95.42

Results for 1-40 word sentences:

Beam ratio Speed F-score (labeled) Coverage
0 26.5175 86.98 99.38
1e-05 33.0422 86.52 99.61
5e-05 30.3617 86.44 99.55
0.0001 30.065 86.42 99.55
0.0005 28.218 86.16 99.44
0.001 27.7347 86.17 99.44
0.005 31.7449 85.83 99.27
0.01 32.3019 85.75 98.77
0.05 34.5538 84.65 97.37
0.1 35.6805 84.01 94.79

Results for 1-60 word sentences:

Beam ratio Speed F-score (labeled) Coverage
0 17.3675 86.84 99.16
1e-05 23.454 86.25 99.58
5e-05 26.9272 86.17 99.53
0.0001 26.8184 86.11 99.53
0.0005 27.2245 85.86 99.37
0.001 28.0771 85.84 99.37
0.005 30.4088 85.47 99.16
0.01 28.5134 85.44 98.63
0.05 24.9792 84.21 96.95
0.1 21.6632 83.57 94.21

Results for 1-80 word sentences:

Beam ratio Speed F-score (labeled) Coverage
0 10.2419 86.80 98.95
1e-05 17.9858 86.06 99.58
5e-05 21.5694 86.01 99.53
0.0001 23.9083 85.95 99.53
0.0005 26.5148 85.70 99.37
0.001 25.8473 85.66 99.37
0.005 27.3008 85.31 99.16
0.01 27.3252 85.25 98.64
0.05 27.5279 84.03 96.96
0.1 27.4514 83.43 94.18

Results without --force_words option (which has a true value by default):

Beam ratio Speed F-score (labeled) Coverage
0 18.3145 85.55 100.00
1e-05 19.6152 85.37 100.00
5e-05 20.3056 85.31 100.00
0.0001 20.4203 85.25 100.00
0.0005 21.6801 84.99 100.00
0.001 20.9608 84.94 100.00
0.005 23.1756 84.49 100.00
0.01 24.867 84.18 100.00
0.05 23.0082 81.94 100.00
0.1 23.8857 79.86 100.00

Results for sentences with 1-40 words:

Beam ratio Speed F-score (labeled) Coverage
0 29.1301 86.66 100.00
1e-05 37.3851 86.37 100.00
5e-05 38.8166 86.27 100.00
0.0001 39.5901 86.24 100.00
0.0005 40.8766 85.94 100.00
0.001 41.2198 85.95 100.00
0.005 40.3587 85.51 100.00
0.01 42.2446 85.20 100.00
0.05 42.7934 83.34 100.00
0.1 41.3743 81.37 100.00

Results for all sentences for selective beam search:

Beam ratio Speed F-score (labeled) Coverage Description
0 15.4804 86.80 98.69 Beam disabled
1e-05 17.8676 86.06 99.32 Beam enabled for all sentences
1e-05 17.5669 86.45 99.11 Selective beam for long sentences (41+ words)

There is something funny about unary rules. There are some experiment results below.

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 19.3461 18.3344 85.55 86.80 100.00 98.69
beam search 19.2946 18.3006 85.37 86.06 100.00 99.32
gold oracle-guided 42.1773 40.4683 91.39 98.85 95.97 85.10
self oracle-guided 47.5441 45.9091 86.02 86.08 99.32 99.22
beam with gold oracle 14.4566 15.7595 86.91 87.31 95.97 95.50
beam with self oracle 21.4434 22.6088 86.09 86.09 99.32 99.32

Normal is the setting without a beam value. Beam search is the setting with the beam value of 0.00001. For gold and self oracle-guided settings, any constituents which does not match to the oracle are pruned. For beam with gold and self oracle settings, the beam value of 0.00001 used and less probable constituents are eliminated except for those constituents matching to the oracle.

For settings with an oracle, coverage(fw) represents the ratio of sentences which have an oracle parse and whose oracle parse chart can be regenerated from the pipe format output.

Same setting, second try:

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 19.4838 19.2757 85.55 86.80 100.00 98.69
beam search 20.0884 19.2959 85.37 86.06 100.00 99.32
gold oracle-guided 44.1463 41.0358 91.39 98.85 95.97 85.10
self oracle-guided 48.093 46.2996 86.02 86.08 99.32 99.22
beam with gold oracle 12.5157 13.4216 86.91 87.31 95.97 95.50
beam with self oracle 19.5327 20.7197 86.09 86.09 99.32 99.32

For these two experiments above, beam is applied before the unary rules get applied, therefore the result cells from the unary rules are not pruned. I tried another version applying beam after unary rules get applied, and obtained the following results.

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 19.6607 19.3346 85.55 86.80 100.00 98.69
beam search 18.5759 16.4958 85.95 86.65 100.00 99.32
gold oracle-guided 32.3877 26.4007 91.15 98.86 95.97 84.79
self oracle-guided 30.1072 34.2314 86.02 86.08 99.32 99.22
beam with gold oracle 13.7288 15.8256 86.81 87.27 95.97 95.45
beam with self oracle 18.6477 17.9787 86.57 86.62 99.32 99.27

So the timing when beam search get applied has an impact on the results in terms of speed, f-scores, and coverage.

Applying beam search before unary rules application is more sensible, so afterwards we consider only this case.

Beam search is most effective for shorter spans, so we stopped applying beam for longer spans (21+). Here are the results (run twice).

On login:

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 16.6298 16.6547 85.55 86.80 100.00 98.69
beam search 18.1389 18.0394 85.65 86.30 100.00 99.37
gold oracle-guided 29.7831 29.8276 89.19 92.25 95.97 92.05
self oracle-guided 44.0289 43.3253 86.10 86.10 99.32 99.32
beam with gold oracle 18.5215 18.3924 86.89 87.29 95.97 95.50
beam with self oracle 21.5692 21.6244 86.33 86.38 99.32 99.27

On x01:

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 17.3751 17.53 85.55 86.80 100.00 98.69
beam search 18.1917 18.075 85.65 86.30 100.00 99.37
gold oracle-guided 30.4756 29.5378 89.19 92.25 95.97 92.05
self oracle-guided 42.5663 42.8408 86.10 86.10 99.32 99.32
beam with gold oracle 18.6783 18.9849 86.89 87.29 95.97 95.50
beam with self oracle 22.3771 22.4117 86.33 86.38 99.32 99.27

And we tried different formulas to determine the beam value.

0.01 / span

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 17.027 17.3116 85.55 86.80 100.00 98.69
beam search 20.9719 20.9543 85.05 85.79 100.00 99.16

0.1 / span

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 16.9504 17.0326 85.55 86.80 100.00 98.69
beam search 20.066 19.6766 84.64 85.47 100.00 98.90

fitting (gold oracle)

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 24.5969 24.6609 85.55 86.80 100.00 98.69
beam search 28.412 28.4924 85.35 86.41 100.00 98.95

fitting (self beam)

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 24.5593 24.417 85.55 86.80 100.00 98.69
beam search 30.6583 30.3048 85.05 86.26 100.00 98.64

fitting (self nobeam)

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 24.645 24.6025 85.55 86.80 100.00 98.69
beam search 29.219 28.9936 85.18 86.26 100.00 98.90

We tried viterbi instead of inside probability as the score of each constituents.

when the beam value is 0.00001,

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 24.6679 24.5751 85.55 86.80 100.00 98.69
beam search 16.7912 16.743 84.46 85.53 100.00 98.95

fitting (self viterbi)

Title Speed(fw) Speed(nf) lf(fw) lf(nf) Coverage(fw) Coverage(nf)
normal 24.285 24.4829 85.55 86.80 100.00 98.69
beam search 16.9799 16.8835 84.46 85.53 100.00 98.95
  • Oracles (SC)

The ultimate self-oracle; perhaps the best we can possibly do during the 6 weeks (as far as pruning is concerned):

prune all cells which don't contain a constituent in the final parse (Yue's ultimate oracle); and prune all constituents which don't end up in the final parse (BG's ultimate oracle). So we're essentially just building the final derivation with no ambiguity. Implementing this will require BG's code which reads in the gold-standard derivation. BG and Yue to discuss.