I try to reproduce/extend some experiments I implemented with Mallet/GRMM in Factorie and I observe some quite large differences of accuracy dependent on the definition of an instance. I had this problem also with Mallet and GRMM, but I wanted to ask now in this group for some advice or some explanation for this disconnect.
The task is the common segmentation of references with a linear chain CRF. If an instance/sentence corresponds to one reference, then I get a lower accuracy compared to if an instance/sentence corresponds to all references of a reference section (separated by a default token with an O label). Both experiments rely on the same data/features, whereas the second dataset is generated using the first one (by just connecting the instances).
I can imagine many reasons why not the exact same results are produced, but not the major difference of the increase of accuracy by approx. 3% or the 50% error reduction
Any help/explanation is greatly appreciated.
Here's some logging info for a linear chain CRF (uses the example in the tutorial package):
Instance/sentence = Reference
Loaded 444 sentences with 16497 words total from ...
Loaded 122 sentences with 4887 words total from ...
Using 31731 observable features.
Iteration 1
Train accuracy = 0.9406558768260896
Test accuracy = 0.8817270308983016
Iteration 2
Train accuracy = 0.9680547978420319
Test accuracy = 0.9183548189073051
Iteration 3
Train accuracy = 0.9949687822028248
Test accuracy = 0.9318600368324125
Iteration 4
Train accuracy = 0.9990301266897011
Test accuracy = 0.9347247800286475
Iteration 5
Train accuracy = 0.9996362975086379
Test accuracy = 0.9349294045426643
Final Test accuracy = 0.9339062819725803
Finished in 25.379 seconds
MaxBP Test accuracy = 0.9339062819725803
SumBP Test accuracy = 0.9359525271127481
Gibbs Test accuracy = 0.9349294045426643
Instance/sentence = Section
Loaded 17 sentences with 16923 words total from ...
Loaded 4 sentences with 5004 words total from ...
Using 31731 observable features.
Iteration 1
Train accuracy = 0.936676450384063
Test accuracy = 0.927956502038967
Iteration 2
Train accuracy = 0.9916942484231562
Test accuracy = 0.9546896239238786
Iteration 3
Train accuracy = 0.9979391744207831
Test accuracy = 0.9642048028998641
Iteration 4
Train accuracy = 0.9997502029600949
Test accuracy = 0.9676030811055731
Iteration 5
Train accuracy = 1.0
Test accuracy = 0.9680561848663344
Final Test accuracy = 0.9685092886270956
Finished in 24.962 seconds
MaxBP Test accuracy = 0.9685092886270956
SumBP Test accuracy = 0.9651110104213865
Gibbs Test accuracy = 0.9644313547802447