I tried to run the code again. I didn't change anything except I used one GPU and one CPU, I got final accuracy 61.9%, I checked the paper, the result is 63.8%. I use different version of torchvision and allennlp. I'm not sure is this the reason why I got a different result? But if I downloade your
best.th and I got evaluation result 64.0%.