Why the T5 baseline for ToTTo not match the result with <<Text-to-Text Pre-Training for Data-to-Text Tasks>>

Knight Zhang

Mar 9, 2021, 7:54:19 AM
to gem-benchmark
Hello everyone,

I am wondering the paper <<Text-to-Text Pre-Training for Data-to-Text Tasks>> which used T5-3B for the ToTTo dataset achieving the BLEU 49.5, however, in GEM, the BLEU score is 42.2. Which is much lower, even lower than the Bert-to-Bert used by the original paper <<ToTTo: A Controlled Table-To-Text Generation Dataset>>.

Can anyone tell me the reason? Are GEM using a smaller T5 model, or they used the whole table as input instead of only the highlighted table?

Thanks a lot!
