Hi Leon,
I just found that there is some inconsistency in the formats between the training data and the test data.
For example, in the training data, the "@" and "#" are not split from the original tokens, while in the test data "@"s and "#"s are treated as single tokens.
I believe the dataset could be better for future work if such inconsistency could be eliminated or informed to the public.
Thanks and regards,
Bill