We've just asked the question to Rui who his the developer of that EDA. Below the answer
==================================================================================
Originally, the RTE data set was created from different sources of data. One source is information extraction (IE) tasks. By constructing the hypothesis (H) using named entities and their relations, those T-H pairs are annotated as "task=IE". The same for the other tasks (IR, QA, etc.). You may check out the description of the data preparation section of the original RTE challenge paper for more details:
As for the performance difference, when training the model, this task (if available) is considered as one feature. We found out that T-H pairs from different task sources have different characteristics (e.g., there are more named entities in the IE pairs). If you train the model with this feature, during testing, whether providing the "task" will have an impact on the result; if your testing data do not have this information, I would suggest to remove "task" from the training set to have a 'general-purpose' model.