About Dataset

54 views

Skip to first unread message

Ömer Koçbil

unread,

May 11, 2021, 1:05:56 PM5/11/21

to mmsys21-grandcha...@googlegroups.com

Hi,

I am Ömer. i attended to MMSys cheapfakes detection competition.i have some questions about data.

1) I looked at train, test and validation json file. In the test json file, there is a feature about "context_label" and i can understand is this picture and news fake or real. However in train and validation json file, there is no "context_label" . Don't we train our model depends on to this feature? Could you give me more detail about this situation ?

2) I examined some news in the test json. Some of the fake news have related picture and captions. What is the reason we call them fake?

I believe there is a lack of information about the features and outputs. I keep in touch with many attendees and they have same problem. If you could share a document about dataset, features, model's input and output, everything will be much more clear for the participants.

Thank you for your time.

Sincerely,

Ömer Koçbil

shivangi.tum

unread,

May 11, 2021, 2:07:29 PM5/11/21

to MMSys'21 Grand Challenge on Detecting Cheapfakes

Hi,

1) If you refer to the paper, we propose a self-supervised training strategy, where we do not require the so-called "context_label" that you see in the test file. The core objective during training should be to improve the match/scoring between the image and its associated captions, i.e. correct grounding of objects in the image with the captions. This grounding would ultimately help you achieve the end goal, i.e. detecting out-of-context image use. This is explained in detail in Sections 4.2 and 4.3 of the main paper. For a quick explanation about the exact training and test pipeline, refer to the video posted here.

2) I assume you are referring to out-of-context image caption pairs as fake, we don't use the label fake because in principle the news caption could be true independently but associating it with an unrelated image would still be classified as out-of-context. Regarding your comment "Some of the fake news have related picture and captions." , this is exactly what out-ofo-context use of images is, it looks like that the news caption is about the image it is linked with in the dataset, but in reality the image is evidence of a totally different image.

A readme about the model input and outputs would be added to this GitHub soon for your reference.

Btw, we will also be releasing source code for the original paper at this GitHub repo in the coming week, in case you are interested.

Regards,

MMSys2021 Cheapfakes Challenge Team

Reply all

Reply to author

Forward

0 new messages