!! RecSys Challenge submissions re-opening for a week !!

Nick Landia

unread,

Jun 30, 2022, 5:54:26 PM6/30/22

to RecSys-Challenge-2022

Dear Participants,

looking through the code of the submissions we have found several cases where the test set data was mistakenly used as an input to model training. This is against the rules, the rules state:

- When you train the model, only use the data from the training file. Do not use any data from the "leaderboard" or "final" test files
- When predicting, treat each test session independently of all other test sessions (i.e., when predicting for test session B, the model should not have any knowledge of test session A. Even if that came before it in terms of time-stamp)

However, from the code we can see that this is a very easy mistake to make and we don't want to just disqualify many teams because of this oversight. Instead we are re-opening submissions for about a week. They will open in the next 24 hours and will remain open until 2022-07-06 23:59:59 PST. Please double-check if you are using the test data in a way that is not allowed in your submission, and please resubmit corrected predictions and code.

Some more information: In almost all of the cases we have come across the mistake was in a feature-engineering step. Teams would concatenate the training, leaderboard_test and final_test sessions together and do some feature engineering with all data. This is a leakage of information where the test data influences what features and values are generated. The resulting features then get used in the model training step, which only uses the training sessions directly in that step, but the feature engineering has already looked at all of the test data at that point and the model is given this information via the features. This is partly the reason we decided to re-open submissions instead of disqualifying directly, because all of the cases we have encountered look like honest mistakes instead of attempts to deliberately break the rules. We felt it is in the spirit of the competition to allow teams to correct this.

Thanks,

Nick

PS: To be absolutely clear about what data can be used, from all the data files available:

model can use all data from:
train_purchases.csv
train_sessions.csv
item_features.csv
candidate_items.csv

model gets asked to predict for each session independently, information from one session should not influence any other session:
test_final_sessions.csv

not used at all for generating "final" prediction file:
test_leaderboard_sessions.csv

김현우

unread,

Jun 30, 2022, 8:38:58 PM6/30/22

to RecSys-Challenge-2022

Hello, Nick.

Is the resubmission team only for those who submitted the source code in the last period? Or is it possible for everyone who participated?

Thanks

2022년 7월 1일 금요일 오전 6시 54분 26초 UTC+9에 Nick Landia님이 작성:

andrew y

unread,

Jun 30, 2022, 8:43:24 PM6/30/22

to RecSys-Challenge-2022

Hello Nick,

I have a couple of questions regarding final code/prediction submission re-opening for a week.

I see that this could be interpreted in two ways:
1. Is the purpose of these resubmission only to correct the test set data being used? If so, will there be any verification process that the logic has not changed other than removing test set data?

2. If there are no verification process, then I assume it just means that everyone can modify their logic to assure the fairness of the competition. Is it possible for the teams that have not used test set data to resubmit the final code/prediction as well?

Sincerely,
Andrew

2022년 7월 1일 금요일 오전 9시 38분 58초 UTC+9에 김현우님이 작성:

Nick Landia

unread,

Jul 1, 2022, 12:48:53 AM7/1/22

to 김현우, RecSys-Challenge-2022

If you have not submitted any source code the first time round then this is your chance to submit it now and still be included!

--
You received this message because you are subscribed to a topic in the Google Groups "RecSys-Challenge-2022" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/recsys-challenge-2022/Q26k4QT6VSw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to recsys-challenge...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/recsys-challenge-2022/bab85770-12b2-480f-8aba-a0c4327ebe9dn%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

wanna

unread,

Jul 1, 2022, 1:02:00 AM7/1/22

to RecSys-Challenge-2022

Hello Nick,

Have we to upload our final submission and code at the same time?

Is it possible to check leaderboard score and private score during re-opening periods?

Thanks,

Wanna

2022년 7월 1일 금요일 오후 1시 48분 53초 UTC+9에 Nick Landia님이 작성:

Nick Landia

unread,

Jul 1, 2022, 1:25:23 AM7/1/22

to wanna, RecSys-Challenge-2022

It's pretty much the same as before. Leaderboard evaluation is still open and will remain available to be used.

You can upload final prediction file and code multiple times before the deadline if you wish but only the last version uploaded will be considered.

To view this discussion on the web visit https://groups.google.com/d/msgid/recsys-challenge-2022/37afa44c-7b7f-4f80-bdb8-919c3a1a6cb3n%40googlegroups.com.

Nick Landia

unread,

Jul 8, 2022, 5:37:20 PM7/8/22

to RecSys-Challenge-2022

The updated competition results are now live!

We have done a lot of checking and have been in contact with some of the top teams directly to ensure the test data is not used in a way that is not allowed in the last version of submissions. We look forward to the papers and reading about how the teams achieved such great results!

Also, just a reminder that all participants of the challenge are invited to submit a paper if they consider their submission particularly effective, novel, otherwise interesting, or exploiting identified particularities of the data.