Time-Series Lookahead

246 views
Skip to first unread message

TrueRank

unread,
Apr 2, 2022, 5:39:05 PMApr 2
to RecSys-Challenge-2022
The rules specify:
  • Do not use data from the test sessions in model training, use test session data only in the prediction step

Will submissions be restricted to:
a) access only item information (e.g. item popularity) from timestamps prior to the date of (or last timestamp of) each session?
b) above, but only using leaderboard or final set, respectively, not both
c) not use *any* leaderboard or final set data

Nick Landia

unread,
Apr 4, 2022, 10:56:37 AMApr 4
to RecSys-Challenge-2022
Hey,

short answer: it's c).

As background information for everyone, this page has details about the training test split: http://www.recsyschallenge.com/2022/dataset.html

For the rule mentioned, it means when you train the model only use the data from the training file, don't use any data from the leaderboard or final test set files. When predicting, treat each test session independently of all other test sessions. I.e. when predicting for test session B the model should not have any knowledge of test session A even if that came before it in terms of time stamp.
Message has been deleted

jiwei liu

unread,
May 19, 2022, 3:52:56 PMMay 19
to RecSys-Challenge-2022
Thank you for the clarification. Could you please update the rule with your message above? It is broader than what the rule says. The rule literally just forbids training with test data. But what you said here also forbids test time augmentation using other sessions and post-processing of predictions using other sessions in the test data. I'm not sure everyone in the competition sees this thread. It should be clearly stated as part of the rule officially in the competition web page.

Best regards,

Nick Landia

unread,
May 20, 2022, 10:39:54 AMMay 20
to RecSys-Challenge-2022
yes, good shout
Reply all
Reply to author
Forward
0 new messages