Major Discrepancy in input test file

Swar Vaidya

unread,

Apr 18, 2023, 2:22:00 AM4/18/23

to ipl-contest-23

Hi Organizing team,

I have observed a major discrepancy in the input file which is being used to evaluate the model. In some of the matches, the input test file does NOT contain all the players who played in the first 6 overs.

For example in match_id_20 (RCB vs DC, link to the scorecard), in the second innings DC had lost 4 wickets inside the first 6 overs. That means 6 batters would have played in the first 6 overs, namely : Warner, Shaw, Marsh, Pandey, Dhull, Porel. However, the input test file contains the name of only the top 3 batters : Warner, Shaw, Marsh. This would create a significant difference in prediction by our trained models. Also for the same innings, the bowlers list contains only the names of 2 bowlers : Siraj & Parnell. However in the match, Vyshak had also bowled in the powerplay.

I have not checked each and every test file manually, but I am afraid that such kind of errors might be present for many other matches as well. This is really a concerning point, and I request the organizing team to rectify this mistake, and take utmost care to not make such errors in future, as such mistakes will make the contest completely lose its fun and essence. Please thoroughly check for each and every input test file, and do make sure that whatever data is present in that file is accurate.

Rahul Maurya

unread,

Apr 18, 2023, 9:46:24 AM4/18/23

to ipl-contest-23, Swar Vaidya

It is quite alarming as it appears that the organizing team hastily prepared for the contest. Some participants are experiencing difficulties with their team IDs not being accepted and their dashboards not updating, despite several days having passed since the submission began.

In regards to team ID, why is there a need to request people to input it? Your system should automatically match email addresses with their corresponding team IDs. If not, it could pose a significant security risk. To illustrate, imagine if Twitter requires users to enter a username before posting a tweet, someone can simply enter "ElonMusk" and post fake tweets as Elon Musk without any difficulty.

I earnestly request the organizing team to address the issues that participants are encountering and improve the system's reliability and credibility.

Swar Vaidya

unread,

Apr 18, 2023, 11:00:54 AM4/18/23

to ipl-contest-23, Rahul Maurya, Swar Vaidya

A gentle reminder to the organizers for the same. Request you to fix this issue ASAP, so that the Dashboard (once it gets updated after all) does not reflect incorrect predictions.

Message has been deleted

Rahul Maurya

unread,

Apr 18, 2023, 2:10:52 PM4/18/23

to ipl-contest-23, Niranjan R, Swar Vaidya, Rahul Maurya

Yes. The organizing team should promptly address the issues and be transparent about their evaluation infrastructure.

The security aspect is particularly concerning because the current process for handling user-submitted data, such as code uploaded in zip files with team names, is flawed. This could potentially lead to a situation where someone uploads bad code with someone else's team name, and the evaluation system links the results to the wrong team ID. The evaluation system should automatically link user-submitted code to the correct team ID without manual input.

And, it's still unclear why the dashboard hasn't been updated yet. If the organizing team is facing technical difficulties, they should ask for help from our community of engineers and we would be willing to assist.

On Tuesday, April 18, 2023 at 10:15:49 PM UTC+5:30 Niranjan R wrote:

Hello organizing team,

To add to the list of issues participants are bringing up (and we are at day 4 of the contest when folks can't even see their scores and where they stand in the overall leaderboard!!), while you're working on ironing out the technical glitches, please ensure you test (functional/load - both) the portal on the following (these are the obvious ones that come to my end, I am sure there are more). Please treat these as constructive feedback - I know you all have been/are working hard on resolving the glitches to provide a streamlined experience to everyone and make the contest a fun filled learning experience:

Point # 1. Besides the functional aspects (that files, if there has been atleast one submission from a participant are indeed picked up for execution, in case there are failed executions, would suggest that error message are little bit descriptive - these will reduce so many support queries constantly raised on this forum when the participants themselves know for a fact that there is a mistake at their end). It would have been a much better user experience if the participants, once they have uploaded their code, could see the status on the portal right then and there (with the last upload timestamp, when was the last time the latest code was executed etc etc.), but I presume that's something that cannot be built in the near term when the contest is underway.

Point # 2. Please ensure that the latest code (if submitted before the prescribed timelines - a few hours before that match begins) and NOT any previous version of the code) is picked up for execution because I have a hunch (will be glad if proven wrong) that this can be the next broad category of system issues participants will begin reporting (issues such as "the portal shows 15 as the absolute error of the last match wherein if I run the same code on my local machine with my latest code (which ,by the way, has also been uploaded in time to the portal shows 25 as the absolute error").

Point # 3. The 20 second execution limit can be a problem going forward - how are you distinguising between these two very different cases?
Case a: The code was indeed executed for a full 20 second and it goes on and on and just do not finish within the allocated 20 seconds. The training step will be the longest (for sure). Then yes, there is a point in flagging those executions as genuine "time out" cases.
-versus-
Case b: The execution environment itself slows down (for whatever reasons - resource contentions/locks etc.) and could not finish in the allocated time even in the case, which otherwise, should have taken a mere 5 seconds (for example) had there been no issues in the execution environment.

Can the portal distinguish between the two? If no, then there will be a significant amount of time out cases which will have to be looked at on a case to case basis. If yes, then for case (b), (I'd say) there should be some automated retry built in. Another option (and I am just thinking out loud here) - can't the serialized version of the models be run (say, pickle files or other options that's available within the list of software versions that's allowed in the contest) - this should reduce the running time (significantly) on the portal side. As far as the actual training code goes (feature engineering plus training), that could be another method in the Model class (which is kept aside for manual review purposes) and not actually executed by the runtime? I know this is far from being an ideal option (you need to have the full code for a later review, not just the serialized model), but if that's the case, then system should have the ability to distiguish between a genuine time out (caused by the program) versus a 'false positive" timeout (caused not due to the code, but due to (intermittant) issues in the execution environment.

Swar Vaidya

unread,

Apr 19, 2023, 2:44:12 AM4/19/23

to ipl-contest-23, Swar Vaidya

Second Reminder for the same.

Organizers, it is a major discrepancy, and has the capacity to mess up the leaderboard. It is astonishing that you are not even addressing this issue. I firmly request you to provide some response in regards to this issue at the earliest possible. The contest is rapidly losing its fun and esteem because of such major issues not being acknowledged.

On Tuesday, April 18, 2023 at 11:52:00 AM UTC+5:30 Swar Vaidya wrote:

Reply all

Reply to author

Forward