Leaderboard Updates

Skip to first unread message

Michael Majurski

Sep 22, 2022, 11:16:33 AM9/22/22
to trojai-community
We have now enabled the multi-leaderboard web-site. https://pages.nist.gov/trojai

This release now makes it possible to submit containers to multiple leaderboards and datasets. Going forward, we will no longer use the naming convention round#, but instead use the format '<task name>-<date>'. To submit to a round you will need to name your container in a way to target the leaderboard you are submitting too as well as the dataset. For instance, "object-detection-aug2022_test_my-awesome-container.simg" would submit to the object-detection-aug2022 leaderboard and the test dataset. Specific requirements of how to submit are provided on the web-site next to each dataset name.

As we spin up new leaderboards, we will keep older leaderboards on-line so that users can submit to them; however, the new leaderboard should take priority, and should be the focus of the round.

There are a couple of extra features that also come with this release.

First, after you finish executing on the test dataset, we will automatically submit your container to the train dataset if you have not already run that exact container (with just a renaming) on the train split of the leaderboard. The idea here is to track progress on both datasets, which could provide useful metrics for us when analyzing overfitting. This auto-launch is a convenience to you, so that you don't need to manually submit to both if you want just test results. We expect the use pattern to be a team develops and submits several containers to train, and once they are happy with its performance, the container will be renamed and submitted to test.

Second, we have made it possible to view older leaderboards that have retired. This will keep a history of all prior rounds that we have explored. Currently we are only showing round2 (renamed to "image-classification-aug2020"). Archived leaderboards are for visualization purposes only, so you cannot submit containers to these leaderboards. We are in the process of converting all prior rounds, so eventually all of them will be added.

Lastly, we have created a mechanism to share additional result data with actors. This will be used to compute metrics on your submissions, such as plots, and then share them with you on Google drive. The framework for this in place, so expect new plots to be coming soon. We'll send another slack announcement when any new metrics are added. Here is a link to the metric class: https://github.com/usnistgov/trojai-test-harness/blob/multi-round/leaderboards/metrics.py. If you have any suggestions/requests, feel free to let us know, and we can work with you to implement them. The main idea here is to provide useful meta-analysis of your container execution without providing too much information.

I expect issues to come up, so submit when you can to the new structure, and if you find any issues post them in the test-server channel or email the TrojAI T&E team at tro...@nist.gov.

Also just remember another 'feature'. From now on we will highlight performer submission timestamps in the jobs tables. If your submission is old (>1 week), then it will be color orange and then red when it becomes very old (>2 weeks). This is to help us track who is actively submitting to a round. If you have no entry in the submission timestamp in the jobs tables, then that indicates to us that you have never submitted for that leaderboard. Green colored submissions are mean you have submitted within a week.

With the new leaderboard updates, Pay attention to this new error:
Shared File Error - Team has an issue with one or more of the shared files.
"Format" indicates incorrect file name, should be "leaderboard_name-data_split_name".
"Leaderboard name" indicates invalid leaderboard name.
"Data split name" indicates invalid data split name."

This is not a failure, it just indicates there are files you are sharing with the TrojAI google drive account which do not map to any leaderboard.

Just to be clear any message in the "General Status" messages are warnings for you, and would not prevent job launches. They are purely to let you know that you have files shared that are formatted in an unexpected way. So if you remove those files from being shared with trojai, then you can quickly resolve that message.

Finally, with the switch to the new round naming convention, we have cone through and renamed the past rounds. This name update should already be reflected for files on Google Drive, and data.nist.gov will be updated as soon as the system is back on line. 

Below I have included a full list of the name translation for those who are interested.

The TrojAI Team

Round1 Train = image-classification-jun2020-train
Round1 Test = image-classification-jun2020-test
Round1 Holdout = image-classification-jun2020-holdout

Round2 Train = image-classification-aug2020-train
Round2 Test = image-classification-aug2020-test
Round2 Holdout = image-classification-aug2020-holdout

Round3 Train = image-classification-dec2020-train
Round3 Test = image-classification-dec2020-test
Round3 Holdout = image-classification-dec2020-holdout

Round4 Train = image-classification-feb2021-train
Round4 Test = image-classification-feb2021-test
Round4 Holdout = image-classification-feb2021-holdout

Round5 Train = nlp-sentiment-classification-mar2021-train
Round5 Test = nlp-sentiment-classification-mar2021-test
Round5 Holdout = nlp-sentiment-classification-mar2021-holdout

Round6 Train = nlp-sentiment-classification-apr2021-train
Round6 Test = nlp-sentiment-classification-apr2021-test
Round6 Holdout = nlp-sentiment-classification-apr2021-holdout

Round7 Train = nlp-named-entity-recognition-may2021-train
Round7 Test = nlp-named-entity-recognition-may2021-test
Round7 Holdout = nlp-named-entity-recognition-may2021-holdout

Round8 Train = nlp-question-answering-sep2021-train
Round8 Test = nlp-question-answering-sep2021-test
Round8 Holdout = nlp-question-answering-sep2021-holdout

Round9 Train = nlp-summary-jan2022-train
Round9 Test = nlp-summary-jan2022-test
Round9 Holdout = nlp-summary-jan2022-holdout

Round10 Train = object-detection-jul2022-train
Round10 Test = object-detection-jul2022-test
Round10 Holdout = object-detection-jul2022-holdout

Round11 Train = image-classification-sep2022-train
Round11 Test = image-classification-sep2022-test
Round11 Holdout = image-classification-sep2022-holdout
Reply all
Reply to author
0 new messages