Sorry about the issues with DART - the file in the original repository was changed which means that the examples don't match up anymore. This is something we cannot easily fix, but I am currently looking into the issue. CommonGen and ToTTo have a private test set, so we unfortunately cannot distribute any infos here.
For your system description, are you able to get scores on the validation set? If not, please open an issue on the metrics repository and we will try to help / run the evaluation for you.