Short answer: We will check the system reports and the code.
Long
answer: In the times of LLMs, leakage is not an easily defined concept.
In the old days of ML this was different, when leakage mostly referred
to overlaps of train and test samples. Such simple cases could still
happen if someone fine-tuned on Wikidata etc., and this is not allowed.
However, with LLMs there is the general step of pre-training, that makes
the issue quite ill-defined. In pre-training, all state-of-the-art LLMs use
large web document collections, naturally including Wikipedia. So they will
have seen relevant content, and most likely infoboxes too. We do not
penalize this for two reasons: (i) by extension, any content on the web is problematic, and so one would have to define automated evaluation data that appears nowhere on the web (which is impossible), or employ post-hoc human annotation of system results (which however entails that systems can be evaluated reasonably only a single time), (ii) the focus of the challenge is not to predict totally-never-ever-reported anywhere topics, but to see, to which
degree LLMs distill web content so as to allow KBC in the long-tail, for topics that likely appear
here and there in web documents, but not at high frequency.
Summary: We (a) will check on a case-by-case basis, (b) put trust in the author's honest self-reporting of potentially critical issues (c) accept that leakage is not a binary yes/no concept, but that there may be gray cases (which do not invalidate work and may make interesting points for discussion).