> I am using 2012 corpus for some work and noticed that the KBAScore.py
> script does not check for duplicate docids.
IIRC, the validation script that runs when uploading official run
submissions rejects submissons containing duplicate stream_ids, so the
scoring script did not need to handle this. We might add this enhancement
in the next rev of the KBAscore tool.
See here for info on duplicate doc_ids and stream_ids:
https://groups.google.com/d/msg/streamcorpus/Bsd1XF-aLpY/UqXg1irNQMUJ
jrf