Fixing an Inconsistency Between Baseline Output, Check Script, and Guidelines

12 views
Skip to first unread message

Cody Buntain

unread,
Sep 9, 2022, 12:23:42 PM9/9/22
to TREC-IS
Hi, all.

I just found an inconsistency between the GitHub code and the CrisisFACTS guidelines. The guidelines say run files should be newline-delimited JSON, but the baseline produces a JSON file containing a single array.

To resolve, I erred on the side of the guidelines and have updated the check script and the baseline to produce newline-delimited JSON rather than an array.

If you have the JSON-array version already, you can quickly convert to the newline-delimited JSON format via the following snippet:

with open("revised_submission.json", "w") as out_file:
    for idx,row in pd.read_json("orig_submission.json").iterrows():
        out_file.write("%s\n" % json.dumps(dict(row)))

There was also a bug in the baseline code, where I wasn't normalizing the importance scores into [0,1]. That's fixed too.

Sorry for the inconvenience!

-Cody
Reply all
Reply to author
Forward
0 new messages