Hi John,
I apologize for the delayed response.
We have been using the BioNLP-ST format since 2011 and contemplated using other formats. We considered CoNLL format but ruled it out for several reasons:
- we never, still don't, and probably never will provide gold tokenization or gold sentence splitting
- the BIO model is unpractical to represent discontinuous and overlapping entities, these phenomenon cannot be neglected in the BB corpus
- relations and normalization are awkward to represent due to the lack of annotation identifiers
We do provide a tokenization and sentence splitting on the Web site (in the section "Supporting Resources"), though it is not gold, it has been computed with a domain-specific set of patterns.
If you happen to succeed at translating the data in CoNLL, and if you agree, then I think it may be valuable resource to provide other participants.
Best regards,
Robert