A friend of mine noticed that the BNC reader didn't have any means of accessing the BNC metadata - metadata which is probably the real value of the BNC corpus in the first place, in many ways; I'm going to write up a new version of it for him, and was wondering if there were guidelines in terms of dependencies (lxml?), style, etc that I should adhere to if I wanted to be able to contribute it when I was done. Are there other corpusreaders that could use similar treatment? I probably don't want to fuss with the base xml corpus reader class, but if I'm going to write out some etree-traversing code for one corpus, I may as well do it for a couple while I'm in that headspace.
Cheers,
Andrew