Adding metadata retrieval to corpus readers

31 views
Skip to first unread message

Andrew P

unread,
May 22, 2013, 4:44:07 PM5/22/13
to nltk...@googlegroups.com
A friend of mine noticed that the BNC reader didn't have any means of accessing the BNC metadata - metadata which is probably the real value of the BNC corpus in the first place, in many ways; I'm going to write up a new version of it for him, and was wondering if there were guidelines in terms of dependencies (lxml?), style, etc that I should adhere to if I wanted to be able to contribute it when I was done. Are there other corpusreaders that could use similar treatment? I probably don't want to fuss with the base xml corpus reader class, but if I'm going to write out some etree-traversing code for one corpus, I may as well do it for a couple while I'm in that headspace. 


Cheers,
Andrew
Reply all
Reply to author
Forward
0 new messages