Adding metadata retrieval to corpus readers

31 views

Skip to first unread message

Andrew P

unread,

May 22, 2013, 4:44:07 PM5/22/13

to nltk...@googlegroups.com

A friend of mine noticed that the BNC reader didn't have any means of accessing the BNC metadata - metadata which is probably the real value of the BNC corpus in the first place, in many ways; I'm going to write up a new version of it for him, and was wondering if there were guidelines in terms of dependencies (lxml?), style, etc that I should adhere to if I wanted to be able to contribute it when I was done. Are there other corpusreaders that could use similar treatment? I probably don't want to fuss with the base xml corpus reader class, but if I'm going to write out some etree-traversing code for one corpus, I may as well do it for a couple while I'm in that headspace.

Cheers,

Andrew

Reply all

Reply to author

Forward

0 new messages