I'm interested in identifying frequency/range of word families across approx. 1,500 academic articles. Two questions:
1. I'm hoping to sort the families by GSL 1000, 2000, AWL, and offlist. The program seems to provide this info through the first 3 lists, but it uses types (rather than families) for offlist words. Is there a way for the program to "familize" offlist words? I've considered trying to use the BNC/COCA lists, but there is so much overlap w/ the GSL 1000/2000 + AWL....
2. Approx. how long should it take to run 1,500 files (ranging in size from 100-2000KB) through the profiler? I've tried now on 3 different computers: 4hrs., 6hrs., and 10hrs., and it doesn't get beyond "creating lexical profile"). (Note: I used antfileconverter to convert pdfs to txt, and then EncodeAnt to endure the files are utf8)
Thanks for any advice!