This is perhaps a naive question, but I was wondering if it is possible to switch from the current set of 3 reference lists to a subdivided list with substantially more coverage?
I am trying to work with some literary texts in order to quantify literary style or register. The two texts that I am currently interested in are "The Short Happy Life of Francis Macomber" by Ernest Hemingway and "Winter Dreams" by F. Scott Fitzgerald.
I have a briefcase containing the BNC_COCA_25000 on my desktop, but I am not sure how to make WordProfiler access it. To be honest, I also cannot find the location on my computer of the current three-way reference list that you are using (1-gsl_1st_1000, 2_gsl_2nd_1000 and 3_awl_570).
Obviously, I would then need to slice up the BNC_COCA_25000. Ideally, I think a slicing of the corpus into 9 subdivisions would be great. But that might not be possible, of course.
On a related note, it would be amazing if a future version of the AntWordProfiler could graph the distribution of the words, after they have been sorted with a large corpora list like the BNC_COCA. My strong impression is that the graphs would describe a variety of curves that obey Zipf's Law, and allow us to see, for example, that Fitzgerald accesses a much greater range of rarer vocabulary items than Hemingway does.
At any rate, as I slowly become more familiar with AntWordProfiler, I can see we are once again deeply in your debt, Professor. The programs you have set up are a major advance on coding in R or RStudio for a whole range of tasks!
All best wishes,
Terence Murphy
Full Professor of Rhetoric and Composition
Yonsei University
Seoul, KOREA