KLD4C: New book coming out soon

36 views
Skip to first unread message

Stefan Th. Gries

unread,
Jun 3, 2024, 11:36:54 AMJun 3
to CorpLing with R
With apologies for cross-posting and self-advertising: Keep your eyes open for my new book,
  • whose website you can find here;
  • whose resources website you can find here.;
  • which is expected for October (I just sent off the corrections for the second proofs and the index).
Here is some more info about the book (which kinda of complements both QCLWR2 and SFLWR3):

1. The book's blurb (from JB's website above) is this:

This book is an attempt to revisit the main specifically corpus-linguistic statistics/measures the field has been relying on for decades: frequency, dispersion, association, and keyness. The book first discusses the purpose of these measures and how they have been measured. Then, the book makes three main proposals: First, that many measures of dispersion, association, and keyness are too confounded with frequency and how to 'take frequency out of them' to obtain conceptually cleaner and more interpretable measures. Second, that many existing measures can be replaced by the simple information-theoretic measure of the Kullback-Leibler divergence and that it, too, can have frequency 'removed' from it. Third, that corpus linguistics should abandon the tradition of trying to describe its findings with a single number and adopt a tupleization approach instead, where we use several separate dimensions of information for description and interpretation. The book is written in an informal, hands-on style [much like SFLWR3] and comes with its own R package featuring functions, example data, and several thousand lines of code exemplifying all applications.

2. One bit from the introduction summarizing the book fairly well:

Corpus-linguistic statistics such as frequencies, dispersion, association, and keyness measures have been around for decades and have given rise to a huge number of often insightful results. Nevertheless, in a way and without an intent to oversell or self-aggrandize, the purpose of this book is nothing less than an attempt to, minimally, (re)think out loud what nearly all of us have been doing over the last few decades (and, again, that includes earlier work of mine!) and to, maximally, revolutionize our (applications of) corpus-linguistic statistics. We will see – in the sense of ‘time will tell’ – how far on this continuum I will be able to push things with you, but I at least hope to expose both beginners and long-time corpus users to a lot of food for thought and to get more discussion started regarding what our corpus statistics actually do and represent. I certainly would have loved to know all of what’s in this book 10?, 15? years ago, and I would have loved even more to not have been trying to learn, unlearn, re-learn, and figure everything out on my own but to have had a book that at least guides my thinking about everything (even if time were to reveal that this book, too, is probably not always right either). As someone with quantitative interests but actually no quantitative training at all, I sometimes wonder how much better and/or more productive I could have been had I known all this earlier … To give you just some examples of what I mean by that: I will try and make readers rethink ideas that have been widely held and used for studies such as
  • corpus frequencies are the best way to measure ‘commonness’;
  • dispersion measures measure dispersion;
  • range is a good dispersion measure that complements frequency counts very well;
  • the log-likelihood value G^2^ is a good association or keyness measure;
  • association measures such as the log-likelihood value G^2^ or p~Fisher-Yates~ also indicate whether a collocation is significant;
  • bi-directional association measures mostly reflect how two things are mutually associated, or associated with each other;
  • collocations are ‘interesting’ when, say, their (P)MI-score exceeds 3;4
  • more generally, rank-ordering collocations by any such score is unproblematic;
  • to capture interesting low- and high-frequency collocations we should use PMI and the t-score at the same time.
Obviously, you all need to buy it for yourself, your students, and your universities, and I'm sure it's also a great Christmas, Mother's, and Father's Day gift, if not even the perfect present for all your friends and family! ;-)

Stefan Th. Gries

unread,
Jul 9, 2024, 7:46:35 AMJul 9
to CorpLing with R
Last self-advertisement: the new book is out so buy copies for you and all your friends and family members!
Reply all
Reply to author
Forward
0 new messages