Clustering algorithm

Никита Мамаев

unread,

Feb 28, 2018, 10:40:13 AM2/28/18

to computationalstylistics

First of all, thank you for this software, it's extremely helpful.

I was wondering if I could get a description of how the clustering algorithm works? I could've missed the segment in the documentation, I'm sorry if that's the case.

Joanna Byszuk

unread,

Feb 28, 2018, 11:30:59 AM2/28/18

to Никита Мамаев, computationalstylistics

Hi,

the information on clustering algorithms in stylo is in the section 8 of the HowTo doc: https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxjb21wdXRhdGlvbmFsc3R5bGlzdGljc3xneDpmM2U3OGUzZTM2YjkyYzM (or here https://sites.google.com/site/computationalstylistics/stylo pdf in section 4).

The clustering algorithms used in stylo are some of the most popular ones, the default is Ward's method, and other implemented options include:

"nj" - Neighbor joining, "single" - single link, "complete" - complete-link, "average" - average-link, "mcquitty" - McQuitty's, "median" and "centroid". Good descriptions of each of the methods can be easily found online, and you can change clustering algorithm used by calling stylo with proper parameters, e.g.: stylo(linkage = "nj")

And if you are looking for a general introduction to how clustering algorithms work, there are many nice introductions online, either more math-oriented or giving basic idea of the concept and methods, Stanford's is fairly good.

Best,

Joanna Byszuk

2018-02-28 16:40 GMT+01:00 Никита Мамаев <mrdetec...@gmail.com>:

First of all, thank you for this software, it's extremely helpful.

I was wondering if I could get a description of how the clustering algorithm works? I could've missed the segment in the documentation, I'm sorry if that's the case.

--
You received this message because you are subscribed to the Google Groups "computationalstylistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to computationalstylistics+unsub...@googlegroups.com.
Visit this group at https://groups.google.com/group/computationalstylistics.
For more options, visit https://groups.google.com/d/optout.

Никита Мамаев

unread,

Feb 28, 2018, 2:38:47 PM2/28/18

to computationalstylistics

Thank you for the fast and detailed answer, Joanna!

I'd like to ask one more question -- I'm currently writing a research paper on a stylometric problem. Can I substantiate the use of Ward's clustering method for handling text analysis data, and maybe more specifically, when using derivatives of Delta measure? Is there an article I can cite that states this method is particularly effective?

On Wednesday, February 28, 2018 at 7:30:59 PM UTC+3, Joanna Byszuk wrote:

Maciej Eder

unread,

Feb 28, 2018, 2:59:06 PM2/28/18

to computationalstylistics

Hi Nikita,

your question is a bit uneasy, because there have been no systematic studies that would convincingly solve this problem. I've been discussing the choice (or, the lack of any justification) of the linkage algorithm here, p. 51-52:

Eder, M. (2017). Visualization in stylometry: cluster analysis using networks. Digital Scholarship in the Humanities, 32(1): 50–64. <https://academic.oup.com/dsh/article-abstract/32/1/50/2957386>, a pre-print version of the paper is here.

The choice of the distance measure is even more complex. I think the Wurzburg guys will say more on that, since they've done some interesting comparisons. If they keep silent, google for "Cosine Delta". If nothing pops up, please refer to my pre-preliminary study, keeping in mind that a progress has been done since then:

Eder, M. (2015). Taking stylometry to the limits: benchmark study on 5,281 texts from “Patrologia Latina”, <http://dh2015.org/abstracts/>

All the best,

Maciej

Никита Мамаев

unread,

Mar 1, 2018, 10:25:30 AM3/1/18

to computationalstylistics

Many thanks for sharing your studies, Mr. Eder!

I presume you referred to this article by Peter W.H. Smith and W. Aldridge, right?

Reply all

Reply to author

Forward