Joining clusters

8 views
Skip to first unread message

Magali

unread,
May 16, 2012, 5:12:48 AM5/16/12
to WordSmith Tools, natassi...@uclouvain.be
Dear Mike,

I’ve been using the ‘joining clusters’ option in WST5 recently and I
thought it was great to see related clusters together (and read fewer
of them :-) ). However, there is one major reason why this option is
still not ideal for research, I think: the total frequency of joined
clusters consists of an addition of all frequencies:

Example for ‘joining smaller ones to largest’ :

Est identique à celui 308 est identique à celui [15], à
celui [293]

= The total frequency is incorrect (308 instead of 15) and
dramatically inflated


Example for ‘joining larger ones to smallest’:

Sur les 3,801 sur les [3774], à distance
sur les [27]

= The total frequency of ‘sur les’ is incorrect (should still be 3774)


Have you had some feedback on the joining clusters option?

Would you consider revising the option to re-compute the frequency of
clusters so as to avoid inflated frequencies? That would be most
useful! Or perhaps this is something you already implemented in WST6?


I guess that the ultimate tool would be able to recompute frequencies
of smaller clusters (so as not to count things twice – not to count
occurrences that are already counted in larger bundles).


Many thanks for your help,

Magali Paquot


Mike Scott

unread,
May 16, 2012, 5:18:19 AM5/16/12
to WordSmith Tools
Dear All

Magali said:

> of them :-) ). However, there is one major reason why this option is
> still not ideal for research, I think: the total frequency of joined
> clusters consists of an addition of all frequencies:

> Have you had some feedback on the joining clusters option?
>
> Would you consider revising the option to re-compute the frequency of
> clusters so as to avoid inflated frequencies? That would be most
> useful! Or perhaps this is something you already implemented in WST6?
>
> I guess that the ultimate tool would be able to recompute frequencies
> of smaller clusters (so as not to count things twice – not to count
> occurrences that are already counted in larger bundles).

No, I haven't had feedback on this issue, that I recall, at least
since it was implemented in WS5. I agree with you that things are a
bit confusing.

I guess if you have clusters like this

P Q R S
R S
P Q R

there would be two different questions, A) what to show and b) what
frequencies to put.

A) SHOWING
Would you want them joined and shown so that
a "head-cluster" RS shows PQRS as a member
and
a different "head-cluster" PQR shows PQRS as a member too
but PQRS disappears from the list of head-clusters when scanning down?
(more economical list)
Or PQRS is in the list, even though it is also in the RS and PQR
cluster displays (much longer list)

B) FREQUENCIES
These as you have seen are at present computed by adding the total of
the members to the total of the head-cluster. I think that is probably
the best way (and matches what happens to single-word lemmas), and the
real problem is in A) showing.

Perhaps you and other users could advise me as to what is best?

Best -- Mike

Reply all
Reply to author
Forward
0 new messages