How does subgrouping work with different levels of hypothesized similarity?

31 views
Skip to first unread message

Laura Murray

unread,
Dec 5, 2023, 3:42:17 PM12/5/23
to gimme-r
Hello! 
My question relates to a new analysis I’m planning. The data that we have includes participants with A LOT of heterogeneity…. We have some healthy controls, and then a bunch of subjects with a variety of substance use disorders and addiction severity. I’m wondering how the subgrouping GIMME walktrap community detection algorithm works when the potential groups might be more or less different from one another. Ultimately, I'm trying to decide if I include the controls in the sample, or if it is better to just look at heterogeneity within the substance users? If I use everyone, will S-GIMME stop after identifying the two most distinct groups (e.g., group1: controls and group 2: psychopathology) and then not dig deeper to differentiate potential additional subgroups of substance users? Would I get a finer-grained subgrouping if I excluded the healthy controls and asked GIMME to identify subgroups within the more homogeneous sample of only substance users?
Thank you! 
Laura


Jonathan J Park

unread,
Dec 6, 2023, 1:55:55 PM12/6/23
to gimme-r
Hi Laura,

This is a fantastic question regarding the nature of the clustering algorithm.
WalkTrap tries to identify clusters of individuals with similar dynamics by maximizing modularity which is defined as--essentially--the observed connectivity between two individuals compared to a random graph of the same degree of the total network.

Fortunato has previously done work on resolution limits to modularity maximization techniques which indicates that as the size of the total network increases, the ability for these kinds of algorithms to distinguish groups below a particular size becomes difficult; however, I don't think this will be a problem you will run into depending on your N.

I think the choice of running the models with the full sample versus the specific sample is more of a matter of your interests and the interpretability of the results.
For instance,  targeted focus on individuals with substance use disorders and "clusters" of individuals in that group versus trying to see if those with substance use differ in their dynamics relative to controls.

Hope this helps.

Best,
Jonathan J. Park

Laura Murray

unread,
Dec 7, 2023, 5:12:29 PM12/7/23
to gimme-r
Hi Jonathan, 
Thank you for this response! The sample size will probably be around 100-200 who endorse using substances, and 100-200 who do not endorse using substances (aka "controls"). Do you think that is large enough to avoid the issues of difficulty distinguishing groups? My overall hypothesis is that some "controls" may have dynamics that place them in a group with mostly substance users, and similarly that some with SUD may have dynamics that place them into a subgroup with mainly controls. I also think that some dynamics will make those who use substances more similar to one another, and some clusters within those who use substances. Maybe its silly, but I made a diagram of the point I'm trying to get at....... will S-GIMME be able to identify ALL of these potential groups (not just subgroup on the BIG hand-drawn circles,  but also be able to subgroup the smaller subgroups too?). If it will struggle to do this, then maybe it would be best for me to just focus on the substance use only sample. 

Many thanks for any guidance you can provide! 
Best, 
Laura
Screenshot 2023-12-07 at 4.27.46 PM.png

Katie Gates

unread,
Dec 8, 2023, 11:15:29 AM12/8/23
to gimme-r
Hi Laura and Jonathan, 

Thanks for the great insights, Jonathan! I agree that there might be a resolution problem, wherein sometimes you get "supergroups" that subsume smaller ones. In simulation studies for S-GIMME, Walktrap has been shown to do a pretty good job of separating out even small subgroups especially when the sample size is as large as yours. But, it all depends on how different the people in the subgroups are from other subgroups, and how homogeneous the within-subgroup patterns are. 

The drawing really helps. Correct me if I'm wrong, but it seems like you might have two research questions, Laura: (1) Do those with an SUD dx generally cluster together, and how do their networks differ from those that don't have one? and (2) Within those with SUD dx, are there meaningful subgroups? 

To me it seems like it might make sense to propose this plan of analysis: (1) S-GIMME with the entire combined group and (2) S-GIMME with just those that have SUD dx. The known problem with super clusters justifies this plan of action. 

What do you think?  
Katie

Laura Murray

unread,
Dec 11, 2023, 4:30:53 PM12/11/23
to gimme-r
Thank you so much for the additional info. Yes, I think a staged approach makes the most sense given the data and potential for super clusters. Thanks for helping me understand!
Reply all
Reply to author
Forward
0 new messages