Discretize variables

43 views
Skip to first unread message

Pedro Henrique

unread,
Sep 19, 2021, 8:37:23 PM9/19/21
to DSI Studio
Hello, Frank!

I am performing connectometry analysis and I want to regress out the effects of age, sex and education. However, education is a nominal variable with 7 levels (1 = low education... 7 = high education) and I want to know if the software will understand it or will consider it as a continous variable ranging from 1 to 7? I am considering to discretize the variable in 7 new variables (each one with 0 and 1 values only) and insert them in the analysis. I am also considering to do the same for any variable with more than 2 categories.

What is your suggestion?

Thank you very much

Frank Yeh

unread,
Sep 19, 2021, 9:15:55 PM9/19/21
to dsi-s...@googlegroups.com
The default connectometry setting (after 2021) will use the nonparametric Spearman correlation to consider the nonlinear relation between variables and diffusion metrics.

As long as they have incremental relation, the correlation should address the concern you have, and you don't have to create 7 variables.

Best regards,
Frank

--
You received this message because you are subscribed to the Google Groups "DSI Studio" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dsi-studio+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dsi-studio/f344ab23-cacc-444c-b2b0-a3d658726925n%40googlegroups.com.

Pedro Henrique

unread,
Sep 19, 2021, 9:25:18 PM9/19/21
to DSI Studio
Thank you very much for the amazing response. 

Frank Yeh

unread,
Sep 19, 2021, 9:36:35 PM9/19/21
to dsi-s...@googlegroups.com
Sorry, I just noticed that you list education as a covariate. The Spearman correlation will only be applied to the study variable, not other covariates. DSI Studio will only use linear regression to regress out its effect.

Nonetheless, it may not necessary to use 7 variables to eliminate the effect of education (also how about age?). A simple linear is a reasonable approximation unless it interacts a lot with your study variable.

Hope this helps.

Best regards,
Frank

Pedro Henrique

unread,
Sep 20, 2021, 4:14:32 PM9/20/21
to DSI Studio

Hello, Frank!

Thanks for the answer. I did the analysis considering both cases (with and without discretization of the education variable). I am attaching the results. It is possible to notice a difference. Should I consider discretizing the education variable or any variable that has more than two categories?

Additionally, I was expecting to have the result without the discretization, once as higher the WMH the lower the FA. Then, I am confusing about the appropriate approach. Any suggestions?
Discretize.pdf
No_discrete.pdf

Frank Yeh

unread,
Sep 20, 2021, 4:30:50 PM9/20/21
to dsi-s...@googlegroups.com
It seems to me that the discretize one had a problem, likely due to very few subjects in some of the education groups that deviated the regression solution to near singularity.

You should definitely use no discrete result and also increase the T threshold to 3 or higher due to the large sample size.

Hope this helps.
Frank


Pedro Henrique Rodrigues da Silva

unread,
Sep 20, 2021, 7:33:24 PM9/20/21
to DSI Studio
Thank you very much. About the T threshold, I was replicating some studies which used T =1, 2 and 3. But I will follow what you suggested. Should I consider T = 2, 3 and 4? Thank you!

Frank Yeh

unread,
Sep 21, 2021, 7:09:18 PM9/21/21
to dsi-s...@googlegroups.com
Yes, I will go for T=2 and 3.
4 may be too high but you can still give it a try.

Pedro Henrique

unread,
Sep 21, 2021, 10:25:28 PM9/21/21
to DSI Studio

Thank you very much!
Reply all
Reply to author
Forward
0 new messages