calculating a subset of genetic correlation matrix

Uku Vainik

unread,

Feb 14, 2024, 3:06:36 AMFeb 14

to Genomic SEM Users

Hi!

I'm interested in calculating correlations between two sets of variables, Set A and Set B. The outcome would be similar to this figure.

https://www.biorxiv.org/content/biorxiv/early/2021/08/19/2021.08.18.456908/F4.large.jpg

I like working in R, so the default action is to put everything into ldsc() function. However , this will make a very long calculation, since the sets are very long, dozens or hundreds of variables. Ideally, I would like to avoid calculating all within-Set correlations, ie within variables in Set A and within variables in Set B.

Right now, I have set up a pairwise correlation loop that reads in two variables at a time, one from Set A and one from Set B and then calculates the correlations. But this spends a lot of time reading in the sumstats.

Is there a way to read in sumstats at once but have the ldsc() focus on a specific subset of the correlation matrix?

Thanks!

Uku

Michel Nivard

unread,

Feb 14, 2024, 3:27:16 AMFeb 14

to Uku Vainik, Genomic SEM Users

Hi Uku!

(Good to hear from you it’s been a while!) this sounds like an excellent option to have. Currently it’s actually the default behavior in the original python implementation of ldsc, so that might be an option for you?

Happy to work on an implementation in our ldsc as well but that might take a bit longer…

Best,

Michel

Op wo 14 feb. 2024 om 09:06 schreef Uku Vainik <ukuv...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "Genomic SEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genomic-sem-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/38ab4c9e-3b4b-4e2f-acbc-24f3afd459c7n%40googlegroups.com.

Uku Vainik

unread,

Feb 14, 2024, 3:33:04 AMFeb 14

to Michel Nivard, Genomic SEM Users

Hi Michel!

Great to hear from you, too! Thanks for getting back so quickly and pointing me to the python direction.

If I plan ahead, I can live with a remote computer spending a week calculating all the scores. It has a slow drive, I think 😃

To optimise the reading part - is there an agreed upon minimum set of EUR SNPs? Right now, I use the HM3 snp set provided by the Boulder tutorial

Thanks

Uku

Michel Nivard

unread,

Feb 14, 2024, 4:56:47 AMFeb 14

to Uku Vainik, Genomic SEM Users

I think you tried the two extremes, namely pairwise and all, I would try to run it in small chunks but not too small to optimize reading. So say set A is 100 traits, and set B is 100 traits, I would split sets A and B into 10 subsets of 10 traits and run in that way? You'd never match any of the A subsets with other A subsets, or B subsets with other B subsets, and that would cut down on computation a lot.