Help with linkage disequilibirum - .gz files

71 views
Skip to first unread message

Dr Jyothi

unread,
Jul 22, 2023, 3:49:33 PM7/22/23
to plink2-users
Hello Everyone, i am new to PLINK. I intend to perform clumping on SNPs from 1000 genome data which has .gz files. I am having trouble loading them onto PLINK.
I tried R but it takes a long time to process large datasets.

Any help is deeply appreciated.

Thank you,

Jyothi.

Khadija Sana

unread,
Jul 22, 2023, 6:56:21 PM7/22/23
to Dr Jyothi, plink2-users
Hello Jyothi, 
Can you please share more details? What is the file format, what options did you use in plink etc. That way it will be easier for others to answer your questions. If possible share the log file.

Best regards, 
Khadija

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/18dc6fa3-7a94-4878-a478-af9fe81ec9a3n%40googlegroups.com.

Dr Jyothi

unread,
Jul 22, 2023, 7:52:54 PM7/22/23
to Khadija Sana, plink2-users
Hi Khadija,

I have a .txt file with SNPs. I intend to perform clumping. I started off using LDlinkR package in R. LDmatrix has a limit of 3000SNPs and therefore I had to split the data manually, verifying the chromosomes and positions being included. My .txt file has information on SNPID, Chromosome, Position, Pvalue, tested and other allele. I was able to form the LD matrix and use R square threshold to filter out the SNPs that are correlated. I am stuck from there, and I am unable to get a working code that effectively filters out correlated SNPs.

I want to try PLINK as I heard its faster. However, I wasnt able to load the .txt file. I went back and got the .gz file from the 1000genomes dataset. However, I am unsure how to load it. PLINK documentation only talks about .bed, .ped or .map files.

Thank you,

Jyothi.

Christopher Chang

unread,
Jul 23, 2023, 12:26:26 PM7/23/23
to plink2-users
You must have only looked at the nearly-15-year-old PLINK 1.0 documentation.

In PLINK 1.9, --vcf (https://www.cog-genomics.org/plink/1.9/input#vcf ) can be used to import the 1000 Genomes .vcf.gz files.  Note that you should import to the .bed+.bim+.fam format (--make-bed), not the obsolete .ped+.map format.

Reply all
Reply to author
Forward
0 new messages