Using dosage files with genetic risk scoring

333 views
Skip to first unread message

Patrick Sullivan

unread,
Dec 29, 2014, 10:10:36 AM12/29/14
to plink2...@googlegroups.com
Hi - 

Above all else, thanks for putting all the time into plink v2 - wow - I've found it extremely useful in making analyses possible that would otherwise be very difficult. If you ever need a letter for a grant or something ... 

I've read the plink2 documentation and searched this group, wasn't able to find an answer. Given that there was a very recent fix to a --dosage related function, I also tried the most recent build (PLINK v1.90b2t 64-bit (20 Dec 2014)). 

I need to compute genetic risk scores using dosage data. The dosage files are in 3 mb chunks, 924 in total. In theory, plink2 should handle this but in practice I get an error. Each dosage file has a header line which I suspect is the problem. 

Am I doing something wrong? A potential work-around would be to combine all the files into one big .gz after stripping all but 1 header line or to do it in 924 separate runs and then combine - but inelegant. 

Have tried: 
- the FID/IID it complains about below is present once in each header and once in the .fam
- RTFM
- searched this group
- adding "sepheader" or "noheader" after list in the first command line below gave a different error
- it seems to work correctly if I give it 1 dosage file on the command line or in the dose list file 
- if there are 2 or 10 or 924 files in dose.list, get the same error as below
- the complicated FID/IID have worked with plink2 for many other runs

Help? Best, PF Sullivan
 

COMMAND

$ ./plink2 --dosage ~/sw/1kg/dosage.data/swe1.dose.list list format=2 \

>   --fam ~/sw/1kg/dosage.data/swe1.fam \

>   --allow-no-sex \

>   --out ~/sw/grs2/output/swe1.add1 \

>   --score ~/gwas.results/assoc/add1.score sum \

>   --q-score-range ~/sw/grs2/qsrange.txt ~/gwas.results/assoc/add1.qscore


LOG FILE

PLINK v1.90b2t 64-bit (20 Dec 2014)        https://www.cog-genomics.org/plink2

(C) 2005-2014 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to /sullivan-lab/pfsulliv/sw/grs2/output/swe1.add1.log.

32097 MB RAM detected; reserving 16048 MB for main workspace.

435 people (226 males, 203 females, 6 ambiguous) loaded from .fam.

Ambiguous sex IDs written to

~/sw/grs2/output/swe1.add1.nosex .

435 phenotype values loaded from .fam.

Using 1 thread (no multithreaded calculations invoked).

435 people pass filters and QC.

Among remaining phenotypes, 221 are cases and 214 are controls.

--score: 65913 entries loaded from

~/gwas.results/assoc/add1.score.

--q-score-range: 8 ranges loaded.

--dosage: Reading from

~/sw/1kg/dosage.data/dasu_scz_swe1_eur-qc.ch.fl/dos_scz_swe1_eur-qc.ch.fl.chr10_000_003.out.dosage.gz,

~/sw/1kg/dosage.data/dasu_scz_swe1_eur-qc.ch.fl/dos_scz_swe1_eur-qc.ch.fl.chr1_000_003.out.dosage.gz,

...

~/sw/1kg/dosage.data/dasu_scz_swe1_eur-qc.ch.fl/dos_scz_swe1_eur-qc.ch.fl.chr9_138_141.out.dosage.gz,

and

~/sw/1kg/dosage.data/dasu_scz_swe1_eur-qc.ch.fl/dos_scz_swe1_eur-qc.ch.fl.chr9_141_144.out.dosage.gz.

Error: 'case_scz_swe1_eur_A5.0*Sw1_PT-1RTZ STORK_36112292' appears multiple times.


Christopher Chang

unread,
Dec 29, 2014, 12:24:31 PM12/29/14
to plink2...@googlegroups.com
Has --dosage worked without --score on that combination of files?

If yes, then it's probably a --dosage + --score bug; I'll investigate this later today.  Test input files and a command line I can use to replicate the error would be helpful.

Patrick Sullivan

unread,
Dec 29, 2014, 5:08:01 PM12/29/14
to plink2...@googlegroups.com
I've replied directly to you ... 

Christopher Chang

unread,
Dec 29, 2014, 5:43:34 PM12/29/14
to plink2...@googlegroups.com
Okay, it turns out that you just need to change your dose.list file format.

When you only have a filename on each line, PLINK assumes that each dosage file has different samples.  If they all contain the same samples, but different SNPs, you should use the "variant batch numbers" format, e.g.

1 dosage_file_1
2 dosage_file_2
3 dosage_file_3
...

instead of just

dosage_file_1
dosage_file_2
dosage_file_3
...

Let me know if you still have any problems.

Patrick Sullivan

unread,
Dec 30, 2014, 7:50:36 AM12/30/14
to plink2...@googlegroups.com
Thanks for the super quick response, and explanation. 

Perhaps add this to the plink2 documentation? 

Best - pfs

Christopher Chang

unread,
Dec 30, 2014, 7:38:01 PM12/30/14
to plink2...@googlegroups.com
A remark on this has been added to the --dosage documentation.
Reply all
Reply to author
Forward
0 new messages