Is it possible to output compressed VCF files with --recode vcf?

1,813 views
Skip to first unread message

freeseek

unread,
Feb 5, 2015, 5:32:58 PM2/5/15
to plink2...@googlegroups.com
"plink --bfile binary_fileset --recode vcf" will convert the dataset to VCF format. This could be in principle handy to perform analyses with softwares like Beagle. Is it possible to output the file compressed?

Christopher Chang

unread,
Feb 5, 2015, 5:38:14 PM2/5/15
to plink2...@googlegroups.com
This is not currently supported.  It would not be difficult to add a 'gz' modifier to some --recode operations, and I probably will do this for VCF soon, but I worry about where to draw the line.

freeseek

unread,
Feb 5, 2015, 5:46:48 PM2/5/15
to plink2...@googlegroups.com
You have a good point. I feel bad asking. It would be nice if for outputs that generate only one file (like VCF, but this applies more in general to other kind of analyses) plink had a more general option to write the output file to /dev/stdout and the log file to /dev/stderr. This way there would be no need to encode all these extra features in plink. You could have it work with something like this:
plink --bfile binary_fileset --recode vcf --stdout 2>output.log | bgzip > output.gz

Christopher Chang

unread,
Feb 6, 2015, 4:51:59 PM2/6/15
to plink2...@googlegroups.com
Yes, that's the clean Unix way of doing things.  For better or worse, though, PLINK is designed to generate more than one relevant output file at a time, so there is limited value in retrofitting it to be able to dump a single file to stdout.

So I will just target the common gzip cases.  VCF output is perhaps the most common case of all, so the February 6 development build has a basic "--recode vcf gz" implementation.  I will try to speed it up (on multicore systems, at least) over the weekend.


On Thursday, February 5, 2015 at 2:46:48 PM UTC-8, freeseek wrote:

freeseek

unread,
Feb 6, 2015, 5:37:58 PM2/6/15
to plink2...@googlegroups.com
Though I forgot to stress the fact that VCF files should be bgzipped rather than gzipped, otherwise most programs will not handle them.

Christopher Chang

unread,
Feb 6, 2015, 5:41:14 PM2/6/15
to plink2...@googlegroups.com
Hmm, that's true; this will probably be changed to "--recode vcf bgz" with the corresponding functionality.

Christopher Chang

unread,
Feb 9, 2015, 3:49:00 AM2/9/15
to plink2...@googlegroups.com
"--recode vcf bgz" should be working in the February 9 development build.

Florian Privé

unread,
Feb 11, 2020, 5:31:37 AM2/11/20
to plink2-users

Thanks for this.
Reply all
Reply to author
Forward
0 new messages