error loading VCFv4.2

1,104 views
Skip to first unread message

Janete Chung

unread,
Jan 6, 2016, 9:25:42 AM1/6/16
to plinkseq-users
Hi plinkseq users!


I am trying to load my vcf file (VCFv4.2):  > pseq myproj1 load-vcf my_vcf.vcf

ERROR Message:

pseq error: could not recognize VCF version VCFv4.2
plinkseq warning: preparing Query no such table: metaphenotypes (repeated 4 times)
pinkseq warning: unable to open database file (repeated 4 times)

I am using  plinkseq version 0.10

Thanks

Janete

Kevin Rue

unread,
Jan 7, 2016, 8:26:08 AM1/7/16
to plinkseq-users
Hi,

I have a similar problem (only the error "pseq error: could not recognize VCF version VCFv4.2", not the other two lines).
My file is a multi-sample (merged) VCF. Is yours?

If I use one of the original individual VCFs, it works.

Kevin

Janete Chung

unread,
Jan 7, 2016, 8:33:40 AM1/7/16
to plinkse...@googlegroups.com
Hi Kevin

My is also a multi-sample merged VCF file.

but are your original individual VCF a v4.2?



--
You received this message because you are subscribed to the Google Groups "plinkseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plinkseq-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Janete Chung

Kevin RUE

unread,
Jan 7, 2016, 4:55:43 PM1/7/16
to plinkse...@googlegroups.com
Hi Janet,

Actually no, my original VCF files are 4.1.
I noticed that if I change the header of my merged VCF to 4.1, it works.
Conversely, if I change the header of my single sample VCF to 4.2, it fails.

The magic code for me:
sed 's/VCFv4\.2/VCFv4.21/' <old.vcf >new.vcf
I hope that helps (seems like an expensive fix, duplicating the file to change a single number)

Kevin


--
You received this message because you are subscribed to a topic in the Google Groups "plinkseq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plinkseq-users/A2TXA8SAt5Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plinkseq-user...@googlegroups.com.

Janete Chung

unread,
Jan 8, 2016, 7:52:53 AM1/8/16
to plinkse...@googlegroups.com
Hi Kevin,

the Magic code didn't work for me.
but I tried to change by editing the file and the pseq still recognize as VCFv4.2

I also tried to convert using vcftools, and I got a VCFv4.0
but I still have some erros Messages: pseq error: could not recognize VCF from VCFv4.2


Janete





Kevin RUE

unread,
Jan 8, 2016, 8:08:19 AM1/8/16
to plinkse...@googlegroups.com
Hi Janet,

I am afraid I am too new to the field to help further then... the simple fix was enough for me to load my variants. 

I hope someone else will be able to come along and help you better. 
Kevin

Kevin RUE

unread,
Apr 13, 2016, 11:10:12 AM4/13/16
to plinkseq-users
Dear PlinkSeq developers,

Could we have some expert feedback on that point?
Maybe I am missing something obvious here, but my current solution is time-consuming/inefficient to manually replace the header line and recompress the VCF file :/
I would be happy to receive some guidance or be notified of a bug fix that solve this compatibility issue with VCF versions.

Thanks.
Kevin


Le vendredi 8 janvier 2016 13:08:19 UTC, Kevin RUE a écrit :
Hi Janet,

I am afraid I am too new to the field to help further then... the simple fix was enough for me to load my variants. 

I hope someone else will be able to come along and help you better. 
Kevin
To unsubscribe from this group and stop receiving emails from it, send an email to plinkseq-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Janete Chung

--
You received this message because you are subscribed to a topic in the Google Groups "plinkseq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plinkseq-users/A2TXA8SAt5Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plinkseq-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "plinkseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plinkseq-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Janete Chung

--
You received this message because you are subscribed to a topic in the Google Groups "plinkseq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plinkseq-users/A2TXA8SAt5Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plinkseq-users+unsubscribe@googlegroups.com.

A. P. Levine

unread,
Apr 18, 2016, 6:22:27 AM4/18/16
to plinkseq-users
I am having the same problem as described in this thread:
"pseq error : could not recognize VCF version VCFv4.2"

Whilst I can try and work around it, it would be much more convenient if PlinkSeq recognised this format directly. Is there any chance of that happening in the near future?

Thank you,

Adam

Adam P. Levine
To unsubscribe from this group and stop receiving emails from it, send an email to plinkseq-user...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Janete Chung

--
You received this message because you are subscribed to a topic in the Google Groups "plinkseq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plinkseq-users/A2TXA8SAt5Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plinkseq-user...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "plinkseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plinkseq-user...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Janete Chung

--
You received this message because you are subscribed to a topic in the Google Groups "plinkseq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plinkseq-users/A2TXA8SAt5Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plinkseq-user...@googlegroups.com.

Kevin RUE

unread,
Apr 18, 2016, 8:19:52 AM4/18/16
to plinkseq-users, aple...@gmail.com
Hi Adam,

As a user, it bothers me as well (as it happens, I am currently changing the header of my problematic files from ##fileformat=VCFv4.2 to ##fileformat=VCFv4.1 .
That (painfully) works around the problem for me.

The following won't help you much, but I thought I would highlight the information for future visitors of this page:
On the page https://atgu.mgh.harvard.edu/plinkseq/download.shtml one can read:
"Note : these should be considered as experimental, early-release versions of this package. As such, aspects of the code, database format, range of available options, are likely to undergo change. We are also not able to offer any support at this stage, although reporting clearly documented and replicable errors or problems will be appreciated. At this stage, we suggest that only expert users (i.e. who wish to play the role of early adopter and beta-version tester) should use PLINK/Seq."

I highlighted in red the bits that may explain why we haven't received any feedback from the developers in the last months (although, I would appreciate some sort of release schedule).
As a developer (of other tools) myself, I can understand that they still prioritise development of the tool over feedback to the early users. But again, a minimum of publication relations wouldn't hurt.

Best wishes
Kevin

A. P. Levine

unread,
Apr 18, 2016, 8:24:16 AM4/18/16
to Kevin RUE, plinkseq-users
Dear Kevin,

Thank you for that. I will try your fix and will not await a reply from the PLINK/Seq developers then.

Kind regards,

Adam

Adam P. Levine
--
Adam

Adam P. Levine

Kevin RUE

unread,
Apr 26, 2016, 12:47:44 PM4/26/16
to plinkseq-users, kevin...@gmail.com, aple...@gmail.com
Dear all,

While the website is currently down (no idea why), I just found a bitbucket repository, which doesn't seem updated since June 2015 (no idea whether the project has migrated elsewhere, or if development has paused).

In any case, I did a bit of digging, and found the file "vcf.cpp", for which the code can be found at :
https://bitbucket.org/statgen/plinkseq/src/1d4c27e5c24dd99b79aab894e6a00a5ceabdc6b6/sources/plinkseq/sources/lib/vcf.cpp?at=master&fileviewer=file-view-default

In which the following chunk of code can be found:
if ( tok[0] == "fileformat" || tok[0] == "format" )
    {
      // match on VCFv3.3, vcf3.3, VCF3.3, etc

      // and also BCFv4.0
      if ( tok[1].size() >= 3 && tok[1].substr( tok[1].size() - 3 ) == "3.3" ) 
	version = VCF_3_3;
      else if ( tok[1].size() >= 3 && tok[1].substr( tok[1].size() - 3  ) == "4.0" )
	version = VCF_4_0;
      else if ( tok[1].size() >= 3 && tok[1].substr( tok[1].size() - 3  ) == "4.1" ) 
	version = VCF_4_1;
      else 
	Helper::halt( "could not recognize VCF version " + tok[1] );
    }
  else if ( version == VCF_UNKNOWN ) 
    {
      Helper::halt( "Version number not specified for VCF" );
    }

No need to say that given the problem detailed in this thread, my plan is to add an innocent bit of code saying:
else if ( tok[1].size() >= 3 && tok[1].substr( tok[1].size() - 3  ) == "4.2" ) 
	version = VCF_4_2;

I though I would post that before trying it myself, to accelerate things in case anyone else would like to try this solution.
Best,
Kevin

Kevin RUE

unread,
Apr 26, 2016, 1:39:04 PM4/26/16
to plinkseq-users, kevin...@gmail.com, aple...@gmail.com
In which case, you will also want to change another later in the same script:

else if ( version == VCF_4_0 || version == VCF_4_1 )
becomes
else if ( version == VCF_4_0 || version == VCF_4_1 || version == VCF_4_2 )

Best wishes,
Kevin

Kevin RUE

unread,
Apr 27, 2016, 5:50:33 AM4/27/16
to plinkseq-users, kevin...@gmail.com, aple...@gmail.com
Alright, summary of my two quick-and-dirty fixes:

Fix Number 1
Change the VCF header (first line)
    • zcat vcf_in.vcf.gz | sed 's/VCFv4\.2/VCFv4.1/' | gzip -c > vcf_out.vcf.gz


Fix Number 2
Tweak the PlinkSEQ source code
add these lines (approximately line ~200)

  else if ( tok[1].size() >= 3 && tok[1].substr( tok[1].size() - 3 ) == "4.2" ) 
version = VCF_4_2;


complete line ~275 as follows:

  else if ( version == VCF_4_0 || version == VCF_4_1 || version == VCF_4_2 )

    • Edit plinkseq/sources/plinkseq/sources/include/plinkseq/vcf.h
Toward to he top of the script, add in the enum VCF_version variable (following the logic of the code formatting):
VCF_4_2
    • compile the sources and use this pseq binary :)

Pros and cons for fixes 1 and 2:
  1. Pro: simplest fix; Cons: doubles the disk space needed (must generate a copy of each VCF file for PlinkSEQ)
  2. Pro: can process VCF4.2 files as they are (saves disk space and time); Cons: fiddling the code ; need to checkout pull novel commits from Bitbucket, edit & recompile the code for each new VCF version.
Important:
In both cases, these fixes make Plink/SEQ process VCF4.2 as if it was VCF4.1. There may be issues if the novel VCF4.2 specs are not handled or mishandled by the code designed for VCF4.1.

Best wishes to all
Looking forward to an official release and "proper" fixes

Kevin

tojo...@gmail.com

unread,
Jul 11, 2016, 9:59:28 AM7/11/16
to plinkseq-users
I forked plinkseq.git and add changes to accept VCFv4.2. Here you  


I should have looked to the changelog of VCF4.2 from previous versions but I havent found any on a quick search.
/ TJ 

Martin MOKREJŠ

unread,
Jul 11, 2016, 4:43:05 PM7/11/16
to plinkse...@googlegroups.com
Hi,
but did anybody really take care to check what is different in the 4.2 and higher specs? Me not but blindly enabling the code is not the right way to do it. Also when Kevin RUE reported this on 04/27/16 11:50 via this email list it seemed he never inspected the differences in specs. It has a reason why PLINKSEQ states it supports only <=4.1 VCFs unless you are confident the functionality is not affected.

http://samtools.github.io/hts-specs/VCFv4.2.pdf
http://samtools.github.io/hts-specs/VCFv4.3.pdf

Hope this helps,
Martin

Kevin RUE

unread,
Jul 11, 2016, 5:07:18 PM7/11/16
to plinkseq-users
Hi Martin,

Thanks for your message.
Indeed I should have been clearer: my workaround is in no way a "fix" to the problem. I just wanted to get my variants into PLINKSEQ, to get familiar with it, compare with other, slower, methods I was using, and see if I got comparable results.
I never intended to post a "solution" to the problem. Initially I was hoping that my message would trigger feedback from the developers, or better, a new release of PLINKSEQ. Posting my workaround was merely a way to stimulate the thread say "The problem is still here, I did that, it's probably not ideal, please let me know if you have any other idea".

All the best,
Kevin

Claire Malley

unread,
Sep 7, 2016, 10:22:58 AM9/7/16
to plinkseq-users
Hi Kevin et al,

This experimental build of pseq still does not recognize VCFv4.2. I copied and installed from https://bitbucket.org/jcode99/plinkseq.git. This is what I ran:

plinkseq/build/execs/pseq chr1.vcf.gz write-vcf --format BGZF --file chr1.vcf.bgzf.gz

The output of plinkseq/build/execs/pseq . version:

PSEQ: 0.10

PSEQ DATE: 14-Jul-14

LOCDB: 3

PLINKSEQ: 0.10(11-Jul-2014)

PROJN_SPEC_FILE: 2

REFDB: 3

SQLITE3_HEADER: 3.8.0.2

SQLITE3_LIBRARY: 3.8.0.2

VARDB: 5

VCF: VCFv4.1

ZLIB: 1.2.8


I hope the official plinkseq developers will take action to accommodate current VCF versions.

Thanks,
Claire

Claire Malley

unread,
Sep 7, 2016, 10:56:32 AM9/7/16
to plinkseq-users
Hi again, an update for others who might run into the same problem: I think you still need to edit the VCF header as Kevin wrote earlier, to pretend it is VCFv4.1. This plus the experimental plinkseq binary seems to work.

Good luck everyone,
Claire
Reply all
Reply to author
Forward
0 new messages