vcfeval error: Error: Invalid VCF header. VCF column header line missing required columns on line:#o ����N �I��L��<��]�87 y ȃϯ��F\ ��*�6y ĝ�%h�--�r� �H���-7)�Ͼ������&l�Ɗ~�(o���Z�ڒ��I`�ƛ��G�� �J����~B� ��C� �����pb \E �@ e� 0+

412 views
Skip to first unread message

KIRTI

unread,
Mar 3, 2018, 2:00:41 PM3/3/18
to RTG Users
Error: Invalid VCF header. VCF column header line missing required columns on line:#o ����N �I��L��<��]�87 y ȃϯ��F\ ��*�6y ĝ�%h�--�r� �H���-7)�Ͼ������&l�Ɗ~�(o���Z�ڒ��I`�ƛ��G�� �J����~B� ��C� �����pb \E    �@ e� 0+

Input command:   rtg-tools-3.8.4/rtg vcfeval -t NA12878_reference/hg19_new.sdf -b gold_standard_common_all_20170710.vcf.bgz -c SRR1611184bowtie_SNP_GATK.vcf.bgz -o output4

Sean Irvine

unread,
Mar 3, 2018, 2:04:48 PM3/3/18
to KIRTI, RTG Users
Hi,

The rtg tools currently determine the file type from the file name extension, in your case it doesn't know what ".bgz" is, so attempts to interpret it as a text file.  Assuming your ".bgz" is block-compressed gzip, then if you rename the file extension to the more standard ".gz" everything will work as expected.

Sean.

On 4 March 2018 at 00:06, KIRTI <kirti...@gmail.com> wrote:
Error: Invalid VCF header. VCF column header line missing required columns on line:#o ����N �I��L��<��]�87 y ȃϯ��F\ ��*�6y ĝ�%h�--�r� �H���-7)�Ͼ������&l�Ɗ~�(o���Z�ڒ��I`�ƛ��G�� �J����~B� ��C� �����pb \E    �@ e� 0+

Input command:   rtg-tools-3.8.4/rtg vcfeval -t NA12878_reference/hg19_new.sdf -b gold_standard_common_all_20170710.vcf.bgz -c SRR1611184bowtie_SNP_GATK.vcf.bgz -o output4

--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+unsubscribe@realtimegenomics.com.
Visit this group at https://groups.google.com/a/realtimegenomics.com/group/rtg-users/.

KIRTI BHADHADHARA

unread,
Mar 4, 2018, 11:23:21 PM3/4/18
to Sean Irvine, RTG Users
Error: /biobank/seq/kirti/new/NA12878/nature_paper/gold_standard/common_gold_standard.vcf.gz is not in bgzip format

Sean Irvine

unread,
Mar 4, 2018, 11:50:46 PM3/4/18
to KIRTI BHADHADHARA, RTG Users
A lot of  rtg commands require block compression so that records can be read for particular regions or sequences of the genome.

You can convert a non-block compressed file to block compressed using something like:

zcat -f /biobank/seq/kirti/new/NA12878/nature_paper/gold_standard/common_gold_standard.vcf.gz | rtg bgzip - >temp.vcf.gz
mv temp.vcf.gz /biobank/seq/kirti/new/
NA12878/nature_paper/gold_standard/common_gold_standard.vcf.gz

Usually you will want to follow the block compression with a tabix index creation:

rtg index /biobank/seq/kirti/new/NA12878/nature_paper/gold_standard/common_gold_standard.vcf.gz

Sean.

Kax

unread,
Nov 26, 2019, 12:51:17 PM11/26/19
to RTG Users

Greetings, 
I'm getting the same error and wondering if you have any insight. I bgzipped and indexed each of my vcf files, but still seem to have some kind of header issue. 
./rtg bgzip /Users/xx/Desktop/Vcf_12878_compare_vcf_calls/Gold_standard_NA12878_phased_variants
./rtg index -f vcf /Users/xx/Desktop/Vcf_12878_compare_vcf_calls/Gold_standard_NA12878_phased_variants.gz

./rtg vcfeval \
-b /Users/xx/Desktop/Vcf_12878_compare_vcf_calls/Gold_standard_NA12878_phased_variants.vcf.gz \
-c /Users/xx/Desktop/Vcf_12878_compare_vcf_calls/NA12878_HG001_500ng_highconf.vcf.gz \
-o /Users/xx/Desktop/Vcf_12878_compare_vcf_calls/vcfeval1/ \
-t /Users/xx/Downloads/hg19.sdf \
-m split

Error: Invalid VCF header. VCF column header line missing required columns on line:#rU٬*%jW;F=tzHƪ`;vT(HN1wsqâMs4kV*FkxݵTefK[kRдhkفwgip8N[8 {{IJ/z

Len Trigg

unread,
Nov 26, 2019, 12:56:20 PM11/26/19
to Kax, RTG Users
Hi Kax,

Comments below...

On Wed, 27 Nov 2019 at 06:51, Kax <katheri...@aruplab.com> wrote:

I'm getting the same error and wondering if you have any insight. I bgzipped and indexed each of my vcf files, but still seem to have some kind of header issue. 
./rtg bgzip /Users/xx/Desktop/Vcf_12878_compare_vcf_calls/Gold_standard_NA12878_phased_variants
./rtg index -f vcf /Users/xx/Desktop/Vcf_12878_compare_vcf_calls/Gold_standard_NA12878_phased_variants.gz

The input filename used in your bgzip command above should have .vcf extension (and it will then produce a .vcf.gz output file).
When the filename has the correct extension, then you won't need to give the "-f vcf" option to index, it will automatically identify the file type.
 
./rtg vcfeval \
-b /Users/xx/Desktop/Vcf_12878_compare_vcf_calls/Gold_standard_NA12878_phased_variants.vcf.gz \
-c /Users/xx/Desktop/Vcf_12878_compare_vcf_calls/NA12878_HG001_500ng_highconf.vcf.gz \
-o /Users/xx/Desktop/Vcf_12878_compare_vcf_calls/vcfeval1/ \
-t /Users/xx/Downloads/hg19.sdf \
-m split

The baseline vcf file you gave to vcfeval above doesn't match the one you gave to the index command above.

Double check you have the right file extension and are bgzip and indexing the correct files, and it should all be good.

Cheers,
Len.

 

Error: Invalid VCF header. VCF column header line missing required columns on line:#rU٬*%jW;F=tzHƪ`;vT(HN1wsqâMs4kV*FkxݵTefK[kRдhkفwgip8N[8 {{IJ/z



On Saturday, March 3, 2018 at 12:00:41 PM UTC-7, KIRTI wrote:
Error: Invalid VCF header. VCF column header line missing required columns on line:#o ����N �I��L��<��]�87 y ȃϯ��F\ ��*�6y ĝ�%h�--�r� �H���-7)�Ͼ������&l�Ɗ~�(o���Z�ڒ��I`�ƛ��G�� �J����~B� ��C� �����pb \E    �@ e� 0+

Input command:   rtg-tools-3.8.4/rtg vcfeval -t NA12878_reference/hg19_new.sdf -b gold_standard_common_all_20170710.vcf.bgz -c SRR1611184bowtie_SNP_GATK.vcf.bgz -o output4

--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/2160cadb-f890-4d91-a366-b0e3af39db63%40realtimegenomics.com.
Reply all
Reply to author
Forward
0 new messages