Hi everyone
I'm trying to compare two VCFs, the first one come 1000 human genomes project, and the second one come from my pipeline.
The error I'm facing is the following:
Command:
for i in $(ls /media/biomehub/HD-4Tb/MIOSeq/CRAM/GATK-1000G-pipeline/GATK_results/*-snps-indels-predicted-effect-snpEff.vcf | sed -E 's/\/media.*GATK_results\/|-.*//g'); do
rm -rf /media/biomehub/HD-4Tb/MIOSeq/CRAM/GATK-1000G-pipeline/GATK_results/${i}
java -jar ~/Softwares/rtg-tools/build/rtg-tools.jar vcfeval \
--all-records \
--decompose \
-b /media/biomehub/HD-4Tb/MIOSeq/CRAM/GATK-1000G-pipeline/GATK_results/${i}-snps-indels-predicted-effect-snpEff.vcf.gz \
-c /media/biomehub/HD-4Tb/MIOSeq/BAM/Variants-original/DMD-1kG-${i}.vcf.gz \
-t /media/biomehub/HD-4Tb/MIOSeq/hsa-genome/GRCh38_full_analysis_set_plus_decoy_hla.sdf \
-f QUAL \
-o /media/biomehub/HD-4Tb/MIOSeq/CRAM/GATK-1000G-pipeline/GATK_results/${i} \
--bed-regions=/media/biomehub/HD-4Tb/MIOSeq/hsa-genome/DMD-SMN-seq/DMD-GRCh38.bed \
--output-mode=combine
done
Error:
Error: SAM input has an irrecoverable problem. Invalid GZIP header
Error: An IO problem occurred: "/media/biomehub/HD-4Tb/MIOSeq/CRAM/GATK-1000G-pipeline/GATK_results/HG02586-snps-indels-predicted-effect-snpEff.vcf.gz has invalid uncompressedLength: -1135658362"
Error: An IO problem occurred: "/media/biomehub/HD-4Tb/MIOSeq/CRAM/GATK-1000G-pipeline/GATK_results/HG02594-snps-indels-predicted-effect-snpEff.vcf.gz has invalid uncompressedLength: -396169846"
Error: SAM input has an irrecoverable problem. Invalid GZIP header
Error: An IO problem occurred: "/media/biomehub/HD-4Tb/MIOSeq/CRAM/GATK-1000G-pipeline/GATK_results/HG02620-snps-indels-predicted-effect-snpEff.vcf.gz has invalid uncompressedLength: -200145969"
One observation, when I use either one to compare against a dbSNP subset (VCF format), there is no error.
All files are in vcf.gz and indexed with tabix.
Cheers,
Luis Arge