SweeD error - number of nucleotides

243 views
Skip to first unread message

Brian Mack

unread,
May 19, 2020, 4:48:54 PM5/19/20
to OmegaPlus
Hi, I can successfully run SweeD on a vcf file, but when I subset the vcf for a set of the samples, I get the following error:

~/bin/sweed/SweeD -name test -input test.vcf -grid 4718 -folded SweeD version 3.3.1 released by Nikolaos Alachiotis and Pavlos Pavlidis in January 2015.
 Command:
       /home/brian/bin/sweed/SweeD -name test -input test.vcf -grid 4718 -folded
 Input file format (0:ms, 1:fasta, 2:macs, 3:vcf, 4:sf): 3
 Total number of samples in the VCF: 4
 Samples excluded from the analysis: 0

 Alignment 1

 ERROR: There are more than 6 nucleotides in line 76. Expected 6 (according to the first SNP in line 72).

SweeD: SweeD_Input.c:3742: readLine_VCF: Assertion `j<*SNP_SZ' failed.
        Chromosome:             chrom_1Aborted (core dumped)

Here are the lines that it is referring to:
     71 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  618     581     580     573
     72 chrom_1 528     .       A       G       1105.31 .       .       GT:GQ:DP:AD:RO:QR:AO:QA:GL      0/0:0:3:3,0:3:108:0:0:0,-0.90309,-10.0752       1/1:18:2:0,2:0:0:2:72:-6.83672,-0.60206,0     72        .:.:.:.:.:.:.:.:.       .:.:.:.:.:.:.:.:.
     73 chrom_1 534     .       G       A       1206.52 .       .       GT:GQ:DP:AD:RO:QR:AO:QA:GL      0/0:0:3:3,0:3:108:0:0:0,-0.90309,-10.0752       1/1:22:2:0,2:0:0:2:72:-6.83672,-0.60206,0     73        .:.:.:.:.:.:.:.:.       .:.:.:.:.:.:.:.:.
     74 chrom_1 2453    .       CA      TG      419.43  .       .       GT:GQ:DP:AD:RO:QR:AO:QA:GL      0/0:99:58:58,0:58:2038:0:0:0,-17.4597,-183.691  .:.:.:.:.:.:.:.:.       0/0:99:59:59,0:59     74 :2041:0:0:0,-17.7608,-183.958  1/1:0:20:0,18:0:0:18:646:-58.1206,-5.41854,0
     75 chrom_1 2497    .       AAAGAGAC        AGAAGAGAC       1305.07 .       .       GT:GQ:DP:AD:RO:QR:AO:QA:GL      0/0:99:49:48,0:48:1647:0:0:0,-14.4494,-148.371  .:.:.:.:.:.:.:.:.       0     75 /0:99:55:55,0:55:1890:0:0:0,-16.5566,-170.378  1/1:99:47:0,46:0:0:46:1604:-144.306,-13.8474,0
     76 chrom_1 2557    .       T       C       1835.23 .       .       GT:GQ:DP:AD:RO:QR:AO:QA:GL      0/0:99:73:73,0:73:2510:0:0:0,-21.9752,-226.149  .:.:.:.:.:.:.:.:.       0/0:99:62:62,0:62     76 :2130:0:0:0,-18.6639,-191.96   1/1:99:61:0,61:0:0:61:2154:-194.127,-18.3628,0

I don't understand what the error is referring to.
Thanks,
Brian

Nikolaos Alachiotis

unread,
May 20, 2020, 9:32:57 AM5/20/20
to OmegaPlus
Can you please send the vcf file (just the first lines in full) that generates this?
Also SweeD is currently at version 4.0.0.
Nikos

--
You received this message because you are subscribed to the Google Groups "OmegaPlus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to omegaplus+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/omegaplus/64f867de-a24d-472c-9bbf-b3936a27ece4%40googlegroups.com.


--
Nikolaos Alachiotis

Brian Mack

unread,
May 20, 2020, 10:47:11 AM5/20/20
to OmegaPlus
The first 100 lines of the file are attached. Thanks for alerting me about the current version. Where would I download version 4 from?
test2.vcf

Nikolaos Alachiotis

unread,
May 21, 2020, 11:26:08 AM5/21/20
to OmegaPlus
Thanks. We will look into that as soon as possible.
You can get the latest version of SweeD here: https://github.com/alachins/sweed
You can also try RAiSD and OmegaPlus to see if they parse your data correctly. If you do let us know.

On Wed, May 20, 2020 at 5:47 PM Brian Mack <kcam...@gmail.com> wrote:
The first 100 lines of the file are attached. Thanks for alerting me about the current version. Where would I download version 4 from?

--
You received this message because you are subscribed to the Google Groups "OmegaPlus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to omegaplus+...@googlegroups.com.


--
Nikolaos Alachiotis

Brian Mack

unread,
May 26, 2020, 11:51:42 AM5/26/20
to OmegaPlus
Nikolaos,

I tried RAiSD, and it works fine, but OmegaPlus gives me the exact same error:
/home/brian/bin/omegaplus/OmegaPlus -name test -input ./test.vcf -ld RSQUARE -grid 100 -length 199998 -minwin 1000 -maxwin 10000 -seed 12345
 Input file format (0:ms, 1:fasta, 2:macs, 3:vcf, 4:sf, 5:mbs): 3
 Gap (-) imputation:                                            OFF
 Ambiguous character (N) imputation:                            OFF
 Omega search strategy:                                         Exhaustive
 Alignment deduction to binary:                                 OFF

 Total number of samples in the VCF:                            4
 Samples excluded from the analysis:                            0
 Alignment 1
                Chromosome:             chrom_1

 ERROR: There are more than 6 nucleotides in line 76. Expected 6 (according to the first SNP in line 72).
OmegaPlus: OmegaPlus_input.c:3067: readLine_VCF: Assertion `j<*SNP_SZ' failed.

Brian

Nikolaos Alachiotis

unread,
May 30, 2020, 7:12:06 AM5/30/20
to OmegaPlus
Hi Brian,
We have looked into the SweeD parser again, and unfortunately it still remains unclear whats causing the failed assertion.
Since RAiSD can parse the file correctly, can you please generate the site report?
It might be that the file is parsed correctly but most or all sites are discarded.
Best regards,
Nikos

--
You received this message because you are subscribed to the Google Groups "OmegaPlus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to omegaplus+...@googlegroups.com.


--
Nikolaos Alachiotis

Brian Mack

unread,
Jun 1, 2020, 3:16:28 PM6/1/20
to OmegaPlus
Nikos,

Sorry, I mispoke. RAiSD fails on that particular file, but it does work on other files that SweeD fails on.

Here is the site report for a file that RAiSD works on:

 RAiSD, Raised Accuracy in Sweep Detection
 This is version 2.8 (released in April 2020)
 Copyright (C) 2017, and GNU GPL'd, by Nikolaos Alachiotis and Pavlos Pavlidis
 Contact n.alachiotis/pavlidisp at gmail.com

 Command: /home/brian/bin/RAiSD/raisd-master/RAiSD -n test2 -I test2.vcf -f -a 9382 -D

 Index: Name | Sites = SNPs + Discarded | Discarded = HeaderCheckFailed + MAFCheckFailed + WithMissing + Monomorphic

 0: chrom_1 | 38562 = 25474 + 13088 | 13088 = 9793 + 0 + 403 + 2892
 1: chrom_2 | 38440 = 25681 + 12759 | 12759 = 9926 + 0 + 509 + 2324
 2: chrom_3 | 32182 = 21424 + 10758 | 10758 = 8238 + 0 + 594 + 1926
 3: chrom_4 | 31901 = 19236 + 12665 | 12665 = 8126 + 0 + 574 + 3965
 4: chrom_5 | 33148 = 21739 + 11409 | 11409 = 8697 + 0 + 595 + 2117
 5: chrom_6 | 23872 = 14290 + 9582 | 9582 = 6079 + 0 + 361 + 3142
 6: chrom_7 | 19918 = 13068 + 6850 | 6850 = 5483 + 0 + 421 + 946
 7: chrom_8 | 27058 = 17749 + 9309 | 9309 = 7177 + 0 + 610 + 1522

Here is the error from SweeD on that same file:
 /home/brian/bin/sweed_4.0/sweed/SweeD -name test2 -input test2.vcf -grid 4718
 Input file format (0:ms, 1:fasta, 2:macs, 3:vcf, 4:sf, 5:mbs): 3
 Total number of samples in the VCF:    4
 Samples excluded from the analysis:    0
 Alignment 1
 ERROR: There are more than 6 nucleotides in line 76. Expected 6 (according to the first SNP in line 73).
SweeD: SweeD_Input.c:3838: readLine_VCF: Assertion `j<*SNP_SZ' failed.
                Chromosome:             chrom_1Aborted (core dumped)

I attached the first 100 lines of this new file.

Thanks,
Brian
test2_n100.vcf

Nikolaos Alachiotis

unread,
Jun 2, 2020, 3:19:39 PM6/2/20
to OmegaPlus
Hi Brian,
When you say "fails", do you mean that it does not parse the file at all, or does it not produce any outcome?
Do you see any failed assertion with RAiSD as well? This might help me locate whats causing this problem.
Best regards,
Nikos


--
You received this message because you are subscribed to the Google Groups "OmegaPlus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to omegaplus+...@googlegroups.com.


--
Nikolaos Alachiotis

Brian Mack

unread,
Jun 3, 2020, 2:32:17 PM6/3/20
to OmegaPlus
Nikos,
Yes, there is a failed assertion:

RAiSD, Raised Accuracy in Sweep Detection
 This is version 2.8 (released in April 2020)
 Copyright (C) 2017, and GNU GPL'd, by Nikolaos Alachiotis and Pavlos Pavlidis
 Contact n.alachiotis/pavlidisp at gmail.com
 Command: /home/brian/bin/RAiSD/raisd-master/RAiSD -n test -I test.vcf -f -a 9382 -D
 Samples: 4
 Format:  vcf
 A pattern structure of 65536 patterns (max. capacity) and approx. 2 MB memory footprint has been created.
RAiSD: sources/RAiSD_Dataset.c:2162: RSDDataset_getSetRegionLength_vcf: Assertion `rcnt==1' failed.
Aborted (core dumped)

Brian

Nikolaos Alachiotis

unread,
Jun 3, 2020, 3:08:36 PM6/3/20
to OmegaPlus
This is helpful. Is the file attached in your previous message the one that causes this failing assertion?
If not, can you please send that one or any small part of it that still causes this failing assertion?
Nikos

--
You received this message because you are subscribed to the Google Groups "OmegaPlus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to omegaplus+...@googlegroups.com.


--
Nikolaos Alachiotis

Brian Mack

unread,
Jun 4, 2020, 11:59:44 AM6/4/20
to OmegaPlus
So, I checked the vcf file with vcf-validator and found that there was a line that did not have the same number of genotype fields as the rest of the file. Somehow the step before with vcftools must not have terminated properly. Thanks for your time.

Brian

wu meiming

unread,
Apr 17, 2022, 11:35:24 PM4/17/22
to OmegaPlus
Hello, when I was running SWEED, I also had the same error as you. I did not understand it after reading the post.  How did you solve it at that time? This problem has been bothering me for a few days and I would appreciate it if I could get a reply.


SweeD version 4.0.0 released by Nikolaos Alachiotis and Pavlos Pavlidis in July 2018.
 Code contributions by Antonis Kioukis and Aggelos Koropoulis.

 Command:

     /public1/home/casdao/kdylvfenghua/kdy_wumm/software/sweed-master/bin/SweeD -name aegagrus-YCLR -input aegagrus-Y25.vcf -sampleList aegagrus-Y25.txt -grid 960 -minsnps 200 -maf 0.05 -missing 0.1



 Input file format (0:ms, 1:fasta, 2:macs, 3:vcf, 4:sf, 5:mbs): 3
 Total number of samples in the VCF:    25

 Samples excluded from the analysis:    0

 Alignment 1

 ERROR: There are 49 nucleotides in line 227. Expected 50 (according to the first SNP in line 56).

SweeD: SweeD_Input.c:3867: readLine_VCF: Assertion `j==*SNP_SZ' failed.
        Chromosome:        CM027080.1Aborted (core dumped)


Thanks.

kcam...@gmail.com

unread,
Apr 18, 2022, 10:39:08 AM4/18/22
to OmegaPlus
The error in my case was due to a malformed vcf file. You can try checking your vcf file with vcf-validator to see if there is something wrong with it.
Message has been deleted

tou wowo

unread,
Apr 19, 2022, 10:06:16 AM4/19/22
to OmegaPlus
Thank you very much for your last reply. I used the VCF - Validator and got the following results. I looked up the information on the Internet, but I still didn't understand the VCF file problem. I can only ask you again.  Do you know what the problem is with my VCF file?      Your reply will be greatly appreciated.

According to the VCF specification, the input file is not valid
Warning: A valid 'reference' entry is not listed in the meta section. This occurs 1 time(s), first time in line 59.
Error: Sample #240, field PL does not match the meta specification Number=G (expected 2 value(s)). This occurs 2 time(s), first time in line 60.
Error: Sample #249, field PL does not match the meta specification Number=G (expected 2 value(s)). This occurs 2 time(s), first time in line 61.
Error: Sample #409, field PL does not match the meta specification Number=G (expected 2 value(s)). This occurs 2 time(s), first time in line 97.
Error: Sample #13, field PL does not match the meta specification Number=G (expected 2 value(s)). This occurs 1 time(s), first time in line 103.
Error: Sample #204, field PL does not match the meta specification Number=G (expected 2 value(s)). This occurs 3 time(s), first time in line 108.
Error: Sample #58, field PL does not match the meta specification Number=G (expected 2 value(s)). This occurs 2 time(s), first time in line 112.
Error: Sample #138, field PL does not match the meta specification Number=G (expected 2 value(s)). This occurs 1 time(s), first time in line 143.
Error: Sample #41, field PL does not match the meta specification Number=G (expected 2 value(s)). This occurs 2 time(s), first time in line 148.
Error: Sample #52, field PL does not match the meta specification Number=G (expected 2 value(s)). This occurs 3 time(s), first time in line 153.
Error: Sample #291, field PL does not match the meta specification Number=G (expected 2 value(s)). This occurs 1 time(s), first time in line 156.
Reply all
Reply to author
Forward
0 new messages