PathConsensus Assertion `s.length() > overlap' failed

38 views
Skip to first unread message

Matthew MacManes

unread,
Jul 10, 2015, 2:44:33 PM7/10/15
to abyss...@googlegroups.com

With command


  for k in 91 71 111; do
      abyss
-pe -C k$k np=40 k=$k name=M$k l=25 n=5 \

      lib='pe1 pe2' mp1_l=30 mp1_l=30 \

      mp='mp1 mp2' long='long1 long2' v=-v \

      pe1='v300.P2_1P.fq.gz v300.P2_2P.fq.gz' \

      pe2='v500.P2_1P.fq.gz v500.P2_2P.fq.gz' \

      mp1='v5kb_1.fastq v5kb_2.fastq' \

      mp2='v10kb_1.fastq v10kb_2.fastq' \

      long1='corrected_LR.fa' \

      long2='v.Trinity.fasta';

  done

 

Getting an assertion error: 


PathConsensus -v --dot -k91  -p0.9 -s Mya91-7.fa -g Mya91-7.dot -o Mya91-7.path Mya91-6.fa Mya91-6.dot Mya91-6.path
Reading `M91-6.dot'...
Reading `M91-6.fa'...
Reading `M91-6.path'...
Read 68347 paths
PathConsensus: warning: Two paths have identical sequence, which may be caused by a transitive edge in the overlap graph.
        15230623+ 15047466+ 221641-
        15230623+ 14628135+ 15047466+ 221641-
PathConsensus: warning: Two paths have identical sequence, which may be caused by a transitive edge in the overlap graph.
        1075777- 15073905+ 15438670+ 1151053+ 12699952+
        1075777- 15073905+ 15230530+ 15438670+ 1151053+ 12699952+
PathConsensus: PathConsensus.cpp:378: void mergeContigs(const Graph&, unsigned int, Sequence&, const Sequence&, const ContigNode&, const Path&): Assertion `s.length() > overlap' failed.
make: *** [M91-7.dot] Aborted (core dumped)
make: *** [M91-7.dot] Deleting file `M91-7.fa'



Anybody have any hints as to what is going on here?


Thanks, Matt

Matthew MacManes

unread,
Jul 10, 2015, 2:45:01 PM7/10/15
to abyss...@googlegroups.com
Sorry, this is with ABySS 1.9.0

Ben Vandervalk

unread,
Jul 10, 2015, 3:40:06 PM7/10/15
to Matthew MacManes, abyss...@googlegroups.com
Hi Matthew,

This seems like a tough one.

Based on the assertion message it sounds like "buried" contigs (sequence of one contig contained inside another) are somehow being created and are causing problems downstream.

Also, by comparing the pairs of paths listed in the warnings, I would guess that the buried contigs are 14628135 and 15230530.

Unfortunately, I'm not sure how the problem arose or what the proper fix is.  As a temporary workaround, you might try removing those two contigs from the *-6.* files and seeing what happens.

It might also be helpful to know at what stage of the pipeline those contigs first appear (e.g. -1, -2, -3,...).

Do your runs at other k values fail for the same reason?

In order to debug it, I would need to access to a data set that reproduces the problem.

- Ben

--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthew MacManes

unread,
Jul 10, 2015, 7:14:12 PM7/10/15
to abyss...@googlegroups.com, macm...@gmail.com
Ben, 

Here is Mya91-6.fa Mya91-6.dot Mya91-6.path


Not sure about other kmers - this was the 1st one I did. k71 is running now - I'll report back. 

Both contigs 1st appear in *-6.fa

>14628135 2892 55040 4243398+,8331463+,10334040-
ACCTCATGACAGTGAACCTTCATGGCAAGTTTTTTGAAAATCCTATAATGAATAGTCAAGACACAGCCCAGACAAGGATCATTTGATCTTTAAGTGTGACATTGACCATTTAAGTACAGACCTTGGTCTTGTGTAAAATAGAGAAGGTCTATTTTAGCTATATAAAAAAATTAATCCATCACAGGAGTGATAAGGGCCCCGAATGATAGCATATTTGAAATGGTGTGTAAATTAAGTATATAACCCAGAATATATCTGTTAATCATCACATAAGAACATATTATTACTGTGGGCAAGTCTTGAAAGTTTGAAGTCCGTACCGTTTCCCGTTTTAAAGAAAATCAGTACATACACAGCACGGTTACTTCCCTTTGCCGTACATTAGCAATAATCGATTATGGAGTAATGAAATCGATTATCGTCGACCTCGGACACAAATAATCGACTTTCGACAATACTCAATCATCGTTTTATCCCTATCAGGCAAGCTAAAAAACATCAAAAATTTAGATGGACAGACCGACAGAGGGGCAGACAGGGTGTTTCCTATAGGCCACCCGCATATTTTTGCGGAGTTCTAATGAAAATGCTTTAGTGATTATTTTTTAATCATGGTGATCCTCATTATATTATTCTATTCTATATTTTACGACACATTTCAAAGTTCATTCAGGACCAAATACAAACCTATCACTGAAAGTAGACCCATTTTGTGGAACTGCCGTGTACTGCTGACCAGAGGGCAGCATGGTGGAGATGACCGCAGAGTTCACGTAAAACCCGTGATTTGCATCCGGTGGGTACTCTGTCCATTTCAGGAATGCTCGATCAAACTCAAAGCTGATCTTGGTGACAGAAAGGGGGCGGAGCCGGAAAACAACTTCAAGCTGATGAGGAAGCTTTCTGTCCTTCCCGGGATTAAACTGTGTCTTGTCTAAAAATATACAATAAAGGTGAATGCCTGCAAACATCAGTTTTACTTAATCAAACAATAAGCATAACTGGACAACTACTAATGCCAAATTTATAGGTCCTGCTAAACATGTGCATATTGTTTCTAGCAACATGTGCTCAAAGTTTCATCAGAATATCCCCCCTTCATTTCGATGTATGGCCAATGTAAAGTTATTGCACAATGAAACTGATGAGGACAAGGGCATCAAGACTATGACAATACTTTTACTTTTTTTCTTCAAAAAACAGACTGCTAAAAATCACAAAAGTGTTCATGTTCTGGATAAAACCAGTTATGATGTTGGTCCATTGGGAGACGCCGTCAGAAAGTGCCCCATGTGGAGCTTGAACCAAAGACCTCTTATGTCAGATGCTGACACCAGTCCCACAATAACCTGCTGAGGGAATAAATCACATACAGGTACAAGGGTCGAATAATACAATCAATTTGTGTAAAAGTATGTAGTACAGATTTTCCACTTATTTGTTACTAGAATAATACTCAAACACATCTTACAAGGTTTGATTGTAACTCCATATGCCTCAATATGTACAGAGCTGAAGTAGACCCGAAGGTACCATGGGATGGTCTCCATGTAGATGACTGTGAGGTTTGTGTCGAGATTGTTATACAAGATACACGAGATTCCACCCTTCTCCAGGCCGTAACCTGGAATAGATGAACATCATGAACATTTTGGGCCCATTTTTGGTATATATCTATAGCAAAACAAGATTTTTCATGGAATGTGCGATATTGTTGATAAACCGCTTTATATTGCATTACCAAACAGATGTGGTACGTATGGATACAATCTTAACCAACCTGTCACAAATCTCTGTGCATACAGAGGGGGTGGAGTAACATCCTGATACTGAACCCTGGAGTTGTAAGATGCCTGTAGGTTGAATGGTACGCTGTCTCTTGTGTGTTGTCTGAGGTCGTACACTGCGTACACACGCTTCTCACCCCCTCGTCTGGTGTTCAACACCTCATCTGGCTCCGGAGTAAGTAACATTTTTGGTCTCTACAAAAATGTTGATTTAAAAATGACATAATTTATTACAAATAGTATGGCAAACTTTCTTGGAAATATTTAACTGTGGAATAAAAAAGTTCAATAAAATTTAGAAGAACAGGTACCATTCGGAGATAAAAAACAACAGATTAAGGCCATGCCCTTTTTCCCCAAAGAATATTTTTACAGCAGTGTTTTACAGCAGAAAATTATTACAATATTGACTTTAGTAAACCTATACCCACAAATGTGGGAAATCAAGATAGAATGCCCTCTGAGTTAATCCCTTTTTTAGGGTAGGCCATTACATTGGTAAATACTTGTATATGAAGAATATTGTAAAACCACTCACAGGATGCAGAGTTGAATCTACGTATAGTTTGCTGAATTTTGCCACCGGGCACGTCTTGTGAAGACTTCGATCAAACATTGACTTAAGACTCCAATCTGAATATAGAAACATGGCATGTCACATGGAGAAATTGTTGGAAAGGCGGTTTCTTAAATAGGGCCAATAAAGAATGACAGCGAATTCAATTGGTACACAGATTATTTTCAGGCAAGATTCATGTAAAGATTACATGATAACTGGGACATGTAGAAAAGTTATCCCACGTTATTCAGTAAATTTTCAGTGTACCATGGTTGGACTCATTTCACAATCATGGGCACTGAGCAAAAACACAACAATCCAAGACATAGTTGTGAGAACCCTGCTGCTGCACATAGCTAAGAATAATTGGTCCTCTCACAAAACTAAAAGTCCTCTCACAAAACAAAAAGTCTATCATTTGGAATGCAGAGCAAAATGAATCAGTGCTAATAGCTTAAGCCCGGAATTATAGTTATTCGGGTTTTTCCAGCCTAATCCTCCGGTACAAACAAAATTAAACCGTTGTTATATACACACGTTACTTCG

>15230530 3228 52079 12998567+,14012701+
GACAGGAAAGTACGAAGACAGAAGGTGTACTATCGGGAAAACGACTGGACAAAGTAACGAGAGAAAAGGGAAAGTGACTAGAACGGCGACAGGGACAATGCGCAGATAGGAAGTGTTCTGACGGACAACGACAAGGAAAGTCCAGGCACAGACGGGCTTCTGACGGGAACATAAAAAGGGAAAGTACGAGCACAAAAAAGGCGTCTGACAGAAACGGCGACATGGAAATGACCGGTAAGAAGGGCAAGAAAAGAACCTTTGTTTAACGGGTACTGACGGTAACAGCGACCGGGAAAGTACGGATATTGAAGTTCTGACGGATACCGATAGGGAAAGTACAAAGACCGAAGGGGAACTGATTGGAACAACAACATTTCACAATGACAAGACAGAGAGATACTGACAGGAATAGCGACGGGGAAGTTCGAAAACAGAAATTGTATGGACAAAAACGGCGACAGGGAAATGACTGAGACGGAAAGGATGTTTAAGTCAACAGCGACACTGAGATAACGAATATCGAACGGGTACTGACGTAAAGGACGACAGGGAAAGAACTGACGGCGACAAGTAACATACGGAAATAGAAGAAGTACTGACAGGAACACCGATAGGGTAAACACAGATACAGAAAAGAAACTGACATGAACAGCGACAGTAAAAGTACAGATGCAGATATGATACCGACATTACCAGGGATAGAACGGCGAATAACATGTTAGTAATAACTGACCAATAGATTTAATGAAGTCATTAGTGTATGACCCTAAGATTAACTTTGCATTAAGTTGCATAATGTTTACCACCCAAGGCCAAGTTTGAATTTAGATCACATAACATTATGTTTTTTCTTTCAGGATACTGTAATTGCATTACAAGCATTAACAGAATATAGCATGAAAGCGAAACGGCCCGATGTCAACCTTGAAGTCACGGTTACCAGACGATATGGAACTTGGGCTAAAATAGGAAGTTTAAGGATGGACAATGAAAATGACCTTCTAATTAAATCATTGGAACTTGACAACGGAAACGTAAATATTTTCACGTTATTCGATGCATGTCTTTTCTTTTAACCCAAAACGTCTTTGTCATTTACTTTATAGATACATATTCGTAAATTATGCATAACTAATTCAATGAAAACATTTATATGTTTGCTATTTATCGGGGCGGTTTATGTTCGTTCATTTTCACGATTCATTCACATGGATAGCTCAGCTTTCGGGACAACTGATTTTTTCAATCAATTTGCGTTTAAAATGGCCTTTCATAATTTTCGTAAACAATAATGATATTGGTGTTTATGCCAGTTGGCAAAACGTTTTCTTAAGCACATTCCAGATACGTCTTATTTTTAAAATAAACAGTTTTTTATTGTTTGTTACGTCATACGATATATTCTTATACATCAATTTACAACTATTCGAGATTCTGTCATTCAAGGTAAAAAATATTTACAAAAACAACAGTATTTTTTTATACATAGATGTATCAGTTGATACATGATTGTATCCGTTGATATATAGTGGTACCTGCTGATATATATTTGTATCGACTCCGCAAGTCGAAACGGATACGTGTTTACTCACCGAACGAGACGTAATACAGGTCACCAGAAAAGCGTTGTATCGTACATTCCATTAATTATATACCTGCCTATTCAAATATTTCGTTTTATTTTTCGTTTTCAATTTTATATATAAATTTACTGCCATCTGTCAAATTCGTGTATGAGTTCGAATGTGTTACATAGTTTGTGTATTGAAGTCAAGGGAGGCTACTACAGAGAAACGAAGTCTGAAGTGACATAATTCGTTTATTGAGCAAAATAAGAACAAAATCGTTTGTGTTTTAGCAAAATTAACAAATTGCTAATACGAGAACAATTAGTGTATGTCACGGCACAAACCAAAACAACAAAAACAAAACGTTACTTATTTACAAGTATACACTAAACGTCATTTCCTGTAGACGTAACGCGGTCATTTTTAACAAAGTAAATACTCGTGGGATTGTTTTAACGATCATAATGTGCAGAATTGAGGGAAGTTCGTGAGTCAATCGCTATTACCATCGATGACGCGAATTCGCCTTCTTCTTTGTATTGGCAGGCTGCTCATTTTATTGAATAATGCAACAACTTAAGAACTTCGAAATGGCAAACAACTAATTCACTGAAGTAGCTGTCTAGATAAATTTGAAGTGTTATCTGTTTAATTCATGCATATCTTCATGTTTTTTTGAAGGACATGAGAAAAATGTTCCAGAATATTAAGATGCTTCTGGAGCGTGAGTAAAGTTAGATAAGTCCATGTATTTAAAATTTGTCAGGTTTAAAGACTTTTTTTTTGGTCACGCTATTTTTATCAACGCAAAAGTAGTTCCGAGATTACACACGGTACTCGCATGATACGGAATCAGAAAACGACATAATCATGATTATATTTATATTTAGGCAAAAAGCTACAGAAACAAACTCAGTAGCAAAAAGATTAAACATAAACCTGGTTTAGTGGCACTTTGCAAACGCCAAAGACATAGAACCCAAAATTACAAACAAGAAACATAGAAGAACAGCACAAAACTTAACAAACAGCATAGTGCATACTGATTTTACAATACGGCGTGCTGTATTTTTGTTGATACATATTCGTAACTATTGATGTATAGTTGTATCTTTTGATACTTACATCTCTTTGAATGATTTTAGCAGCCTGCAAATATTGTAGAGGAGAAAGGATAAATCAGAGAGACACATTCTTATTTTCCCAAAGATTAAGAGCTGAAAAATGATGGAAAGGCAGTATATGCGAATCATATCAAGAACATATTCCCAAGTGAAATATACAGTCATACTGATATTATGTTAAACAAAATATATCAATAATACATTACAATTTAAAATATATTATATATTAAAAAATAAAACTATTGAAAATTGAATTATGTAAAAAGTGTAATTAAACTATATATCAAAGCTTTTGTGCTTTTAAACCGTATTCAAACTAAACCTAAATGAGGACAACGGAAGTCTAAATCATGGTAACACTCATGTTAAAAGACCTATTCAAATGTAAGACATTTTTAGACACATAGGCTTTTAAATTATCAATTATAAATACGATTTCGCCTACGTGTTCTCCTGTATTAAACACAATGAAAAAATAGGTTGTATTTATTTATATATATTACGCCCACTCGCTTCAATTTTCCTTAGCTTTATTC

Matt

Ben Vandervalk

unread,
Jul 14, 2015, 3:03:44 PM7/14/15
to Matthew MacManes, abyss...@googlegroups.com
Hi Matt,

Sorry, no solution yet.

Thanks for your patience.  Unfortunately, I think I will need even more data.  Even though we know that the problem sequences first appear in -6.fa, there are some upstream .path/.adj/.dot/.fa files that would provide some valuable information about the origin of those sequences.  Would it be possible for you to send me the complete contents of your assembly directory, along with the full log file?  (I understand it may be difficult to find a place to host that much data; also, feel free to send the download info in a private e-mail if you prefer.)

It is possible that your problem will go away serendipitously with some parameter adjustment.  For example the 's' parameter sets the minimum length cutoff for sequences that are joined together into contigs or scaffolds.  (By default it is 200.)  Setting it higher would cause the contig/scaffold algorithms to ignore a lot of the smaller sequences.   Maybe try setting it to 1000 adding "s=1000" to your abyss-pe command. It might also be worth trying "pcopt=-a1", which should effectively disable the PathConsensus steps in the pipeline.   (The -a option specifies the number alternate paths to merge into a consensus.)

- Ben

Matthew MacManes

unread,
Aug 24, 2015, 5:46:48 PM8/24/15
to ABySS, macm...@gmail.com
Any thoughts on this Ben - I've just had this error creep up again I'm afraid. 

Matt

Ben Vandervalk

unread,
Aug 25, 2015, 12:01:35 PM8/25/15
to Matthew MacManes, ABySS
Sorry Matt, I worked on it for a while but I wasn't able to figure it out. I will give it another try.

Are you able to work around it with "pcopt=-a1"?

- Ben

Matthew MacManes

unread,
Aug 25, 2015, 5:23:01 PM8/25/15
to Ben Vandervalk, ABySS
Yes. "pcopt=-a1" works - just not sure how much I lose by using that.. 

Ben Vandervalk

unread,
Aug 25, 2015, 7:27:14 PM8/25/15
to Matthew MacManes, ABySS
Sure, I understand; it's definitely not ideal.

Just a suggestion: You may be able to get some idea of what your are losing with "pcopt=-a1" by doing a comparison with a non-problematic data set (if you have such a data set).

- Ben

Ben Vandervalk

unread,
Aug 25, 2015, 7:28:40 PM8/25/15
to Matthew MacManes, ABySS
More explicitly, what I was thinking of was comparing abyss-fac results with/without "pcopt=-a1".

- Ben

Ben Vandervalk

unread,
Aug 25, 2015, 7:43:34 PM8/25/15
to MacManes, Matthew, ABySS
A smaller data set would be awesome, if you are able to do it.  Understandable if you can't.

- Ben

On Tue, Aug 25, 2015 at 4:34 PM, MacManes, Matthew <Matthew....@unh.edu> wrote:
I do, just not with the longer kmers.. I’m going to try and make a smaller dataset that recreates the issue - will make it easier for debugging.. I’m also happy to be told this is some data-specific issue if indeed this is the case.. 

Matt
______________________________________________
Matthew MacManes, Ph.D. 
University of New Hampshire  I  Assistant Professor of Genome Enabled Biology
Department of Molecular, Cellular, & Biomedical Sciences
Durham, NH  03824
Phone: 603-862-4052  I  Twitter: @PeroMHC | Web: genomebio.org
Office: 189 Rudman Hall | Laboratory: 145 Rudman Hall

Reply all
Reply to author
Forward
0 new messages