Testing the reduction part of Redundans

110 views
Skip to first unread message

Nina

unread,
Oct 12, 2017, 3:09:52 AM10/12/17
to Redundans
Hi Leszek,

I have had the pleasure of trying your software Redunsdans, downloaded it a few weeks ago.

I have tryed to understand the reduction part specifically. For that I have created a very small contigs fasta file with 9 strings. These 9 strings I then copied and just renamed the name of the individual strings.


In an attempt to test if Redundans would prune out Node 11 to Node 18 I ran Redundans with the following parameters:


>redundans.py --noscaffolding --nogapclosing -f contigs-double8.fa --identity 0.51 --overlap 0.8 --minLength 50 --log double8.log -o double8
[ERROR] Empty FastA file encountered: double8/contigs.reduced.fa !


I assume I get the ERROR message because the .fa file is too small

> wc double8/contigs.reduced.fa      
 19  19 651 double8/contigs.reduced.fa

 However the file 'double8/contigs.reduced.fa' contains both NODE\_8 and NODE\_18
 much to my surprise. The other contigsd are reduced as I would expect. I have tried the same exercise with nine duplicated contigs. Also in this case was the last contig present in twice (NODE\_9 and NODE\_19) after the reduction.

 Any thought to what is going on?

 Kind regards
 Nina


My contigs file (contigs-double8.fa):

>NODE_1
ACGTATAGGGTGTCGAGCACATATCAATATGTCATGAACTGAGAACCTTTACCTTTTTGG
AGTAAGTACCCCTTTTGGGGAGAAGAA
>NODE_2
TTGCATTGAGAGGCGGTATGTTTTTCCAAGATTCTCAAGTAACAAGATTTTTAGTCTAGG
>NODE_3
AATGCTTGAATTTTAAAATTCACTGCAATTAAAGTAAGAGAAGGGTTGTGTACATCAAAA
>NODE_4
AATTACTTTTAAAATCATAAAGGTTGATAATCAGAAGTCAAAGTACTATACTTTTGCTTA
>NODE_5
AAATTAGACGAATTTCACGAGATAAAAATAGCTACATGCCTTTCCGCATTAATGCAAGAA
>NODE_6
TGAAAAACTACCTTGCAATACAGAATCCTAACTATAATCTAGATCCAATCAGTTTGAGGT
>NODE_7
GCGCATTTGCATTGCGTAGCCAAGGGATTAATGACAAGTATACACCAAAATAGAACGCAC
>NODE_8
GCGATAGCTTCAATACTCCAATCGAAAATGAATGGCGTGTTTTTATTCAACAAAGTCTTA
>NODE_11
ACGTATAGGGTGTCGAGCACATATCAATATGTCATGAACTGAGAACCTTTACCTTTTTGG
AGTAAGTACCCCTTTTGGGGAGAAGAA
>NODE_12
TTGCATTGAGAGGCGGTATGTTTTTCCAAGATTCTCAAGTAACAAGATTTTTAGTCTAGG
>NODE_13
AATGCTTGAATTTTAAAATTCACTGCAATTAAAGTAAGAGAAGGGTTGTGTACATCAAAA
>NODE_14
AATTACTTTTAAAATCATAAAGGTTGATAATCAGAAGTCAAAGTACTATACTTTTGCTTA
>NODE_15
AAATTAGACGAATTTCACGAGATAAAAATAGCTACATGCCTTTCCGCATTAATGCAAGAA
>NODE_16
TGAAAAACTACCTTGCAATACAGAATCCTAACTATAATCTAGATCCAATCAGTTTGAGGT
>NODE_17
GCGCATTTGCATTGCGTAGCCAAGGGATTAATGACAAGTATACACCAAAATAGAACGCAC
>NODE_18
GCGATAGCTTCAATACTCCAATCGAAAATGAATGGCGTGTTTTTATTCAACAAAGTCTTA


The reduces contig output file (double8/contigs.reduced.fa) :

>NODE_8
GCGATAGCTTCAATACTCCAATCGAAAATGAATGGCGTGTTTTTATTCAACAAAGTCTTA
>NODE_7
GCGCATTTGCATTGCGTAGCCAAGGGATTAATGACAAGTATACACCAAAATAGAACGCAC
>NODE_6
TGAAAAACTACCTTGCAATACAGAATCCTAACTATAATCTAGATCCAATCAGTTTGAGGT
>NODE_5
AAATTAGACGAATTTCACGAGATAAAAATAGCTACATGCCTTTCCGCATTAATGCAAGAA
>NODE_4
AATTACTTTTAAAATCATAAAGGTTGATAATCAGAAGTCAAAGTACTATACTTTTGCTTA
>NODE_3
AATGCTTGAATTTTAAAATTCACTGCAATTAAAGTAAGAGAAGGGTTGTGTACATCAAAA
>NODE_2
TTGCATTGAGAGGCGGTATGTTTTTCCAAGATTCTCAAGTAACAAGATTTTTAGTCTAGG
>NODE_11
ACGTATAGGGTGTCGAGCACATATCAATATGTCATGAACTGAGAACCTTTACCTTTTTGG
AGTAAGTACCCCTTTTGGGGAGAAGAA
>NODE_18
GCGATAGCTTCAATACTCCAATCGAAAATGAATGGCGTGTTTTTATTCAACAAAGTCTTA

Nina

unread,
Oct 12, 2017, 3:17:06 AM10/12/17
to Redundans
 For that I have created a very small contigs fasta file with 9 strings. These 9 strings I then copied and just renamed the name of the individual strings.

Should have been:

 For that I have created a very small contigs fasta file with 8 strings. These 8 strings I then copied and just renamed the name of the individual strings.
 

l.p.p...@gmail.com

unread,
Oct 17, 2017, 9:45:10 AM10/17/17
to Nina, Redundans
Dear Nina, 

Thanks a lot for a report! I've found the bug, it's minor thing, last element from LAST hits was not processed, so it only affects cases like your test case where there is only one2one hits. In real life scenario you should have also suboptimal hits - that's why I've never spotted it... 

Neverheless, It's solved. Just pushed corrected version of fasta2homozygous.py to github. 

Thanks again! 
L. 

L.

--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/9e68204b-70aa-4419-a70e-e5c81430c23c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Nina

unread,
Oct 19, 2017, 5:14:12 AM10/19/17
to Redundans
Thanks for the quick reply and fix!
I'm happy to have helped.
 
I can see how this is not a problem that would be exposed through normal use of Redundans. I was trying to understand it and hence was testing the limits.

Cheers Nina

l.p.p...@gmail.com

unread,
Oct 19, 2017, 5:20:32 AM10/19/17
to Nina, Redundans
I'm really greatful, as one usually uses more complex test case and some unusual scenarios can't be spotted... 

btw: let me know how Redundans works for you. Plus, if you have some ideas for new features, I'll be happy to work more, given there is substantial interest. 
One of the things we're discussing lately is providing all-in-one solution around Redundans that would just take your FastQ libs and return the assembly without any prior knowledge about lib types and genome characteristics. Anyone is willing to help with that? 

Bests, 
L. 



L.

--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages