[WARNING] Nothing reduced, but why? Issues with output?

42 views
Skip to first unread message

Daniel Fernando Paulo

unread,
Apr 15, 2019, 8:44:56 AM4/15/19
to Redundans

Hi everyone,

 

I am pretty new to Redundans and I am trying to perform the “Redundancy Reduction” step on a very heterozygous insect genome.

 

For that, I used the following command line:

 

redundans.py --verbose --noscaffolding --nogapclosing --fasta genome.fasta --identity 0.95 --overlap 0.66 --minLength 500 --threads 7 --outdir outdir --log genome.log

 

Please not that I used the --noscaffolding and --nogapclosing parameters, as I’m only interested in the Redundancy Reduction step of the pipeline.   

 

After some hours of run I got the following message (on the "Final Reduction Step"):

 

“[WARNING] Nothing reduced”


The statistics of my “.log” file looks like this:

 

#fname                        contigs       bases         GC [%]      contigs>1kb      bases>1kb          N50            N90            Ns              longest

genome.fasta                  30              48147108  28.249        30              48147108  7306273    3975295    226300      13637223

contigs.fa                       30              48147108  28.249        30              48147108  7306273    3975295    226300      13637223

contigs.reduced.fa          12              47642345  28.255        12              47642345  7306273    3975295    226300      13637223

scaffolds.reduced.fa        12              47642345  28.255        12              47642345  7306273    3975295    226300      13637223

 

 

What is very weird is that my genome has 17,817 scaffolds, but it seems that Redundans analyzed only 30 of them.

 

When I run the command grep -c ">" contigs.fa it returns to me “17817”, so it seems that the program is loading all contigs into the pipeline.

 

So, what I guess is that redundans is working, but it’s not processing the entire fasta file and thus the assembly unchanged (as less than 1% of the contigs were reported in the “.log” file).

 

Any ideas of what is happening? Am I interpreting the results wrongly?

 

Thank you all for the help,

Dani.

Leszek

unread,
Apr 16, 2019, 5:47:46 AM4/16/19
to Daniel Fernando Paulo, Redundans
Dear Dani, 
There are two possible explanations
  1. Your contigs are very short - Redundans skips contigs shorter than 200 nt (--minLength 200). 
  2. Your contigs have non-unique names - you can check with `grep -m50 ">" genome.fasta
Please send me fasta index (genome.fasta.fai). It'll help in resolving the problem. 

Bests, 
L.
---
The registration to #NGSchool2019: Machine Learning for Biomedicine (27 Jul - 4 Aug 2019, Poland) is opened till 1 May. You can apply at https://ngschool.eu/apply.
To stay up-to-date follow us on Facebook and/or Twitter.


--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/6fd5842e-ed7c-4702-adb4-1bc3bb6b5dae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Fernando Paulo

unread,
Apr 16, 2019, 10:18:17 AM4/16/19
to Leszek, Redundans
Dear Leszek,

Thanks for your fast repply.

Yeah, I was suspecting that was a problem with the input ".fa" genome file.
So I changed all headers to unique values by doing: 

awk '/^>/{print ">NODE_" ++i; next}{print}' < genome.fa > genome_rename.fa

Now it seems to work fine, and Redundans reported all the 17,817 scaffolds as expected.
Thanks again and sorry for sending two emails about the same issue.
Dani.
_____________________________

MSc. Daniel Fernando Paulo
State University of Campinas (Unicamp).
Center for Molecular Biology and Genetic Engineering (CBMEG).
Laboratory of Animal Genetics and Evolution (LabGEA).
13083-875 - Campinas, SP - Brazil.

Reply all
Reply to author
Forward
0 new messages