Redundans not producing expected output when trying to reduce metagenomic assemblies

43 views
Skip to first unread message

Jesse McNichol

unread,
Feb 2, 2018, 11:38:03 AM2/2/18
to Redundans
Hi Leszek,

I've been trying to use redundans to reduce and quantify the redundancy in metagenomic assemblies I'm working with. I've run into a few issues:

-contigs.reduced.fa.hetero.tsv is always empty
-contigs.reduced.fa.hist.png is always empty

Even if I don't set the --noreduction flag, it prints out the following config information with the -v flag:

Options: Namespace(fasta='180124_all_concatenated_cd-hit_culled.fasta', fastq=[], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '<stderr>', mode 'w' at 0x7f5d772171e0>, longreads=[], mapq=10, minLength=200, nocleaning=True, nogapclosing=True, norearrangements=False, noreduction=True, noscaffolding=True, outdir='180202_test_cd-hit_culled_redundans_default', overlap=0.8, reference='', resume=False, threads=4, verbose=True)

Even though this flag is reported as above, I do get good reduction of about 95% for shorter contigs (<1000bp) in some samples (which is what I expected intuitively). For others, it does almost nothing (maybe it removes a few contigs but removes <1% for sure). This might be a real result but I'm just not sure whether I'm causing the unpredictable behaviour by using redundans for a case where the complexity is much greater than what you designed the software for. I expect there is more allelic variation compared to a single genome assembly since we're sequencing complex and mixed natural communities. Basically, I'm just not sure whether I'm running up into some limit of the software for making multiple comparisons/alignments.

If you have any insights, I'd be grateful to hear it. Can provide logs/input files if you'd like.

Best,

Jesse

l.p.p...@gmail.com

unread,
Feb 5, 2018, 4:38:11 AM2/5/18
to Jesse McNichol, Redundans
Hi Jesse, 

Honestly, I've never tried Redundans on microbiome data, but it can be an interesting application! 
You may observe removal of short contigs even without reduction, as contigs shorter than 200bp (minLength=200) are skipped, as those complicate scaffolding and usually resulting gaps can be later easily closed with PE/MP reads. 

First of all, which version/commit are you running (git log | head -n5)? 

I've been trying to use redundans to reduce and quantify the redundancy in metagenomic assemblies I'm working with. I've run into a few issues:

-contigs.reduced.fa.hetero.tsv is always empty
-contigs.reduced.fa.hist.png is always empty

Even if I don't set the --noreduction flag, it prints out the following config information with the -v flag:

Options: Namespace(fasta='180124_all_concatenated_cd-hit_culled.fasta', fastq=[], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '<stderr>', mode 'w' at 0x7f5d772171e0>, longreads=[], mapq=10, minLength=200, nocleaning=True, nogapclosing=True, norearrangements=False, noreduction=True, noscaffolding=True, outdir='180202_test_cd-hit_culled_redundans_default', overlap=0.8, reference='', resume=False, threads=4, verbose=True)
That's correct, when you set --noreduction this parameter becomes False and reduction is not performed (I know it's counterintuitive:/). 
 

Even though this flag is reported as above, I do get good reduction of about 95% for shorter contigs (<1000bp) in some samples (which is what I expected intuitively). For others, it does almost nothing (maybe it removes a few contigs but removes <1% for sure). This might be a real result but I'm just not sure whether I'm causing the unpredictable behaviour by using redundans for a case where the complexity is much greater than what you designed the software for. I expect there is more allelic variation compared to a single genome assembly since we're sequencing complex and mixed natural communities. Basically, I'm just not sure whether I'm running up into some limit of the software for making multiple comparisons/alignments.

If you have any insights, I'd be grateful to hear it. Can provide logs/input files if you'd like.
It may be related to data itself. 
Which de novo assembler are you using?
Could you send me contigs.reduced.fa.hist.png and logs/statistics for good and bad cases? 

Bests, 
L. 

Best,

Jesse

--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/fc91fafb-9750-4416-9f58-0b93c2bd0c95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages