Hi Leszek,
I've been trying to use redundans to reduce and quantify the redundancy in metagenomic assemblies I'm working with. I've run into a few issues:
-contigs.reduced.fa.hetero.tsv is always empty
-contigs.reduced.fa.hist.png is always empty
Even if I don't set the --noreduction flag, it prints out the following config information with the -v flag:
Options: Namespace(fasta='180124_all_concatenated_cd-hit_culled.fasta', fastq=[], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '<stderr>', mode 'w' at 0x7f5d772171e0>, longreads=[], mapq=10, minLength=200, nocleaning=True, nogapclosing=True, norearrangements=False, noreduction=True, noscaffolding=True, outdir='180202_test_cd-hit_culled_redundans_default', overlap=0.8, reference='', resume=False, threads=4, verbose=True)
Even though this flag is reported as above, I do get good reduction of about 95% for shorter contigs (<1000bp) in some samples (which is what I expected intuitively). For others, it does almost nothing (maybe it removes a few contigs but removes <1% for sure). This might be a real result but I'm just not sure whether I'm causing the unpredictable behaviour by using redundans for a case where the complexity is much greater than what you designed the software for. I expect there is more allelic variation compared to a single genome assembly since we're sequencing complex and mixed natural communities. Basically, I'm just not sure whether I'm running up into some limit of the software for making multiple comparisons/alignments.
If you have any insights, I'd be grateful to hear it. Can provide logs/input files if you'd like.
Best,
Jesse