Hello all,
Sorry if this is a repost. I was sure I posted this last week but I cannot find the original using google groups search.
I am getting some conflicting results for the SOAPdenovo
scaffolding process. I found that if I use the ouput of the prepare (http://soap.genomics.org.cn/down/prepare.tgz)
tool, I get different, arguably better, results. After error correction (http://soap.genomics.org.cn/down/correction.tar.gz),
I am left with 242,857,233 reads in 113 libraries. Running pregraph and contig with
a kmer of 49, I generate “original.contig”. Following the normal steps, I ran map and scaff resulting in “original.scafSeq”. I also copied “original.contig” to a new
directory and ran “prepare –g rescaf –K 49 –c original.contig”. Running map and scaf on rescaf.contig results in the file rescaf.scafSeq. The
following is a summary of the assemblies.
File Contigs
Total_length Longest N50
original.contig 2459270
144845786 5254 54
rescaf.contig 2459270
144845786 5254 54
original.scafSeq 15289 4964403 5254 532
rescaf.scafSeq 9875 5416552 36403 2579
Does anyone know the explanation for this difference? The
only difference between “original.contig” and “rescaf.contig” is the lack of “cvg_X.X_tip_X”
in the latter. If any information on the accompanying files is needed, I can
provide that as well.
Regards, Keith