SOAPdenovo and colorspace reads

135 views
Skip to first unread message

David H

unread,
Jul 4, 2012, 8:49:36 AM7/4/12
to bgi-...@googlegroups.com

I would like to use SOAPdenovo for scaffolding.

I have contigs.  I have both Illumina paired-end reads and SOLiD mate pair reads.

May I pls ask:  Can SOAPdenovo be made to work with SOLiD colorspace reads (csfasta files)?

Ruibang Luo

unread,
Jul 6, 2012, 10:57:08 PM7/6/12
to bgi-...@googlegroups.com
You can translate color space reads to base space first. But you will lose the advantage of color space reads.

rb



-- 
You received this message because you are subscribed to the Google Groups "BGI-SOAP" group.
To view this discussion on the web visit https://groups.google.com/d/msg/bgi-soap/-/5JHuihhkYNYJ.
To post to this group, send email to bgi-...@googlegroups.com.
To unsubscribe from this group, send email to bgi-soap+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/bgi-soap?hl=en.

Manoj Samanta

unread,
Jul 8, 2012, 2:32:50 PM7/8/12
to bgi-...@googlegroups.com
The best way to proceed would be to

i) correct color space mate pair reads using contigs assembled by SOAPdenovo,

ii) translate corrected color space to nucleotide space,

iii) treat translated mate pair library like a Illumina mate pair library.

The straightforward approach of translating color space into
nucleotide and proceeding has drawback that the translations become
completely meaningless, if the reads contain one color space error
(http://www.homolog.us/blogs/2010/02/15/the-mathematics-behind-color-space-sequencing/).
Based on my experience with SOLiD reads, SOLiD reads do have frequent
singleton errors and that may throw the assembler off-base.


How to do (i) listed above?

(a) Run 'SOAPdenovo pregraph' and 'SOAPdenovo contig' to assemble
contigs from Illumina reads.
(b) Translate contigs to color space (pseudonucleotide) and use Bowtie
to map color space reads,
(c) Based on above mapping, correct as many color space reads as possible.

Kevin McKernan

unread,
Jul 9, 2012, 1:00:05 PM7/9/12
to bgi-...@googlegroups.com, bgi-...@googlegroups.com
Pavel Pevzner published work on kmer error correcting SOLiD reads.
Error correction is even better in color space as SNP look different than errors. I don't know if the SOAP tools can do this but one trick we've used to make other base space tools work on color space data is to perform a literal transliteration of 0,1,2,3 -> A,T,C,G.
Perform the soap error correction and then translate back to 0,1,2,3.

Youll then have error corrected color space reads which you perform color to base translations on which are immune to the frame shifting issues.
I think Pavels technique takes into consideration the adjacent errors as being SNPs and it better tuned for this. I think Life Tech may have version of this available.

I use this before assembling.
Translating to base space before error correction or assembly will leave you with little data.

Do the ILMN mate pairs have the same read orientation as the SOLiD mate pairs?

Sent from my iPhone

Ruibang Luo

unread,
Jul 11, 2012, 10:58:32 PM7/11/12
to <bgi-soap@googlegroups.com>, bgi-...@googlegroups.com
Please aware that SOLiD uses FF orientation since Illumina uses FR in pair-end reads and RF in mate pairs, when using soapdenovo for SOLiD, please reverse complement right side reads and use reverse=0

從我的 iPhone 傳送

Manoj Samanta

unread,
Jul 12, 2012, 10:23:10 AM7/12/12
to bgi-...@googlegroups.com
Thanks Kevin. Does Pavel's method use the Illumina reads used to
build contigs, or is it only based on the SOLiD library?

Kevin McKernan

unread,
Jul 12, 2012, 1:53:17 PM7/12/12
to bgi-...@googlegroups.com, bgi-...@googlegroups.com
His method is just an error correction tool. Works both ILMN and SOLiD.
I would assemble SOLiD data without error correction so I was suggesting this as a preprocessing step to any assembler.


Sent from my iPhone

luiscunhamx

unread,
Aug 29, 2013, 5:23:39 AM8/29/13
to bgi-...@googlegroups.com
Dear Manoj,



I just found this post, and I am having exactly the same problem, I have 3x different size mate pair library and deep coverage with illumina data. I am very interested in doing the correction strategy mentioned by you. Can you please explain a bit better the last point, sorry but I am not sure how to do it after having it mapped:


(c) Based on above mapping, correct as many color space reads as possible.



Thanking in advance and waiting for your answer


Luis
Reply all
Reply to author
Forward
0 new messages