How to use colorspace data in Trinity? Or What is best way for use SOLID data in Trinity?

107 views
Skip to first unread message

Leandro de Mattos

unread,
Feb 23, 2015, 10:36:58 AM2/23/15
to trinityrn...@googlegroups.com
Dear, colleagues bioinformaticians,
 
I would like to know how to use trinity to run solid data (colorspace.fasta). In some way, can I use directly in this this format? If not, which program is best recommended to convert to base space?
 
Thanks for any help,
Leandro
 
 

Brian Haas

unread,
Feb 23, 2015, 10:42:06 AM2/23/15
to Leandro de Mattos, trinityrn...@googlegroups.com, David Eccles
Hi Leandro,

Trinity is not compatible with colorspace.  David Eccles (CC'd) is our local colorspace expert and will likely have some advice for you.

best,

~brian



--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Leandro de Mattos

unread,
Feb 23, 2015, 11:15:40 AM2/23/15
to Brian Haas, trinityrn...@googlegroups.com, David Eccles
Thanks Dr. Brian,

Best,

Leandro.

David Eccles (gringer)

unread,
Feb 23, 2015, 1:24:17 PM2/23/15
to Leandro de Mattos, trinityrn...@googlegroups.com
On 24.02.2015 04:42, Brian Haas wrote:
> Trinity is not compatible with colorspace. David Eccles (CC'd) is
> our local colorspace expert and will likely have some advice for
> you.

[Brian, any chance this can be put onto the website as a FAQ?]

If you're trying to use colour-space at all, my first advice is to run:
run away, or rerun your samples on a base-space system. Colour-space
is *really* confusing, and trying to explain why aligners produce odd
mappings is difficult. I wrote up some information on colour-space on
seqanswers a while back, which you can read if you want to know more:

http://seqanswers.com/forums/showpost.php?p=59156&postcount=4

So, let's assume you've come back from that and still like the idea of
colour-space. Maybe there's a strong financial incentive for that.
Next up is my post to the trinityrnaseq-users group about this:

On 11.10.2012 11:06, David Eccles (gringer) wrote:
> I was trying to shoehorn colourspace into Trinity a while back.
> There are a few issues with this format that make assembly
> considerably more difficult:
>
> 1. You can't treat double-encoded colourspace as base-space and
> expect things to work properly. The reverse of colourspace is the
> reverse of a sequence, rather than reverse-complement. The assembly
> algorithm needs to be modified to accommodate this (*in addition to
> double-encoding*), and you can't distinguish between some important
> codons and/or features. For example, a polyA tail is no different
> from a polyT head, so Trinity may happily join transcripts together
> with a polyA/polyT split.
>
> 2. There are two types of sequencing errors which have different
> outcomes on the determined sequence: Phasing errors will result in
> frame shifts (as per base-space sequencing), but incorrect colour
> addition in a sequence completely changes the interpreted sequence.
> The second error means that any sufficiently long sequence
> (say > 25-30bp) has about a 1 in 4 chance of having the wrong
> encoding at the end of the sequence. This could change stop
> codons to other amino acids, or convert polyA tails to polyG tails,
> for example.
>
> My initial idea was to double-encode sequences, then derive the four
> possible base-space encodings for each sequence in the Inchworm
> output. Unfortunately, because of the stated issues you get
> unnatural chimera assemblies, and the resultant assembled sequences
> aren't particularly useful.
>
> You would be best to avoid colour-space if at all possible. If you
> can't do that, try mapping your sequences to a similar well-assembled
> transcriptome to derive base-space sequences for each read, then
> assemble in Trinity using the derived reads.

Still interested? Then here's another post I wrote about how you might
got about doing it. The first suggestion follows similar lines to how
you assemble high error long reads (e.g. Nanopore, PacBio), which is to
use a process to correct the reads so that their errors in base-space
are similar to Illumina reads, and then treat the output as Illumina reads:

On 18.10.2013 02:49, David Eccles (gringer) wrote:
> Okay, so it seems your general question is "how do I shoehorn
> colour-space data into an RNA assembly program?"
>
> My first recommendation, which will save you time, pain, and probably
> money, is to re-run your samples using a non-colourspace system. If
> this can't be done, you might still come out better off by telling
> people that assembly of colour-space transcripts is not possible.
>
> The biggest problem I have with colour-space is that two different
> types of errors (base switch errors and sequence errors) are [possibly
> necessarily] represented by a single number. Two additional problems
> cause issues in RNA assembly:
> * base switching due to incorrect reads of colours makes guessing
> amino acids almost impossible
> * poly-A tails cannot be differentiated from poly-T heads (reverse
> complement), or any other long monomer, causing accidental chimeric
> transcripts
>
> Considering this, and throwing caution to the wind, there are two
> avenues that *might* work:
>
> 1. Use bowtie (version 1) to map the colour-space reads to your good
> (or not so good) reference assembly. The resulting SAM file will have
> the converted base-space sequence, with any errors due to base
> switching fixed up to match the reference. Extract out that base
> sequence into a new FASTA file, and continue as normal with Trinity.
>
> 2. Double-encode the colourspace data (by trimming off the first base
> then changing 1->A, 2->C, 3->G, 4->T), then run it through the
> Inchworm step as a strand-specific run [you are doing strand-specific
> sequencing, right?]. When that's done, convert the Inchworm
> transcripts back to colourspace, then convert to base space by
> generating the four different base-space sequences possible from each
> colourspace sequence (i.e. append A/C/G/T to the start of the numeric
> sequence). I think this might have been what I got Trinity to do a few
> moons ago. After that, run the transcripts through Chrysalis /
> Butterfly as per normal. If your colour-space sequencer has perfect
> reads, then there'll be no base switching in the middle of the
> colour-space version of the transcripts, so you might get a few good
> transcripts out the other end.

Hope this helps,

--
David Eccles
Bioinformatics Research Analyst, Gringene Bioinformatics
Room 2.10 x857
Malaghan Institute of Medical Research
http://www.malaghan.org.nz

Brian Haas

unread,
Feb 23, 2015, 4:10:55 PM2/23/15
to David Eccles (gringer), Leandro de Mattos, trinityrn...@googlegroups.com
Good idea David.  I'll see if I can link directly to this post in the google group.

many thx!

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

David Eccles (gringer)

unread,
Feb 25, 2015, 1:21:19 PM2/25/15
to trinityrn...@googlegroups.com
On 26.02.2015 03:17, Leandro de Mattos wrote:
> Dear David,
> Please, I Think that you can to help me. I have a question, Can I use the script:
> velvet assembler : solid_denovo_preprocessor.pl <http://solid_denovo_preprocessor.pl> to obtain input for the Trinity ??
> Can I use file doubleEncoded_input.de for running Trinity.

Probably, but you're going to get lots of weird chimeras from homopolymer sequences joining together, and any consistent errors will
completely alter the base-space representation of the sequence. Your assembly is only likely to be useful in colour-space for mapping other
colour-space reads.

- David
Reply all
Reply to author
Forward
0 new messages