Preprocessing reads before alignment for reference-guided assembly

Thomas Sandmann

unread,

May 31, 2016, 9:10:07 PM5/31/16

to trinityrnaseq-users

Dear all,

I would like to align paired-end reads to a reference genome with STAR and then use Trinity's reference-guided assembly approach.

The manual states

If quality trimming or normalization are desired, these processes should be performed prior to aligning the reads to the genome, as Trinity will only use the reads as they exist in the coordinate-sorted bam file provided to it.

STAR uses soft-clipping to mark parts of the reads that didn't align to the reference, e.g. due to low quality. Is that sufficient / useful as input for Trinity? Or should I perform quality trimming before the STAR alignment instead?

Also, does normalization provide a benefit for the reference-guided assembly? STAR is quite fast and it seems to me that Trinity's memory requirements when starting with a BAM file are already much lower than in a de novo assembly.

Many thanks for any feedback,

Thomas

Brian Haas

unread,

May 31, 2016, 9:59:48 PM5/31/16

to Thomas Sandmann, trinityrnaseq-users

Hi Thomas,

In genome-guided mode, Trinity will assemble the full reads as represented in the bam file, regardless of any soft or hard clipping in the alignments. In general, if any quality trimming is to be done on the reads before de novo transcript assembly, it's done very gently (minQ=5). Normalization will certainly help w/ genome-guided de novo assembly as within genome-free assembly for those genes that have massive coverage.

best,

~brian

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--

--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

Thomas Sandmann

unread,

May 31, 2016, 10:45:25 PM5/31/16

to Brian Haas, trinityrnaseq-users

Hi Brian,

Thanks so much for your instantaneous reply and your explanations. I will trim adapters and perform gentle quality trimming, as you suggested, before starting the normalization step you outlined in the manual. Then I think I am ready to align the reads and begin the assembly.

In addition, I would also like to follow the de novo assembly path. As I have more than a billion read pairs, memory is a constraint. I think I can start with the normalization and use the remaining read pairs for both de novo and alignment to the reference, correct?

Also, do you have any recommendations about error correcting the reads (or not)? There is some evidence in the literature that this might be useful, especially for high coverage data like mine, so any advice is appreciated.

Thanks again for making these great tools available to the research community!

Thomas

Brian Haas

unread,

Jun 1, 2016, 7:10:43 AM6/1/16

to Thomas Sandmann, trinityrnaseq-users

Hi Thomas,

That's right, you can use the normalized reads for both genome-guided and de novo separately. We haven't thoroughly explored doing error correction. I suspect - given how Trinity works - any improvements from doing error correction would be negligible.

If you need a machine with a lot of RAM, you could leverage the freely available resources we make available at IU:

https://galaxy.ncgas-trinity.indiana.edu/

best,

~b

Reply all

Reply to author

Forward