Dual RNA-Seq

50 views
Skip to first unread message

Kamil

unread,
Nov 20, 2017, 8:44:58 PM11/20/17
to rna-star
Dear Alex,
Thank you so much for your help in this google group page, have managed to solve most of my problems by just looking through the previous queries. I have one that has not yet been covered though; I performed RNA-seq of ribodepleted RNA from human gut tissue - 25bp PE (short sequences as we were QCing). The read quality was excellent. I used bowtie2 (kneaddata) to remove any ribosomal reads. As I want to perform dual host and microbiome RNA-seq I then used the unmapped paired files and aligned against hg19 with STAR. 12% of the reads were unmapped and I used these to align against the gut microbiome reference database. Unfortunately this second STAR mapping process resulted in no uniquely mapped reads and instead nearly all reads were multimapped. If I skip the hg19 STAR alignment, then around 11% of reads uniquely map to the gut microbiome database. 

What am I doing wrong? Is this result just a consequence of short read lengths? Should I use bowtie2 to pullout hg19 aligned reads first and then map the unmapped reads using STAR against the gut microbiome database?

Many thanks for your help with this!
Kamil

Alexander Dobin

unread,
Nov 21, 2017, 8:55:21 PM11/21/17
to rna-star
Hi Kamil,

first, I would like to make a general comment about the multimappers. Multimapping reads are not in any way "worse" than the unique alignments. If a read maps to multiple locations in your reference, it does not tell you that your pipeline is bad. It might hint you that your reference has repeated sequences, which may or may not reflect biological reality. The only problem with the multimappers is that they are harder to deal with in post-mapping analyses, as you need to aggregate the mappings over several loci.

I think your procedure of mapping the reads first to human genome, and then to the microbiome is correct.
In your particular case, one possibility why you see only multimappers to the microbiome, it's because the reads are too short.
Another possibility is that microbiome reference is highly repetitive even for longer sequences. You can figure this out by simulating random reads from the microbiome and mapping them back to see whether they come up as multimappers.
Yet another possibility is that the rRNA depletion did not work well for bacteria, and most of the bacteria RNA you see are rRNA, which dominantly map as multimappers.
The 11% of reads that map uniquely directly to the microbiome have to be also mappable to the human genome. This means that microbiome contains some sequences identical to human, which could be biological reality, or technical artifact (contamination of microbiome reference with the human genome).

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages