read mapping to multiple species

126 views
Skip to first unread message

Silpa Suthram

unread,
Sep 24, 2013, 5:13:57 PM9/24/13
to sailfis...@googlegroups.com
Hi,
  We have RNA-seq data from mouse and human combined.  We usually align reads to their combined transcriptome, but remove any reads mapping to both species or ribosomal RNA.  How does Sailfish deal with this issue?  Is there a good way to do this with Sailfish?
Silpa

smount

unread,
Sep 25, 2013, 6:39:05 PM9/25/13
to sailfis...@googlegroups.com
What Sailfish will do is assign k-mers within those reads to either the mouse or the human transcript in a way that is most consistent with the other data.  If your annotation file is correct, this should work very well. 

Muad Abd El Hay

unread,
Nov 28, 2017, 4:45:44 AM11/28/17
to Sailfish Users Group
I would like to piggyback on this so I don't open a new thread.

It seems like I have contamination of human sequences in addition to my mouse input.

I would like to run Salmon on both mouse and human transcriptomes and fish out only the features that map to the mouse transcriptome (or ideally split them into two results).

How do I best do this with Salmon?

Muad Abd El Hay

unread,
Nov 29, 2017, 9:30:07 AM11/29/17
to Sailfish Users Group
I tried to simply concatenate the two fasta files for human and mouse that I got from here:

Mouse
ftp://ftp.ensembl.org/pub/release-90/fasta/mus_musculus/cdna/Mus_musculus.GRCm38.cdna.all.fa.gz
Human
ftp://ftp.ensembl.org/pub/release-90/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz

I extracted both to get the .fa files and then ran:

cat mmusculus.fa hsapiens.fa > combined.fa

Then I ran salmon index as follows:

salmon index -t combined.fa -i combined_index

When I then try to run salmon quant as follows:

#!/bin/bash
for f1 in *_1_sequence.txt.gz
do
    f2=${f1%%_1_sequence.txt.gz}"_2_sequence.txt.gz"
    salmon quant -i ../cdna/combined_index -l A --gcBias \
         -1 $f1 \
         -2 $f2 \
         -p 36 -o quants_combined/${f1}_quant
done

I get an error telling me that versionInfo.json is missing:

[Error: The index version file ../cdna/combined_index/versionInfo.json doesn't seem to exist.  Please try re-building the salmon index.]

And indeed, the file versionInfo.json is missing in the combined_index folder while it is present in the indexes created for the mouse and human transcriptomes separately.

How can I create a combined index for both human and mouse that salmon quant will accept/will be compatible?
Reply all
Reply to author
Forward
0 new messages