Hi Rob,
Lately I was trying to use Salmon (v0.8.0) along with tximport to study a downloaded single-cell data on gene-level. And I came across something werid that I found almost 20k genes on average per cell, which is way higher than expected. Realistically, the scRNA-seq protocol I followed will only have a gene number detection of roughly 10k. I re-do the analysis with STAR + featurecounts and I observed a per-cell gene number of 6k on average. Just wonder which part of my code goes wrong. Attached please find my code for Salmon and subsequent R script for tximport. Any advice or suggestion will be much appreciated! I do love Salmon for its speed and convenience.
Salmon index
#!/bin/bash
salmon index -t /home1/garyhe/workingdir/ref/gencode/gencode.v25.transcripts.ercc.fa -i /home1/garyhe/workingdir/ref/index/salmonindex/v25/19mer --gencode --type quasi -k 19
Salmon quant
#!/bin/bash
cd /home1/garyhe/workingdir/data/bjorklund2016ni/00_raw
for i in $(ls *.gz | cut -c 1-10 | uniq)
do
salmon quant -i /home1/garyhe/workingdir/ref/index/salmonindex/v25/19mer \
-l U \
-r ${i}_1.fastq.gz \
--writeMappings=/home1/garyhe/workingdir/data/bjorklund2016ni/01_aligned/${i}.sam \
-o /home1/garyhe/workingdir/data/bjorklund2016ni/02_quant/${i}
Done
R script for tximport
#condense the ensemble transcript ID counts to gene ID counts
library(GenomicFeatures)
txdb <- makeTxDbFromGFF("./Sequencing_run/gencode.vM12.primary_assembly.annotation.ercc.gtf")
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")
tx2gene <- df[, 2:1]
head(tx2gene)
library(tximport)
library(readr)
dir <- "./Sequencing_run/salmon_output/sc/"
list.files(dir)
samples <- read.table("./Sequencing_run/salmon_output/scsampleinfo.txt", header=TRUE)
samples
files <- file.path(dir, samples$Sample_ID, "quant.sf")
names(files) <- paste0(samples$Sample_ID)
names(files) <- gsub("[:_:].*$", "", names(files))
#gene-level
txi.salmon.g <- tximport(files, type = "salmon", tx2gene = tx2gene, reader = read_tsv)
names(txi.salmon.g)
head(txi.salmon.g$counts)
gcounts.txt <- txi.salmon.g$counts
write.csv(gcounts.txt, "./Sequencing_run/salmon_output/sc/gcounts.csv")
Best!