ISSUE: number of genes detected in a single cell

33 views

Skip to first unread message

Gary Ho

unread,

Feb 15, 2017, 12:54:03 AM2/15/17

to Sailfish Users Group

Hi Rob,

Lately I was trying to use Salmon (v0.8.0) along with tximport to study a downloaded single-cell data on gene-level. And I came across something werid that I found almost 20k genes on average per cell, which is way higher than expected. Realistically, the scRNA-seq protocol I followed will only have a gene number detection of roughly 10k. I re-do the analysis with STAR + featurecounts and I observed a per-cell gene number of 6k on average. Just wonder which part of my code goes wrong. Attached please find my code for Salmon and subsequent R script for tximport. Any advice or suggestion will be much appreciated! I do love Salmon for its speed and convenience.

Salmon index

#!/bin/bash

salmon index -t /home1/garyhe/workingdir/ref/gencode/gencode.v25.transcripts.ercc.fa -i /home1/garyhe/workingdir/ref/index/salmonindex/v25/19mer --gencode --type quasi -k 19

Salmon quant

#!/bin/bash

cd /home1/garyhe/workingdir/data/bjorklund2016ni/00_raw

for i in $(ls *.gz | cut -c 1-10 | uniq)

salmon quant -i /home1/garyhe/workingdir/ref/index/salmonindex/v25/19mer \

-l U \

-r ${i}_1.fastq.gz \

--writeMappings=/home1/garyhe/workingdir/data/bjorklund2016ni/01_aligned/${i}.sam \

-o /home1/garyhe/workingdir/data/bjorklund2016ni/02_quant/${i}

Done

R script for tximport

#condense the ensemble transcript ID counts to gene ID counts

library(GenomicFeatures)

txdb <- makeTxDbFromGFF("./Sequencing_run/gencode.vM12.primary_assembly.annotation.ercc.gtf")

k <- keys(txdb, keytype = "GENEID")

df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")

tx2gene <- df[, 2:1]

head(tx2gene)

library(tximport)

library(readr)

dir <- "./Sequencing_run/salmon_output/sc/"

list.files(dir)

samples <- read.table("./Sequencing_run/salmon_output/scsampleinfo.txt", header=TRUE)

samples

files <- file.path(dir, samples$Sample_ID, "quant.sf")

names(files) <- paste0(samples$Sample_ID)

names(files) <- gsub("[:_:].*$", "", names(files))

#gene-level

txi.salmon.g <- tximport(files, type = "salmon", tx2gene = tx2gene, reader = read_tsv)

names(txi.salmon.g)

head(txi.salmon.g$counts)

gcounts.txt <- txi.salmon.g$counts

write.csv(gcounts.txt, "./Sequencing_run/salmon_output/sc/gcounts.csv")

Best!

Rob

unread,

Feb 21, 2017, 2:19:54 PM2/21/17

to Sailfish Users Group

Hi Gary,

Thanks for all of the info here. To keep things focused, I'm responding to everything over on GitHub (where Valentine already posted a few things).