ISSUE: number of genes detected in a single cell

33 views
Skip to first unread message

Gary Ho

unread,
Feb 15, 2017, 12:54:03 AM2/15/17
to Sailfish Users Group

Hi Rob,

Lately I was trying to use Salmon (v0.8.0) along with tximport to study a downloaded single-cell data on gene-level. And I came across something werid that I found almost 20k genes on average per cell, which is way higher than expected. Realistically, the scRNA-seq protocol I followed will only have a gene number detection of roughly 10k. I re-do the analysis with STAR + featurecounts and I observed a per-cell gene number of 6k on average. Just wonder which part of my code goes wrong. Attached please find my code for Salmon and subsequent R script for tximport. Any advice or suggestion will be much appreciated! I do love Salmon for its speed and convenience.

 

Salmon index

 

#!/bin/bash

 

salmon index -t /home1/garyhe/workingdir/ref/gencode/gencode.v25.transcripts.ercc.fa -i /home1/garyhe/workingdir/ref/index/salmonindex/v25/19mer --gencode --type quasi -k 19

 

Salmon quant

#!/bin/bash

 

cd /home1/garyhe/workingdir/data/bjorklund2016ni/00_raw

 

for i in $(ls *.gz | cut -c 1-10 | uniq)

 

do

 

salmon quant -i  /home1/garyhe/workingdir/ref/index/salmonindex/v25/19mer \

-l U \

-r ${i}_1.fastq.gz \

--writeMappings=/home1/garyhe/workingdir/data/bjorklund2016ni/01_aligned/${i}.sam \

-o /home1/garyhe/workingdir/data/bjorklund2016ni/02_quant/${i}

 

Done

 

R script for tximport

#condense the ensemble transcript ID counts to gene ID counts

library(GenomicFeatures)

txdb <- makeTxDbFromGFF("./Sequencing_run/gencode.vM12.primary_assembly.annotation.ercc.gtf")

k <- keys(txdb, keytype = "GENEID")

df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")

tx2gene <- df[, 2:1]

head(tx2gene)

 

 

library(tximport)

library(readr)

dir <- "./Sequencing_run/salmon_output/sc/"

list.files(dir)

samples <- read.table("./Sequencing_run/salmon_output/scsampleinfo.txt", header=TRUE)

samples

files <- file.path(dir, samples$Sample_ID, "quant.sf")

names(files) <- paste0(samples$Sample_ID)

names(files) <- gsub("[:_:].*$", "", names(files))

 

#gene-level

txi.salmon.g <- tximport(files, type = "salmon", tx2gene = tx2gene, reader = read_tsv)

names(txi.salmon.g)

head(txi.salmon.g$counts)

gcounts.txt <- txi.salmon.g$counts

write.csv(gcounts.txt, "./Sequencing_run/salmon_output/sc/gcounts.csv")

 

Best!

Rob

unread,
Feb 21, 2017, 2:19:54 PM2/21/17
to Sailfish Users Group
Hi Gary,

  Thanks for all of the info here.  To keep things focused, I'm responding to everything over on GitHub (where Valentine already posted a few things).

--Rob
Reply all
Reply to author
Forward
0 new messages