Hi Rob,
I managed to get a first complete run through of salmon 0.2.7. I've attached a powerpoint with some of the slides in case you want to look at them but I'll summarize in text my opinions
1) My previous issue with sailfish using paired-end libaries was clearly related to the fasta file we used, using a cDNA fasta from ensembl fixed the issue
2) In general my gut tells me salmon is an improvement over sailfish. Principally because there are a number of genes that were TPM>1 in sailfish that go to TPM=0 in salmon. I suspect this is expect due to the better use of paired-end fragments in salmon
3) Running salmon 0.2.7 read based or alignment based I'm seeing a weirdly near perfect TPM correlation for stranded versus unstranded using both read and alignment based quantitation. This seems fishy as I'd expect things to be similar but the added value of standed should change some expression measures more that we are seeing in my opinion.
4) The gene level summary is working differently for read and alignment based quantifications. If you do read based the number of genes is based on the fasta file used to create the index NOT the GTF supplied for the summary function, which seems correct. While the alignment based quant outputs genes based on the GTF provided to STAR during index build OR the same GTF provided to salmon, NOT the transcriptome fasta provided to Salmon during quantitation, which was my assumption since you are providing a fasta to salmon that quantitation would be limited to it. It looks like this is also the case for transcript level summary.
Last:
Is there any chance you can post a CentOS binary for v3.0 like you have for previous releases (
https://github.com/kingsfordgroup/sailfish/releases). Our IT team is having little success building from source on our CentOS 6.5 cluster and I'd like to test v0.3.0 as you suggested earlier.
Jonathan
#!/bin/bash
#PBS -l nodes=1:ppn=12
#PBS -N Salmon_Dev
#PBS -l walltime=48:00:00
#PBS -m abe
#PBS -M jke...@tgen.org
#PBS -j oe
#PBS -q overflow
cd $PBS_O_WORKDIR
#Load sailfish module, can't seem to figure out how to build independent of IT
module load sailfish/0.6.3
#####################################################
##
## VARIABLES
##
#####################################################
SALMON_BINARY=/home/jkeats/downloads/Salmon-v0.2.7_CentOS-6.3/bin/salmon
SALMON_INDEX=/scratch/jkeats/salmon/salmon_index_E74hs37d5
SAILFISH_INDEX_DIR=/scratch/jkeats/salmon/sailfish_index
STAR_BINARY=/home/jkeats/downloads/STAR-STAR_2.4.0i/bin/Linux_x86_64/STAR
STAR_INDEX=/scratch/jkeats/STAR_NEW/124bpOverHang/
#TRANSCRIPTOME_FASTA=/home/tgenref/pecan/sailfish_index/Homo_sapiens.GRCh37.74.gtf.hs37d5.EGFRvIII.fa
TRANSCRIPTOME_FASTA=/scratch/jkeats/salmon/Homo_sapiens.GRCh37.74.cdna.hs37d5.fasta
#GTF=/home/tgenref/pecan/ensembl_v74/Homo_sapiens.GRCh37.74_hs37d5_EGFRvIII_Sailfish.gtf
GTF=/home/tgenref/pecan/ensembl_v74/Homo_sapiens.GRCh37.74.gtf.hs37d5.EGFRvIII.gtf
FORWARD=/scratch/jkeats/salmon/RPMI8226_RETparental_p0_CL_Whole_T2_KBMRS_K12719.proj.R1.fastq.gz
REVERSE=/scratch/jkeats/salmon/RPMI8226_RETparental_p0_CL_Whole_T2_KBMRS_K12719.proj.R2.fastq.gz
#####################################################
##
## PARAMETERIZED CODE
##
#####################################################
# Create the Salmon Index
${SALMON_BINARY} index -t ${TRANSCRIPTOME_FASTA} --index salmon_index_E74hs37d5 -p 12
${SALMON_BINARY} quant \
--index ${SALMON_INDEX} \
--libType IU \
-1 <(zcat ${FORWARD}) \
-2 <(zcat ${REVERSE}) \
--threads 12 \
--geneMap ${GTF} \
--output salmon_unstranded
${SALMON_BINARY} quant \
--index ${SALMON_INDEX} \
--libType ISR \
-1 <(zcat ${FORWARD}) \
-2 <(zcat ${REVERSE}) \
--threads 12 \
--geneMap ${GTF} \
--output salmon_stranded
${STAR_BINARY} --genomeDir ${STAR_INDEX} \
--readFilesIn ${FORWARD} ${REVERSE} \
--outSAMattributes NH HI \
--outFilterMultimapNmax 20 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverLmax 0.04 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--quantMode TranscriptomeSAM \
--runThreadN 12 \
--readFilesCommand zcat
${SALMON_BINARY} quant \
--targets ${TRANSCRIPTOME_FASTA} \
--libType IU \
--alignments Aligned.toTranscriptome.out.bam \
--threads 12 \
--geneMap ${GTF} \
--output salmon_alignment_unstranded
${SALMON_BINARY} quant \
--targets ${TRANSCRIPTOME_FASTA} \
--libType ISR \
--alignments Aligned.toTranscriptome.out.bam \
--threads 12 \
--geneMap ${GTF} \
--output salmon_alignment_stranded
sailfish index -t ${TRANSCRIPTOME_FASTA} -o sailfish_index -k 20 -p 16
#The libtype will need to be dynamic for Single End runs, or stranded runs. Currently assumes Illumina PE, unstranded library
sailfish quant -i ${SAILFISH_INDEX_DIR} --libtype "T=PE:O=><:S=U" --mates1 <(zcat ${FORWARD}) --mates2 <(zcat ${REVERSE}) --out sailfish_unstranded --threads 12 --polya
cd sailfish_unstranded
#Rename and process output to generate gene level expression estimates
/home/tgenref/pecan/bin/Sailfish_0.6.3/TranscriptsToGenes.sh --exp-file quant_bias_corrected.sf --gtf-file ${GTF} --res-file sailfish_genes.expr
cd ..
#The libtype will need to be dynamic for Single End runs, or stranded runs. Currently assumes Illumina PE, unstranded library
sailfish quant -i ${SAILFISH_INDEX_DIR} --libtype "T=PE:O=><:S=AS" --mates1 <(zcat ${FORWARD}) --mates2 <(zcat ${REVERSE}) --out sailfish_stranded --threads 12 --polya
cd sailfish_stranded
#Rename and process output to generate gene level expression estimates
/home/tgenref/pecan/bin/Sailfish_0.6.3/TranscriptsToGenes.sh --exp-file quant_bias_corrected.sf --gtf-file ${GTF} --res-file sailfish_genes.expr
cd ..