Where is the link to Salmon?

656 views
Skip to first unread message

ying chen

unread,
Oct 7, 2014, 2:27:46 PM10/7/14
to sailfis...@googlegroups.com
Hi guys,

Can someone point me to the location of Salmon? I saw it somewhere on sailfish github page before, but I could not find it anymore.

Thanks a lot for the help!

Ying

ying chen

unread,
Oct 7, 2014, 4:20:17 PM10/7/14
to sailfis...@googlegroups.com
Sorry, I just found out that it's part of the sailfish-develop :(

Ying

Rob Patro

unread,
Oct 7, 2014, 4:23:37 PM10/7/14
to ying chen, sailfis...@googlegroups.com
Hi Ying,

  It’s true that the development of Salmon is happening on the development branch of Sailfish.  However, the first beta of Salmon is also available here (https://github.com/kingsfordgroup/sailfish/releases/tag/salmon-v0.1.0).  This contains both source code as well as a pre-compiled binary for 64-bit linux.  Though we’re still actively developing Salmon, it’s already giving us promising results and we’d be very grateful to get any feedback you or other users have if you’re willing to give it a try!

Thanks,
Rob

-- 
Rob Patro
Sent with Airmail
--
Sailfish is available at http://www.cs.cmu.edu/~ckingsf/software/sailfish/
Citation:
Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms
Rob Patro, Stephen M. Mount, and Carl Kingsford
manuscript submitted (2013)
http://arxiv.org/pdf/1308.3700.pdf
---
You received this message because you are subscribed to the Google Groups "Sailfish Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sailfish-user...@googlegroups.com.
To post to this group, send email to sailfis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sailfish-users/d0b3edba-1019-4fc7-92f9-74df912b6de1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ying chen

unread,
Oct 8, 2014, 4:18:09 PM10/8/14
to sailfis...@googlegroups.com, njs...@gmail.com
Hi Rob,

I downloaded salmon-v0.1.0.zip and tired to build it. It went well after I modded the libgff part in CMakeLists.txt. But it seems that the built process does not build salmon function. There is only sailfish in /bin, no salmon. Also there is no mention of salmon build in CMakeLists.txt.

Then I downloaded the binary SalmonBeta, but it failed to run:

$ salmon -h
salmon: /lib64/libm.so.6: version `GLIBC_2.15' not found (required by salmon)
salmon: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by salmon)
salmon: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /myhps/users/c135523/local/bin/SalmonBeta/lib/libtbb.so.2)
salmon: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /myhps/users/c135523/local/bin/SalmonBeta/lib/libtbbmalloc.so.2)

I tried the trick you suggetsed for genesum:

$ LD_LIBRARY_PATH=/myhps/apps/gcc/gcc-4.9.1/lib64:$LD_LIBRARY_PATH  salmon -h

But I got the same error. 

Any suggestion?

Thanks a lot,

Ying

Rob Patro

unread,
Oct 8, 2014, 4:21:26 PM10/8/14
to ying chen, sailfis...@googlegroups.com, njs...@gmail.com
Hi Ying,

  The issue with the binary is that your GLIBC is not new enough — I’m trying to build a binary with a newer glibc currently.  However, the version you built from source should work.  The issue you’re seeing is that
salmon is currently built, but not installed.  So, you can try just copying it over from build/src/salmon to bin (or you can just run it from build/src/salmon).  Please let me know if you can run the executable you built!

Thanks,
Rob

-- 
Rob Patro
Sent with Airmail

ying chen

unread,
Oct 9, 2014, 12:48:45 PM10/9/14
to sailfis...@googlegroups.com, njs...@gmail.com
Hi Rob,

Thanks a lot for the help! Once I copied salmon from build/src to /bin, it works. But when I tried a test run, I got a Segmentation fault (core dumped).

$ salmon quant -t Homo_sapiens.GRCh37.75.cdna.all.fa -l "IU" -p 6 -g Homo_sapiens.GRCh37.75.gtf -a blca_17_C1EPNACXX121228_1.bam blca_17_C1EPNACXX121228_2.bam blca_17_C1EPNACXX121228_3.bam blca_17_C1EPNACXX121228_4.bam -o salmon_quant

At first I got a lot warnings about that there is no ensembl id in my bam files such as:

WARNING: Transcript ENST00000390522 appears in the reference but did not appear in the BAM

Then,

Resetting BAMQueue from file blca_17_C1EPNResetting BAMQueue from file [blca_17_C1EPNACXX121228_1.bam] . . .doneequired = 50000000
Started parsingds. . . doneequired = 50000000
processed 0 reads. . . doneequired = 50000000
killing thread 4 . . . doneequired = 50000000
writing output 174802 / # required = 50000000

Freeing memory used by read queue . . . done
Post-hoc bias correction is not yet supported in salmon; disabling
Computing gene-level abundance estimates
There were 215170 transcripts mapping to 63677 genes
Segmentation fault (core dumped)

When I looked at quant.sf, and actually there are some data there ( total 212 lines only):

# [ targets ] => { /Homo_sapiens.GRCh37.75.cdna.all.fa }
# [ libtype ] => { IU }






# [ threads ] => { 6 }






# [ gene_map ] => { /Homo_sapiens.GRCh37.75.gtf }
# [ alignments ] => { blca_17_C1EPNACXX121228_1.bam }
# [  ] => { blca_17_C1EPNACXX121228_2.bam }
# [  ] => { blca_17_C1EPNACXX121228_3.bam }
# [  ] => { blca_17_C1EPNACXX121228_4.bam }
# [ output ] => { salmon_quant }
# Name
Length
TPM
FPKM
NumReads
chrM
16569
996787
43361.9
371637
chr1
2.49E+08
3.25766
0.141714
18271
chr2
2.43E+08
1.72427
0.075009
9436
chr3
1.98E+08
1.68676
0.073377
7516
chr4
1.91E+08
1.24635
0.054219
5361
chr5
1.81E+08
1.68732
0.073401
6869
chr6
1.71E+08
1.58684
0.06903
6110
chr7
1.59E+08
1.79562
0.078113
6430
chr8
1.46E+08
1.45591
0.063335
4795
chr9
1.41E+08
1.5666
0.06815
4978
chr10
1.36E+08
1.60437
0.069793
4893
chr11
1.35E+08
2.1258
0.092476
6458
chr12
1.34E+08
2.34202
0.101882
7054
chr13
1.15E+08
0.85856
0.037349
2225
chr14
1.07E+08
1.81696
0.079041
4389
chr15
1.03E+08
1.78792
0.077777
4125
chr16
90354753
2.80254
0.121915
5698
chr17
81195210
3.09679
0.134716
5658
chr18
78077248
1.16228
0.050561
2042
chr19
59128983
3.96988
0.172697
5282
chr20
63025520
2.25004
0.097881
3191
chr21
48129895
0.990751
0.043099
1073
chr22
51304566
2.24089
0.097483
2587
chrX
1.55E+08
1.12797
0.049069
3941
chrY
59373566
0.303139
0.013187
405
NT_167249.1
4928567
4.46338
0.194165
495
NT_167250.1
590426
1.88172
0.081858
25
NT_167251.1
1680828
1.6657
0.072461
63
NT_167244.1
4622290
4.57646
0.199084
476
NT_167245.1
4610396
7.1523
0.311137
742
NT_167246.1
4683263
4.50739
0.196079
475
NT_167247.1
4833398
4.75355
0.206787
517
NT_167248.1
4611984
7.71835
0.335761
801

 .....

Any suggestion?

Thanks,

Ying

Rob Patro

unread,
Oct 13, 2014, 4:44:22 PM10/13/14
to ying chen, sailfis...@googlegroups.com, njs...@gmail.com
Hi Ying,

  What is the contents of the BAM file?  The version of Salmon that uses pre-computed alignments (like eXpress and RSEM) assume that you’ve aligned the reads directly to the transcripts, not the genome.  From the 
test output that you’ve provided it seems like some of the things being quantified are chromosomes.  Are these what appear in the bam files?  The WARNINGS are occurring because we expect that the set of transcripts
provided in the fasta file is the same as the set of transcripts against which you aligned — so presumably, since the alignment is against the genome, it can’t find many/all of the transcripts in the header of the BAM file.

  Since aligning against a genome is such prevalent mode of operation in RNA-seq (at least it’s prevalent against alignments that have already been computed), we’re actually trying to work toward a tool that will allow 
“un-projecting” genomic alignments onto a set of target transcripts.  However, this is not yet ready.  In the mean time, there are a few things you might try.  (1) If you have the raw reads available, try aligning them 
directly against your transcript file (Homo_sapiens.GRCh37.75.cdna.all.fa).  (2) If you have the raw reads available, try using the mode of Salmon that doesn’t require pre-computed alignments.  Salmon is capable
of using it’s own lightweight alignment method to quantify abundances directly from a salmon index (created with the `./salmon index` command) and a set of fasta files. (3) If you don’t have the raw reads available,
you can recover them from the BAM file using e.g. a BAM => FAST{A/Q} tool and then try (1) or (2).

Thanks,
Rob

P.S.  Currently, the alignment-based mode of Salmon assumes that all of the alignments for a sample are present in a single .bam/.sam file.  As such you’ll want to merge multiple files that actually correspond 
to the same condition.  However, this limitation isn’t necessary and is temporary and soon you’ll be able to provide multiple separate .bam files to be considered together to quantify a single sample. 

-- 
Rob Patro
Sent with Airmail

ying chen

unread,
Oct 15, 2014, 11:05:50 AM10/15/14
to sailfis...@googlegroups.com, njs...@gmail.com
Hi Rob,

Thanks a lot for the detailed explanation. Unfortunately my bam files were all aligned to the genome :(.

Ying
...

Rob Patro

unread,
Oct 15, 2014, 11:15:48 AM10/15/14
to ying chen, sailfis...@googlegroups.com, njs...@gmail.com
Hi Ying,

  I figured as much ;P.  As I said, this is such a common mode of operation (aligning to the genome), that a huge portion of existing pre-computed alignments are of this form.  Unfortunately, unlike Cufflinks, many of the newer tools (RSEM, eXpress, TIGAR, Sailfish, etc. and now Salmon) deal directly with the transcripts.  This has a number of benefits, like usually requiring smaller working memory, having smaller intermediate files, and significantly, allowing the tools to work immediately on de novo assembled transcripts.  However, it also limits their immediate application to a large range of existing processed data (if one is willing to re-process, dealing with the raw data is less of a problem).

  One solution to this, is to provide a tool for efficiently converting genomic to transcriptomic alignments.  There is not too much conceptual difficulty in this problem, but such a tool does not (to the best of my knowledge) currently exist.  However, there are a few promising developments, and I’m hopeful that such a thing will be available relatively soon, and thus open up a vast array of existing pre-processed data to re-analysis with these new tools.  I’ll be sure to keep you (and others) posted on any developments.  For the time being, though, your best bet to analyze such data is along the lines of what I suggested below (BAM => FASTA, and then using that fasta either directly with read-based salmon, or realigning to the transcriptome and using it with alignment-based salmon).

Best,
Rob

--

Sailfish is available at http://www.cs.cmu.edu/~ckingsf/software/sailfish/
Citation:
Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms
Rob Patro, Stephen M. Mount, and Carl Kingsford
manuscript submitted (2013)
http://arxiv.org/pdf/1308.3700.pdf
---
You received this message because you are subscribed to the Google Groups "Sailfish Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sailfish-user...@googlegroups.com.
To post to this group, send email to sailfis...@googlegroups.com.

Rob

unread,
Oct 15, 2014, 11:52:40 PM10/15/14
to sailfis...@googlegroups.com, njs...@gmail.com
Hi Ying,

  Just a quick update.  If you build Salmon off of the develop branch of sailfish, it now has the ability to read from multiple separate bam/sam files to perform quantification (currently, these files must have the same header --- i.e. they must have been mapped to the same reference).  These bam/sam files are still expected to be w.r.t the transcripts, but the unnecessary restriction that all the alignments come from a single bam file has been removed.

--Rob

ying chen

unread,
Oct 16, 2014, 5:00:06 PM10/16/14
to sailfis...@googlegroups.com, njs...@gmail.com
Hi Rob,

Thanks a lot for the update. I just downloaded and compiled the sailfish-develop, and tried to run a simple test. But I got the error message:

$ sailfish quant -i SailfishIndexGRCh38 --libtype "T=PE:O=><:S=U"  --mates1 <(gunzip -c /248T_P_51_1.fq.gz-batches/FCD21A3ACXX_1.fq.gz) --mates2 <(gunzip -c /248T_P_51_2.fq.gz-batches/FCD21A3ACXX_1.fq.gz) --out /248T_P_51 --threads 8
Version Info: Could not resolve upgrade information in the alotted time.
Check for upgrades manually at www.cs.cmu.edu/~ckingsf/sailfish.
============
Exception : [unknown library format string : T=PE:O=><:S=U]
============
sailfish quant was invoked improperly.
For usage information, try sailfish quant --help
Exiting.

The same command works with the sailfish-develop I compiled a week ago.

Any suggestion?

Thanks a lot,

Ying

Rob Patro

unread,
Oct 17, 2014, 6:38:52 PM10/17/14
to ying chen, sailfis...@googlegroups.com, njs...@gmail.com
Hi Ying,

  First, I should say that most of the changes that are happening on the develop branch right now are related to Salmon.  However, I did make a change to Sailfish with a recent commit.  In particular, we’re unifying the way that the library format is specified between Sailfish and Salmon.  The Sailfish library format string has a few shortcomings — it’s a little bit complicated, and it also contains “special” characters that have to be escaped in certain situations.  We’ve re-designed how the library format is specified in Salmon, and I’ve back-ported this specification to Sailfish.  Actually, the format specification is a bit more important in Salmon than in Sailfish, as Salmon (both the alignment-free and alignment-based modes) makes better use of paired-end information. The new format strings are described here (http://sailfish.readthedocs.org/en/develop/library_type.html).  Basically, if your old format was "T=PE:O=><:S=U”, your new format string is IU.  This just means that the reads are oriented inward (I) and the library is unstranded (U).

Best,
Rob
--

Sailfish is available at http://www.cs.cmu.edu/~ckingsf/software/sailfish/
Citation:
Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms
Rob Patro, Stephen M. Mount, and Carl Kingsford
manuscript submitted (2013)
http://arxiv.org/pdf/1308.3700.pdf
---
You received this message because you are subscribed to the Google Groups "Sailfish Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sailfish-user...@googlegroups.com.
To post to this group, send email to sailfis...@googlegroups.com.

ying chen

unread,
Oct 21, 2014, 10:12:41 AM10/21/14
to sailfis...@googlegroups.com, njs...@gmail.com
Hi Rob,

I just downloaded and compiled the sailfish-develop. The installation went well, but I got segmentation error when trying to run salmon.


$ salmon -h
Segmentation fault (core dumped)
$ salmon quant -h
Segmentation fault (core dumped)

Thanks,

Ying

Diego Mauricio Riaño Pachón

unread,
Jan 15, 2015, 1:26:46 PM1/15/15
to sailfis...@googlegroups.com, njs...@gmail.com
Hi,
I am trying to make salmon run on my system without any success.

I downloaded the available binary SalmonBeta-v0.2.2_ubuntu-14.04, but I got a segfault just trying to run it.

Then, I tried to compile it from source (v.0.2.2) using:

mkdir build
cd build
cmake -DFETCH_BOOST=TRUE ../
make

but after a while I am getting this error:

make[3]: Leaving directory `/bioinf/progs/src/sailfish-0.2.2/external/Shark'
cd /bioinf/progs/src/sailfish-0.2.2/external/Shark && /usr/bin/cmake -E touch /bioinf/progs/src/sailfish-0.2.2/build/libshark-prefix/src/libshark-stamp/libshark-install
/usr/bin/cmake -E cmake_progress_report /bioinf/progs/src/sailfish-0.2.2/build/CMakeFiles 41
[ 83%] Completed 'libshark'
/usr/bin/cmake -E make_directory /bioinf/progs/src/sailfish-0.2.2/build/CMakeFiles
/usr/bin/cmake -E touch /bioinf/progs/src/sailfish-0.2.2/build/CMakeFiles/libshark-complete
/usr/bin/cmake -E touch /bioinf/progs/src/sailfish-0.2.2/build/libshark-prefix/src/libshark-stamp/libshark-done
make[2]: Leaving directory `/bioinf/progs/src/sailfish-0.2.2/build'
/usr/bin/cmake -E cmake_progress_report /bioinf/progs/src/sailfish-0.2.2/build/CMakeFiles  41 42 43 44 45 46 47 48
[ 83%] Built target libshark
make[1]: Leaving directory `/bioinf/progs/src/sailfish-0.2.2/build'
make: *** [all] Error 2

This is happening on a (lsb_release -a):

LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.6 (Final)
Release: 6.6
Codename: Final

Compiling with:

cmake 2.8.12.2
g++ 4.7.2 

i will really appreciate your help on getting salmon to run.

Best,
Diego

Rob

unread,
Jan 15, 2015, 1:42:08 PM1/15/15
to sailfis...@googlegroups.com, njs...@gmail.com
Hi Diego,

  I'd be happy to help you compile salmon.  However, the first thing I'd suggest you try is to grab the binary release from the latest successful build on Travis-CI.  That can be obtained at (https://github.com/kingsfordgroup/sailfish/releases/download/TravisCI/SalmonBeta-latest_ubuntu-12.04.tar.gz).  This binary doesn't depend on as new a version of libc as the one you tried before, and
so it may work out of the box.  If not, could you please post back the actual output you get when you run it?

  If that doesn't work, to figure out what's going wrong with the compilation, we'll have to take a look at the CMake output logs.

Best,
Rob

Diego Mauricio Riaño Pachón

unread,
Jan 16, 2015, 8:06:58 AM1/16/15
to sailfis...@googlegroups.com, njs...@gmail.com
Dear Rob,

thanks a lot for your reply.

I tried the binary you suggested. I got the following error:

./salmon 
./salmon: /lib64/libc.so.6: version `GLIBC_2.15' not found (required by ./salmon)
./salmon: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./salmon)
./salmon: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/gabriela/SalmonBeta-latest_ubuntu-12.04/bin/../lib/libpthread.so.0)
./salmon: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/gabriela/SalmonBeta-latest_ubuntu-12.04/bin/../lib/libtbb.so.2)
./salmon: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/gabriela/SalmonBeta-latest_ubuntu-12.04/bin/../lib/libtbbmalloc.so.2)
./salmon: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/gabriela/SalmonBeta-latest_ubuntu-12.04/bin/../lib/librt.so.1)
./salmon: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/gabriela/SalmonBeta-latest_ubuntu-12.04/bin/../lib/libstdc++.so.6)
./salmon: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/gabriela/SalmonBeta-latest_ubuntu-12.04/bin/../lib/libgcc_s.so.1)

Cheers,
Diego

Shawn Rupp

unread,
May 7, 2015, 1:08:49 PM5/7/15
to sailfis...@googlegroups.com, njs...@gmail.com
I have a couple of of questions.

First, I'm running into an error when I run genesum:

./genesum -g /home/yitzhak/Documents/Squamates/Genome/ASU_Acar_v2.2.1.gtf -e /home/yitzhak/Documents/RNA_Seq/Sailfish/All_Tissues/all_female_tissues.txt/quant_bias_corrected.sf -o /home/yitzhak/Documents/RNA_Seq/Sailfish/All_Tissues/all_female_tissues.txt/quant_bc_gene.txt

Aggregating estimates using key [gene_name]
Parsing GTF/GFF [/home/yitzhak/Documents/Squamates/Genome/ASU_Acar_v2.2.1.gtf] . . .terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_S_construct null not valid
Aborted (core dumped)

I haven't found any solutions yet, so maybe you'll be able to help.

I also have a few questions about Salmon. Most importantly, can it aggregate output by gene, instead of just transcripts? Also, are you still developing sailfish, or have you been putting all your time into salmon? (I only ask to get an idea of what to expect for each program.)

On that note, I ran into an error with salmon where it said a FASTQ file was missing a header, but the same files worked fine with sailish.

./salmon quant -i /home/yitzhak/Documents/RNA_Seq/Salmon/Index -l U -r /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S1.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S2.fastq (..) -o /home/yitzhak/Documents/RNA_Seq/Salmon/all_male _tissues.txt
Version Info: This is the most recent version

# salmon (smem-based) v0.2.2
# [ program ] => salmon 
# [ command ] => quant 
# [ index ] => { /home/yitzhak/Documents/RNA_Seq/Salmon/Index }
# [ libtype ] => { U }
# [ unmated_reads ] => { /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S1.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S2.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S3.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S4.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S5.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D026S1.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D026S2.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D026S3.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D026S4.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D026S5.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/b48adrenal716.2.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/b48adrenal716.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/liverSRR391651.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/liverSRR391653.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/liverSRR391656.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/lungSRR391654.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/lungSRR391655.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/lungSRR391657.fastq }
# [ output ] => { /home/yitzhak/Documents/RNA_Seq/Salmon/all_male }
# [  ] => { _tissues.txt }
Logs will be written to /home/yitzhak/Documents/RNA_Seq/Salmon/all_male/logs
there is 1 lib
[2015-05-07 09:51:45.127] [jointLog] [info] parsing read library format
[M::bwa_idx_load_from_disk] read 0 ALT contigs
processed 78800000 fragmentss
hits per frag:  1.40054terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid fastq file: header missing
Aborted (core dumped)

Any ideas how to fix this this? I'm using an Ubuntu machine with 32 Gb of RAM, if that helps.

Thanks for any help!

Rob

unread,
May 7, 2015, 1:25:45 PM5/7/15
to sailfis...@googlegroups.com, smru...@gmail.com, njs...@gmail.com
On Thursday, May 7, 2015 at 1:08:49 PM UTC-4, Shawn Rupp wrote:
I have a couple of of questions.

First, I'm running into an error when I run genesum:

./genesum -g /home/yitzhak/Documents/Squamates/Genome/ASU_Acar_v2.2.1.gtf -e /home/yitzhak/Documents/RNA_Seq/Sailfish/All_Tissues/all_female_tissues.txt/quant_bias_corrected.sf -o /home/yitzhak/Documents/RNA_Seq/Sailfish/All_Tissues/all_female_tissues.txt/quant_bc_gene.txt

Aggregating estimates using key [gene_name]
Parsing GTF/GFF [/home/yitzhak/Documents/Squamates/Genome/ASU_Acar_v2.2.1.gtf] . . .terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_S_construct null not valid
Aborted (core dumped)

I haven't found any solutions yet, so maybe you'll be able to help.

Hrmm ... it seems that it's having an issue parsing the gtf file.  The error is generic enough that I probably can't tell you why it's happening unless I have a look at the gtf file itself.
 

I also have a few questions about Salmon. Most importantly, can it aggregate output by gene, instead of just transcripts? Also, are you still developing sailfish, or have you been putting all your time into salmon? (I only ask to get an idea of what to expect for each program.)


Yes, salmon can aggregate output by gene instead of just transcripts.  With salmon, this functionality is integrated into the program (rather than provided by a separate tool like genesum).  Specifically, salmon has the ability to take a gtf file, or a gene <-> transcript mapping in a simple text format, and, at the end of the run, it will aggregate transcript-level expressions to gene-level expressions.

Though there will be some maintenance releases of sailfish (and potentially, some new features if they make sense), our development is now strongly focused on salmon.  In particular, we believe that salmon is, in many ways, pareto-optimal with respect to sailfish (i.e. it is both faster and more accurate).  Though there are still some cases where sailfish may be more appropriate (e.g. quantifying tiny RNAs or collections of small junctions), salmon is generally intended as a replacement for sailfish.
 

On that note, I ran into an error with salmon where it said a FASTQ file was missing a header, but the same files worked fine with sailish.

./salmon quant -i /home/yitzhak/Documents/RNA_Seq/Salmon/Index -l U -r /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S1.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S2.fastq (..) -o /home/yitzhak/Documents/RNA_Seq/Salmon/all_male _tissues.txt
Version Info: This is the most recent version

# salmon (smem-based) v0.2.2
# [ program ] => salmon 
# [ command ] => quant 
# [ index ] => { /home/yitzhak/Documents/RNA_Seq/Salmon/Index }
# [ libtype ] => { U }
# [ unmated_reads ] => { /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S1.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S2.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S3.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S4.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D015S5.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D026S1.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D026S2.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D026S3.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D026S4.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/Tail/D026S5.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/b48adrenal716.2.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/b48adrenal716.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/liverSRR391651.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/liverSRR391653.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/liverSRR391656.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/lungSRR391654.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/lungSRR391655.fastq /home/yitzhak/Documents/Squamates/FASTQ_files/lungSRR391657.fastq }
# [ output ] => { /home/yitzhak/Documents/RNA_Seq/Salmon/all_male }
# [  ] => { _tissues.txt }
Logs will be written to /home/yitzhak/Documents/RNA_Seq/Salmon/all_male/logs
there is 1 lib
[2015-05-07 09:51:45.127] [jointLog] [info] parsing read library format
[M::bwa_idx_load_from_disk] read 0 ALT contigs
processed 78800000 fragmentss
hits per frag:  1.40054terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid fastq file: header missing
Aborted (core dumped)

Any ideas how to fix this this? I'm using an Ubuntu machine with 32 Gb of RAM, if that helps.

Well, it looks like it's not happy with one of the fastq files --- complaining that the read is missing a header.  Unfortunately, there seem to be quite a few fastq files so it's not clear which the error is coming from.  I would suggest the following things.  First, try and run salmon with just a single fastq file, try a few of them and see which, if any, complete without error.  Second, comb through the fastq files to see if there are, indeed, any strangely formatted reads (i.e. reads without a header line --- there may be a tool for verifying fastq format files, but I'm unsure).  Third (and I would do this regardless), try and upgrade to the latest version of salmon to see if that has any effect.  There have been may accuracy & performance enhancements and bug fixes since v0.2.2.
 

Thanks for any help!

Sure.  Let me know if I can help with any of this.  For example, if you can track down the fastq file that's causing an error (and can't find an obvious deficit with it) I'd be happy to take a look. 

Best,
Rob
Reply all
Reply to author
Forward
0 new messages