'std::out_of_range' error

martis....@gmail.com

unread,

Oct 21, 2015, 3:49:26 AM10/21/15

to trinityrnaseq-users

Hello everyone,

I'm trying to run Trinity on a big data set, which contains 24 samples and over 1 billion PE reads. Since I have limited resources, I have split the job in 3 steps: trimming + normalization, inchworm, and chrysalis/butterfly. The trimming step worked quite well, but I got an error in the normalization step of the first sample data. The error seems to be in the Jellyfish part, but I don't really understand it.

I'm using Trinity v.2.1.0 with the following options:
Trinity --seqType fq --SS_lib_type FR --max_memory 500G --CPU 16 --min_kmer_cov 2 --inchworm_cpu 12 --bflyGCThreads 10 --bflyCPU 16 --trimmomatic --quality_trimming_params "ILLUMINACLIP:TruSeq3-SE_all_indexes.fa:2:30:10" --normalize_reads --normalize_max_read_cov 30 --normalize_by_read_set --no_run_inchworm --verbose --left $LEFT --right $RIGHT --output $OUT

$RIGHT and $LEFT list 24 samples separated by comma (no space between them). And this is the error message I got:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------
------------ In silico Read Normalization ---------------------
-- (Removing Excess Reads Beyond 30 Coverage --
-- /scratch/6235623/trinity_24samples_run/norm_for_read_set_1 --
---------------------------------------------------------------

Tuesday, October 20, 2015: 21:17:41 CMD: trinity/2.1.0/util/insilico_read_normalization.pl --seqType fq --JM 500G --max_cov 30 --CPU 16 --output /scratch/6235623/trinity_24samples_run/norm_for_read_set_1 --SS_lib_type FR --left 1_150528_AC714LANXX_P2024_2001_1.fastq.PwU.qtrim.fq --right 1_150528_AC714LANXX_P2024_2001_2.fastq.PwU.qtrim.fq --pairs_together --PARALLEL_STATS

Converting input files. (both directions in parallel)CMD: /pica/sw/apps/bioinfo/trinity/2.1.0/util/..//trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /scratch/6235623/trinity_24samples_run/1_150528_AC714LANXX_P2024_2001_1.fastq.PwU.qtrim.fq >> left.fa
CMD: /pica/sw/apps/bioinfo/trinity/2.1.0/milou/util/..//trinity-plugins/fastool/fastool --rev --illumina-trinity --to-fasta /scratch/6235623/trinity_24samples_run/1_150528_AC714LANXX_P2024_2001_2.fastq.PwU.qtrim.fq >> right.fa
Sequences parsed: 107810583
CMD finished (1208 seconds)
Sequences parsed: 107810583
CMD finished (1273 seconds)
CMD: touch left.fa.ok
CMD finished (0 seconds)
CMD: touch right.fa.ok
CMD finished (0 seconds)
Done converting input files.CMD: cat left.fa right.fa > both.fa
CMD finished (39 seconds)
CMD: touch both.fa.ok
-------------------------------------------
----------- Jellyfish --------------------
-- (building a k-mer catalog from reads) --
-------------------------------------------
CMD finished (0 seconds)
CMD: /pica/sw/apps/bioinfo/trinity/2.1.0/util/..//trinity-plugins/jellyfish/bin/jellyfish count -t 16 -m 25 -s 71491756417 both.fa
CMD finished (1173 seconds)
CMD: /pica/sw/apps/bioinfo/trinity/2.1.0/util/..//trinity-plugins/jellyfish/bin/jellyfish histo -t 16 -o jellyfish.K25.min2.kmers.fa.histo mer_counts.jf
CMD finished (200 seconds)
CMD: /pica/sw/apps/bioinfo/trinity/2.1.0/util/..//trinity-plugins/jellyfish/bin/jellyfish dump -L 2 mer_counts.jf > jellyfish.K25.min2.kmers.fa
CMD finished (589 seconds)
CMD: touch jellyfish.K25.min2.kmers.fa.success
CMD finished (0 seconds)
CMD: /pica/sw/apps/bioinfo/trinity/2.1.0/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads left.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 --num_threads 16 > left.fa.K25.stats
CMD: /pica/sw/apps/bioinfo/trinity/2.1.0/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads right.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 --num_threads 16 > right.fa.K25.stats
-reading Kmer occurences...
-reading Kmer occurences...

done parsing 2012703648 Kmers, 1942890574 added, taking 3891 seconds.
terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr
bash: line 1: 18634 Aborted (core dumped) /pica/sw/apps/bioinfo/trinity/2.1.0/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads left.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 --num_threads 16 > left.fa.K25.stats
Thread 3 terminated abnormally: Error, cmd: /pica/sw/apps/bioinfo/trinity/2.1.0/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads left.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 --num_threads 16 > left.fa.K25.stats died with ret 34304 at /pica/sw/apps/bioinfo/trinity/2.1.0/util/insilico_read_normalization.pl line 733.
Error, thread exited with error Error, cmd: /pica/sw/apps/bioinfo/trinity/2.1.0/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads left.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 --num_threads 16 > left.fa.K25.stats died with ret 34304 at /pica/sw/apps/bioinfo/trinity/2.1.0/util/insilico_read_normalization.pl line 733.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Has anyone an idea why jellyfish ends with the above mentioned error?

Best,
Mihaela

Mark Chapman

unread,

Oct 21, 2015, 4:03:20 AM10/21/15

to martis....@gmail.com, trinityrnaseq-users

Hi Mihaela,

I've not seen that before but maybe its because of the way you're listing your read files? The trinity standard is to give two text files, one for F one for R reads:

Or, if you have read collections in different files you can use 'list' files, where each line in a list

# file is the full path to an input file. This saves you the time of combining them just so you can pass

# a single file for each direction.

# --left_list <string> :left reads, one file path per line

# --right_list <string> :right reads, one file path per line

Also, you say you have limited resources, do you mean memory? You can try normalising each pair of files and then doing a final normalisation of the normalised pairs.

Hope this helps, but it might not :)

-Mark

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--

Dr. Mark A. Chapman

M.Ch...@soton.ac.uk

+44 (0)2380 594396

------------------------------------

Centre for Biological Sciences
University of Southampton

Life Sciences Building 85
Highfield Campus
Southampton
SO17 1BJ

martis....@gmail.com

unread,

Oct 21, 2015, 4:20:26 AM10/21/15

to trinityrnaseq-users, martis....@gmail.com

Hi Marc,

I'm not sure, but how I understand it, you need to use 'list files' only if you run the normalization separately using the "insilico_read_normalization.pl" script. If you run the normalization step as part of the pipeline, it is enough if you concatenate the files by comma and give them as an argument:

"If you have multiple sets of fastq files, such as corresponding to multiple tissue types or conditions, etc., you can indicate them to Trinity like so:

 Trinity --seqType fq --max_memory 50G  \
         --left condA_1.fq.gz,condB_1.fq.gz,condC_1.fq.gz \
         --right condA_2.fq.gz,condB_2.fq.gz,condC_2.fq.gz \
         --CPU 6  "

Nevertheless I will try your suggestion. Maybe it helps.

Since Trinity can't run on different nodes, I'm limited to 1 node with 16 cores and 512G memory. I think this will be enough, if I split the job. The trimming (using --trimmomatic) worked quite nice.

As you suggested, I already tried to run the normalization step for each pair of files individually, followed by a final normalization step for all normalized pairs. I used therefore the options "--normalize_reads --normalize_max_read_cov 30 --normalize_by_read_set".

Best,
Mihaela

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Tiago Hori

unread,

Oct 21, 2015, 7:25:35 AM10/21/15

to martis....@gmail.com, trinityrnaseq-users

We have seen this out of range error before. It is a C++ error caused by a substring call out of range, i.e trying to get a substring that is past the length of the target.

Was it because of headers? Can't remember. Have you tried to search the Google groups?

T.

"Profanity the is the only language all programmers understand"

Sent from my iPhone, the universal excuse for my poor spelling.

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

Mihaela

unread,

Oct 21, 2015, 7:31:02 AM10/21/15

to Tiago Hori, trinityrnaseq-users

Hi Tiago,

I did not find any specific entry for Trinity and this error, but I saw that there are some entries for other tools. I will check these entries, too.
I'm not sure if the headers are wrong, since I already assembled one of the 24 samples without the mentioned error and the headers are similar.

Currently I'm trying an older Trinity version (2.0.6) to see if maybe the error occurs due to an installation error of the new version.

/M

Tiago Hori

unread,

Oct 21, 2015, 7:33:49 AM10/21/15

to Mihaela, trinityrnaseq-users

It is crashing on Jellysfish, so it is a Jellyfish issue!

T.

"Profanity the is the only language all programmers understand"

Sent from my iPhone, the universal excuse for my poor spelling.

Mihaela

unread,

Oct 21, 2015, 7:50:09 AM10/21/15

to trinityrn...@googlegroups.com

You are right. Nevertheless, I still could not find a solution for this error.

Btw, do you know if it is possible to change the jellyfish version used in Trinity (I'm just curious)?
As far as I see, Trinity is using Jellyfish version 2.1.4, but the newest release is 2.2.3.

/M.

You received this message because you are subscribed to a topic in the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/trinityrnaseq-users/FuYRUoxqnis/unsubscribe.
To unsubscribe from this group and all its topics, send an email to trinityrnaseq-u...@googlegroups.com.

martis....@gmail.com

unread,

Oct 22, 2015, 8:48:28 AM10/22/15

to trinityrnaseq-users

Hi,

it seems that the error I posted yesterday occurs only in the Trinity 2.1.0 version. Running the same command with 2.0.6 works fine. So maybe there is a bug within Trinity 2.1.0 or it is a problem specific to our installation of Trinity 2.1.0.

Best,
Mihaela

Reply all

Reply to author

Forward