Error in Insilico normalization process

111 views
Skip to first unread message

Yuwei Xiao

unread,
Sep 26, 2023, 8:40:58 PM9/26/23
to trinityrnaseq-users
Hi,

I'm working on de novo transcriptome assembly for 24 samples (each of 7 GB as pair end data).  I'm running latest version of Trinity with command "Trinity --seqType fq --max_memory 50G --samples_file sample_info.txt --CPU 4 --normalize_by_read_set --no_salmon". (Samples_file is also attached in case you need). 
The bottom is a part of what's shown on the terminal.  The text was bolded which I think is mostly related. I don't understand why the both.fa is reported as 0 byte while I checked the fold tmp_normalized_reads/ and found its size was indeed combination of left.fa and right.fa. Maybe that's not the actual reason for failure? The first error report occured with insilico_read_normalization.pl saying " died with ret 512 at /opt/conda/envs/denovoassembly/bin/Trinity line 2919." Some other text also occured afterwards but I don't know whether it's important or not. Thank you so much for your help!

------------------------------------------------------------------------------
Tuesday, September 26, 2023: 16:35:22   CMD: java -Xmx64m -XX:ParallelGCThreads=2  -jar /opt/conda/envs/denovoassembly/opt/trinity-2.15.1/util/support_scripts/ExitTester.jar 0
Tuesday, September 26, 2023: 16:35:22   CMD: java -Xmx4g -XX:ParallelGCThreads=2  -jar /opt/conda/envs/denovoassembly/opt/trinity-2.15.1/util/support_scripts/ExitTester.jar 1

----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads  ---------------------
----------------------------------------------------------------------------------

---------------------------------------------------------------
------------ In silico Read Normalization ---------------------
-- (Removing Excess Reads Beyond 200 Coverage --
---------------------------------------------------------------


## Running in silico normalization, processing each read set separately
# running normalization on reads: $VAR1 = [
          [
            '/data-store/iplant/home/xiluo/potentilla/Pg5A_1_clean.fastq.gz'
          ],
          [
            '/data-store/iplant/home/xiluo/potentilla/Pg5A_2_clean.fastq.gz'
          ]
        ];


Tuesday, September 26, 2023: 16:35:23   CMD: /opt/conda/envs/denovoassembly/opt/trinity-2.15.1/util/insilico_read_normalization.pl --seqType fq --JM 50G  --max_cov 200 --min_cov 1 --CPU 6 --output /data-store/iplant/home/xiluo/potentilla/trinity_out_dir/norm_for_read_set_1 --max_CV 10000  --left /data-store/iplant/home/xiluo/potentilla/Pg5A_1_clean.fastq.gz --right /data-store/iplant/home/xiluo/potentilla/Pg5A_2_clean.fastq.gz --pairs_together  --PARALLEL_STATS  
-prepping seqs
Converting input files. (both directions in parallel)CMD: seqtk-trinity seq -A -R 1  <(gunzip -c /data-store/iplant/home/xiluo/potentilla/Pg5A_1_clean.fastq.gz) >> left.fa
CMD: seqtk-trinity seq -A -R 2  <(gunzip -c /data-store/iplant/home/xiluo/potentilla/Pg5A_2_clean.fastq.gz) >> right.fa
CMD finished (156 seconds)
CMD finished (163 seconds)
CMD: touch left.fa.ok
CMD finished (1 seconds)
CMD: touch right.fa.ok
CMD finished (1 seconds)
Done converting input files.CMD: cat left.fa right.fa > both.fa
                       
CMD finished (1196 seconds)
both.fa (0 bytes) is different from the combined size of left.fa and right.fa (17377774715 bytes)
Error, cmd: /opt/conda/envs/denovoassembly/opt/trinity-2.15.1/util/insilico_read_normalization.pl --seqType fq --JM 50G  --max_cov 200 --min_cov 1 --CPU 6 --output /data-store/iplant/home/xiluo/potentilla/trinity_out_dir/norm_for_read_set_1 --max_CV 10000  --left /data-store/iplant/home/xiluo/potentilla/Pg5A_1_clean.fastq.gz --right /data-store/iplant/home/xiluo/potentilla/Pg5A_2_clean.fastq.gz --pairs_together  --PARALLEL_STATS   died with ret 512 at /opt/conda/envs/denovoassembly/bin/Trinity line 2919.
        main::process_cmd("/opt/conda/envs/denovoassembly/opt/trinity-2.15.1/util/insili"...) called at /opt/conda/envs/denovoassembly/bin/Trinity line 3472
        main::normalize("/data-store/iplant/home/xiluo/potentilla/trinity_out_dir/norm"..., 200, ARRAY(0x5599dc927e58), ARRAY(0x5599dc928710)) called at /opt/conda/envs/denovoassembly/bin/Trinity line 3389
        main::run_normalization(200, ARRAY(0x5599dc6f3738), ARRAY(0x5599dc6f3768)) called at /opt/conda/envs/denovoassembly/bin/Trinity line 1450
sample info.txt

Brian Haas

unread,
Sep 28, 2023, 3:15:35 PM9/28/23
to Yuwei Xiao, trinityrnaseq-users
Hi,

For some reason, it thinks the size of the both.fa file is zero (so empty).  If this is not the case, try removing a both.fa.ok file if it exists and then rerunning your original command to try again and pick up at that last step.

best,

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/113d0785-2be7-44e6-9938-0dae4f0be946n%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Mohd Noor Mat Isa

unread,
Sep 28, 2023, 3:17:08 PM9/28/23
to Yuwei Xiao, trinityrnaseq-users
Hi,

I do have the same errors... I'm using singularity image to run 6 samples ... 

Hopefully Brian and Trinity community will enlighten us.. 


Thanks 
Mohd Noor
Malaysia Genome and Vaccine Institute

--

Yuwei Xiao

unread,
Sep 29, 2023, 6:05:54 PM9/29/23
to trinityrnaseq-users
Hi Brian,

The most confusing part is that although both.fa was considered as 0 byte in the ouput text, but it was actually not (see attachment). I reran it several times and sometimes both.fa was of ~200 MB size. I have no idea why. Also, I don't get both.fa.OK file. I ran the command on cloud shell app in Cyverse. I think enough memory and disk space was allocated (See top command output screenshot).

And also, I got an error with salmon as default saying "Error, cannot determine salmon version installed from (salmon: error while loading shared libraries: libboost_iostreams.so.1.60.0: cannot open shared object file: No such file or directory) at /opt/conda/envs/denovo/bin/Trinity line 4111" (also see attached screenshot). So I just added "--no_salmon".  Is there any other solution to this issue?

Last question, I'm wondering what's canonical case of using trinity to de novo assemble transcripts from dozens of data sets or samples. I read some disscussion webpages saying that they concatenated all left reads files into a single file and all right into a single file at first which would be used as input to get a single assembly. While in other cases (I think that's also my case, Please correct me if I misunderstood it), with "--normalize_by_read_set", pairs of sequencing data files are parsed sequencially and transcripts assembled from different data sets are incoporated in some way to get a final single transcript fasta file. what are the pros and cons of the two strategies?  

Thank you so much for your attention and reply!

Yuwei
salmon issue.png
folder screenshot.png
top output.png

Brian Haas

unread,
Sep 30, 2023, 6:10:59 PM9/30/23
to Yuwei Xiao, trinityrnaseq-users
Hi Yuwei,

responses below


onfusing part is that although both.fa was considered as 0 byte in the ouput text, but it was actually not (see attachment). I reran it several times and sometimes both.fa was of ~200 MB size. I have no idea why. Also, I don't get both.fa.OK file. I ran the command on cloud shell app in Cyverse. I think enough memory and disk space was allocated (See top command output screenshot).


It is very peculiar - I can't explain it either.  It's the first time in all the years of Trinity that I'm hearing about this particular issue.

 
And also, I got an error with salmon as default saying "Error, cannot determine salmon version installed from (salmon: error while loading shared libraries: libboost_iostreams.so.1.60.0: cannot open shared object file: No such file or directory) at /opt/conda/envs/denovo/bin/Trinity line 4111" (also see attached screenshot). So I just added "--no_salmon".  Is there any other solution to this issue?

If you can use our singularity or docker image for Trinity, maybe that'll work better for you as it comes with everything installed and working.

Here's how we install salmon in our images:
 
The version info is there as well.  It really just needs to be installed and available via your PATH setting.  If you can run salmon yourself outside of Trinity, then Trinity should in theory be able to as well.


Last question, I'm wondering what's canonical case of using trinity to de novo assemble transcripts from dozens of data sets or samples. I read some disscussion webpages saying that they concatenated all left reads files into a single file and all right into a single file at first which would be used as input to get a single assembly. While in other cases (I think that's also my case, Please correct me if I misunderstood it), with "--normalize_by_read_set", pairs of sequencing data files are parsed sequencially and transcripts assembled from different data sets are incoporated in some way to get a final single transcript fasta file. what are the pros and cons of the two strategies?  


The normalize_by_read_set is just a way to save on memory usage during the normalization process.   Our standard protocol involves assembling all reads into a single assembly so that can serve as a single target for your downstream quantification and DE analyses.  It only makes sense if all the samples derive from the same target organism.   Be sure to use the --samples_file parameter if you have multiple sets of reads to assemble together.

Hope this helps,

~b

 

Brian Haas

unread,
Oct 1, 2023, 9:04:21 AM10/1/23
to Mohd Noor Mat Isa, Yuwei Xiao, trinityrnaseq-users

On Sun, Oct 1, 2023 at 4:40 AM Mohd Noor Mat Isa <emn...@gmail.com> wrote:
Hi,

FYI, I managed to overcome the insilico normalization issue when using singularity image version 2.14.0 (trinityrnaseq.v2.14.0.simg).

I don't know why, and what are the differences between v2.14.0 vs v2.15.1 in the insilico normalization step.. but finally managed to complete the run.

Regards
Mohd Noor
MGVI

On Sun, Oct 1, 2023 at 1:58 PM Mohd Noor Mat Isa <emn...@gmail.com> wrote:
Hi Brian,

I have the same problem, the different is I'm running using singularity image... I got the same issue with insilico normalization process.. 

Mohd Noor
MGVI


Mohd Noor
Malaysia Genome and Vaccine Institute

Mohd Noor Mat Isa

unread,
Oct 1, 2023, 9:04:39 AM10/1/23
to Brian Haas, Yuwei Xiao, trinityrnaseq-users
Hi,

FYI, I managed to overcome the insilico normalization issue when using singularity image version 2.14.0 (trinityrnaseq.v2.14.0.simg).

I don't know why, and what are the differences between v2.14.0 vs v2.15.1 in the insilico normalization step.. but finally managed to complete the run.

Regards
Mohd Noor
MGVI

On Sun, Oct 1, 2023 at 1:58 PM Mohd Noor Mat Isa <emn...@gmail.com> wrote:
Hi Brian,

I have the same problem, the different is I'm running using singularity image... I got the same issue with insilico normalization process.. 

Mohd Noor
MGVI


On Sun, 1 Oct 2023, 06:11 Brian Haas, <bh...@broadinstitute.org> wrote:

Mohd Noor Mat Isa

unread,
Oct 1, 2023, 9:04:42 AM10/1/23
to Brian Haas, Yuwei Xiao, trinityrnaseq-users
Hi Brian,

I have the same problem, the different is I'm running using singularity image... I got the same issue with insilico normalization process.. 

Mohd Noor
MGVI


On Sun, 1 Oct 2023, 06:11 Brian Haas, <bh...@broadinstitute.org> wrote:

Yuwei Xiao

unread,
Oct 3, 2023, 6:04:12 PM10/3/23
to trinityrnaseq-users
Hi Brian,

Thank you so much for your reply. I downloaded and installed Trinity 2.15.1 by conda instead of Singularity image or Docker image. Could that be the reason since I didn't find any installation and  running guideline or link to conda in Trinity wiki page? Do you think it's viable to install and run Trinity by conda? Anyway, I will go for Docker now and see if it works. Thank you again for your patience and time. 

Best,
Yuwei  

Brian Haas

unread,
Oct 3, 2023, 7:57:14 PM10/3/23
to Yuwei Xiao, trinityrnaseq-users
The conda installations tend to be fine, but I'm not responsible for them.  I do build the docker and singularity images, though, and stand by them.

best,

~b

Reply all
Reply to author
Forward
0 new messages