Trinity transcriptome assembly stalls at Chrysalis step

132 views
Skip to first unread message

Ollie White

unread,
Jul 11, 2020, 10:53:07 AM7/11/20
to trinityrnaseq-users
Hello, 

I am using Trinity v2.10.0 to assemble a transcriptome dataset. It runs perfectly at first but it seems to stall or at least stop progressing at the Chrysalis step. I have pasted the script and log file below.

It stopped at the Chrysalis step and the log file did not change for about four days before the job reached the maximum walltime on my university computing facility. I was able to monitor the job and it was running on 32 CPUs as expected. 

Note that I also normalised the reads using insilico_read_normalization.pl prior to the assembly.

If the job stalls like this again, is it worth stopping the job and restarting from a checkpoint? 

Or perhaps it just needs to run for longer? If so, can I run the assembly in steps so that the job does not exceed the walltime limit of my universities computing facility? 

Best wishes
Ollie


Script code
Trinity \
   --seqType fq \
   --max_memory 250G \
   --SS_lib_type RF \
   --left /scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq \
   --right /scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq \
   --CPU 32 \
   --min_kmer_cov 2 --max_internal_gap_same_path 15 --max_diffs_same_path 4 \
   --output /dev/shm/trinity_sun \
   --full_cleanup


Logfile

     ______  ____   ____  ____   ____  ______  __ __
    |      ||    \ |    ||    \ |    ||      ||  |  |
    |      ||  D  ) |  | |  _  | |  | |      ||  |  |
    |_|  |_||    /  |  | |  |  | |  | |_|  |_||  ~  |
      |  |  |    \  |  | |  |  | |  |   |  |  |___, |
      |  |  |  .  \ |  | |  |  | |  |   |  |  |     |
      |__|  |__|\_||____||__|__||____|  |__|  |____/

    Trinity-v2.10.0



Left read files: $VAR1 = [
          '/scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq'
        ];
Right read files: $VAR1 = [
          '/scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq'
        ];
Trinity version: Trinity-v2.10.0
-ERROR: couldn't run the network check to confirm latest Trinity software version.

Monday, July 6, 2020: 11:54:21  CMD: java -Xmx64m -XX:ParallelGCThreads=2  -jar /home/local/software/trinity/trinityrnaseq-v2.10.0/util/support_scripts/ExitTester.jar 0
Monday, July 6, 2020: 11:54:21  CMD: java -Xmx4g -XX:ParallelGCThreads=2  -jar /home/local/software/trinity/trinityrnaseq-v2.10.0/util/support_scripts/ExitTester.jar 1
Monday, July 6, 2020: 11:54:21  CMD: mkdir -p /dev/shm/trinity_sun
Monday, July 6, 2020: 11:54:22  CMD: mkdir -p /dev/shm/trinity_sun/chrysalis


----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads  ---------------------
----------------------------------------------------------------------------------

---------------------------------------------------------------
------------ In silico Read Normalization ---------------------
-- (Removing Excess Reads Beyond 200 Coverage --
---------------------------------------------------------------

# running normalization on reads: $VAR1 = [
          [
            '/scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq'
          ],
          [
            '/scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq'
          ]
        ];


Monday, July 6, 2020: 11:54:22  CMD: /home/local/software/trinity/trinityrnaseq-v2.10.0/util/insilico_read_normalization.pl --seqType fq --JM 250G  --max_cov 200 --min_cov 2 --CPU 32 --output /dev/shm/trinity_sun/insilico_read_normalization --max_CV 10000  --SS_lib_type RF  --left /scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq --right /scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq --pairs_together --PARALLEL_STATS
-prepping seqs
Converting input files. (both directions in parallel)CMD: seqtk-trinity seq -r -A -R 1  /scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq >> left.fa
CMD: seqtk-trinity seq -A -R 2  /scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq >> right.fa
CMD finished (8 seconds)
CMD finished (10 seconds)
CMD: touch left.fa.ok
CMD finished (0 seconds)
CMD: touch right.fa.ok
CMD finished (0 seconds)
Done converting input files.CMD: cat left.fa right.fa > both.fa
CMD finished (3 seconds)
CMD: touch both.fa.ok
CMD finished (0 seconds)
-kmer counting.
-------------------------------------------
----------- Jellyfish  --------------------
-- (building a k-mer catalog from reads) --
-------------------------------------------

CMD: jellyfish count -t 32 -m 25 -s 100000000  both.fa
CMD finished (73 seconds)
CMD: jellyfish histo -t 32 -o jellyfish.K25.min2.kmers.fa.histo mer_counts.jf
CMD finished (65 seconds)
CMD: jellyfish dump -L 2 mer_counts.jf > jellyfish.K25.min2.kmers.fa
CMD finished (141 seconds)
CMD: touch jellyfish.K25.min2.kmers.fa.success
CMD finished (0 seconds)
-generating stats files
CMD: /home/local/software/trinity/trinityrnaseq-v2.10.0/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads left.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25  --num_threads 16  > left.fa.K25.stats
CMD: /home/local/software/trinity/trinityrnaseq-v2.10.0/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads right.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25  --num_threads 16  > right.fa.K25.stats
-reading Kmer occurrences...-reading Kmer occurrences...


 done parsing 251518720 Kmers, 166418795 added, taking 844 seconds.

 done parsing 251518720 Kmers, 166418795 added, taking 848 seconds.
STATS_GENERATION_TIME: 239 seconds.
STATS_GENERATION_TIME: 255 seconds.
CMD finished (1151 seconds)
CMD finished (1177 seconds)
CMD: touch left.fa.K25.stats.ok
CMD finished (0 seconds)
CMD: touch right.fa.K25.stats.ok
CMD finished (0 seconds)
-sorting each stats file by read name.
CMD: head -n1 left.fa.K25.stats > left.fa.K25.stats.sort && tail -n +2 left.fa.K25.stats | /bin/sort -k1,1 -T . -S 125G >> left.fa.K25.stats.sort
CMD: head -n1 right.fa.K25.stats > right.fa.K25.stats.sort && tail -n +2 right.fa.K25.stats | /bin/sort -k1,1 -T . -S 125G >> right.fa.K25.stats.sort
CMD finished (9 seconds)
CMD finished (10 seconds)
CMD: touch left.fa.K25.stats.sort.ok
CMD finished (0 seconds)
CMD: touch right.fa.K25.stats.sort.ok
CMD finished (0 seconds)
-defining normalized reads
CMD: /home/local/software/trinity/trinityrnaseq-v2.10.0/util/..//util/support_scripts//nbkc_merge_left_right_stats.pl --left left.fa.K25.stats.sort --right right.fa.K25.stats.sort --sorted > pairs.K25.stats
-opening left.fa.K25.stats.sort
-opening right.fa.K25.stats.sort
-done opening files.
CMD finished (288 seconds)
CMD: touch pairs.K25.stats.ok
CMD finished (0 seconds)
CMD: /home/local/software/trinity/trinityrnaseq-v2.10.0/util/..//util/support_scripts//nbkc_normalize.pl --stats_file pairs.K25.stats --max_cov 200  --min_cov 2 --max_CV 10000 > pairs.K25.stats.C200.maxCV10000.accs
8495931 / 9307971 = 91.28% reads selected during normalization.
0 / 9307971 = 0.00% reads discarded as likely aberrant based on coverage profiles.
812015 / 9307971 = 8.72% reads discarded as below minimum coverage threshold=2
CMD finished (165 seconds)
CMD: touch pairs.K25.stats.C200.maxCV10000.accs.ok
CMD finished (0 seconds)
-search and capture.
-preparing to extract selected reads from: /scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq ... done prepping, now search and capture.
-capturing normalized reads from: /scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq
-preparing to extract selected reads from: /scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq ... done prepping, now search and capture.
-capturing normalized reads from: /scratch/user/transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq
CMD: touch /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq.ok
CMD finished (0 seconds)
CMD: touch /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq.ok
CMD finished (0 seconds)
CMD: ln -sf /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq left.norm.fq
CMD finished (0 seconds)
CMD: ln -sf /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq right.norm.fq
CMD finished (0 seconds)
-removing tmp dir /dev/shm/trinity_sun/insilico_read_normalization/tmp_normalized_reads


Normalization complete. See outputs:
        /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq
        /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq
Monday, July 6, 2020: 12:30:54  CMD: touch /dev/shm/trinity_sun/insilico_read_normalization/normalization.ok
Converting input files. (in parallel)Monday, July 6, 2020: 12:30:54     CMD: cat /dev/shm/trinity_sun/insilico_read_normalization/left.norm.fq | seqtk-trinity seq -r -A -R 1 - >> left.fa
Monday, July 6, 2020: 12:30:54  CMD: cat /dev/shm/trinity_sun/insilico_read_normalization/right.norm.fq | seqtk-trinity seq -A -R 2 - >> right.fa
Monday, July 6, 2020: 12:31:02  CMD: touch right.fa.ok
Monday, July 6, 2020: 12:31:03  CMD: touch left.fa.ok
Monday, July 6, 2020: 12:31:03  CMD: touch left.fa.ok right.fa.ok
Monday, July 6, 2020: 12:31:03  CMD: cat left.fa right.fa > /dev/shm/trinity_sun/both.fa
Monday, July 6, 2020: 12:31:06  CMD: touch /dev/shm/trinity_sun/both.fa.ok
-------------------------------------------
----------- Jellyfish  --------------------
-- (building a k-mer (25) catalog from reads) --
-------------------------------------------

* [Mon Jul  6 12:31:08 2020] Running CMD: jellyfish count -t 32 -m 25 -s 100000000 -o mer_counts.25.asm.jf /dev/shm/trinity_sun/both.fa
* [Mon Jul  6 12:32:04 2020] Running CMD: jellyfish dump -L 2 mer_counts.25.asm.jf > jellyfish.kmers.25.asm.fa
* [Mon Jul  6 12:34:07 2020] Running CMD: jellyfish histo -t 32 -o jellyfish.kmers.25.asm.fa.histo mer_counts.25.asm.jf
----------------------------------------------
--------------- Inchworm (K=25, asm) ---------------------
-- (Linear contig construction from k-mers) --
----------------------------------------------

* [Mon Jul  6 12:34:55 2020] Running CMD: /home/local/software/trinity/trinityrnaseq-v2.10.0/Inchworm/bin//inchworm --kmers jellyfish.kmers.25.asm.fa --run_inchworm -K 25 --monitor 1   --num_threads 6  --PARALLEL_IWORM  -L 25  --no_prune_error_kmers  > /dev/shm/trinity_sun/inchworm.fa.tmp
Kmer length set to: 25
Min assembly length set to: 25
Monitor turned on, set to: 1
setting number of threads to: 6
-setting parallel iworm mode.
-reading Kmer occurrences...
 [236M] Kmers parsed.
 done parsing 236472718 Kmers, 236472718 added, taking 866 seconds.

TIMING KMER_DB_BUILDING 866 s.
-populating the kmer seed candidate list.
Kcounter hash size: 236472718
Processed 236472718 non-zero abundance kmers in kcounter.
-Not sorting list of kmers, given parallel mode in effect.
-beginning inchworm contig assembly.
Total kcounter hash size: 236472718 vs. sorted list size: 236472718
num threads set to: 6
Done opening file. tmp.iworm.fa.pid_8142.thread_0
Done opening file. tmp.iworm.fa.pid_8142.thread_1
Done opening file. tmp.iworm.fa.pid_8142.thread_2
Done opening file. tmp.iworm.fa.pid_8142.thread_3
Done opening file. tmp.iworm.fa.pid_8142.thread_4
Done opening file. tmp.iworm.fa.pid_8142.thread_5

        Iworm contig assembly time: 229 seconds = 3.81667 minutes.

TIMING CONTIG_BUILDING 229 s.

TIMING PROG_RUNTIME 1207 s.
* [Mon Jul  6 12:56:56 2020] Running CMD: mv /dev/shm/trinity_sun/inchworm.fa.tmp /dev/shm/trinity_sun/inchworm.fa
Monday, July 6, 2020: 12:56:56  CMD: touch /dev/shm/trinity_sun/inchworm.fa.finished
--------------------------------------------------------
-------------------- Chrysalis -------------------------
-- (Contig Clustering & de Bruijn Graph Construction) --
--------------------------------------------------------

inchworm_target: /dev/shm/trinity_sun/both.fa
bowtie_reads_fa: /dev/shm/trinity_sun/both.fa
chrysalis_reads_fa: /dev/shm/trinity_sun/both.fa
* [Mon Jul  6 12:56:56 2020] Running CMD: /home/local/software/trinity/trinityrnaseq-v2.10.0/util/support_scripts/filter_iworm_by_min_length_or_cov.pl /dev/shm/trinity_sun/inchworm.fa 100 10 > /dev/shm/trinity_sun/chrysalis/inchworm.fa.min100
* [Mon Jul  6 12:58:52 2020] Running CMD: /local/software/biobuilds/2017.11/bin/bowtie2-build --threads 32 -o 3 /dev/shm/trinity_sun/chrysalis/inchworm.fa.min100 /dev/shm/trinity_sun/chrysalis/inchworm.fa.min100 1>/dev/null
* [Mon Jul  6 13:02:28 2020] Running CMD: bash -c " set -o pipefail;/local/software/biobuilds/2017.11/bin/bowtie2 --local -k 2 --no-unal --threads 32 -f --score-min G,20,8 -x /dev/shm/trinity_sun/chrysalis/inchworm.fa.min100 /dev/shm/trinity_sun/both.fa  | samtools view -@ 32 -F4 -Sb - | samtools sort -m 4194304000 -@ 32 -no - - > /dev/shm/trinity_sun/chrysalis/iworm.bowtie.nameSorted.bam"

 ===============================================================================
 Job finished at Sat Jul 11 11:54:28 BST 2020


Requested resource limits are mem=250gb,neednodes=1:ppn=32,nodes=1:ppn=32,walltime=120:00:00
Used resource limits are cput=3693:54:49,mem=14467392kb,vmem=262648060kb,walltime=120:00:15

Brian Haas

unread,
Jul 11, 2020, 11:04:24 AM7/11/20
to Ollie White, trinityrnaseq-users
Hi Ollie,

I just released a new version of Trinity a short while ago:



It has updates that might perform better for you here.  Try giving it a whirl next time. 

best,

~brian

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/538ad401-6644-40af-a7b9-9b269cceeef3o%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Ollie White

unread,
Jul 11, 2020, 11:08:09 AM7/11/20
to trinityrnaseq-users
Hi Brian, 

Thanks for the reply, I will try the latest version and let you know if I have any success.

Best wishes
Ollie



On Saturday, July 11, 2020 at 4:04:24 PM UTC+1, Brian Haas wrote:
Hi Ollie,

I just released a new version of Trinity a short while ago:



It has updates that might perform better for you here.  Try giving it a whirl next time. 

best,

~brian

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

Ollie White

unread,
Jul 17, 2020, 11:25:36 AM7/17/20
to trinityrnaseq-users
Hi Brian, 

I have installed the most recent v 2.11 as suggested and repeated the read normalisation and started the assembly running.

At the moment it is running the Chrysalis step as before, but the log file has not changed since 5pm yesterday. I am wondering if it might not progress beyond this step as before. 

This is the log file so far 


     ______  ____   ____  ____   ____  ______  __ __
    |      ||    \ |    ||    \ |    ||      ||  |  |
    |      ||  D  ) |  | |  _  | |  | |      ||  |  |
    |_|  |_||    /  |  | |  |  | |  | |_|  |_||  ~  |
      |  |  |    \  |  | |  |  | |  |   |  |  |___, |
      |  |  |  .  \ |  | |  |  | |  |   |  |  |     |
      |__|  |__|\_||____||__|__||____|  |__|  |____/

    Trinity-v2.11.0



Left read files: $VAR1 = [
          '/scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq'
        ];
Right read files: $VAR1 = [
          '/scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq'
        ];
Trinity version: Trinity-v2.11.0
-ERROR: couldn't run the network check to confirm latest Trinity software version.

Thursday, July 16, 2020: 15:54:50       CMD: java -Xmx64m -XX:ParallelGCThreads=2  -jar /home/local/software/trinity/trinityrnaseq-v2.11.0/util/support_scripts/ExitTester.jar 0
Thursday, July 16, 2020: 15:54:51       CMD: java -Xmx4g -XX:ParallelGCThreads=2  -jar /home/local/software/trinity/trinityrnaseq-v2.11.0/util/support_scripts/ExitTester.jar 1
Thursday, July 16, 2020: 15:54:52       CMD: mkdir -p /dev/shm/trinity_sun
Thursday, July 16, 2020: 15:54:52       CMD: mkdir -p /dev/shm/trinity_sun/chrysalis


----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads  ---------------------
----------------------------------------------------------------------------------

---------------------------------------------------------------
------------ In silico Read Normalization ---------------------
-- (Removing Excess Reads Beyond 200 Coverage --
---------------------------------------------------------------

# running normalization on reads: $VAR1 = [
          [
            '/scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq'
          ],
          [
            '/scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq'
          ]
        ];


Thursday, July 16, 2020: 15:54:52       CMD: /home/local/software/trinity/trinityrnaseq-v2.11.0/util/insilico_read_normalization.pl --seqType fq --JM 250G  --max_cov 200 --min_cov 2 --CPU 32 --output /dev/shm/trinity_sun/insilico_read_normalization --max_CV 10000  --SS_lib_type RF  --left /scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq --right /scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq --pairs_together  --PARALLEL_STATS
-prepping seqs
Converting input files. (both directions in parallel)CMD: seqtk-trinity seq -r -A -R 1  /scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq >> left.fa
CMD: seqtk-trinity seq -A -R 2  /scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq >> right.fa
CMD finished (8 seconds)
CMD finished (10 seconds)
CMD: touch left.fa.ok
CMD finished (0 seconds)
CMD: touch right.fa.ok
CMD finished (0 seconds)
Done converting input files.CMD: cat left.fa right.fa > both.fa
CMD finished (2 seconds)
CMD: touch both.fa.ok
CMD finished (0 seconds)
-kmer counting.
-------------------------------------------
----------- Jellyfish  --------------------
-- (building a k-mer catalog from reads) --
-------------------------------------------

CMD: jellyfish count -t 32 -m 25 -s 100000000  both.fa
CMD finished (74 seconds)
CMD: jellyfish histo -t 32 -o jellyfish.K25.min2.kmers.fa.histo mer_counts.jf
CMD finished (63 seconds)
CMD: jellyfish dump -L 2 mer_counts.jf > jellyfish.K25.min2.kmers.fa
CMD finished (142 seconds)
CMD: touch jellyfish.K25.min2.kmers.fa.success
CMD finished (0 seconds)
-generating stats files
CMD: /home/local/software/trinity/trinityrnaseq-v2.11.0/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads left.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25  --num_threads 16  > left.fa.K25.stats
CMD: /home/local/software/trinity/trinityrnaseq-v2.11.0/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads right.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25  --num_threads 16  > right.fa.K25.stats
-reading Kmer occurrences...-reading Kmer occurrences...


 done parsing 251484642 Kmers, 166389879 added, taking 759 seconds.

 done parsing 251484642 Kmers, 166389879 added, taking 768 seconds.
STATS_GENERATION_TIME: 229 seconds.
STATS_GENERATION_TIME: 253 seconds.
CMD finished (1064 seconds)
CMD finished (1092 seconds)
CMD: touch left.fa.K25.stats.ok
CMD finished (0 seconds)
CMD: touch right.fa.K25.stats.ok
CMD finished (0 seconds)
-sorting each stats file by read name.
CMD: head -n1 left.fa.K25.stats > left.fa.K25.stats.sort && tail -n +2 left.fa.K25.stats | /bin/sort -k1,1 -T . -S 125G >> left.fa.K25.stats.sort
CMD: head -n1 right.fa.K25.stats > right.fa.K25.stats.sort && tail -n +2 right.fa.K25.stats | /bin/sort -k1,1 -T . -S 125G >> right.fa.K25.stats.sort
CMD finished (10 seconds)
CMD finished (11 seconds)
CMD: touch left.fa.K25.stats.sort.ok
CMD finished (0 seconds)
CMD: touch right.fa.K25.stats.sort.ok
CMD finished (0 seconds)
-defining normalized reads
CMD: /home/local/software/trinity/trinityrnaseq-v2.11.0/util/..//util/support_scripts//nbkc_merge_left_right_stats.pl --left left.fa.K25.stats.sort --right right.fa.K25.stats.sort --sorted > pairs.K25.stats
-opening left.fa.K25.stats.sort
-opening right.fa.K25.stats.sort
-done opening files.
CMD finished (281 seconds)
CMD: touch pairs.K25.stats.ok
CMD finished (0 seconds)
CMD: /home/local/software/trinity/trinityrnaseq-v2.11.0/util/..//util/support_scripts//nbkc_normalize.pl --stats_file pairs.K25.stats --max_cov 200  --min_cov 2 --max_CV 10000 > pairs.K25.stats.C200.maxCV10000.accs
8496701 / 9308845 = 91.28% reads selected during normalization.
0 / 9308845 = 0.00% reads discarded as likely aberrant based on coverage profiles.
812124 / 9308845 = 8.72% reads discarded as below minimum coverage threshold=2
CMD finished (159 seconds)
CMD: touch pairs.K25.stats.C200.maxCV10000.accs.ok
CMD finished (0 seconds)
-search and capture.
-preparing to extract selected reads from: /scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq ... done prepping, now search and capture.
-capturing normalized reads from: /scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq
-preparing to extract selected reads from: /scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq ... done prepping, now search and capture.
-capturing normalized reads from: /scratch/oww1c19/argyranthemum_transcriptomics/normalise_reads/normalise_sun/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq
CMD: touch /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq.ok
CMD finished (0 seconds)
CMD: touch /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq.ok
CMD finished (0 seconds)
CMD: ln -sf /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq left.norm.fq
CMD finished (0 seconds)
CMD: ln -sf /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq right.norm.fq
CMD finished (0 seconds)
-removing tmp dir /dev/shm/trinity_sun/insilico_read_normalization/tmp_normalized_reads


Normalization complete. See outputs:
        /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_1.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq
        /dev/shm/trinity_sun/insilico_read_normalization/trim_paired_A7_2.fq.gz_ext_all_reads.normalized_K25_maxC30_minC0_maxCV10000.fq.normalized_K25_maxC200_minC2_maxCV10000.fq
Thursday, July 16, 2020: 16:29:43       CMD: touch /dev/shm/trinity_sun/insilico_read_normalization/normalization.ok
Converting input files. (in parallel)Thursday, July 16, 2020: 16:29:43  CMD: cat /dev/shm/trinity_sun/insilico_read_normalization/left.norm.fq | seqtk-trinity seq -r -A -R 1 - >> left.fa
Thursday, July 16, 2020: 16:29:43       CMD: cat /dev/shm/trinity_sun/insilico_read_normalization/right.norm.fq | seqtk-trinity seq -A -R 2 - >> right.fa
Thursday, July 16, 2020: 16:29:51       CMD: touch right.fa.ok
Thursday, July 16, 2020: 16:29:52       CMD: touch left.fa.ok
Thursday, July 16, 2020: 16:29:52       CMD: touch left.fa.ok right.fa.ok
Thursday, July 16, 2020: 16:29:52       CMD: cat left.fa right.fa > /dev/shm/trinity_sun/both.fa
Thursday, July 16, 2020: 16:29:55       CMD: touch /dev/shm/trinity_sun/both.fa.ok
-------------------------------------------
----------- Jellyfish  --------------------
-- (building a k-mer (25) catalog from reads) --
-------------------------------------------

* [Thu Jul 16 16:29:58 2020] Running CMD: jellyfish count -t 32 -m 25 -s 100000000 -o mer_counts.25.asm.jf /dev/shm/trinity_sun/both.fa
* [Thu Jul 16 16:30:53 2020] Running CMD: jellyfish dump -L 2 mer_counts.25.asm.jf > jellyfish.kmers.25.asm.fa
* [Thu Jul 16 16:32:53 2020] Running CMD: jellyfish histo -t 32 -o jellyfish.kmers.25.asm.fa.histo mer_counts.25.asm.jf
----------------------------------------------
--------------- Inchworm (K=25, asm) ---------------------
-- (Linear contig construction from k-mers) --
----------------------------------------------

* [Thu Jul 16 16:33:39 2020] Running CMD: /home/local/software/trinity/trinityrnaseq-v2.11.0/Inchworm/bin//inchworm --kmers jellyfish.kmers.25.asm.fa --run_inchworm -K 25 --monitor 1   --num_threads 6  --PARALLEL_IWORM   --min_any_entropy 1.0   -L 25  --no_prune_error_kmers  > /dev/shm/trinity_sun/inchworm.fa.tmp
Kmer length set to: 25
Min assembly length set to: 25
Monitor turned on, set to: 1
min entropy set to: 1
setting number of threads to: 6
-setting parallel iworm mode.
-reading Kmer occurrences...
 [236M] Kmers parsed.
 done parsing 236437792 Kmers, 236437792 added, taking 851 seconds.

TIMING KMER_DB_BUILDING 851 s.
Pruning kmers (min_kmer_count=1 min_any_entropy=1 min_ratio_non_error=0.005)
Pruned 187920 kmers from catalog.
        Pruning time: 202 seconds = 3.36667 minutes.

TIMING PRUNING 202 s.
-populating the kmer seed candidate list.
Kcounter hash size: 236437792
Processed 236249872 non-zero abundance kmers in kcounter.
-Not sorting list of kmers, given parallel mode in effect.
-beginning inchworm contig assembly.
Total kcounter hash size: 236437792 vs. sorted list size: 236249872
num threads set to: 6
Done opening file. tmp.iworm.fa.pid_20697.thread_0
Done opening file. tmp.iworm.fa.pid_20697.thread_1
Done opening file. tmp.iworm.fa.pid_20697.thread_2
Done opening file. tmp.iworm.fa.pid_20697.thread_3
Done opening file. tmp.iworm.fa.pid_20697.thread_4
Done opening file. tmp.iworm.fa.pid_20697.thread_5

        Iworm contig assembly time: 231 seconds = 3.85 minutes.

TIMING CONTIG_BUILDING 231 s.

TIMING PROG_RUNTIME 1398 s.
* [Thu Jul 16 16:58:52 2020] Running CMD: mv /dev/shm/trinity_sun/inchworm.fa.tmp /dev/shm/trinity_sun/inchworm.fa
Thursday, July 16, 2020: 16:58:52       CMD: touch /dev/shm/trinity_sun/inchworm.fa.finished
--------------------------------------------------------
-------------------- Chrysalis -------------------------
-- (Contig Clustering & de Bruijn Graph Construction) --
--------------------------------------------------------

inchworm_target: /dev/shm/trinity_sun/both.fa
bowtie_reads_fa: /dev/shm/trinity_sun/both.fa
chrysalis_reads_fa: /dev/shm/trinity_sun/both.fa
* [Thu Jul 16 16:58:52 2020] Running CMD: /home/local/software/trinity/trinityrnaseq-v2.11.0/util/support_scripts/filter_iworm_by_min_length_or_cov.pl /dev/shm/trinity_sun/inchworm.fa 100 10 > /dev/shm/trinity_sun/chrysalis/inchworm.fa.min100
* [Thu Jul 16 17:00:49 2020] Running CMD: /local/software/biobuilds/2017.11/bin/bowtie2-build --threads 32 -o 3 /dev/shm/trinity_sun/chrysalis/inchworm.fa.min100 /dev/shm/trinity_sun/chrysalis/inchworm.fa.min100 1>/dev/null
* [Thu Jul 16 17:05:18 2020] Running CMD: bash -c " set -o pipefail;/local/software/biobuilds/2017.11/bin/bowtie2 --local -k 2 --no-unal --threads 32 -f --score-min G,20,8 -x /dev/shm/trinity_sun/chrysalis/inchworm.fa.min100 /dev/shm/trinity_sun/both.fa  | samtools view -@ 32 -F4 -Sb - | samtools sort -m 4194304000 -@ 32 -no - - > /dev/shm/trinity_sun/chrysalis/iworm.bowtie.nameSorted.bam"



I monitored the job using top and it seems to be running on all threads



Is it normal for this step to take a longer time? 

Not sure if this is informative but the left and right input reads are about 3 G each in size. 

Best wishes
Ollie

Brian Haas

unread,
Jul 17, 2020, 11:35:28 AM7/17/20
to Ollie White, trinityrnaseq-users
It looks like it's still running the bowtie2 step and bowtie2 looks like it's running at peak usage.  If you think there might be a bowtie2 problem, you could kill it and relaunch with --no_bowtie.   A long time ago, we had issues w/ bowtie2 versions locking up, but I haven't heard anything about this in a while... maybe check your bowtie2 software version.

best,

~b

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/df094b23-929f-428d-9bdb-b4487c764469o%40googlegroups.com.

Ollie White

unread,
Jul 17, 2020, 12:58:07 PM7/17/20
to trinityrnaseq-users
Hi Brian, 

Thanks for the reply, I will give it a try. Is the bowtie step important for the final assembly? 

Trinity is using Bowtie 2 version 2.3.1. Would a more recent version potentially resolve the issue? 

Best wishes
Ollie



On Friday, July 17, 2020 at 4:35:28 PM UTC+1, Brian Haas wrote:
It looks like it's still running the bowtie2 step and bowtie2 looks like it's running at peak usage.  If you think there might be a bowtie2 problem, you could kill it and relaunch with --no_bowtie.   A long time ago, we had issues w/ bowtie2 versions locking up, but I haven't heard anything about this in a while... maybe check your bowtie2 software version.

best,

~b

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

Brian Haas

unread,
Jul 17, 2020, 1:01:36 PM7/17/20
to Ollie White, trinityrnaseq-users
This is the version that I've been using for a while now:

It's what I've put in our dockerfile and haven't felt the need to update it:

Can you just use our Docker or Singularity image?

~b


To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/909b357c-896b-43ae-891c-c299c6285e09o%40googlegroups.com.

Brian Haas

unread,
Jul 17, 2020, 2:11:42 PM7/17/20
to Ollie White, trinityrnaseq-users
Also, the bowtie step helps a little bit, but I often wonder if it's worth the effort in a lot of cases.  It wasn't part of the original Trinity, and I added it to help with clustering inchworm contigs based on mate pairs.  The cluster definitions are generally improved, but if it ends up adding a lot to the total runtime, then it might not be worth the effort.  The last release was supposed to further reduce the complexity of the bowtie alignments, and so I'm thinking maybe you're in the 'bowtie2 lock' situation.

If you want to get results sooner than later, then just kill it and restart Trinity with the --no_bowtie parameter.

best,

~b

Ollie White

unread,
Jul 30, 2020, 6:01:00 PM7/30/20
to trinityrnaseq-users
Hi Brian, 

Thanks for the reply, I can't use a docker image or singularity on my universities HPC unfortunately. 

The link you sent seems to be bowtie, not bowtie2 (I think..). Is bowtie2 not the dependency for trinity? 

Apologies if I have missed something simple

Best wishes
Ollie
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Brian Haas

unread,
Jul 30, 2020, 6:59:01 PM7/30/20
to Ollie White, trinityrnaseq-users
Sorry about that.  It looks like we install both bowtie and bowtie2 into the docker.   Here's the bowtie2 that's used in Trinity:

## Bowtie2
WORKDIR $SRC
RUN wget https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.3.4.1/bowtie2-2.3.4.1-linux-x86_64.zip/download -O bowtie2-2.3.4.1-linux-x86_64.zip && \
unzip bowtie2-2.3.4.1-linux-x86_64.zip && \
mv bowtie2-2.3.4.1-linux-x86_64/bowtie2* $BIN && \
rm *.zip && \
rm -r bowtie2-2.3.4.1-linux-x86_64

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/af570bd0-b0ab-460e-bb37-05ed98d9d593o%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages