Hi everybody,
After quality trimming (FASTX Toolkit) and rRNA contamination removal (SortMeRNA) I am using Trinity v2.0.6 on SE reads from Ion Proton sequencing machine with the following command:
/opt/trinityrnaseq-2.0.6/Trinity --seqType fq --max_memory 240G --single 001_chip1_qtrimmed_non_rRNA_cleaned.fq,001_chip2_qtrimmed_non_rRNA.fq,001_chip3_qtrimmed_non_rRNA.fq,002_chip1_qtrimmed_non_rRNA_cleaned.fq,002_chip2_qtrimmed_non_rRNA.fq,002_chip3_qtrimmed_non_rRNA_cleaned.fq,003_chip1_qtrimmed_non_rRNA.fq,003_chip2_qtrimmed_non_rRNA_blankline.fq,003_chip3_qtrimmed_non_rRNA.fq,004_chip1_qtrimmed_non_rRNA_cleaned.fq,004_chip2_qtrimmed_non_rRNA.fq,004_chip3_qtrimmed_non_rRNA.fq,005_chip1_qtrimmed_non_rRNA_cleaned.fq,005_chip2_qtrimmed_non_rRNA.fq,005_chip3_qtrimmed_non_rRNA.fq,006_chip1_qtrimmed_non_rRNA_cleaned.fq,006_chip2_qtrimmed_non_rRNA.fq,006_chip3_qtrimmed_non_rRNA.fq,007_chip1_qtrimmed_non_rRNA_cleaned.fq,007_chip2_qtrimmed_non_rRNA.fq,007_chip3_qtrimmed_non_rRNA.fq,008_chip1_qtrimmed_non_rRNA_cleaned.fq,008_chip2_qtrimmed_non_rRNA.fq,008_chip3_qtrimmed_non_rRNA.fq --SS_lib_type F --normalize_reads --min_contig_length 200 --min_kmer_cov 2 --inchworm_cpu 24 --bflyHeapSpaceInit 24G --bflyHeapSpaceMax 240G --bflyCalculateCPU --CPU 24 --output ./trinity_results2 > trinity_proton.log
I got the following error (I have omitted the full path of the files).
-------------------------------------------
----------- Jellyfish --------------------
-- (building a k-mer catalog from reads) --
-------------------------------------------
CMD finished (0 seconds)CMD finished (245 seconds)
CMD: /opt/trinityrnaseq-2.0.6/util/..//trinity-plugins/jellyfish/bin/jellyfish histo -t 24 -o jellyfish.K25.min2.kmers.fa.histo mer_counts.jf
CMD finished (29 seconds)
CMD: /opt/trinityrnaseq-2.0.6/util/..//trinity-plugins/jellyfish/bin/jellyfish dump -L 2 mer_counts.jf > jellyfish.K25.min2.kmers.fa
CMD finished (67 seconds)
CMD: touch jellyfish.K25.min2.kmers.fa.success
CMD finished (0 seconds)
CMD: /opt/trinityrnaseq-2.0.6/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads single.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 --num_threads 24 > single.fa.K25.stats
-reading Kmer occurences...
STATS_GENERATION_TIME: 1476 seconds.
CMD finished (1759 seconds)
CMD: touch single.fa.K25.stats.ok
-sorting each stats file by read name.
CMD finished (0 seconds)
CMD: /usr/bin/sort --parallel=24 -k5,5 -T . -S 240G single.fa.K25.stats > single.fa.K25.stats.sort
CMD finished (356 seconds)
CMD: touch single.fa.K25.stats.sort.ok
CMD finished (0 seconds)
CMD: /opt/trinityrnaseq-2.0.6/util/..//util/support_scripts//nbkc_normalize.pl single.fa.K25.stats.sort 50 200 > single.fa.K25.stats.sort.C50.pctSD200.accs
36101866 / 172720634 = 20.90% reads selected during normalization.
3847930 / 172720634 = 2.23% reads discarded as likely aberrant based on coverage profiles.
0 / 172720634 = 0.00% reads missing kmer coverage (N chars included?).
CMD finished (421 seconds)
CMD: touch single.fa.K25.stats.ied records have been rsort.C50.pctSD200.accs.ok
CMD finished (0 seconds)
Thread 2 terminated abnormally: Error, not all specified records have been retrieved (missing 11490858) from [path to my files] at /opt/trinityrnaseq-2.0.6/util/insilico_read_normalization.pl line 526.
Error encountered with thread.
Error, at least one thread died at /opt/trinityrnaseq-2.0.6/util/insilico_read_normalization.pl line 424.
Error, cmd: /opt/trinityrnaseq-2.0.6/util/insilico_read_normalization.pl --seqType fq --JM 240G --max_cov 50 --CPU 24 –output [path to my files] died with ret 6400 at /opt/trinityrnaseq-2.0.6/Trinity line 2116.
Previously, I deleted one blank line in one file (the error was detected byTrinity). At the same time, I have checked the top few lines of each fastq file and everything seems ok in headers (no spaces). I have also checked the blank lines, indentation, etc. A few lines of one input fastq file:
@C60IL:03002:04349
GCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAGCTCGI am really interested in performing the preprocessing step with Fastx Toolkit as well as remove rRNA before de novo assembly.
Please, let me know your opinion and any possible solution.
Thanks so much!!
Laura
--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.
Ultimate stresses sportsmanship and fair play. Competitive play is encouraged, but NEVER AT THE EXPENSE OF RESPECT BETWEEN PLAYERS, adherence to the rules and the BASIC JOY OF PLAY.
Hello,
Now, I am facing the following error:
“Number of Commands: 118
WARNING, cannot remove output directory /data01/proton/trinity/data_sed/trinity_results/read_partitions/Fb_0/CBin_442/c44235.trinity.reads.fa.out, since not created in this run. (safety precaution)
succeeded(1) 0.847458% completed. WARNING, cannot remove output directory /data01/proton/trinity/data_sed/trinity_results/read_partitions/Fb_0/CBin_442/c44248.trinity.reads.fa.out, since not created in this run. (safety precaution).
.
succeeded(14) 11.8644% completed. OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00002aaf75600000, 17179869184, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 17179869184 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /data01/proton/trinity/data_sed/trinity_results/read_partitions/Fb_0/CBin_442/c44244.trinity.reads.fa.out/hs_err_pid820.log
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00002b4f43780000, 25823281152, 0) failed; error='Impossibile allocare memoria' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 25823281152 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /data01/proton/trinity/data_sed/trinity_results/read_partitions/Fb_0/CBin_444/c44417.trinity.reads.fa.out/hs_err_pid7015.log
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00002b77ae480000, 25746735104, 0) failed; error='Impossibile allocare memoria' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 2
..”
Please, find also attached a file from a Cbin directory..
Before launching the trinity command I set the stack size to unlimited due to “There is insufficient memory for the Java Runtime Environment to continue” error, typing:
ulimit -s unlimited
ulimit -a
What should I do?? Perhaps something related to tweaking Trinity to play nicely with Java or including the --bflyGCThreads parameter in my trinity command (but I am using v2.0.6)??
Thanks a lot for any help!!
L.
On Wed, Jul 8, 2015 at 2:12 PM, Laura Entrambasaguas <lent...@gmail.com> wrote:LauraHi Tiago,Thanks so much!!
Before receiving your email, I have again removed any possible space (cat file.fa | sed '/^$/d;s/[[:blank:]]//g' > output.fa) and now Trinity is running. I suppose I did not remove all spaces..Let's see now what happens!!
Sorry, I have sent the email without finishing it...Hi Brian,
Thanks for your quick but disquieting answer... So,1. Do you thing de novo assembly is wrong although I didn't get any error during the processes and I even got a successfully message at the end of the log file??
2. From this point:- Due to the fastq read names from the Ion Proton Server haven't got the /1 suffix, I suppose I should introduce it in each input file?- After the run, I should find all the input files (24) in the insilico_read_normalization folder.
Thanks a lot for your answers and warning.L.On Mon, Jul 20, 2015 at 4:53 PM, Brian Haas <bh...@broadinstitute.org> wrote:Hi Laura,I think the code still wants to see the /1 suffix. sorry!~brianOn Mon, Jul 20, 2015 at 10:33 AM, Laura Entrambasaguas <lent...@gmail.com> wrote:ThanksDear Brian,But I am working with SE reads, do you still thinking my files must have the /1 or /2 suffix or do you think the Trinity de novo assembly could be wrong??
Hello,
After reading the post of Cristina (both.fa.read_count does not include all reads) I do realize that also the both.fa.read_count file generated after my de novo assembly is disconcerting. This file indicates 44805904 but I performed the assembly from two pooled fastq PE files of 348.521.669 reads each.
I first run Trinity with butterfly settings (--inchworm_cpu 24 --bflyHeapSpaceInit 8G --bflyHeapSpaceMax 240G --bflyCPU 24) but after receiving an error (# There is insufficient memory for the Java Runtime Environment to continue...), I rerun Trinity without those parameters and it seemed that I must have been really close to the end of the butterfly process because the analysis finished in minutes. The .log indicated that everything was completed successfully.
The final Trinity command was:/opt/trinityrnaseq-2.0.6/Trinity --seqType fq --max_memory 240G --left all_1.fq --right all_2.fq --SS_lib_type RF --normalize_reads --min_contig_length 200 --min_kmer_cov 2 --CPU 24 –output /data01/illumina/ddlab.sci.univr.it/results/trinity > trinity.log &
Also .stats seemed ok:
################################
## Counts of transcripts, etc.Total trinity transcripts: 222548
Percent GC: 40.53
########################################
Stats based on ALL transcript contigs:
########################################
Contig N10: 6171
Contig N20: 4780
Contig N30: 3924
Contig N40: 3255
Contig N50: 2690
Median contig length: 785
Average contig: 1439.30
Total assembled bases: 320312502
#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################
Contig N10: 5691
Contig N20: 4304
Contig N30: 3370
Contig N40: 2601
Contig N50: 1886
Median contig length: 427
Average contig: 918.21
Total assembled bases: 95815280
By its part, the
bowtie total
alignment rate was, aprox., 82.76%
(unique, aprox. 18%).
In addition, I've just assessed the read content of the transcriptome assembly, and I got scared when I saw the results:
#read_type count pct
single 1 100.00
Total aligned reads: 1
I've also run another trinity assembly. Both, .log and .stats seems ok, and these latest results are similar to the trinity first.stats results. I've also assessed the read content of this second transcriptome assembly and I get the same results:
#read_type count pct
single 1 100.00
Total aligned reads: 1
I'm really worried about these results, above all because I have been working with the first trinity.fasta assembly for some time. I would really appreciate any advice.
Thanks so much,
--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.
Hi Brian,
Thanks.
Yes, I did realize that the normalization significantly reduces the number of reads, but from 697.043.338 millions of reads to 44.805.904?? It's a huge quantity of "lost" reads for getting a reliable assembly??!!
I know that Trinity in-silico normalization is necessary for large data sets to reduce memory requirements and improve runtimes, but taking into account that these data came from a non-model organism and I'm mainly interested in analysing gene expression (I would want to use as many reads as possible in the assembly to maximize the coverage level), do you still recommend me to perform the in-silico read normalization??
In relation to bowtie_PE step results, was my fault due I forgot the '--' parameter. Now seems that its properly running (its working for than 1.5 h)..Let's hope everything will come right!!
Thanks again!!
Laura
Hi Brian,
Thanks.
Yes, I did realize that the normalization significantly reduces the number of reads, but from 697.043.338 millions of reads to 44.805.904?? It's a huge quantity of "lost" reads for getting a reliable assembly??!!
I know that Trinity in-silico normalization is necessary for large data sets to reduce memory requirements and improve runtimes, but taking into account that these data came from a non-model organism and I'm mainly interested in analysing gene expression (I would want to use as many reads as possible in the assembly to maximize the coverage level), do you still recommend me to perform the in-silico read normalization??
In relation to bowtie_PE step results, was my fault due I forgot the '--' parameter. Now seems that its properly running (its working for than 1.5 h)..Let's hope everything will come right!!
Thanks again!!
Laura
On Mon, Nov 9, 2015 at 12:07 PM, Brian Haas <bh...@broadinstitute.org> wrote: