Re: [trinityrnaseq-users] Single end and paired end data

1,198 views
Skip to first unread message

Tiago Hori

unread,
May 1, 2015, 1:15:23 PM5/1/15
to duncan...@gmail.com, trinityrn...@googlegroups.com
In pair-end mode, Trinity will discard any reads that have no pairs, so the short answer is no, it is not using them. If you want to use them all, concatenate all your reads in one file and use single end mode.

T.
Ultimate stresses sportsmanship and fair play. Competitive play is encouraged, but NEVER AT THE EXPENSE OF RESPECT BETWEEN PLAYERS, adherence to the rules and the BASIC JOY OF PLAY.

On May 01, 2015, at 02:15 PM, duncan...@gmail.com wrote:

Hi everyone,

I am working on a transcriptome assembly for a species of whiptail lizard known as aspidoscelis inornata. I am using a very large data set of both paired end (100 bp illumina hiseq) and single end data(100bp illumina hiseq). I am currently running version 2.0.6. I followed the instructions on the trinity website and concatenated the single end reads into the left_1.fastq.gz. My assembly is still running, however I am a little concerned about whether or not the single end reads are being used. Both the stdout from trimmomatic step and the length of the trimmed fastq files suggest that the single end reads are no longer included in the trimmed fastq files. Is there a different way I should be designating the single end data? I am new to transcriptome assembly and trinity, so any suggestion would be much appreciated. 

Thanks for the help,
Duncan Tormey

initial command:

/home/dut/local/bin/trinityrnaseq-2.0.6/Trinity --seqType fq --SS_lib_type RF --max_memory 200G --normalize_max_read_cov 50 --min_kmer_cov 2 --trimmomatic --min_contig_length 200 --CPU 20 --output /home/dut/projects/lizard_transcriptomics/assembl2/dmt_transcriptome_assembler/trinity_dmt_out --left left_1.fastq.gz --right right_2.fastq.gz 

numbers of reads in input files:

left_1.fastq.gz: 5,622,334,875 reads

right_1.fastq.gz: 1,238,550,485 reads

number of reads in trimmed files:

left_1.fastq.gz.PwU.qtrim.fq: 1,237,252,378

right_2.fastq.gz.PwU.qtrim.fq: 1,158,590,302

trinity output for trimmomatic step:

---------------------------------------------------------------
------ Quality Trimming Via Trimmomatic  ---------------------
<< ILLUMINACLIP:/home/dut/local/bin/trinityrnaseq-2.0.6/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25 >>
---------------------------------------------------------------

Tuesday, April 21, 2015: 15:36:57CMD: java -jar /home/dut/local/bin/trinityrnaseq-2.0.6/trinity-plugins/Trimmomatic/trimmomatic.jar PE -threads 20 -phred33  /home/dut/projects/lizard_transcriptomics/assembl2/dmt_transcriptome_assembler/left_1.fastq.gz /home/dut/projects/lizard_transcriptomics/assembl2/dmt_transcriptome_assembler/right_2.fastq.gz  left_1.fastq.gz.P.qtrim left_1.fastq.gz.U.qtrim  right_2.fastq.gz.P.qtrim right_2.fastq.gz.U.qtrim  ILLUMINACLIP:/home/dut/local/bin/trinityrnaseq-2.0.6/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25 
TrimmomaticPE: Started with arguments: -threads 20 -phred33 /home/dut/projects/lizard_transcriptomics/assembl2/dmt_transcriptome_assembler/left_1.fastq.gz /home/dut/projects/lizard_transcriptomics/assembl2/dmt_transcriptome_assembler/right_2.fastq.gz left_1.fastq.gz.P.qtrim left_1.fastq.gz.U.qtrim right_2.fastq.gz.P.qtrim right_2.fastq.gz.U.qtrim ILLUMINACLIP:/home/dut/local/bin/trinityrnaseq-2.0.6/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 1238550485 Both Surviving: 1157622971 (93.47%) Forward Only Surviving: 79629407 (6.43%) Reverse Only Surviving: 967331 (0.08%) Dropped: 330776 (0.03%)
TrimmomaticPE: Completed successfully
Tuesday, April 21, 2015: 19:10:50CMD: cat left_1.fastq.gz.P.qtrim left_1.fastq.gz.U.qtrim > left_1.fastq.gz.PwU.qtrim.fq
Tuesday, April 21, 2015: 19:48:07CMD: cat right_2.fastq.gz.P.qtrim right_2.fastq.gz.U.qtrim > right_2.fastq.gz.PwU.qtrim.fq
Tuesday, April 21, 2015: 20:19:53CMD: touch trimmomatic.ok
Tuesday, April 21, 2015: 20:19:53CMD: gzip left_1.fastq.gz.P.qtrim left_1.fastq.gz.U.qtrim right_2.fastq.gz.P.qtrim right_2.fastq.gz.U.qtrim &
Converting input files. (in parallel)Tuesday, April 21, 2015: 20:19:53CMD: /home/dut/local/bin/trinityrnaseq-2.0.6/trinity-plugins/fastool/fastool --rev  --illumina-trinity --to-fasta left_1.fastq.gz.PwU.qtrim.fq >> left.fa 2> left_1.fastq.gz.PwU.qtrim.fq.readcount 
Tuesday, April 21, 2015: 20:19:53CMD: /home/dut/local/bin/trinityrnaseq-2.0.6/trinity-plugins/fastool/fastool --illumina-trinity --to-fasta right_2.fastq.gz.PwU.qtrim.fq >> right.fa 2> right_2.fastq.gz.PwU.qtrim.fq.readcount



--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

duncan...@gmail.com

unread,
May 1, 2015, 3:01:00 PM5/1/15
to trinityrn...@googlegroups.com, duncan...@gmail.com
Thank you for the quick response. 

I guess I misunderstood the FAQ page. I didn't realize that combining single end and paired end data meant you lost the paired end information. The paired end data comes from embryonic and blood RNA libraries. The single end data comes from normal juvenile/adult tail tissue as well as regenerating tail tissue. My rational for including the single end data was that if I was to use only the paired end data to generate a reference transcriptome for differential expression analysis, I could potentially miss transcripts that are unique to the regeneration process. 

Would it be better to assemble the single end data and paired end data separately and then merge the assemblies? Or run them all in single end mode?

I suppose I could always try both...

Thanks again,
Duncan
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

Tiago Hori

unread,
May 1, 2015, 3:13:46 PM5/1/15
to duncan...@gmail.com, trinityrn...@googlegroups.com
I would try both, but my gut tells me the merging is going to give you better results.

T.

Sent from my iPhone
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages