using samples.txt for bio rep, trinity only reads top entry

402 views
Skip to first unread message

z yan wang

unread,
Dec 7, 2016, 6:13:42 PM12/7/16
to trinityrnaseq-users
Hello,

I'm very excited by the --samples_file option in the newest version of Trinity! I'm running into problems getting Trinity to read/use any entry after the top entry. My samples.txt file (tab-delimited) shows 4 conditions, 3 of which have 3 biological replicates, 1 of which has only 2 biological replicates:

Feed Feed_rep1 CR-ZYW-0_R1.fastq.gz CR-ZYW-0_R2.fastq.gz
Feed Feed_rep2 CR-ZYW-A_S3_L007_R1_001.fastq.gz CR-ZYW-A_S3_L007_R2_001.fastq.gz
Feed Feed_rep3 CR-ZYW-B_S4_L007_R1_001.fastq.gz CR-ZYW-2_S4_L007_R2_001.fastq.gz
Fast Fast_rep1 CR-ZYW-C_S5_L007_R1_001.fastq.gz CR-ZYW-C_S5_L007_R2_001.fastq.gz
Fast Fast_rep2 CWR-ZYW-E_S1_L002_R1_001.fastq.gz CWR-ZYW-E_S1_L002_R2_001.fastq.gz
Fast Fast_rep3 CWR-ZYW-F_S2_L002_R1_001.fastq.gz CWR-ZYW-F_S2_L002_R2_001.fastq.gz
Dec Dec_rep1 CR-ZYW-D_S6_L007_R1_001.fastq.gz CR-ZYW-D_S6_L007_R2_001.fastq.gz
Dec Dec_rep2 CWR-ZYW-G_S3_L002_R1_001.fastq.gz CWR-ZYW-G_S3_L002_R2_001.fastq.gz
Juv Juv_rep1 CWR-ZYW-H_S4_L002_R1_001.fastq.gz CWR-ZYW-H_S4_L002_R2_001.fastq.gz
Juv Juv_rep2 CWR-ZYW-I_S5_L002_R1_001.fastq.gz CWR-ZYW-I_S5_L002_R2_001.fastq.gz
Juv Juv_rep3 CWR-ZYW-J_S6_L002_R1_001.fastq.gz CWR-ZYW-J_S6_L002_R2_001.fastq.gz

I'm running Trinity v2.3.2 with --trimmomatic and using the --samples_file option. The issue first appears with Trimmomatic: Trimmomatic only trims the top entry, then Trinity proceeds with in silico normalization:
----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads  ---------------------
----------------------------------------------------------------------------------

---------------------------------------------------------------
------ Quality Trimming Via Trimmomatic  ---------------------
<< ILLUMINACLIP:/apps/software/trinityrnaseq/2.3.2/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25 >>
---------------------------------------------------------------

Friday, December 2, 2016: 16:36:09 CMD: java -jar /apps/software/trinityrnaseq/2.3.2/trinity-plugins/Trimmomatic/trimmomatic.jar PE -threads 8 -phred33  /group/rags-lab/Mom_Sequencing/CR-ZYW-0_R1.fastq.gz /group/rags-lab/Mom_Sequencing/CR-ZYW-0_R2.fastq.gz  CR-ZYW-0_R1.fastq.gz.P.qtrim CR-ZYW-0_R1.fastq.gz.U.qtrim  CR-ZYW-0_R2.fastq.gz.P.qtrim CR-ZYW-0_R2.fastq.gz.U.qtrim  ILLUMINACLIP:/apps/software/trinityrnaseq/2.3.2/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25 
TrimmomaticPE: Started with arguments: -threads 8 -phred33 /group/rags-lab/Mom_Sequencing/CR-ZYW-0_R1.fastq.gz /group/rags-lab/Mom_Sequencing/CR-ZYW-0_R2.fastq.gz CR-ZYW-0_R1.fastq.gz.P.qtrim CR-ZYW-0_R1.fastq.gz.U.qtrim CR-ZYW-0_R2.fastq.gz.P.qtrim CR-ZYW-0_R2.fastq.gz.U.qtrim ILLUMINACLIP:/apps/software/trinityrnaseq/2.3.2/trinity-plugins/Trimmomatic/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 129462733 Both Surviving: 127080559 (98.16%) Forward Only Surviving: 2381584 (1.84%) Reverse Only Surviving: 0 (0.00%) Dropped: 590 (0.00%)
TrimmomaticPE: Completed successfully
Friday, December 2, 2016: 16:53:14 CMD: cp CR-ZYW-0_R1.fastq.gz.P.qtrim CR-ZYW-0_R1.fastq.gz.PwU.qtrim.fq
Friday, December 2, 2016: 16:57:09 CMD: cp CR-ZYW-0_R2.fastq.gz.P.qtrim CR-ZYW-0_R2.fastq.gz.PwU.qtrim.fq
Friday, December 2, 2016: 17:01:47 CMD: touch trimmomatic.ok
Friday, December 2, 2016: 17:01:47 CMD: gzip CR-ZYW-0_R1.fastq.gz.P.qtrim CR-ZYW-0_R1.fastq.gz.U.qtrim CR-ZYW-0_R2.fastq.gz.P.qtrim CR-ZYW-0_R2.fastq.gz.U.qtrim &
---------------------------------------------------------------
------------ In silico Read Normalization ---------------------
-- (Removing Excess Reads Beyond 50 Coverage --
-- /scratch/zwang3/OG_replicate_trinity/insilico_read_normalization --
---------------------------------------------------------------

Friday, December 2, 2016: 17:01:47 CMD: /apps/software/trinityrnaseq/2.3.2/util/insilico_read_normalization.pl --seqType fq --JM 15G  --max_cov 50 --CPU 8 --output /scratch/zwang3/OG_replicate_trinity/insilico_read_normalization   --max_pct_stdev 10000  --SS_lib_type RF  --left CR-ZYW-0_R1.fastq.gz.PwU.qtrim.fq --right CR-ZYW-0_R2.fastq.gz.PwU.qtrim.fq --pairs_together --PARALLEL_STATS  
Converting input files. (both directions in parallel)CMD: /apps/software/trinityrnaseq/2.3.2/util/..//trinity-plugins/fastool/fastool --rev  --illumina-trinity --to-fasta /scratch/zwang3/OG_replicate_trinity/CR-ZYW-0_R1.fastq.gz.PwU.qtrim.fq >> left.fa
CMD: /apps/software/trinityrnaseq/2.3.2/util/..//trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /scratch/zwang3/OG_replicate_trinity/CR-ZYW-0_R2.fastq.gz.PwU.qtrim.fq >> right.fa
Sequences parsed: 127080559
CMD finished (605 seconds)
Sequences parsed: 127080559
CMD finished (898 seconds)
CMD: touch left.fa.ok
CMD finished (0 seconds)
CMD: touch right.fa.ok
CMD finished (1 seconds)
Done converting input files.CMD: cat left.fa right.fa > both.fa
CMD finished (272 seconds)
CMD: touch both.fa.ok

(etc etc...this run of Trinity completed without any errors logged, but the assembly that was created is just an assembly of the top file)

My files are in the working directory and the names of the files are correct. If I change the order that the replicate names are listed, the same thing happens--only the top entry gets trimmed and shuttled along for further analyses. Any insight into this? I'm currently trimming each library individually and will try running Trinity again with the --samples_file option and no --trimmomatic.

Thanks so much,
yan


Brian Haas

unread,
Dec 7, 2016, 6:22:18 PM12/7/16
to z yan wang, trinityrnaseq-users
I'll look into this ASAP tonight and get back to you.

-Brian
(by iPhone)

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

z yan wang

unread,
Dec 7, 2016, 6:37:44 PM12/7/16
to trinityrnaseq-users, yan.w...@gmail.com
Thanks, I really appreciate it!
y

Brian Haas

unread,
Dec 7, 2016, 8:11:09 PM12/7/16
to z yan wang, trinityrnaseq-users
My test run is working as expected...

could you send me your samples file?

thx,

~b

On Wed, Dec 7, 2016 at 6:37 PM, z yan wang <yan.w...@gmail.com> wrote:
Thanks, I really appreciate it!
y

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

z yan wang

unread,
Dec 7, 2016, 10:46:31 PM12/7/16
to trinityrnaseq-users
Yes, here it is.

samples.txt

Brian Haas

unread,
Dec 8, 2016, 6:52:56 AM12/8/16
to z yan wang, trinityrnaseq-users
The samples.txt file looks perfect.  I can't explain why it seems to be misbehaving here. Attached is a slightly modified version of the Trinity script that provides some more logging information about what samples were read in and are being processed at these earlier stages. Using the replacement script, try running Trinity in a new directory (or delete or rename the existing trinity_out_dir) and let's see how it goes.

best,

~b

On Wed, Dec 7, 2016 at 10:46 PM, z yan wang <yan.w...@gmail.com> wrote:
Yes, here it is.

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.
Trinity

z yan wang

unread,
Dec 9, 2016, 12:10:56 PM12/9/16
to trinityrnaseq-users
Thanks--I'm running into some Perl issues on my end so I can't report back on the replacement script yet. 

However, I did try longlisting the --left and --right reads of my samples like this:

 Trinity --seqType fq --max_memory 50G  \
         --left condA_1.fq.gz,condB_1.fq.gz,condC_1.fq.gz \
         --right condA_2.fq.gz,condB_2.fq.gz,condC_2.fq.gz \ 
and that seems to feed into Trinity just fine. I'll try using the samples.txt file again for transcript quantification once the assembly has finished.

I'll update on replacement script once things are sorted out on my end.

Thanks again,
yan

Brian Haas

unread,
Dec 9, 2016, 1:49:41 PM12/9/16
to z yan wang, trinityrnaseq-users
thanks. I'm very curious about what was happening there.  It's troubling to say the least.

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

z yan wang

unread,
Dec 12, 2016, 5:14:07 PM12/12/16
to trinityrnaseq-users, yan.w...@gmail.com

Hi Brian,


Ok, so I ran this version of Trinity (renamed Brian_Trinity) and I got the following error:


Error, more than 4 fields found in samples file line: [Feed Feed_rep1 /scratch/zwang3/OG_replicate_trinity/TrimmedFiles/CR-ZYW-0_R1.fastq.gz.P.qtrim /scratch/zwang3/OG_replicate_trinity/TrimmedFiles/CR-ZYW-0_R2.fJuvt.gz.Juv_rep33 CWR-ZYW-J_S6_L002_R1_001.fastq.gz.P.qtrim       CWR-ZYW-J_S6_L002_R2_001.fastq.gz.P.qtrim]  at ./Brian_Trinity.sh line 3450, <$fh> line 1.



So it looks like there is something secretly wrong with my samples file.
(I did find a small typo in the samples.txt file I sent yo originally: a "2" was supposed to be a "B" or something like that, but that was fixed for this try)

y


Brian Haas

unread,
Dec 12, 2016, 6:18:37 PM12/12/16
to z yan wang, trinityrnaseq-users
I suspected this was the case.  My guess is that there's weird linefeed characters being used here.

If you try

   cat -te samples.txt

you'll see what it 'really looks like', with all characters including whitespace rendered.

It's best to create your samples.txt file from within linux using a text editor like vim, emacs, nano, pico, etc. and saving as raw text.  If you're using some fancy tool or writing it outside of linux and then ftp'ing it over, it's often a recipe for problems.

best

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

z yan wang

unread,
Dec 12, 2016, 7:14:17 PM12/12/16
to trinityrnaseq-users
Thanks so much for your advice and assistance, Brian!
Reply all
Reply to author
Forward
0 new messages