Assembly of multiple biological replicates/libraries

110 views
Skip to first unread message

Charles Foster

unread,
Oct 9, 2019, 10:19:18 PM10/9/19
to Trans-ABySS
Dear Trans-ABySS users,

I am interested in trying out Trans-ABySS to assemble a transcriptome. Here's a summary of the data I have:
  • Stranded RNAseq
  • Three experimental conditions; four replicates of each condition = 12 biological replicates
  • 24 fastq reads files: 12 forward + 12 reverse for each biological replicate
If I only had one forward file and one reverse file I would use the following command:

transabyss -k ${kmer} --pe reads_R1.fq reads_R2.fq --SS --outdir results --name ${name} --threads $THREADS

However, I would like the assembly to be based on all biological replicates. How would I do this considering I have 12 forward and 12 reverse files? Can I provide something like a comma-separated list (e.g., --pe reads_R1_1.fq,reads_R1_2.fq,reads_R1_3.fq etc.).

Alternatively, do I need to combine all forward reads into one file and all reverse reads into another file? If so, what's the best way to do this for my stranded data?

Many thanks,
Charles

Ka Ming Nip

unread,
Oct 10, 2019, 11:01:19 AM10/10/19
to Trans-ABySS

Hi Charles,


You don't need to combine the read files. You can specify all pairs of files in a single command, ie.


transabyss --SS --pe replicate_01_R1.fq replicate_01_R2.fq replicate_02_R1.fq replicate_02_R2.fq ... replicate_12_R1.fq replicate_12_R2.fq ...


Hope that helps!


Ka Ming


--
Ka Ming Nip
Graduate Student | Dr. Inanc Birol Lab
Canada's Michael Smith Genome Sciences Centre

From: trans...@googlegroups.com <trans...@googlegroups.com> on behalf of Charles Foster <charles...@gmail.com>
Sent: October 9, 2019 7:19 PM
To: Trans-ABySS
Subject: Assembly of multiple biological replicates/libraries
 
--
You received this message because you are subscribed to the Google Groups "Trans-ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trans-abyss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trans-abyss/1676c212-a440-4a7b-9fbb-213d82f684e3%40googlegroups.com.

Charles Foster

unread,
Oct 10, 2019, 11:59:30 PM10/10/19
to Trans-ABySS
Hi Ka,

Thanks for that- too easy.

Cheers,
Charles
To unsubscribe from this group and stop receiving emails from it, send an email to trans...@googlegroups.com.

Charles Foster

unread,
Nov 11, 2019, 6:54:30 PM11/11/19
to Trans-ABySS
Hi Ka,

I've got a couple of quick follow-up question that I felt didn't warrant a new topic, so I thought I'd just post here!

I've assembled a different data set using trans-abyss. Afterwards I realised that I accidentally left in the --SS option, even though this data set comprises reads that are not strand specific. Is this an issue? Is there a difference inherent to the --SS option that will lead to a poorer assembly if the data are not strand specific? Of course I can just assemble the data set again without the --SS option, but I'd prefer not to use up compute resources unless necessary.

Additionally, I just noticed in the advanced options that --useblat can remove redundant sequences. Is this option on by default, or does it need to be invoked? I previously did not include it in my assembly command, and I do seem to have a fair few redundant sequences. Is it something I can use post-hoc leveraging the files created in the initial trans-abyss run, or would I need to start again?

Thanks again,
Charles

Ka Ming Nip

unread,
Nov 11, 2019, 9:04:58 PM11/11/19
to trans...@googlegroups.com

Hi Charles,


Yes, there is a difference. If you use the `--SS` option for non-strand specific data, then the assembly will be less contiguous and you will also see more duplicated sequences. That is why it is not turn on by default.


The `--useblat` option is better at removing duplicated sequences during the initial assembly steps, but it is quite slow. If redundant sequences is an issue, you can use `transabyss-merge` after the assembly has completed.


Ka Ming


--
Ka Ming Nip
Graduate Student | Dr. Inanc Birol Lab
Canada's Michael Smith Genome Sciences Centre
Sent: November 11, 2019 3:54 PM
To: Trans-ABySS
Subject: Re: Assembly of multiple biological replicates/libraries
 
To unsubscribe from this group and stop receiving emails from it, send an email to trans-abyss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trans-abyss/53c15451-5d92-49e0-b450-b6798ab3e3af%40googlegroups.com.

Charles Foster

unread,
Nov 11, 2019, 9:11:43 PM11/11/19
to Trans-ABySS
Hi Ka,

Thanks for the fast reply and clarification. I'll re-assemble without the `--SS` option. I'm also currently already assembling with four different k-mer sizes then merging the final assembly with `transabyss-merge`. Does this mean there is no benefit to also using `--useblat`?

Cheers,
Charles

Ka Ming Nip

unread,
Nov 12, 2019, 12:05:40 PM11/12/19
to Trans-ABySS

Hi Charles,


There are benefits in removing low coverage sequencing errors early in the assembly process. Since you are already using multiple k-mer sizes and merging the assemblies with `transabyss-merge`, the differences are less noticeable.


Ka Ming


--
Ka Ming Nip
Graduate Student | Dr. Inanc Birol Lab
Canada's Michael Smith Genome Sciences Centre
Sent: November 11, 2019 6:11 PM
To unsubscribe from this group and stop receiving emails from it, send an email to trans-abyss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trans-abyss/8355a920-2d52-49d6-9dcb-a0438e1ce6c6%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages