Assembly with and without stranded option

64 views
Skip to first unread message

Brian Lawney

unread,
Feb 8, 2018, 12:23:27 PM2/8/18
to Trans-ABySS

Hello,


I'm playing with TransAbyss by using some mock data and I'm looking for some help with interpreting the results.  Namely, I have simulated strand-specific RNA-seq reads (using BioC Polyester), but transAbyss fails when I use the --SS flag.

Details:
I created a mock genome by sampling some human genes (all from the + strand) and creating an accompanying GTF file.  I use those files to simulate paired-end RNA-seq reads using Bioconductor's Polyester package.  The command (run from within in R session) was:

simulate_experiment(seqpath=SEQPATH, gtf=GTF,

                  strand_specific=T,

                  num_reps=c(1,1),

                  paired=T,

                  fold_changes=c(1,1),

                  reads_per_transcript=1000)

This created paired FASTA for two samples, but I'm only concerned with one of the samples here.  Note the "strand_specific=T" tag, which allegedly generated strand-specific simulated reads.  I aligned these reads with STAR aligner and I get the IGV screenshot below (colored by "first-of-pair strand") which leads me to believe the reads are indeed strand-specific since all the R1 reads directly align to the + strand.  Additionally if I take the R2 read, reverse-complement it, then it aligns to my genome.

If necessary, I put the fasta files in a public storage bucket here:
https://storage.googleapis.com/transabyss-ss-query/sample_01_1.fasta
https://storage.googleapis.com/transabyss-ss-query/sample_01_2.fasta

Anyway, assuming the reads are indeed strand-specific, I put these fasta files through transabyss using the following command (note --SS flag):


/trans-abyss/transabyss-1.5.5/transabyss --pe sample_01_1.fasta sample_01_2.fasta --SS --threads 32


This eventually fails with (full log at https://storage.googleapis.com/transabyss-ss-query/log_with_ss.txt):


Building the character occurrence table...
Mateless       0
Unaligned      0
Singleton     49  0.219%
FR             0
RF             0
FF             0
Different  22322  99.8%
Total      22371
error: the histogram `transabyss-3.hist' is empty
sort: write failed: standard output: Broken pipe


If I remove the --SS flag, it "works", although I'm not certain it's doing the correct thing.  The full log is athttps://storage.googleapis.com/transabyss-ss-query/log_without_ss.txt

In particular, does this seem reasonable? 
When I have run TransAbyss on real data before, the FR field was 54.7% and here it is VERY low:

Building the character occurrence table...
Mateless       0
Unaligned      0
Singleton     55  0.246%
FR           199  0.89%
RF             0
FF             0
Different  22117  98.9%
Total      22371
 



Any takes on this?  Is it just because I have a very small/mock dataset?  Or am I missing something?  I will still take the successful final contigs and see how they align to my genome, etc. and maybe that will shed some light. Thanks! 














Ka Ming Nip

unread,
Feb 8, 2018, 12:41:33 PM2/8/18
to trans...@googlegroups.com
Hi,

When the strand-specific option is used, it is expected that the /2 reads are in the sense orientation whereas the /1 reads are in the anti-sense orientation. This is what dUTP strand-specific RNA-seq reads would look like and it is also illustrated on our wiki:
https://github.com/bcgsc/transabyss/wiki#17-strand-specific-assembly

So, this means your reads are in the exact opposite orientation than expected.
To work around the issue, you just need to rename the /2 reads with a /1 suffix and vice versa.

Hope that helps!

Ka Ming


--
Ka Ming Nip
Graduate Student | Dr. Inanc Birol Lab (BTL)
Canada's Michael Smith Genome Sciences Centre
________________________________________
From: trans...@googlegroups.com [trans...@googlegroups.com] On Behalf Of Brian Lawney [brian...@gmail.com]
Sent: February 8, 2018 9:23 AM
To: Trans-ABySS
Subject: Assembly with and without stranded option

Hello,

I'm playing with TransAbyss by using some mock data and I'm looking for some help with interpreting the results. Namely, I have simulated strand-specific RNA-seq reads (using BioC Polyester), but transAbyss fails when I use the --SS flag.

Details:
I created a mock genome by sampling some human genes (all from the + strand) and creating an accompanying GTF file. I use those files to simulate paired-end RNA-seq reads using Bioconductor's Polyester package. The command (run from within in R session) was:

simulate_experiment(seqpath=SEQPATH, gtf=GTF,
<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

strand_specific=T,
<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

num_reps=c(1,1),
<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

paired=T,
<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

fold_changes=c(1,1),
<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

reads_per_transcript=1000)

This created paired FASTA for two samples, but I'm only concerned with one of the samples here. Note the "strand_specific=T" tag, which allegedly generated strand-specific simulated reads. I aligned these reads with STAR aligner and I get the IGV screenshot below (colored by "first-of-pair strand") which leads me to believe the reads are indeed strand-specific since all the R1 reads directly align to the + strand. Additionally if I take the R2 read, reverse-complement it, then it aligns to my genome.

If necessary, I put the fasta files in a public storage bucket here:
https://storage.googleapis.com/transabyss-ss-query/sample_01_1.fasta
https://storage.googleapis.com/transabyss-ss-query/sample_01_2.fasta

Anyway, assuming the reads are indeed strand-specific, I put these fasta files through transabyss using the following command (note --SS flag):<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

/trans-abyss/transabyss-1.5.5/transabyss --pe sample_01_1.fasta sample_01_2.fasta --SS --threads 32
<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

This eventually fails with (full log at https://storage.googleapis.com/transabyss-ss-query/log_with_ss.txt):
<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

Building the character occurrence table...
Mateless 0
Unaligned 0
Singleton 49 0.219%
FR 0
RF 0
FF 0
Different 22322 99.8%
Total 22371
error: the histogram `transabyss-3.hist' is empty
sort: write failed: standard output: Broken pipe
<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

If I remove the --SS flag, it "works", although I'm not certain it's doing the correct thing. The full log is athttps://storage.googleapis.com/transabyss-ss-query/log_without_ss.txt <https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

In particular, does this seem reasonable? When I have run TransAbyss on real data before, the FR field was 54.7% and here it is VERY low:<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

Building the character occurrence table...
Mateless 0
Unaligned 0
Singleton 55 0.246%
FR 199 0.89%
RF 0
FF 0
Different 22117 98.9%
Total 22371



<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


Any takes on this? Is it just because I have a very small/mock dataset? Or am I missing something? I will still take the successful final contigs and see how they align to my genome, etc. and maybe that will shed some light. Thanks!
<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

[https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s320/igv_ss_stranded.png]<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>

[https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s320/igv_ss_stranded.png]<https://lh3.googleusercontent.com/-PclRhlSpsbo/Wnx_-exLGBI/AAAAAAAACNk/gvW1yUDsd0Y4CHKwJYTmBdy74mcXgjcKwCLcBGAs/s1600/igv_ss_stranded.png>


--
You received this message because you are subscribed to the Google Groups "Trans-ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trans-abyss...@googlegroups.com<mailto:trans-abyss...@googlegroups.com>.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages