Scaffolding modes

72 views
Skip to first unread message

l.p.p...@gmail.com

unread,
Feb 7, 2017, 9:36:03 AM2/7/17
to stephan...@ugent.be, Redundans
Try it, it depends how close is the genome and how well the synteny (gene order) is conserved.  

L.

2017-02-07 15:12 GMT+01:00 <stephan...@ugent.be>:
That's great! I should have read the manual more thoroughly, my apologies (trying too many softwares at once).
Redundans takes into account every scenario (short, hybrid or long). Would it be lenient enough to use a closely related genome for scaffolding?


On Tuesday, February 7, 2017 at 2:44:03 PM UTC+1, lpryszcz wrote:
Hi Staphanie, 

:) I think I was not clear, can Redundans use long-read information as well?
Yes 
If so, corrected or non-corrected (eg lordec)?
Both 
If so, How come Redundans automatically identifies all reads as "pairs"? 
See information from Fuyou Fu (assuming that e.g.
You misunderstood, the dataset below is Illumina paired-end (ie. 550_1.fastq.gz and 550_2.fastq.gz) and mate-pairs (ie. 5000_2.fastq.gz and 5000_2.fastq.gz), not long reads... 

 
5000_2.fastq.gz are PacBio reads of 5000bp length on average):
[WARNING] Poor quality: Major orientation (RF) represent 58.04% of pairs in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/2000_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/2000_2.fastq.gz: [9, 4177, 5804, 10]
[WARNING] Poor quality: Major orientation (FR) represent 58.66% of pairs in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/5000_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/5000_2.fastq.gz: [14, 5866, 4105, 15]
[WARNING] Poor quality: Major orientation (FR) represent 67.36% of pairs in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/10000_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/10000_2.fastq.gz: [23, 6736, 3214, 27]
[WARNING] Highly variable insert size (196 +- 206.38) in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/10000_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/10000_2.fastq.gz!
[WARNING] Highly variable insert size (240 +- 270.10) in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/5000_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/5000_2.fastq.gz!
[WARNING] Highly variable insert size (411 +- 287.91) in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/550_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/550_2.fastq.gz!


On Tuesday, February 7, 2017 at 2:06:56 PM UTC+1, lpryszcz wrote:
Dear Stephanie, 

As you mentioned, scaffolding with short reads uses paired-end information. 
With long reads, you can find individual reads that span gaps, this is connect 2 or more contigs and join them together.  

Bests, 
L. 

L.

2017-02-07 13:57 GMT+01:00 <stephan...@ugent.be>:
Hi,
Did I understand that correctly, that you can scaffold with short (eg illumina) as well as long (eg pacbio) sequences?
How come Redundans automatically identifies all reads as "pairs"?

Cheers,

Stephanie

On Friday, January 27, 2017 at 5:32:13 AM UTC+1, Ching-Ho Chang wrote:
Hi redundans developer,

I tried to use Pacbio sequences to scaffold the genome.
Here is the result.
#fname  contigs bases   GC [%]  contigs >1kb    bases in contigs >1kb   N50     N90     Ns      longest
runmecate/contigs.fa    9274    503102573       36.370  9274    503102573       105793  21932   5492    1053140
runmecate/contigs.reduced.fa    9142    501703970       36.367  9142    501703970       106102  22145   5492    1053140
runmecate/scaffolds.longreads.1.fa      4583    443058017       36.370  4583    443058017       404424  28820   2398945 7166112
runmecate/scaffolds.longreads.fa        4583    443058017       36.370  4583    443058017       404424  28820   2398945 7166112

It greatly increased the N50 and longest contig size while lost many sequences.
I did find some sequences it removed are not redundant sequences.
It lost ~8% BUSCO (Benchmarking Universal Single-Copy Orthologs) genes
I was wondering if there is any parameter I can tune for the long read scaffolding.

Thank you,
Ching-Ho Chang


--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/e3c70eef-e3c9-4152-865c-405f694c6c0c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/6b6bce30-e061-4e01-bdea-d3ffeb2163f8%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/f392b670-a954-4153-83cc-b4dee18dc293%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Fuyou Fu

unread,
Feb 7, 2017, 7:35:45 PM2/7/17
to l.p.p...@gmail.com, stephan...@ugent.be, Redundans
Hi Leszek,
The figure is not good like your test data yet with identity=0.95. My genome should have some replication seuqences.
Do you think 0.95 is good identity value based on your experiences?
Thanks,
Fuyou


For more options, visit https://groups.google.com/d/optout.



--
Fuyou Fu, Ph.D.
Department of Botany and Plant Pathology
Purdue University
USA

contigs.reduced.fa.hist.png

l.p.p...@gmail.com

unread,
Feb 8, 2017, 5:15:02 AM2/8/17
to Fuyou Fu, Redundans
Yes, this is to be expected, as only alignments with identity >95% were kept. How much of the assembly have been removed this time? 
95% is good value, it should get read of regions that are complicating scaffolding process.
If the assembly is too big, you can try 90%. 

Hope it helps.
L.

Fuyou Fu

unread,
Feb 8, 2017, 5:51:16 AM2/8/17
to l.p.p...@gmail.com, Redundans
Hi Leszek,
The contigs is about 4Gb before reduced. The reduced contigs is 1Gb.
If I use the identity =61 %, the reduced contigs is 200mb.
My predicted genome size is about 1.63 Gb.

The attachment is my used scafffold as contigs for assembling. The contigs is about 2.1 Gb befored reduced, is 1.9 Gb after reduced.
Do you think is good?
Thanks,
Fuyou
contigs.reduced.fa.hist.png

l.p.p...@gmail.com

unread,
Feb 8, 2017, 6:06:39 AM2/8/17
to Fuyou Fu, Redundans
Try reduced contigs (1G) right away. 
You can reduce scaffolds more (also to ~1G) ie identity 0.9. 

I guess your final assembly will substantially smaller due to fragmentation and repeats that won't be fully represented in your assembly. 
L.
Reply all
Reply to author
Forward
0 new messages