Scaffolding modes

l.p.p...@gmail.com

unread,

Feb 7, 2017, 9:36:03 AM2/7/17

to stephan...@ugent.be, Redundans

Try it, it depends how close is the genome and how well the synteny (gene order) is conserved.

L.

2017-02-07 15:12 GMT+01:00 <stephan...@ugent.be>:

That's great! I should have read the manual more thoroughly, my apologies (trying too many softwares at once).
Redundans takes into account every scenario (short, hybrid or long). Would it be lenient enough to use a closely related genome for scaffolding?

On Tuesday, February 7, 2017 at 2:44:03 PM UTC+1, lpryszcz wrote:
Hi Staphanie,
:) I think I was not clear, can Redundans use long-read information as well?
Yes
If so, corrected or non-corrected (eg lordec)?
Both
If so, How come Redundans automatically identifies all reads as "pairs"?
See information from Fuyou Fu (assuming that e.g.
You misunderstood, the dataset below is Illumina paired-end (ie. 550_1.fastq.gz and 550_2.fastq.gz) and mate-pairs (ie. 5000_2.fastq.gz and 5000_2.fastq.gz), not long reads...
5000_2.fastq.gz are PacBio reads of 5000bp length on average):
[WARNING] Poor quality: Major orientation (RF) represent 58.04% of pairs in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/2000_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/2000_2.fastq.gz: [9, 4177, 5804, 10]
[WARNING] Poor quality: Major orientation (FR) represent 58.66% of pairs in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/5000_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/5000_2.fastq.gz: [14, 5866, 4105, 15] [WARNING] Poor quality: Major orientation (FR) represent 67.36% of pairs in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/10000_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/10000_2.fastq.gz: [23, 6736, 3214, 27] [WARNING] Highly variable insert size (196 +- 206.38) in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/10000_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/10000_2.fastq.gz! [WARNING] Highly variable insert size (240 +- 270.10) in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/5000_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/5000_2.fastq.gz! [WARNING] Highly variable insert size (411 +- 287.91) in /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/550_1.fastq.gz - /scratch/snyder/f/fu115/Genome_assembly/DBG/Redundans/seq/550_2.fastq.gz!

On Tuesday, February 7, 2017 at 2:06:56 PM UTC+1, lpryszcz wrote:
Dear Stephanie,

As you mentioned, scaffolding with short reads uses paired-end information.
With long reads, you can find individual reads that span gaps, this is connect 2 or more contigs and join them together.

Bests,
L.

L.

2017-02-07 13:57 GMT+01:00 <stephan...@ugent.be>:
Hi,
Did I understand that correctly, that you can scaffold with short (eg illumina) as well as long (eg pacbio) sequences?
How come Redundans automatically identifies all reads as "pairs"?

Cheers,

Stephanie

On Friday, January 27, 2017 at 5:32:13 AM UTC+1, Ching-Ho Chang wrote:
Hi redundans developer,

I tried to use Pacbio sequences to scaffold the genome.
Here is the result.
#fname contigs bases GC [%] contigs >1kb bases in contigs >1kb N50 N90 Ns longest
runmecate/contigs.fa 9274 503102573 36.370 9274 503102573 105793 21932 5492 1053140
runmecate/contigs.reduced.fa 9142 501703970 36.367 9142 501703970 106102 22145 5492 1053140
runmecate/scaffolds.longreads.1.fa 4583 443058017 36.370 4583 443058017 404424 28820 2398945 7166112
runmecate/scaffolds.longreads.fa 4583 443058017 36.370 4583 443058017 404424 28820 2398945 7166112

It greatly increased the N50 and longest contig size while lost many sequences.
I did find some sequences it removed are not redundant sequences.
It lost ~8% BUSCO (Benchmarking Universal Single-Copy Orthologs) genes
I was wondering if there is any parameter I can tune for the long read scaffolding.

Thank you,
Ching-Ho Chang

--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/e3c70eef-e3c9-4152-865c-405f694c6c0c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/6b6bce30-e061-4e01-bdea-d3ffeb2163f8%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Redundans" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redundans+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/f392b670-a954-4153-83cc-b4dee18dc293%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Fuyou Fu

unread,

Feb 7, 2017, 7:35:45 PM2/7/17

to l.p.p...@gmail.com, stephan...@ugent.be, Redundans

Hi Leszek,

The figure is not good like your test data yet with identity=0.95. My genome should have some replication seuqences.

Do you think 0.95 is good identity value based on your experiences?

Thanks,

Fuyou

To view this discussion on the web visit https://groups.google.com/d/msgid/redundans/CAAwYpuaNdJVRZe1_cgkYEJrbmmxRb9hjgNrBREnbKHh%3DAm%2Bm0w%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

Fuyou Fu, Ph.D.
Department of Botany and Plant Pathology
Purdue University
USA

contigs.reduced.fa.hist.png

l.p.p...@gmail.com

unread,

Feb 8, 2017, 5:15:02 AM2/8/17

to Fuyou Fu, Redundans

Yes, this is to be expected, as only alignments with identity >95% were kept. How much of the assembly have been removed this time?

95% is good value, it should get read of regions that are complicating scaffolding process.

If the assembly is too big, you can try 90%.

Hope it helps.

L.

Fuyou Fu

unread,

Feb 8, 2017, 5:51:16 AM2/8/17

to l.p.p...@gmail.com, Redundans

Hi Leszek,

The contigs is about 4Gb before reduced. The reduced contigs is 1Gb.

If I use the identity =61 %, the reduced contigs is 200mb.

My predicted genome size is about 1.63 Gb.

The attachment is my used scafffold as contigs for assembling. The contigs is about 2.1 Gb befored reduced, is 1.9 Gb after reduced.

Do you think is good?

Thanks,

Fuyou

contigs.reduced.fa.hist.png

l.p.p...@gmail.com

unread,

Feb 8, 2017, 6:06:39 AM2/8/17

to Fuyou Fu, Redundans

Try reduced contigs (1G) right away.

You can reduce scaffolds more (also to ~1G) ie identity 0.9.

I guess your final assembly will substantially smaller due to fragmentation and repeats that won't be fully represented in your assembly.

L.

Reply all

Reply to author

Forward