one PE library unused in scaffolding

97 views
Skip to first unread message

Richard Buggs

unread,
Jun 15, 2012, 7:00:21 AM6/15/12
to bgi-...@googlegroups.com
Hi all,

I am running a SOAPdenovo assembly with Illumina PE and MP libraries with 200, 500, 800, 2000 and 5000bp insert sizes (sequenced at BGI). There are 381,167,444 reads in total, of which 80,391,068 are in the 500bp library. All the reads load OK in the pregraph, contig and map steps of the assembly pipeline, but when I get to the scaff step, the 500bp library seems to fail to load properly. In the log of the scaff I get:

0 PEs with insert size 500 attached, 36508765 + 0 + 0 ignored
on contigs longer than 500, 0 pairs found,insert_size estimated: 0
0 new connections

 The command I am running is:
./SOAPdenovo63mer scaff -g EC -F > scaff19.log

The full log file of the scaff step is below.

I would be grateful for any advice helping to resolve this.

many thanks

Richard

------
Scaff.log

Version 1.05: released on July 29th, 2010

there're 6 grads, 381167444 reads, max read len 95
K = 33
there're 15353747 edge in edge file
average contig coverage is 25, 3304396 contig masked
Mask contigs shorter than 35, 7815986 contig masked
21073689 arcs loaded
input 7651783 contigs
done loading updated edges
time spent on loading edges 88s

22994701 PEs with insert size 200 attached, 6893 + 474309 + 0 ignored
estimated PE size 189, by 18554312 pairs
on contigs longer than 200, 15964778 pairs found,SD=12, insert_size estimated: 190
6670123 new connections
0 PEs with insert size 500 attached, 36508765 + 0 + 0 ignored
on contigs longer than 500, 0 pairs found,insert_size estimated: 0
0 new connections

23923032 PEs with insert size 800 attached, 2474 + 30247 + 0 ignored
estimated PE size 756, by 3618798 pairs
on contigs longer than 800, 3498249 pairs found,SD=63, insert_size estimated: 773
7345614 new connections
21245228 PEs with insert size 2000 attached, 3296 + 411214 + 0 ignored
estimated PE size 1841, by 579710 pairs
on contigs longer than 2000, 508826 pairs found,SD=332, insert_size estimated: 2121
10134492 new connections
3634696 PEs with insert size 5000 attached, 8759 + 638 + 0 ignored
estimated PE size 2048, by 11644 pairs
on contigs longer than 5000, 4442 pairs found,SD=872, insert_size estimated: 4894
2003361 new connections
3705365 PEs with insert size 5000 attached, 5568 + 692 + 0 ignored
estimated PE size 2007, by 11961 pairs
on contigs longer than 5000, 4467 pairs found,SD=845, insert_size estimated: 4898
2111793 new connections
all PEs attached
time spent on loading pair end info 529s

4830869 link to masked contigs, 0 links on a single scaff
Insert size 200: 6670122 links input
Cutoff for number of pairs to make a reliable connection: 3
1712300 weak connects removed (there were 3678506 active cnnects))
85 circles removed 
variance for insert size 20
a remove transitive lag, 31137 connections removed
a remove transitive lag, 0 connections removed
Picked  127642 subgraphs,7 have conflicting connections,117010 have significant overlapping, 18 eligible
maskRepeat: 85415 contigs masked from 96659 puzzles
a remove transitive lag, 358 connections removed
a remove transitive lag, 0 connections removed
Picked  8236 subgraphs,0 have conflicting connections,8092 have significant overlapping, 1 eligible
Masked 7082 contigs, 0 puzzle left
Freezing is done....

the 1 rank
302356 scaffolds from 1577863 contigs sum up 247608010bp, with average length 818, 2 gaps filled
1117781 scaffolds&singleton sum up 396930629bp, with average length 355
the longest is 25453bp,scaffold N50 is 661 bp, scaffold N90 is 125 bp

Insert size 500: 0 links input

5348133 link to masked contigs, 0 links on a single scaff
Insert size 800: 7345614 links input
Cutoff for number of pairs to make a reliable connection: 3
1954282 weak connects removed (there were 3563932 active cnnects))
92 circles removed 
variance for insert size 30
a remove transitive lag, 98955 connections removed
a remove transitive lag, 4192 connections removed
a remove transitive lag, 244 connections removed
a remove transitive lag, 14 connections removed
a remove transitive lag, 1 connections removed
a remove transitive lag, 0 connections removed
Picked  299749 subgraphs,280 have conflicting connections,223462 have significant overlapping, 295 eligible
maskRepeat: 35471 contigs masked from 174739 puzzles
a remove transitive lag, 11444 connections removed
a remove transitive lag, 265 connections removed
a remove transitive lag, 7 connections removed
a remove transitive lag, 2 connections removed
a remove transitive lag, 0 connections removed
Picked  81246 subgraphs,207 have conflicting connections,69985 have significant overlapping, 120 eligible
Masked 48066 contigs, 10 puzzle left
Freezing is done....

the 2 rank
205803 scaffolds from 1577863 contigs sum up 398678077bp, with average length 1937, 2 gaps filled
808483 scaffolds&singleton sum up 491341098bp, with average length 607
the longest is 70220bp,scaffold N50 is 2522 bp, scaffold N90 is 177 bp

Report from smallScaf: 223437 scaffolds by smallPE
7235968 link to masked contigs, 788290 links on a single scaff
Insert size 2000: 10134492 links input
Cutoff for number of pairs to make a reliable connection: 5
downSliding is done...orienConflict 241976, fall inside 641577
1844714 weak connects removed (there were 1945680 active cnnects))
360 circles removed 
variance for insert size 50
a remove transitive lag, 13535 connections removed
a remove transitive lag, 4 connections removed
a remove transitive lag, 0 connections removed
Picked  35589 subgraphs,827 have conflicting connections,29138 have significant overlapping, 282 eligible
maskRepeat: 8891 contigs masked from 27960 puzzles
a remove transitive lag, 597 connections removed
a remove transitive lag, 0 connections removed
Picked  17333 subgraphs,690 have conflicting connections,16317 have significant overlapping, 110 eligible
Masked 14186 contigs, 0 puzzle left
Freezing is done....

the 3 rank
108804 scaffolds from 1577863 contigs sum up 403995335bp, with average length 3713, 2 gaps filled
765915 scaffolds&singleton sum up 504285138bp, with average length 658
the longest is 169177bp,scaffold N50 is 6220 bp, scaffold N90 is 166 bp

2787832 link to masked contigs, 585377 links on a single scaff
Insert size 5000: 4115154 links input
0 link to masked contigs, 1 links on a single scaff
Insert size 5000: 1 links input
Cutoff for number of pairs to make a reliable connection: 5
Report from checkScaf: 27 scaffold segments broken
downSliding is done...orienConflict 46772, fall inside 264843
327450 weak connects removed (there were 419658 active cnnects))
2382 circles removed 
variance for insert size 50
a remove transitive lag, 8017 connections removed
a remove transitive lag, 2 connections removed
a remove transitive lag, 0 connections removed
Picked  18447 subgraphs,638 have conflicting connections,14344 have significant overlapping, 225 eligible
maskRepeat: 2141 contigs masked from 14534 puzzles
a remove transitive lag, 250 connections removed
a remove transitive lag, 0 connections removed
Picked  11716 subgraphs,612 have conflicting connections,10826 have significant overlapping, 167 eligible
non-strict linearization
Picked  11062 subgraphs,875 have conflicting connections,8658 have significant overlapping, 162 eligible
Masked 7779 contigs, 3 puzzle left
Freezing is done....
39178 contigs recovered
all links loaded
time spent on creating scaffolds 687s

the final rank
90204 scaffolds from 1577863 contigs sum up 466793117bp, with average length 5174, 2 gaps filled
710937 scaffolds&singleton sum up 560416754bp, with average length 788
the longest is 274889bp,scaffold N50 is 11754 bp, scaffold N90 is 199 bp
Found 7 weak points in scaffolds

Start to load reads for gap filling. 50 length discrepancy is allowed
...
Loaded 152228207 reads from EC.readInGap
8 thread created
...
Processed 1000 scaffolds
Processed 90000 scaffolds
Done with 90204 scaffolds, 787077 gaps finished, 1177379 gaps overall
Threads processed 90204 scaffolds
time elapsed: 36m

Ruibang Luo

unread,
Jun 17, 2012, 11:04:27 PM6/17/12
to bgi-...@googlegroups.com
The reads your are using is shorter than the kmer your are using.

rb



--
You received this message because you are subscribed to the Google Groups "BGI-SOAP" group.
To view this discussion on the web visit https://groups.google.com/d/msg/bgi-soap/-/0z5kD2BQu2UJ.
To post to this group, send email to bgi-...@googlegroups.com.
To unsubscribe from this group, send email to bgi-soap+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/bgi-soap?hl=en.

Richard Buggs

unread,
Jun 18, 2012, 4:39:03 AM6/18/12
to bgi-...@googlegroups.com
I am using a kmer of 33 and the read length for the 500bp insert library is 100bp, so this can't be the problem.

Richard
-- 
Richard Buggs MA, DPhil
NERC Fellow & Senior Lecturer
School of Biological and Chemical Sciences
Queen Mary, University of London
London
E1 4NS
United Kingdom

email: r.b...@qmul.ac.uk
website: http://www.sbcs.qmul.ac.uk/staff/richardbuggs.html
office: +44(0)207 882 3058
mobile: +44(0)772 992 0401
twitter: @RJABuggs

Yunjie Liu

unread,
Jun 18, 2012, 4:55:17 AM6/18/12
to bgi-...@googlegroups.com
Hi,

I think it maybe due to reverse_seq field in 500bp insert size library configuration. Try use reverse_seq=0 and see if it works.

2012/6/18 Richard Buggs <r.b...@qmul.ac.uk>
For more options, visit https://groups.google.com/groups/opt_out.



--
Sincerely,
Yunjie Liu

Richard Buggs

unread,
Jun 18, 2012, 4:58:33 AM6/18/12
to bgi-...@googlegroups.com
Hi Yunjie Liu,

I already have reverse_seq=0 for that library in the cont.config file.

Richard

Ruibang Luo

unread,
Jun 18, 2012, 5:12:53 AM6/18/12
to bgi-...@googlegroups.com
Would you mind send me the configuration file and the head ten lines of both the FASTQ files of the 500 IS lib?

rb
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Richard Buggs

unread,
Jun 18, 2012, 6:58:05 AM6/18/12
to bgi-...@googlegroups.com
max_rd_len=95
[LIB]
avg_ins=200
reverse_seq=0
asm_flags=3
rank=1 
q1=/data_n2/rbuggs/HK11583_BETaezD/filter_data/200BP/SZAXPI001450-5_1.fq
q2=/data_n2/rbuggs/HK11583_BETaezD/filter_data/200BP/SZAXPI001450-5_2.fq
[LIB]
avg_ins=500
reverse_seq=0
asm_flags=3
rank=2
q1=/data_n2/rbuggs/HK11583_BETaezD/filter_data/500BP/SZAIPI001449-6_1.fq
q2=/data_n2/rbuggs/HK11583_BETaezD/filter_data/500BP/SZAIPI001449-6_1.fq
[LIB]
avg_ins=800
reverse_seq=0
asm_flags=3
rank=3
q1=/data_n2/rbuggs/HK11583_BETaezD/filter_data/800BP/SZAMPI001448-7_1.fq
q2=/data_n2/rbuggs/HK11583_BETaezD/filter_data/800BP/SZAMPI001448-7_2.fq
[LIB]
avg_ins=2000
reverse_seq=1
asm_flags=3
rank=4
q1=/data_n2/rbuggs/HK11583_BETaezD/filter_data/2000BP/BETaezDAADWAAPEI-37_1.fq
q2=/data_n2/rbuggs/HK11583_BETaezD/filter_data/2000BP/BETaezDAADWAAPEI-37_2.fq
[LIB]
avg_ins=5000
reverse_seq=1
asm_flags=3
rank=5
q1=/data_n2/rbuggs/HK11583_BETaezD/filter_data/5000BP/I246_BETaezDABDLAAPEI-33_1.fq
q2=/data_n2/rbuggs/HK11583_BETaezD/filter_data/5000BP/I246_BETaezDABDLAAPEI-33_2.fq
[LIB]
#I have two runs of 5000BP, which I want to give the same rank
avg_ins=5000
reverse_seq=1
asm_flags=3
rank=5
q1=/data_n2/rbuggs/HK11583_BETaezD/filter_data/5000BP/I247_BETaezDABDLAAPEI-33_1.fq
q2=/data_n2/rbuggs/HK11583_BETaezD/filter_data/5000BP/I247_BETaezDABDLAAPEI-33_2.fq
cont.config (END) 

/data_n2/rbuggs/HK11583_BETaezD/filter_data/500BP/SZAMPI001448-7_1.fq

@FCC0BFYACXX:2:1101:1472:2089#GCCAATAT/1
AATGGACGAGTTCGAGCTCGAGCTCGCACAAGATTTTTCTTTGGCGAGCCGAGCTCAAGCCTAGCTTTTGATACTCGTCACAAGCTCGAGGTCGG
+
bb_eeeeeeegggefffhihghaggfhdfihfegfgihdgiiihfhieeecc^_abccccT_`bbc_bcbc`cbbbaaccccccccbBBBBBBBB
@FCC0BFYACXX:2:1101:1308:2094#GCCAATAT/1
GACACTAATGGTTGAAGTGAATTCTCCGAAGAGAAAATGGATTATGGGAGTGTGTGACTTGAACTATTGATTAGTCCGTGCAGATATATTACTTA
+
___cceeeggcegfdghaggegfbghfgcWb_cgf`Zff]aacgfbgh]ccdgeefSWVV\_\\bfbghZbdg_^V\a^^WU___TZ]Zac_]`b
@FCC0BFYACXX:2:1101:1450:2097#GCCAATAT/1
AAGAGCACTCATCCATAAACCGGTTACTGGTACAAATAACATAAAGAAATGTAACCAACGTTTATTGGAAAAAGCAACCCCAAAGATTTGGGACC


/data_n2/rbuggs/HK11583_BETaezD/filter_data/500BP/SZAIPI001449-6_1.fq

@FCC0BFYACXX:2:1101:1472:2089#GCCAATAT/2
TTAGATGAAGAAATTTACATGCAAATGCCTTAAGGGTTTGTTGTCAAGGGGGTGTCTCAAGTATGTAAACTTCAAAAATCACTTTATGGACTGAA
+
___cccecggfgehiiifiicfihdggdhhhhhhiidgfhhhhghhfhhhffS_\dgfgfgeeeeeecbddddbdcccccbbcccbcccbbca`b
@FCC0BFYACXX:2:1101:1308:2094#GCCAATAT/2
TTTTATTTGATTCCCAAAAAATAAAAAATATCGAGACTTTTTACATACAAAGAATTTACTTGTTACTAATAGAGTCTAGCCTCTGTTCTTCTTTA
+
___eacccggggaeegafgeU`eaggihbfbefffXfghhdg_ccf]b_ffe_fged`f_cgdgbgddgdgeeeb]acbb_bbc`T_Z``bbb`b
@FCC0BFYACXX:2:1101:1450:2097#GCCAATAT/2
ATGTTACGTCAATTTGAACTTGCTCGATCTGTTCAATTGCGACCTTATAATGCAATCGCATTCTCTGGTCCAATTGCTGTTTTTGTTTCTGTATT

>>> To unsubscribe from this group, send email to bgi-soap+unsubscribe@googlegroups.com.
>>> For more options, visit this group at http://groups.google.com/group/bgi-soap?hl=en.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups "BGI-SOAP" group.
>> To post to this group, send email to bgi-...@googlegroups.com.
>> To unsubscribe from this group, send email to bgi-soap+unsubscribe@googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/bgi-soap?hl=en.
>
> --
> Richard Buggs MA, DPhil
> NERC Fellow & Senior Lecturer
> School of Biological and Chemical Sciences
> Queen Mary, University of London
> London
> E1 4NS
> United Kingdom
>
> website: http://www.sbcs.qmul.ac.uk/staff/richardbuggs.html
> office: +44(0)207 882 3058
> mobile: +44(0)772 992 0401
> twitter: @RJABuggs
>
> --
> You received this message because you are subscribed to the Google Groups "BGI-SOAP" group.
> To post to this group, send email to bgi-...@googlegroups.com.
> To unsubscribe from this group, send email to bgi-soap+unsubscribe@googlegroups.com.

李振宇

unread,
Jun 18, 2012, 7:38:02 AM6/18/12
to bgi-...@googlegroups.com
The q1 and q2 were both assigned as read1 file '/data_n2/rbuggs/HK11583_BETaezD/filter_data/500BP/SZAIPI001449-6_1.fq'. 

Please assign read2 file to q2 and rerun the assembly.


发件人: bgi-...@googlegroups.com [bgi-...@googlegroups.com] 代表 Richard Buggs [r.b...@qmul.ac.uk]
发送时间: 2012年6月18日 18:58
到: bgi-...@googlegroups.com
主题: Re: [BGI-SOAP:492] one PE library unused in scaffolding

To view this discussion on the web visit https://groups.google.com/d/msg/bgi-soap/-/neNTnR34rGwJ.

To post to this group, send email to bgi-...@googlegroups.com.
To unsubscribe from this group, send email to bgi-soap+u...@googlegroups.com.

Richard Buggs

unread,
Jun 18, 2012, 8:26:58 AM6/18/12
to bgi-...@googlegroups.com
aha, thanks! That would explain it!

Richard

To unsubscribe from this group, send email to bgi-soap+u...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/bgi-soap?hl=en.
-- 
Richard Buggs MA, DPhil
NERC Fellow & Senior Lecturer
School of Biological and Chemical Sciences
Queen Mary, University of London
London
E1 4NS
United Kingdom

twitter: @RJABuggs

Reply all
Reply to author
Forward
0 new messages