Two mate pair libraries have 0 reads mapped during scaffolding

34 views
Skip to first unread message

Ray Cui

unread,
Feb 9, 2017, 9:01:37 AM2/9/17
to BGI-SOAP
Hello,

        I am currently doing de novo assembly with some old (>5years) illumina PE and MP data. An earlier version of the genome is already available, so we know the insert sizes and characteristics of each read library quite well. I determined the read orientation and insert size by mapping to the previous version of reference genome with BWA.

        The strange thing is that 2 libraries (with 50bp x 2 reads, phred33, MP) have 0 connections, while other libraries seem fine (100bp x 2, phred64/phred33, PE + MP). These 2 libraries are contaminated with normal PE reads (about 30% - 50%):

For insert size: 2134
 Total PE links                      0
 Normal PE links on same contig      0
 Incorrect oriented PE links         0
 PE links of too small insert size   0
 PE links of too large insert size   0
 Correct PE links                    0
 Accumulated connections             0
Use contigs longer than 2134 to estimate insert size: 
 PE links               0
Too few PE links.
0 new connections.

===========================
For insert size: 4886
 Total PE links                      0
 Normal PE links on same contig      0
 Incorrect oriented PE links         0
 PE links of too small insert size   0
 PE links of too large insert size   0
 Correct PE links                    0
 Accumulated connections             0
Use contigs longer than 4886 to estimate insert size: 
 PE links               0
Too few PE links.
0 new connections.

        These are the definitions I have in the config file for these two libraries:
[LIB]
avg_ins=2134.14
reverse_seq=1
asm_flags=2
rd_len_cutoff=1000
rank=4
pair_num_cutoff=4
map_len=32
q1=BGI2K_MP_1.fq.gz.trimmed.paired.fq.gz
q2=BGI2K_MP_2.fq.gz.trimmed.paired.fq.gz

[LIB]
avg_ins=4886.89
reverse_seq=1
asm_flags=2
rd_len_cutoff=1000
rank=5
pair_num_cutoff=4
map_len=32
q1=./BGI5K_MP_1.fq.gz.trimmed.paired.fq.gz
q2=./BGI5K_MP_2.fq.gz.trimmed.paired.fq.gz

            Could this be caused by the contaminating PE reads in the MP libraries or does it have something to do with the short read length?
Best Regards
Rongfeng Cui

Ray Cui

unread,
Feb 13, 2017, 11:32:49 AM2/13/17
to BGI-SOAP
I think I figured it out, it's the -k parameter which is by default set to the same number as -K for the contigging stage. After changing it to be 35 it works.

Best
Ray
Reply all
Reply to author
Forward
0 new messages