Hello,
I am currently doing de novo assembly with some old (>5years) illumina PE and MP data. An earlier version of the genome is already available, so we know the insert sizes and characteristics of each read library quite well. I determined the read orientation and insert size by mapping to the previous version of reference genome with BWA.
The strange thing is that 2 libraries (with 50bp x 2 reads, phred33, MP) have 0 connections, while other libraries seem fine (100bp x 2, phred64/phred33, PE + MP). These 2 libraries are contaminated with normal PE reads (about 30% - 50%):
For insert size: 2134
Total PE links 0
Normal PE links on same contig 0
Incorrect oriented PE links 0
PE links of too small insert size 0
PE links of too large insert size 0
Correct PE links 0
Accumulated connections 0
Use contigs longer than 2134 to estimate insert size:
PE links 0
Too few PE links.
0 new connections.
===========================
For insert size: 4886
Total PE links 0
Normal PE links on same contig 0
Incorrect oriented PE links 0
PE links of too small insert size 0
PE links of too large insert size 0
Correct PE links 0
Accumulated connections 0
Use contigs longer than 4886 to estimate insert size:
PE links 0
Too few PE links.
0 new connections.
These are the definitions I have in the config file for these two libraries:
[LIB]
avg_ins=2134.14
reverse_seq=1
asm_flags=2
rd_len_cutoff=1000
rank=4
pair_num_cutoff=4
map_len=32
q1=BGI2K_MP_1.fq.gz.trimmed.paired.fq.gz
q2=BGI2K_MP_2.fq.gz.trimmed.paired.fq.gz
[LIB]
avg_ins=4886.89
reverse_seq=1
asm_flags=2
rd_len_cutoff=1000
rank=5
pair_num_cutoff=4
map_len=32
q1=./BGI5K_MP_1.fq.gz.trimmed.paired.fq.gz
q2=./BGI5K_MP_2.fq.gz.trimmed.paired.fq.gz
Could this be caused by the contaminating PE reads in the MP libraries or does it have something to do with the short read length?
Best Regards
Rongfeng Cui