Is chimeric reads a problem in de novo assembly?

39 views
Skip to first unread message

lamz1...@163.com

unread,
Jun 14, 2016, 9:23:47 AM6/14/16
to BGI-SOAP

Hi, dear all!

 

I want to perform de novo assembly with four libraries (insert size were 270bp, 500bp, 2K and 5K), and the read are paired and the length is 150bp. After mapping the reads to reference with BWA, there are about 1/3 reads were chimeric for the two mate-pair libraries (2K and 5K). I don't know whether I should filter out these reads? There are little information after google. Considering the short libraries were used for constructing contig, then the reads from long libraries are mapped to contigs to link these contigs, in my opinion, the assembly tool still could use chimeric reads to link the contigs. However, my mate think there were rare chimeric reads in previous experiment since reads were short, and the assemble tool may can't deal with chimeric reads. Furthermore, I think if I filter out these reads, then this wouldn't be a true de novo assemble. So should I filter out the chimeric reads in mate-pair library before using SOAPdenovo?

 

Also, there is library with insert size is 270bp, and according to paper titled "Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques", the overlap reads were filter out when using SOAPdenovo, should I filter out these reads to use SOAPdenovo?

 

Any suggestion would be grateful!

lizh...@genomics.cn

unread,
Jun 14, 2016, 11:02:07 PM6/14/16
to bgi-soap
Hi,

When building scaffolds, SOAPdenovo doesn't use all libraries at the same time. It firstly uses library with smallest 'rank' value, which is specificed in config file, then the next library with larger 'rank' value. Generally the short insert size library has small 'rank' value so that it will be used at first and the large insert size library will be used at last. This can reduce the effect of chimeric reads in large insert libraries, though not perfectly.

I think you can filter the overlapping reads before using SOAPdenovo if you are confident that they are contamination.

Best,


--
You received this message because you are subscribed to the Google Groups "BGI-SOAP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bgi-soap+u...@googlegroups.com.
To post to this group, send email to bgi-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bgi-soap/91d0c9a5-9fb7-477c-b73f-a4c8917a09d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lamz1...@163.com

unread,
Jun 15, 2016, 10:41:47 PM6/15/16
to BGI-SOAP, lizh...@genomics.cn
Hi,

Thanks for your reply! Maybe  I hadn't state clearly. For the overlapping reads, they aren't contamination. Because I plan to use SOAPdenovo and ALLPATHS, I design a library that the fragment is 270bp, and the read length is 150bp that is required by ALLPATHS, then I found in the paper, SOAPdenovo filter such overlapping reads, so I wonder why they had this step, could SOAPdenovo use overlap reads in constructure contig?

For the chimeric reads, I wonder whether SOAPdenovo could mapped them to contigs, then link and order the contig correctly, considering the chimeric rate is higher in my data than the paper (30% VS 5%)?

Best wishes!

在 2016年6月15日星期三 UTC+8上午11:02:07,lizh...@genomics.cn写道:

lizh...@genomics.cn

unread,
Jun 16, 2016, 2:07:22 AM6/16/16
to bgi-soap
Hi,

SOAPdenovo certainly can use this kind of reads for contig construction. You can also merge the overlapping PE reads into a single long read and then provide them to SOAPdenovo. Please refer 'COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly' (http://bioinformatics.oxfordjournals.org/content/early/2012/10/08/bioinformatics.bts563.full.pdf). 

Like I said, SOAPdenovo can't perfectly handle the chimeric reads correctly. The only way to find out is to do the assembly and then evaluate the assembly. You may also try to do the assembly after removing these chimeirc reads and compare the assembly to the one without filtering chimeric reads.

Best,


lamz1...@163.com

unread,
Jun 23, 2016, 10:21:14 AM6/23/16
to BGI-SOAP, lizh...@genomics.cn
Hi, thanks for your reply and sorry for the delay of reply.

I had tried to assembly in these two ways.

Best wishes!

在 2016年6月16日星期四 UTC+8下午2:07:22,lizh...@genomics.cn写道:
Reply all
Reply to author
Forward
0 new messages