Quick questions

65 views

Skip to first unread message

Parit

unread,

Apr 14, 2011, 3:22:08 AM4/14/11

to Zorro - The masked assembler

Hello,

Its great to see a hybrid assembler. I have 454 and Illumina data sets
of EColi MG1655. I want to try Zorro on them and see how good results
I get. Just 2 quick questions : 1. Is there a limit/recommended range
on the k-mer size ? 2. Illumina data is typically a lot more than 10x
so do I need to sample out reads from illumina data before hand?

- Parit

Gustavo Lacerda

unread,

Apr 14, 2011, 6:39:05 AM4/14/11

to zorro-a...@googlegroups.com

Hi Paritt. Thank you for testing Zorro

1. Is there a limit/recommended range on the k-mer size ?

Zorro is a assembly merger. It takes as input pre-assembled contigs (you can use Newbler/Mira/other to assemble 454 reads and velvet/soapdenovo/abyss or other to assemble the illumina reads, for example). The reads file which Zorro requires will be used only to AB INITIO REPEAT DETECTION, not to assembly. That's why we recommend to subsample the reads. The k-mer size of Zorro thus has a complete different meaning and doest not need to be optimized (as in most assembly softwares). The k-mer should be large enough to not expect k-length words ocurring in the genome by chance. k=22 is enough for E. coli since 4^22 is much larger than 5MB.

2. Illumina data is typically a lot more than 10x so do I need to sample out reads from illumina data before hand?

Not required, but subsampling will speed up things and consume less resources. Because the reads will not be used for assembly (only ab initio repeat detection) we do not need to use all reads. Since you have both 454 and Illumina, in your case you could use only 10X coverage of 454 reads, for example (as 454 typically provides more unbiased coverage of the genome)

I will update the documentation to clarify those aspects.

Best regards,
Gustavo

2011/4/14 Parit <parit....@gmail.com>

--
Gustavo Gilson Lacerda Costa
Bioinformatician at State University of Campinas (UNICAMP)
Work:(19)3521-6651 Cell:(19)9243-1559 Skype:gustavo.unicamp
www.researcherid.com/rid/B-6312-2009

Reply all

Reply to author

Forward

0 new messages