Hi,
I have been using Abyss to reconstruct contigs missing from the reference genome of the Platypus.
I have around 30 samples and I am reconstructing contigs from the reads that did not map to the reference genome.
In most samples, Abyss is giving great results, however I noticed a specific issue and I would like to know more about this.
In some individuals, the samples are contaminated with viral or bacterial DNA. When I run Abyss on these samples, I can reconstruct the entire genome of these micro-organisms, but nothing else! I then have to do a supplementary step of realigning all the unmapped reads to these reconstructed genomes to clean the data, and then I relaunch Abyss. In this second round, I get many contigs that are likely platypus DNA.
Although this iterative approach solves the problem (mostly), I am wondering why Abyss is not able to pick up the platypus contigs in the first place. I notice that this happens when these contaminating genomes are highly covered, so is there something in the Abyss algorithm that considers only these bacterial/viral k-mer as signal and the others as noise because of the differences in coverage between the sequences? This is a likely explanation for me, however I am not sure I understand at which step in the Abyss algorithm this would happen and I am curious of understanding this issue further.
Many thanks and best wishes,
Julie
--
Julie Hussin, PhD
Human Frontiers Postdoctoral Fellow
WTCHG, University of Oxford