How to performe an "Exhaustive" search for contaminants

10 views
Skip to first unread message

Matthew Oldach

unread,
May 21, 2020, 8:25:11 PM5/21/20
to Edwards Lab Tools

Most of the time the contaminating species are known ahead of time and used for database(s) assembly.

I would like to use DeconSeq to remove contaminants from human WGS data derived from Saliva.

I started out by downloading the Human Oral Microbiome Database (HOMD) and testing this out on a proband first. There are ~1,900 genomes here resulting in 4.0Gb FASTA file which needed to be split in order to build indexs with BWA.  Now, I would like to perform a more exhaustive search for contaminants, including those which may not be known beforehand.

What is the best way of doing this? Is there a FASTA file somewhere which has all of the known genomes (I could then subset by removing human and search for everything else).

Matthew Oldach

unread,
Jun 8, 2020, 9:00:35 PM6/8/20
to Edwards Lab Tools
The correct answer is to go to NCBIs RefSeq Genomes and follow their readme for how to download.
Reply all
Reply to author
Forward
0 new messages