How to performe an "Exhaustive" search for contaminants

13 views

Skip to first unread message

Matthew Oldach

unread,

May 21, 2020, 8:25:11 PM5/21/20

to Edwards Lab Tools

Most of the time the contaminating species are known ahead of time and used for database(s) assembly.

I would like to use DeconSeq to remove contaminants from human WGS data derived from Saliva.

I started out by downloading the Human Oral Microbiome Database (HOMD) and testing this out on a proband first. There are ~1,900 genomes here resulting in 4.0Gb FASTA file which needed to be split in order to build indexs with BWA. Now, I would like to perform a more exhaustive search for contaminants, including those which may not be known beforehand.

What is the best way of doing this? Is there a FASTA file somewhere which has all of the known genomes (I could then subset by removing human and search for everything else).

Matthew Oldach

unread,

Jun 8, 2020, 9:00:35 PM6/8/20

to Edwards Lab Tools

The correct answer is to go to NCBIs RefSeq Genomes and follow their readme for how to download.

Reply all

Reply to author

Forward

0 new messages