Most Efficient Means to Find Homologous Genes in Metagenome Datasets

61 views

img-Data-Contentimg-USER-INTERFACEmetagenome

Skip to first unread message

Bradley Tolar

unread,

Jan 30, 2017, 6:22:45 PM1/30/17

to IMG User Forum

Hello everyone,

I have a rather simple goal that seems to be incredibly difficult using IMG's restrictions. I want to gather the available diversity of sequences from a specific archaeal gene in metagenome datasets; however, I am limited in BLAST to only 20 metagenomes at a time (despite it saying there is a max of 100 in the search). This is next to impossible to do, if only because it is so difficult to keep track of which datasets you have searched out of the >7000 available (very few subsets within the "Tree" view are less than 20). I'd be more than willing to increase the specificity or change parameters (such as excluding all sequences shorter than a specified length) if this would allow more capacity to search.

So far my best success has been to make a few Workspaces of metagenomes and go from there, but this is incredibly time-consuming, and wastes a lot of time because many of the datasets don't have any of my genes of interest to start with.

Due to difficulties and inaccuracies in annotation pipelines, I cannot simply search using a locus tag (which would only let me search 50 at a time). And unfortunately NCBI itself (which has no limits) does not have anywhere close to the breadth of ecosystem types as IMG has.

Any tips?

Thanks!
-Bradley

Barbara MacGregor

unread,

Jan 31, 2017, 4:32:22 PM1/31/17

to IMG User Forum

Hi,

This isn't a HUGE help, but could let you keep track of what you've finished - make a Workspace file with everything, then delete sequences from it as you check them. Like using trees to keep track of things in Arb.

Hopefully there's a better answer!

Barbara

Reply all

Reply to author

Forward

0 new messages