Hi Megha,
Yes, there are some approaches for this. When you annotate the functions as sucrose synthase, for example, are you doing so from the RefSeq functional results, or from the Subsystems results?
Each of these functions is called to a read level, so you can see what else that read maps to.
- If you're calling functions from Subsystems results, you can use the
Subsys_to_RefSeq_mapper.py script to get the RefSeq organism of origin for those reads.
- If you want to find organisms from a specific function in RefSeq, or functions for a specific organism, you can use the
DIAMOND_specific_organism_retriever.py script to pull all the reads for a search criterion, and then annotate that subset against your database of choice.
Either of these approaches can help you get the organisms performing a specific function.
It's been a couple of years since I've last delved into the pipeline, so feel free to ask other questions; I might be a bit rusty but I'll do my best to help out.
Best,
Sam