Annoting to a more general taxonomic classification & functional annotation using other database

138 views
Skip to first unread message

Moya Farahyah

unread,
Jul 22, 2020, 8:03:53 AM7/22/20
to SAMSA bioinformatics group
Hi, Sam. 

I've run the whole thing for organism annotation and apparently the result to my data is so diverse that 60% of my bar graph is occupied by "Other" genus. I've tried to increase the amount of genus to be represented in the bar but the result remains the same. Is there, by any chance, a way to annote it to a more general taxonomic classification (preferably phylum level)? Also, I can't seem to access The SEED download page (ftp://ftp.theseed.org). I'm assuming the site is down for a while (or is it just me?). Is there any other database I can use for functional annotation? I want to try eggNOG but I have no idea which files to download to process it locally. Any ideas?

Sam Westreich

unread,
Sep 4, 2020, 5:49:43 PM9/4/20
to Moya Farahyah, SAMSA bioinformatics group
Hi Moya,

Apologies for the delayed response; I do have a way for annotating to a higher taxonomy level, but I had to hunt down where the script was (I hadn't used it in a bit, and it's not included in the SAMSA2 Github repo because... well, I didn't include it.).  However, I'm attaching it here, and also going to see about adding it.

I'll clean this up a bit before uploading, but here's the usage statement:

USAGE STATEMENT
-Q Enables quiet mode
-F Input file, necessary
-R Reference index file, necessary
-T Final taxonomy level desired: Kingdom, Phylum, Class, Order, Family, Genus
-O Output file (default is input_file.shifted)
-V Verbose mode, shows exceptions
-E Exclusion, will exclude all exceptions if present

So you run this as such:

$ python taxonomy_shifter_v4.py -F input_summary_file_stage_5.tsv -R Bacteria_Genus_flattened.tsv -T Phylum 

This will shift everything at a lower level than Phylum, in this case, to be at the Phylum level, and then should consolidate identical rows and sum their percentages and read totals.

Regarding your other questions, it's likely that TheSeed may be having some issues.  I also can't access it.  If you're looking to get the database, you can pull the mirror that I have hosted at Bioshare, as linked in this document, line 38: https://github.com/transcript/samsa2/blob/master/setup_and_test/full_database_download.bash

I've not tried eggNOG before, but if you link to where I can download it, I can look into converting it into a compatible format.

Best,
Sam




On Wed, Jul 22, 2020 at 5:03 AM Moya Farahyah <jand...@gmail.com> wrote:
Hi, Sam. 

I've run the whole thing for organism annotation and apparently the result to my data is so diverse that 60% of my bar graph is occupied by "Other" genus. I've tried to increase the amount of genus to be represented in the bar but the result remains the same. Is there, by any chance, a way to annote it to a more general taxonomic classification (preferably phylum level)? Also, I can't seem to access The SEED download page (ftp://ftp.theseed.org). I'm assuming the site is down for a while (or is it just me?). Is there any other database I can use for functional annotation? I want to try eggNOG but I have no idea which files to download to process it locally. Any ideas?

--
You received this message because you are subscribed to the Google Groups "SAMSA bioinformatics group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samsa-bioinformatic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/samsa-bioinformatics-group/1336bf26-f000-4ad6-8fb1-5a1faac2eb29o%40googlegroups.com.


--
Sam Westreich
Microbiome Scientist, DNAnexus, 
taxonomy_shifter_v4.py
Bacteria_Genus_flattened.tsv

Amanda Zahorik

unread,
Nov 3, 2022, 12:23:31 PM11/3/22
to SAMSA bioinformatics group
Hi Sam,

Jumping in because I was looking for this solution and wanted to follow up. I ran the script you shared (thank you!), but didn't have wildly successful results. It looks like a significant portion of my organism annotations didn't consolidate. The script runs fine, but gives me the following summary:

Exceptions: 7282/7726, 94.2531711105%
Exception fraction: 6441093/6540756, 98.4762770542%

And when I snoop in the resulting .tsv, many of the organisms are still species-level annotated. Any suggestions as to how to address this?

Thanks for Samsa2, and for any help you can offer!
Amanda
Reply all
Reply to author
Forward
0 new messages