Is the SEED database suitable for fungi?

81 views
Skip to first unread message

radz150193

unread,
Apr 21, 2020, 3:20:40 AM4/21/20
to SAMSA bioinformatics group
I downloaded the fungal Refseq database and it worked well with samsa2. I was wondering if the default SEED database downloaded during setup includes fungal data as well, or is it only suitable for bacterial work? If not, is there a SEED fungal database? I'm having trouble finding it online.

Sam Westreich

unread,
Apr 22, 2020, 12:53:33 PM4/22/20
to radz150193, SAMSA bioinformatics group
Hello,

I'm not sure how well the SEED database will work for fungi.  This is a database that was manually curated by the PATRIC group, and I believe their specific focus was on bacteria.  It likely won't work as well for fungi, since that wasn't the main focus when creating the database.

You can read more about the database in its paper, here (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965101/), and give it a shot, but it may be significantly reduced in usefulness on fungi.

If you do find a different database that works well, please feel free to let me know so I can look at adding it as an enhancement.

Best,
Sam

On Tue, Apr 21, 2020 at 12:20 AM radz150193 <m.rad...@outlook.com> wrote:
I downloaded the fungal Refseq database and it worked well with samsa2. I was wondering if the default SEED database downloaded during setup includes fungal data as well, or is it only suitable for bacterial work? If not, is there a SEED fungal database? I'm having trouble finding it online.

--
You received this message because you are subscribed to the Google Groups "SAMSA bioinformatics group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samsa-bioinformatic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/samsa-bioinformatics-group/b9a8b3e6-cee3-4200-8877-d359758f746b%40googlegroups.com.


--
Sam Westreich
Microbiome Scientist, DNAnexus, 

radz150193

unread,
Apr 22, 2020, 8:05:45 PM4/22/20
to SAMSA bioinformatics group
Thanks Sam. Also I wanted to ask is there a way to include the Refseq number in the step 5 output rather than just the protein name?


On Thursday, April 23, 2020 at 2:53:33 AM UTC+10, S. Westreich (creator) wrote:
Hello,

I'm not sure how well the SEED database will work for fungi.  This is a database that was manually curated by the PATRIC group, and I believe their specific focus was on bacteria.  It likely won't work as well for fungi, since that wasn't the main focus when creating the database.

You can read more about the database in its paper, here (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965101/), and give it a shot, but it may be significantly reduced in usefulness on fungi.

If you do find a different database that works well, please feel free to let me know so I can look at adding it as an enhancement.

Best,
Sam

On Tue, Apr 21, 2020 at 12:20 AM radz150193 <m.ra...@outlook.com> wrote:
I downloaded the fungal Refseq database and it worked well with samsa2. I was wondering if the default SEED database downloaded during setup includes fungal data as well, or is it only suitable for bacterial work? If not, is there a SEED fungal database? I'm having trouble finding it online.

--
You received this message because you are subscribed to the Google Groups "SAMSA bioinformatics group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samsa-bioinformatics-group+unsub...@googlegroups.com.

radz150193

unread,
Apr 23, 2020, 7:28:42 PM4/23/20
to SAMSA bioinformatics group
Thanks for the update Sam, for some reason I can't see your response regarding adding the -R option, thanks for that! I noticed that it outputs just the reference ID, I was wondering if there is a way to output both the organism and the reference ID in the same tsv file. 

So the columns would be for example:

Percentage            Read Counts               Protein                 RefSeq ID

I've noticed that the first few lines of the TSV are identical between the -R and the -F output, but the rest have different count values, for example:

10.297731834981146      16467   XP_015464343.1

6.943949371204873       11104   XP_015464362.1

2.4926677047570807      3986    XP_013238560.1

0.7291647124301947      1166    XP_025336506.1

0.5521890575264682      883     XP_021875969.1

0.40960796452982634     655     XP_009544859.1

0.3789655366489691      606     XP_011125707.1

0.334565284005278       535     XP_031005759.1

0.2914157427036627      466     XP_012185442.1

0.29079038703262483     465     XP_025339630.1

0.2876636086774353      460     XP_007873121.1

0.22888017559987242     366     XP_025168914.1


10.297731835 16467 hypothetical protein AC631_06000

6.9439493712 11104 hypothetical protein AC631_05981, partial

2.49266770476 3986 hypothetical protein DI09_207p20

1.31324690918 2100 elongation factor 1-alpha

0.965549156083 1544 predicted protein

0.769812831048 1231 ferritin-like superfamily

0.72916471243 1166 uncharacterized protein CXQ87_003410

0.675384124721 1080 cytochrome c oxidase subunit 1 (mitochondrion)

0.532177676053 851 Elongation factor 1-alpha

0.483399933712 773 P-loop containing nucleoside triphosphate hydrolase protein

0.40960796453 655 hypothetical protein HETIRDRAFT_316868, partial

0.378965536649 606 hypothetical protein AOL_s00169g48



On Thursday, April 23, 2020 at 2:53:33 AM UTC+10, S. Westreich (creator) wrote:
Hello,

I'm not sure how well the SEED database will work for fungi.  This is a database that was manually curated by the PATRIC group, and I believe their specific focus was on bacteria.  It likely won't work as well for fungi, since that wasn't the main focus when creating the database.

You can read more about the database in its paper, here (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965101/), and give it a shot, but it may be significantly reduced in usefulness on fungi.

If you do find a different database that works well, please feel free to let me know so I can look at adding it as an enhancement.

Best,
Sam

On Tue, Apr 21, 2020 at 12:20 AM radz150193 <m.ra...@outlook.com> wrote:
I downloaded the fungal Refseq database and it worked well with samsa2. I was wondering if the default SEED database downloaded during setup includes fungal data as well, or is it only suitable for bacterial work? If not, is there a SEED fungal database? I'm having trouble finding it online.

--
You received this message because you are subscribed to the Google Groups "SAMSA bioinformatics group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samsa-bioinformatics-group+unsub...@googlegroups.com.

Sam Westreich

unread,
Jul 16, 2020, 7:55:27 PM7/16/20
to radz150193, SAMSA bioinformatics group
Hi Michael,

Just clearing up some old messages, and for anyone else who searches for this - you can now use the -R flag at the same time as the -F or -O flags to get the RefSeq IDs in addition to the organism or function IDs.

Best,
Sam

To unsubscribe from this group and stop receiving emails from it, send an email to samsa-bioinformatic...@googlegroups.com.


--
Sam Westreich
Microbiome Scientist, DNAnexus, 

--
You received this message because you are subscribed to the Google Groups "SAMSA bioinformatics group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samsa-bioinformatic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/samsa-bioinformatics-group/892fdc3f-7540-47f8-840b-7dcc26ef7897%40googlegroups.com.

Kabir Peay

unread,
Jun 10, 2022, 4:40:07 PM6/10/22
to SAMSA bioinformatics group
It's been two years since this post and I was wondering if there are any updated recommendations people have found for databases to use with fungi for samsa2?
Reply all
Reply to author
Forward
0 new messages