and concatenate them together, would this give me a database built equivalently to the one provided with SAMSA2?
Regards,
Rachael
--
You received this message because you are subscribed to the Google Groups "SAMSA bioinformatics group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samsa-bioinformatics-group+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/samsa-bioinformatics-group/21efae37-7848-4822-8837-4854c8a4142d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Thanks Sam,
Should it be the
x.protein.faa.gz files or the nonredundant_protein.x.protein.faa.gz
files? It looks like the non-redundant ones are provided only for
bacterial, not viral genomes - so perhaps I should use the former?
Cheers,
Rachael
Thanks Sam,
Should it be the x.protein.faa.gz files or the nonredundant_protein.x.protein.faa.gz files? It looks like the non-redundant ones are provided only for bacterial, not viral genomes - so perhaps I should use the former?
Cheers,
Rachael
On 17/07/18 6:14 AM, Sam Westreich wrote:
Hi Rachael,
Yes, if you download the protein files (.faa.gz) from those links you shared and concatenate them together, you can use that as the reference for SAMSA2. You can make the DIAMOND-structured version with command:
diamond makedb --in merged.faa --d merged.dmnd
The database provided with SAMSA2 is bacterial only, but this approach should handily add viruses as well.
Best,Sam
On Mon, Jul 16, 2018 at 2:58 AM, <rachael...@gmail.com> wrote:
Hi Sam,
Can you provide detail on how the RefSeq_bac.fa file, downloadable by SAMSA2, was created?
I would like to create an updated database in the same way. I am looking to use the most recent version of RefSeq and would also like to include viral genomes. The database should contain genomes in both complete and draft stage, not just all of the complete ones (my species of interest have assemblies in 'scaffold' stage).
If I download all of the non-redundant protein files in:
ftp://ftp.ncbi.nlm.nih.gov/refseq/release/bacteria
ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral
and concatenate them together, would this give me a database built equivalently to the one provided with SAMSA2?
Regards,
Rachael
--
You received this message because you are subscribed to the Google Groups "SAMSA bioinformatics group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to samsa-bioinformatics-group+unsubsc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/samsa-bioinformatics-group/21efae37-7848-4822-8837-4854c8a4142d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Thanks Sam,
At the moment (release 89, July 13th 2018) those non-redundant viral files are very small and only contain a handful of proteins - an issue I have alerted the RefSeq staff to. I'll use the regular protein files for viruses and non-redundant for bacteria. Thank you!
Cheers,
Rachael
To unsubscribe from this group and stop receiving emails from it, send an email to samsa-bioinformatic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/samsa-bioinformatics-group/CAJgADcL%2Bhj0_3ZpS9VXKF13cjd9MZ9Rd1LTeCQbHsRd_0zd74w%40mail.gmail.com.