Annotation of biosynthetic gene clusters

Yi-Ming Shi

unread,

Sep 24, 2019, 11:58:33 AM9/24/19

to Anvi'o

Hello,

I'm a biochemist and a beginner when it comes to computational science... I'm analyzing the pangenome of 40 strains and would like to compare their gene clusters for secondary metabolites' biosynthesis. I've walked through the workflow of PANGENOMICS. Then I exported the gene sequence (fasta) of every single strain and ran antismash for finding and annotating the biosynthetic gene clusters, and would like to import results back into the contigs database. Now it comes up with an issue: the assembled contigs or genome in the original fasta that was used to generate a contig database was "split" into thousands of single gene in the exported gene sequence fasta. Thus the antismash is only able to annotate those single genes rather than finding and annotation gene clusters. I'm wondering if I could export the originally assembled contigs with gene IDs or gene callings conferred by Anvio? I appreciate if you could also offer an alternative approach to deal with function annotation and visualization of biosynthetic gene clusters in Anvio.

Best regards,

Yi-Ming

A. Murat Eren

unread,

Sep 24, 2019, 1:17:09 PM9/24/19

to Anvi'o

Hi Yi-Ming,

If you used anvi'o to export gene sequences (i.e., via anvi-get-sequences-for-gene-calls, or anvi-get-sequences-for-hmm-hits, etc), then it is very easy to connect them back to contigs from which they come from.

Here is the key information you need, and the rest will come together very quickly:

$ sqlite3 CONTIGS.db 'select * from genes_in_contigs limit 10;' -separator $'\t' -header | column -t
gene_callers_id contig start stop direction partial source version
0 Day17a_QCcontig1 0 186 f 1 prodigal v2.60
1 Day17a_QCcontig1 214 1219 f 0 prodigal v2.60
2 Day17a_QCcontig1 1265 2489 f 0 prodigal v2.60
3 Day17a_QCcontig1 2561 3452 f 0 prodigal v2.60
4 Day17a_QCcontig1 3552 3783 f 0 prodigal v2.60
5 Day17a_QCcontig1 4172 4613 f 0 prodigal v2.60
6 Day17a_QCcontig1 4628 5594 f 0 prodigal v2.60
7 Day17a_QCcontig1 5646 5874 f 0 prodigal v2.60
8 Day17a_QCcontig1 6010 6967 f 0 prodigal v2.60
9 Day17a_QCcontig1 6999 7929 f 0 prodigal v2.60

I hope these column names make sense. The gene callers id is what anvi'o uses intrinsically to uniquely identify each gene, and this particular table shows you in which contigs they appear. Then you can use anvi-import-misc-data to connect genes back to the interface.

The same command will work on your own contigs database, too.

Best wishes,

--

A. Murat Eren (Meren)
http://merenlab.org :: twitter :: gpg

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/cb933538-8416-44ca-8fe1-e456a1b16647%40googlegroups.com.

Yi-Ming Shi

unread,

Sep 25, 2019, 12:09:28 PM9/25/19

to Anvi'o

Hi Meren,

Thanks for your reply. I'm getting there.

Since I don't need all gene caller IDs, I'm wondering how I can export certain gene caller IDs in a given range. Let's say, in Day17a_QCcontig1, how to export the gene caller IDs between 1265 (start) and 6967 (stop).

Thanks and best regards,

Yi-Ming

On Tuesday, September 24, 2019 at 7:17:09 PM UTC+2, Meren wrote:

Hi Yi-Ming,

If you used anvi'o to export gene sequences (i.e., via anvi-get-sequences-for-gene-calls, or anvi-get-sequences-for-hmm-hits, etc), then it is very easy to connect them back to contigs from which they come from.

Here is the key information you need, and the rest will come together very quickly:

$ sqlite3 CONTIGS.db 'select * from genes_in_contigs limit 10;' -separator $'\t' -header | column -t
gene_callers_id contig start stop direction partial source version
0 Day17a_QCcontig1 0 186 f 1 prodigal v2.60
1 Day17a_QCcontig1 214 1219 f 0 prodigal v2.60
2 Day17a_QCcontig1 1265 2489 f 0 prodigal v2.60
3 Day17a_QCcontig1 2561 3452 f 0 prodigal v2.60
4 Day17a_QCcontig1 3552 3783 f 0 prodigal v2.60
5 Day17a_QCcontig1 4172 4613 f 0 prodigal v2.60
6 Day17a_QCcontig1 4628 5594 f 0 prodigal v2.60
7 Day17a_QCcontig1 5646 5874 f 0 prodigal v2.60
8 Day17a_QCcontig1 6010 6967 f 0 prodigal v2.60
9 Day17a_QCcontig1 6999 7929 f 0 prodigal v2.60

I hope these column names make sense. The gene callers id is what anvi'o uses intrinsically to uniquely identify each gene, and this particular table shows you in which contigs they appear. Then you can use anvi-import-misc-data to connect genes back to the interface.

The same command will work on your own contigs database, too.

Best wishes,
--

A. Murat Eren (Meren)
http://merenlab.org :: twitter :: gpg

On Tue, Sep 24, 2019 at 11:58 AM Yi-Ming Shi <shiyi...@gmail.com> wrote:

Hello,

I'm a biochemist and a beginner when it comes to computational science... I'm analyzing the pangenome of 40 strains and would like to compare their gene clusters for secondary metabolites' biosynthesis. I've walked through the workflow of PANGENOMICS. Then I exported the gene sequence (fasta) of every single strain and ran antismash for finding and annotating the biosynthetic gene clusters, and would like to import results back into the contigs database. Now it comes up with an issue: the assembled contigs or genome in the original fasta that was used to generate a contig database was "split" into thousands of single gene in the exported gene sequence fasta. Thus the antismash is only able to annotate those single genes rather than finding and annotation gene clusters. I'm wondering if I could export the originally assembled contigs with gene IDs or gene callings conferred by Anvio? I appreciate if you could also offer an alternative approach to deal with function annotation and visualization of biosynthetic gene clusters in Anvio.

Best regards,

Yi-Ming

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.

To unsubscribe from this group and stop receiving emails from it, send an email to an...@googlegroups.com.

A. Murat Eren

unread,

Sep 25, 2019, 1:47:30 PM9/25/19

to Anvi'o

Hi,

Unfortunately the program is not able to that. But you can use the program `anvi-script-resformat-fasta` to with the `--keep-ids` parameter to subset the genes you are interested in.

Best wishes,

--

A. Murat Eren (Meren)
http://merenlab.org :: twitter :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/2f68b3c4-1fb1-42b3-8a5c-fbba7a810aec%40googlegroups.com.

Reply all

Reply to author

Forward