log file flag and AMP size cutoff

Martin Klapper

unread,

Sep 23, 2020, 4:04:04 AM9/23/20

to ampsphere-users

Hi all,

at first - thanks for developing Macrel and sharing it via conda!

I tested Macrel AMP detection on metagenome assemblies and was wondering, if a flag exists to save the log information. something like --logfile path/to/logfile.txt. The log is usually quite big and slows down my Jupyter script.

Is there any cutoff upon AMP detection which allows only AMPs shorter than 50 aa to be detected? With Macrel I did not find any longer AMPs in my assemblies, while ampir detected also longer AMPs (but does not identify short AMPs...).

Cheers

Martin

celio.diasjunior

unread,

Sep 23, 2020, 7:58:13 PM9/23/20

to ampsphere-users

Hi Martin,

We are glad to hear from our users and we are sure that these iterations will improve both Macrel and the AMPsphere projects.

First of all, thank you for the tip about the flag directing to a log file, we should implement this. Just to let you know, we also intend to implement a better --help message soon, where it will show all command-line options.

It is quite interesting that you did not detect any peptide longer than 50 residues. It is true that the model relies on a size bias, where larger peptides tend to be considered negative examples (80% of the cases in the training set). However, in our tests during the Global AMP Survey, in which we screened thousands of genomes and metagenome assemblies, it was observed a rate of about 4.6% of predicted AMPs longer than 50 residues (52,977 / 1,151,506). I think maybe it is just because in AMPsphere we have such a huge sampling size that this unbalance became insignificant, but to you this effect is high. I would suggest testing higher amounts of data and probably the results will be a bit different or try checking the longer peptides you got with ampir. If those longer AMPs present some signal peptide or extension, you could try removing it and submitting directly to macrel [ We heard that worked already, although we did not test it ].

Be sure that you are using the latest version of Macrel, some of the very early versions were unstable during traning and also were biased towards presence of methionine. Please read more in our blogpost at:

http://big-data-biology.org/blog/2020/04/29/NME2/

About the comparison with ampir, we need to keep in mind 2 main points:

1. gene prediction - ampir does not use a gene prediction system, but focused in examples of eukaryotes (frog, centipede, arabidopsis, human...). It is known that their genes are much larger and their AMPs usually tend to be more complex than those from prokaryotes. The gene prediction system implemented in macrel is prodigal-based, what means it does not work in eukaryotes at all as it does not observe introns. It also means that the minimum and maximum gene length returned are respectively 33 - 303 bp, because they are filtered during macrel processing.

2. training set - ampir traning set, differently from macrel, was made using example sequences from 50 to 500 residues (default condition or "precursor" mode), which makes their model also biased to identify longer proteins. I do not know if you have tested the ampir's "mature" mode trained with peptides of 10-60 residues, in this mode your results should be relatively comparable to those obtained in macrel. The core difference is that Legana divided ampir's training set by length (mature and precursor) and accepted much larger amps than macrel did.

These two points suggest ampir as better in the prediction of eukaryote's AMPs, while macrel was designed to predict prokaryotic ones. Thus, the purpose of your quest tells to you which software it demands.

Cheers,

Celio Dias Santos Junior.

Martin Klapper

unread,

Sep 24, 2020, 3:26:21 AM9/24/20

to ampsphere-users

Thank you for the detailed answer! As we study amoebae-bacteria interactions, we're using both tools. But indeed, our sampling size was small in comparison to your Global AMP survey.

Martin Klapper

unread,

Sep 24, 2020, 3:30:28 AM9/24/20

to ampsphere-users

we were using the latest version macrel=0.5.0

Luis Pedro Coelho

unread,

Sep 26, 2020, 7:40:38 AM9/26/20

to ampsphe...@googlegroups.com

On Wed, 23 Sep 2020, at 10:04 AM, Martin Klapper wrote:

Hi all,
at first - thanks for developing Macrel and sharing it via conda!

I tested Macrel AMP detection on metagenome assemblies and was wondering, if a flag exists to save the log information. something like --logfile path/to/logfile.txt. The log is usually quite big and slows down my Jupyter script.

Thanks Martin.

FYI: I made your suggestion into a github issue so we don't forget

https://github.com/BigDataBiology/macrel/issues/22

I agree that it would be useful to have it.

Best

Luis

Luis Pedro Coelho | Fudan University | http://luispedro.org

https://orcid.org/0000-0002-9280-7885

Reply all

Reply to author

Forward