Short protein segments

28 views
Skip to first unread message

Claudèle Lemay-St-Denis

unread,
Feb 4, 2021, 1:28:24 PM2/4/21
to IMG User Forum
Hi,

The results from a search on the IMG database (of all assembled metagenomes) using a Pfam identifier yielded a bit over 2000 unique protein sequences. Wierdly, half of them do not start with a methionine. Also, the majority of them are really short. How is this possible? Could I be able to retrieve the full length sequences?

On a related note: is it possible to retrieve, from the protein IDs that I have identified, the original full length metagenomic sequences?

Thanks in advance,

Claudèle

ikan...@gmail.com

unread,
Feb 4, 2021, 8:17:02 PM2/4/21
to IMG User Forum, claud...@gmail.com
Hi,

IMG-ER annotation pipeline may translate alternative start codons GTG and TTG into Val and Leu, respectively, not into Met.

I posted a question on this issue about a year ago, but couldn't get any answer.

Ilnam

2021년 2월 5일 금요일 오전 3시 28분 24초 UTC+9에 claud...@gmail.com님이 작성:

Claudèle Lemay-St-Denis

unread,
Feb 8, 2021, 1:49:14 PM2/8/21
to IMG User Forum, ikan...@gmail.com, Claudèle Lemay-St-Denis
Interesting hypothesis, thought the segments I got start by various different amino acids (P, Q, F, S, A, etc.). Very weird!

Claudèle

Rekha Seshadri

unread,
Feb 8, 2021, 1:55:04 PM2/8/21
to Claudèle Lemay-St-Denis, IMG User Forum, ikan...@gmail.com

Since these are encoded on metagenomes - they might be fragments or incomplete CDS at the end of the contigs.
Reply all
Reply to author
Forward
0 new messages