Closest expected structure excluded from modeling tempates

Michael Schultz

unread,

Jan 28, 2021, 2:31:51 PM1/28/21

to Phyre

Hi. I obtained a model of a vertebrate GAPDH that is a very close relative of somatic type human GAPDH. Normal mode of modeling. There are 17 published structures of human somatic type human GAPDH. All are of the homotetramer, which is the biologically relevant assemblage. In the psi-BLAST results for my job, the most related protein is human somatic GAPDH as expected. Strangely, no structural information from the somatic type human homotetramer is listed in Template Information for my target model. I initially thought that Phyre2 could not use structural information for single chains when that information is found in .pdb files of multimeric assemblages. However a model with 100% confidence and 71% id relates to the structure of a homotetramer (PDB header:oxidoreductase. Chain: O: PDB Molecule:glyceraldehyde-3-phosphate dehydrogenase, testis-specific; PDBTitle: crystal structure of human sperm-specific glyceraldehyde-3-phosphate2 dehydrogenase (gapds) complex with nad and phosphate. Probably 3H9E). I think I am making some sort of rookie mistake but can’t see it right now. Thanks, MS

Powell, Harold

unread,

Jan 29, 2021, 5:10:51 AM1/29/21

to Phyre

Hi

First off, I should introduce myself - I've taken over from Lawrence Kelley as the maintainer/developer of Phyre2; please bear with me while I find my feet and am not able to give definitive answers quite so readily (Lawrence worked on Phyre for ~20 years and I have only been "in charge" (whatever that means) since the end of November 2020; our overlap time was affected by current events...).

To answer your question in a general way; without searching the PDB for human somatic GAPDH, I don't know which PDB IDs relate to these proteins - so if you could give the PDBs of expected matching structures, it saves me a lot of work.

Having said that, it wouldn't surprise me if there are none of the structures you might expect included in our fold library, because in order to save time when finding homologous models, we don't store the entire PDB in our fold library - just a large set of proteins with the known folds (which was originally based on SCOP70 but has expanded over the years ). It's interesting (and predicted by people like Cyrus Chothia and Sarah Teichmann many years ago) that most folds seem to have been discovered already (there have been very few new folds being added to the PDB over the last 10 years, for example).

It's likely that the 3H9E structure is similar to the human somatic GAPDH structures you expected.

Your "initial thought" was incorrect - as you found out, the fold library does contain information about individual chains in multimeric complexes. So, if you see a hit with a name like "c3g73A_" the "c" stands for "chain", the "3g73" is the PDB ID, and the "A" means the "A" chain in that entry (which actually has two polypeptide chains and a bit of DNA...).

I hope this helps

Harry Powell

Powell, Harold

unread,

Jan 29, 2021, 5:29:31 AM1/29/21

to Phyre

It's been pointed out to me off-board that if you sign in and go to "expert" mode, you can try "one-to-one threading", which uses your own PDB model and your sequence; Phyre will attempt to align the sequence and structure and construct a model of your sequence, so this might give you an answer more in accord with your expectations.

Harry

Reply all

Reply to author

Forward