Feedback, ideas and collaborations welcome.
Last year at Science Hack Day SF I led and ardent band of marauders into battle in an attempt to create a simple method to sequence DNA. Though we accomplished some stuff it was alot of work! This year I just wanted to hangout and relax and code for 8 or 10 hours straight.
The project I decided to work on was the amino acid dependence of secondary structure lengths in proteins. Proteins as we all know and love, are little nanomachines. In order to better engineer and create proteins from scratch we need to understand how they have evolved and why. The smaller parts that make up proteins (i.e. alpha helices, beta sheets, &c.) are made up of specific amino acids with specific properties. I wanted to take that one step further as I have never seen a paper or heard much about the amino acid dependence based on the size of a secondary structure element. One could imagine that a smaller helix might have a propensity against certain amino acids that are less helical as opposed to a longer helix.
Protein Model of Lysozyme, alpha helices are in teal, beta sheets are in red, loops are in purple |
I downloaded the amino acid sequence and secondary structure of each protein in the PDB(http://www.rcsb.org) and then wrote code(code is here) to parse and quantify the data. In the end I was a little dissapointed. It doesn't seem like many or any amino acids really have a dependence on secondary structure length.
Here is a graph of the ones that seem to change the most.
Alanine(A) might have some dependence and same with Proline(P) though I think Proline is an artifact(helices tend to end with prolines as a helix breaker, if this helix breaking residue is included in the count it can skew the percentages for shorter helices).
Anyways, want to look at beta sheets next and then maybe the protein size dependence of the amino acid dependence of secondary structure length. Or even secondary structure length dependence of protein size dependence.
Josiah Zayner