Protein Unfolding Algorithms?

67 views
Skip to first unread message

Cory J. Geesaman

unread,
Nov 22, 2017, 4:11:30 AM11/22/17
to DIYbio
So this might seem outright insane given the computational requirements of protein folding, but I had a though I'm not able to locate any existing resources on, but could be the basis for an actual biotech boom (i.e. not just copying and pasting stuff, but creating things.)

The concept would be, to start: take a desired topology, stick it in the computer, have the computer generate a matching docking profile, then run a sort of protein unfolding algorithm to generate a best fitting protein for that docking profile.  Eventually add in things like active sites, inhibition sites, any special functionality (e.g. what part of the protein should flex when something is docked in some arbitrary site, which direction should it flex, should it just change shape, or open another site for access to some thing to dock in it, or eject something from another site, etc.)

I figure starting with the topology --> docking site --> protein --> DNA piece would be a really good start and would likely be pretty useful, but then adding in the other stuff would be the basis for a code --> DNA compiler (or one of the smaller units of a really powerful biological IDE which could let you build complete systems from proteins on up through organs and transmitters and such.)

Any thoughts on this welcome (especially if some or all of this already exists in an open source manner I can hack on.)  Decided to heat my house this winter by running Folding@home but the curecoins generated are still $3/day in the hole for electricity so I might as well play with other computationally intensive genomics.

John Griessen

unread,
Nov 22, 2017, 10:20:38 AM11/22/17
to diy...@googlegroups.com
On 11/22/2017 03:11 AM, Cory J. Geesaman wrote:
> Decided to heat my house this winter by running Folding@home but the curecoins generated are still $3/day in the hole for
> electricity so I might as well play with other computationally intensive genomics.

crypto coin mining seems to be giving us a clear signal that super efficient solar converters
direct from sunlight to computer driving voltages like 12V, 3V, 1.8V are needed along with motherboards
that allow for extra input DC terminals to hook up and let their AC-DC converters be in idle mode whenever sun volts are coming
in. The low volt wiring between panel on the roof and computer input terminals has easy electric code requirements compared to
120VAC.

About the topology program you're thinking on: Nathan is working on some DNA as storage media experimenting that aligns with that
some.

Bryan Jones

unread,
Nov 22, 2017, 10:22:55 AM11/22/17
to diy...@googlegroups.com
Cory, I'm not sure if I completely follow your logic, but it sounds like you might be describing something like Rosetta, which can be used to design proteins from scratch based upon a given fold/topology or desired active site. It's far from perfect but keeps getting better. There are lots of people working on it, but the Rosetta project is spearheaded by David Baker at the University of Washington. I'm not sure if it is open source, but it is at least free to use for personal or non-profit uses.
You can check it out here: https://www.rosettacommons.org/software
Here's one recent paper that used Rosetta to design a new TIM barrel protein:

Cory J. Geesaman

unread,
Nov 22, 2017, 12:57:14 PM11/22/17
to DIYbio
Bryan,
    Unless I'm misunderstanding their documentation it looks like Rosetta operates in the sequence --> folding --> docking direction with a bit more functionality on the docking side.  I'm looking to from docking --> structure-prediction/space-filling --> unfolding --> sequence.  I'm looking at proteins as more or less a parallel assembly language or machine code from biotech and trying to compile from code --> organism instead of the more traditional route of decompiling organism --> code.  Is there a specific page in the Rosetta documentation which covers such a case?

John Griessen

unread,
Nov 22, 2017, 1:03:01 PM11/22/17
to diy...@googlegroups.com
On 11/22/2017 09:22 AM, Bryan Jones wrote:
> I'm not sure if it is open source

Reading the license terms pages, it is $40K for commercial use, so definitely not free licensed and I saw no mention of the source
code either.

Cory J. Geesaman

unread,
Nov 22, 2017, 1:11:52 PM11/22/17
to DIYbio
To put it another way, think of the arbitrarily stupid use case: sharks with laser beams coming from their eyes.

You would need a way to create proteins which either are covered in quantum wells, or which assist in the construction of a planar surface of quantum wells, with the required metallic traces to power the thing, in addition to cells designed to get the electricity needed to power it (forgoing the possibility of chemical or dye lasers for this example even though they might be easier.)  Then you would need new amino acids (think things like Selenocysteine which replaces a stop codon in some organisms to create an AA with a Cysteine-like structure replacing a Sulfur atom with a Selenium atom) to allow for the structure of the actual quantum wells.  At that point (assuming you've worked out the building blocks required to get the desired elements into side chains) you could seed the cell with the appropriate mRNA components and conceivably build it, but you still have the really big problem of going from "this is the structure I need" to "this is the DNA which codes for the protein which will fold into that structure, given the prerequisite materials."

I'm trying to build an IDE which lets you use abstract concepts ("replicate N times, differentiate into these cell types, take on this superstructure, etc") to compile DNA, ultimately to compile complete chromosomes and organisms.  Awhile back I was trying to tackle this from the standpoint of semantic search type methodologies against DNA as a whole, but after learning a bit more about the structure of DNA and the way it functions within an organism it seems the only thing really missing (aside from likely lots of computing power to compile anything) is the ability to go from a desired structure to the DNA which codes for that structure, everything else is either already built (e.g. DNA printers) or a relatively straightforward adaptation of computing (e.g. the abstractions involved, creating a system of compatible docking sites without overlap, etc.)


On Wednesday, November 22, 2017 at 10:22:55 AM UTC-5, Bryan Jones wrote:

Bryan Jones

unread,
Nov 22, 2017, 2:07:49 PM11/22/17
to diy...@googlegroups.com
People have used Rosetta both ways. For example, the "game" foldit uses rosetta + human intuition to figure out how proteins fold given the sequence. However, the bigger goal of Rosetta is to design new proteins. You start with a desired function/reaction -> define a topology or active site geometry to catalyze the reaction -> find an overall protein fold that can accommodate the active site -> optimized sequence -> make and test the new protein.

" it seems the only thing really missing (aside from likely lots of computing power to compile anything) is the ability to go from a desired structure to the DNA which codes for that structure"
It's even less than that. It's very straightforward to go from desired protein sequence -> DNA sequence, although you might need some optimization in terms of promoters and the like to optimize expression levels. So the only really hard step is going from desired docking or structure to protein sequence. That's exactly what Rosetta is for. I think it's the best there is for doing this, however, it's still not that great. It typically requires trying dozens of designs to find one protein that does the desired job even very poorly.

Here's a good paper that discusses how to use Rosetta to design new enzymes: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0019230 

--
-- You received this message because you are subscribed to the Google Groups DIYbio group. To post to this group, send email to diy...@googlegroups.com. To unsubscribe from this group, send email to diybio+un...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/diybio?hl=en
Learn more at www.diybio.org
---
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To unsubscribe from this group and stop receiving emails from it, send an email to diybio+un...@googlegroups.com.
To post to this group, send email to diy...@googlegroups.com.
Visit this group at https://groups.google.com/group/diybio.
To view this discussion on the web visit https://groups.google.com/d/msgid/diybio/80269200-1f9e-4913-ad92-c0c1d16e9804%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Skyler Gordon

unread,
Nov 23, 2017, 12:42:37 PM11/23/17
to diy...@googlegroups.com
The idea of ‘reverse engineering’ a protein is interesting, but most likely a fruitless venture. People often modify existing proteins (I.e. protein engineering) due to the fact that the ‘shape’ you are looking for does not necessarily have much to do with your final catalytic result.

While the folding of a protein is critical, the shape does not necessarily define the catalytic functions of the active site. Your program would essentially rely on guessing a shape based on other known proteins, getting a ‘best fit’ shape, and then try to produce it via ‘self learning’ reassembly of trillions of different protein combinations. You’re going to need a distributed framework or your own server network.

That being said. Best of luck.

-SG

Nathan McCorkle

unread,
Nov 23, 2017, 5:34:15 PM11/23/17
to diybio
On Wed, Nov 22, 2017 at 10:11 AM, Cory J. Geesaman <co...@geesaman.com> wrote:
problem of going from "this is the structure I need" to "this is the DNA which codes for the protein which will fold into that structure, given the prerequisite materials."

I have been thinking of this sort of thing too, and because I wrote a tool last year for generating circuit-board schematics with constraint-based design rules, I ended up using a SAT solver... and have since searched the literature for SAT (or SMT) + biotech keywords. There are some interesting hits, for sure. I haven't gone through, for example, all of the pages here:

but I think this one is the closest sounding by the title:

in my thoughts on how I might approach this, one way I thought to start would be to model a 3D space/grid at atomic or probably sub-atomic granularity. Then using constraints to say some grid variables must be adjacent (maybe use the IUPAC name to traverse from each adjacent atom to the next), and additional constraints to associate those locations with a charge value. You'd write this as generator code, so you have the means to generate copies of your amino acid models (adjacent grid points, with charge) you can offset the overall molecule position as well as rotate it, and setup constraints when you need them (or just a specific charge, or group of charges) to be in a specific location.

Oh yeah, I forgot you'd also probably need to have a variable for storing the location-to-location 'bendiness', how many degrees it can bend and/or rotate. This will be required for setting up constraints that the peptide bonds, since you know you need all these groups of locations+charge to be connected in a polymer.

You'd also need a constraint that says grid locations can only be used once, and probably another constraint that prevents molecules from being positioned like links in a chain (we know benzene rings don't behave like that)... that one could be a bit tough to implement... hmm.

Then you should basically be able to say "solve", and the constraint solver will produce a solution, which you'd have to parse back into a sequence of amino acids.

This doesn't account for solvents though, salts, etc... so there's a lot to think about. But that isn't to say that stuff couldn't be modeled, it's just more work and I am not sure how to deal with something like diffusion with a solver like this. One of the microfluidics design using SAT/SMT (in the directory of the URL above) talks about using piece-wise solvers... one for SAT/SMT, in series with one for fluid dynamics (I think, it's been a while since I read these), and I think they fed back and forth iteratively to each other.



Nathan McCorkle

unread,
Nov 23, 2017, 5:39:20 PM11/23/17
to diybio
On Thu, Nov 23, 2017 at 2:33 PM, Nathan McCorkle <nmz...@gmail.com> wrote:

You'd write this as generator code

Here's an example of a work-in-progress 3D grid router, for microfluidics or circuit boards:

and another using a different solver and a different encoding style (each 'net' to be routed requires another grid's worth of variables to be created... which is not very scalable)
 
--
-Nathan

Cory J. Geesaman

unread,
Nov 24, 2017, 5:17:35 AM11/24/17
to DIYbio
Yeah, the computing requirements will be enormous, I'm planning on starting a small 10GPU 4coreCPU server and playing with similarly small proteins to test/debug anything I come up with.
Reply all
Reply to author
Forward
0 new messages