Datascience in synthetic biology

osazuwa

unread,

Feb 10, 2013, 11:03:27 AM2/10/13

to diy...@googlegroups.com

I do research on computational statistics methods for problems in systems biology. I am in a very small group of mathematicians and statisticians doing bioinformatics and systems biology work, in an institute full of computer scientists working on machine learning and NLP, social network computing, and distributed computing. qcri.org.qa if you are interested.

Lot's of my colleagues are interested in genomics where knowing how to work with big data is important, and network biology where methods for social network computing can be applied. They are also interested hackerspace movement, especially with the concept behind big data hackerspace Hack/Reduce.

How can data scientists get involved in synthetic biology? Is in silico prediction of how parts will behave a data science application? Analyzing genome databases to find and define new parts? Is there a role for data scientists in synthetic biology startups?

Dakota Hamill

unread,

Feb 10, 2013, 12:13:40 PM2/10/13

to diy...@googlegroups.com

Prediction of how proteins would fold would be of tremendous use, as well as reverse engineering so to speak, from a substrate to an enzyme.

If I had a particular molecule I needed it cleaved in half, or phosphorylated, or oxidized, or reduced, or hydroxylated and it needed to be done in an organism and not in a test tube, who could build that enzyme from scratch?

Trying to model say, 20-30 amino acids and their functional groups as a "pocket" for the substrate to fit in might be doable, but now add in 200 other amino acids that make up the rest of the protein, and then figure out where to put your 20-30 amino acids in that sequence so that when the protein finally folds into it's final structure, your active site is still accessible to the target substrate, and the enzyme can still carry out the intended function. That's where it gets crazy!

Proteomics is very intriguing because after all, enzymes do most of the hard chemistry in living organisms. You're talking enzymes that work at room temperature and pressure and can do chemistry that even the best inorganic catalyst chemists can't mimic, nor can the best organic synthetic chemists make your molecule from scratch at anything above a 1% yield.

Being able to run billions of simulations on protein folding or reverse engineer a protein from looking at the 3D structure of a substrate would be amazing, and people are already working on it. (See any of the 3D protein folding games/initiatives you can donate your CPU time to).

The part I've never actually seen or read about is how do you go about writing the algorithms to do that? THAT has to be some insane stuff. Combining statistics, who knows what other kinds of crazy math, as well as physics and chemistry.

But then at the same time, nature is running trillions of iterations a day on finding new enzymes that do something better, and nature is cutting out the whole software computer calculations side.

I think there is something to be said for doing the millions or billions of screening runs with the actual hardware, because at the end you have your finished product, so to speak.

Now nature may do these iterations in a random fashion, but through site directed mutagenisis and other ways, you could make many many protein mutants a day and screen them yourself as well.

I guess what I'm trying to say is, you could do tons of planning and simulations, and hope that when you finally start building your enzyme based on the calculations it does what you want in a real environment, or you could build a million enzymes that might work in a real environment, and hope for the one that does.

So what's faster and cheaper to do? I don't know.

And on one last note, I think an interesting role for any comp stats person or bio informatics person would be to make a way to find similarities between enzymes of known structure, function, and interest, to unknown sequences. I'm guessing that can already be done with BLAST searches, but that generally means you have a short sequence you want to check against a large database.

If you could work in reverse it would be better.

Example:

We've just isolated a new bacterial strain with a 5 million base pair genome and had the entire genome sequenced. It was isolated near a thermal vent near the bottom of the ocean. We believe it could have a new polymerase enzyme for use in PCR and perhaps some new restriction endonucleases.

Now, you could do shotgun cloning and spend tons of time in the lab just trying to get the one gene that codes for a new restriction enzyme to actually express itself, then you'd have to purify that one single fraction of the enzyme, then digest DNA and check for a unique digestion pattern...or

You could take your sequences of known restriction enzymes and run them against your new database (5 million bases of your new sequenced organism) and check for sequence homology.

Oh look, these 3 regions of DNA look an awful like other restriction enzymes, with a few modifications, let's PCR them out to isolate them, and check them for new properties!

In bioprospecting for new enzymes, I could see bio-informatics having a big impact, as it could drastically reduce the time spent in the lab doing tedious amounts of work, and allow you to streamline the time in the lab to efficiently targeting particular areas of interest in a new genome.

But that's my 2 cents I don't know jack about statistics or crazy algorithms. I wish you luck! Write some algorithms for bio-prospecting and I'll get you the organisms to test!

Jonathan Cline

unread,

Feb 15, 2013, 9:22:55 PM2/15/13

to diy...@googlegroups.com, jcline

Relevant conference below, have you heard of it before?

the second call for papers for the 19th DNA Computing and Molecular Programming Conference (DNA19). The conference will take place in Tempe, AZ, USA, September 22-27, 2013.

Detailed information about the conference and submission tracks can be found below, is attached as a pdf, and is also available at the conference web page: dna19.biodesign.asu.edu.

All papers and abstracts should be submitted electronically following the instructions and link at the conference web page. Please note that the submission site is now open and will close on April 15, 2013.

Registration will open on May 20, 2013 and the registration deadline is July 20, 2013.

Reply all

Reply to author

Forward