Enzymatic elimination of errors from extremely low-cost, very large chemical DNA synthesis using error correction codes

98 views
Skip to first unread message

Bryan Bishop

unread,
May 17, 2017, 9:00:10 PM5/17/17
to enzymaticsynthesis, Bryan Bishop
I was recently at the GP-write meeting in NY where a number of folks are interested in advancing DNA synthesis technology to the point where it would be feasible to synthesize all the human chromosomes without breaking the bank.

Some background on that:
http://diyhpl.us/~bryan/papers2/bio/The%20human%20genome%20project%20-%20write,%20hgp-write%20-%202016.pdf

Unfortunately I did not hear any proposals for technologies that would bring the cost down and yield up. I would like genome synthesis to be in the price range of, say, $1 per genome. And so far there's not much on the horizon that could make the chemistry that cheap. There's some speculative ideas around engineered polymerase enzymes that respond to optical and electrical control, but we don't really know how to build them yet. Other than "use lots of azobenzene".

Kent Kemmish mentioned an interesting idea the other day. He proposed that instead of focusing on error-free DNA synthesis, what about finding more ways to make use of high error rate DNA synthesis? So that got me thinking-- especially because of bech32 in bitcoinland proposed for segwit addresses believe it or not... in fact, I had a dream a few months ago about Blockstream and them entering the biology industry and I was trying to convince them that ECDSA crypto proteins were not yet practical, but I was eager- in my dream- to hear they were interested in moving into biology. Anyway, there has been recent interest in encoding and error correction codes for data storage in DNA molecules, such as interest from the semiconductor industry and so on, but these techniques do not work for biological information because it's entirely non-executable and only can be reconstituted after DNA sequencing.

Instead, what about encoding biological information using an error correction code? With oligonucleotide synthesis, error rates of up to really high percents could be tolerated depending on the encoding technique. The magic here is to convert from encoded DNA to decoded DNA using enzymes. The necessary enzymes so far seem easier to engineer than our wacky optical/electronic polymerase control concepts....

Here are some speculative ways that this could work.

Iterative in vitro enzyme decoding of an encoded DNA molecule: in this technique, there would be a few hundred different zinc finger recombinase fusion proteins that would find "encoded" segments of DNA and convert the encoded segments into decoded segments. The enzyme reaction yield rate would have to be really high, in order to fix encoding errors. There would also have to be "NOPs" (no-ops/no operations) maybe special unnatural nucleotides to help with reading frame shift problems. Also, it would match a variable fragment of DNA that roughly approximates the target, because the target probably has errors due to using error-prone DNA synthesis techniques in the first place. After each enzyme reaction, the DNA molecule would become progressively less encoded and more decoded, and at the end it should be an entirely decoded molecule that should work in any organism. The enzymes for this would probably be some mix of fusion proteins based on cas9, recombinases, zinc finger nucleases, zinc finger recombinases, TALENs, etc. Really we need a library of basic string operations in protein machinery to operate on DNA and transform it into whatever result we want, using DNA as an intermediate memory tape device between each operation in vitro (or eventually in vivo but whatever).

Magic codons: we could lengthen codons to be more than 3 letters long, with some tolerance for errors, these codons might be more practical for error-prone DNA synthesis. We might also be able to make polymerases that convert from X bp per codon to 3 bp per codon. This way, if you get one of the base pairs wrong in the codon that you're writing, you will still be able to get a fuzzy match in many cases, to the correct tRNA synthetase and amino acid.

Checksum enzymes: in the future it would be nice to have checksums and hash functions in tRNA synthetases or in polymerases or DNA repair enzymes to check that the DNA molecule is still correct based on the checksums.

Ribosome repurposing: maybe we could convert a ribosome to construct DNA molecules or oligos or RNA or something; the tRNA synthetase machinery already translates from codons (3 bp) to a single amino acid. This is similar to mechanisms required for information decoding or error correction codes.

Many open questions still remain-- and obviously it's highly speculative. However, making use of cheap DNA synthesis with errors would allow us to build genomes at an extremely low cost using technology that already exists today... minus the magic enzymes and magic codons.

some relevant logs:
 

Ujjwal Thaakar

unread,
May 20, 2017, 10:00:34 AM5/20/17
to Bryan Bishop, enzymatic...@googlegroups.com
Sounds extremely interesting. I'll have to read up a lot to understand this better. I still did not get a complete idea of what you meant. 

--
You received this message because you are subscribed to the Google Groups "enzymaticsynthesis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to enzymaticsynthe...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--

Ujjwal Thaakar

Chief Everything Officer

Kesar | Linkedin | Twitter

Bryan Bishop

unread,
Aug 14, 2017, 10:57:40 AM8/14/17
to enzymaticsynthesis, Bryan Bishop
On Wed, May 17, 2017 at 8:00 PM, Bryan Bishop <kan...@gmail.com> wrote:
Instead, what about encoding biological information using an error correction code?

or use error correction enzymes:

"A systematic comparison of error correction enzymes by next-generation sequencing"

Jelmer Cnossen

unread,
Aug 15, 2017, 7:35:40 AM8/15/17
to enzymatic...@googlegroups.com

Interesting ideas. I really like the encoded DNA step, because it allows for new and different ways to get the sequence data from the outside. For example, it allows to hack DNA ligase instead of all the TdT ideas. You no longer have to stick to single letters in the first step.

One addition I was thinking about is to have the decoded DNA strand to only consists of 1-bit data.
I need to read up on the speculative protein engineering ideas, so maybe this already in there.

Engineering a protein that respond to light is a smaller task if you don't have to make 4 different ones (or having it respond to something else in 4 different ways). You could engineer some light-controlled proteins to make a strand of 1-bit data, existing of short DNA templates that only contain 2 letters, say A and C. Or of only 2 kinds of short DNA oligos and join them using ligase that is blocked by a light controlled protein (any ideas?).

After generating this (with many errors), you have another set of proteins that take the 1-bit encoded DNA and convert pairs of the short segments (1 bit) in there to the decoded DNA (2 bit data). As you already pointed out there are a few techniques that would fit with that decoding step. 

Now you've moved some of the complexity away from the part that processes the sequence information from external input, which might be the hardest part to get accurate. Of course it is still a crazy difficult problem then, but maybe just slightly more achievable. 

- Jelmer


--
You received this message because you are subscribed to the Google Groups "enzymaticsynthesis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to enzymaticsynthesis+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages