I think I may have been misunderstood.
We already represent DNA in a linear form, even though we understand
that it *acts* nonlinearly. Indeed, DNA *is* linear, molecularly
speaking. When we design new DNA, we are generally designing new linear
molecules of code.
There is, of course, a pervasive misunderstanding in Synthetic Biology
that DNA can be programmed using the same idioms as von neumann machine
code. I sometimes laugh, sometimes sigh whenever I see genetic systems
being touted as "NAND" or "NOR" gates that can enable combinatorial
logic and escalating complexity in synthetic biology, as if you could
use more than one in the same cell without causing crosstalk. Or,
indeed, as if it's even the most efficient way to design a
"programmable" cell.
What I was suggesting rather, is that rather than writing the following
using existing science-oriented markup formats, which is dense, ugly and
hard to collaboratively edit or even edit individually using
revision-controlled software:
=======================================================
> My DNA thing. Nucleotides 1-100 are domain 1. Nucleotides 101-150, and
151-200, are identical ORFs, and both are optimised for expression in
E.coli K12. 201-280 is a terminator, even though it doesn't look like it
because I was just mashing the keyboard when I wrote it.
gctagcattg catagtcgac tagtcgatca gtcagatcga tcgatagcta
gctagctgca tgcatcagta cgtcagcatg catcgatcag tcagtcagtc
atgctgcatg ctcaggtcgc tagactgatg ctagcatgct agcatgataa
atgctgcatg ctcaggtcgc tagactgatg ctagcatgct agcatgataa
tcgcatgcta gctgatgcta gctactagtc gatgcatgct agtcagcatg
catgcatcag actgcatgca cagactgcat
=======================================================
We could write in a more design-oriented markup language, something
resembling this pseudo-python, without making the false assumption that
the DNA behaves any differently just because of its pretty presentation:
=======================================================
def DomainOne: # Here, hack this domain into something useful.
gctagcattg catagtcgac tagtcgatca gtcagatcga tcgatagcta
gctagctgca tgcatcagta cgtcagcatg catcgatcag tcagtcagtc
def MyORF: # This is an ORF optimised for K12 expression
atgctgcatg ctcaggtcgc tagactgatg ctagcatgct agcatgataa
def RandomTerminator: # I just mashed the keyboard here rly.
tcgcatgcta gctgatgcta gctactagtc gatgcatgct agtcagcatg
catgcatcag actgcatgca cagactgcat
My_DNA_Thing = DomainOne + MyORF * 2 + RandomTerminator
=======================================================
Now, of course I chose Python as my model because I love python. I'm
talking about markup formats, but that's only a step away from a script
or a "language", and I don't have any shame in suggesting that DNA can
be "programmed" using machine tools, provided (again) that you don't
fall into the trap of thinking about the consequent code linearly.
For example, a class-like construct might be used to create a gene
on-the-fly from a passed amino string; this is basically the workflow of
your prototypical synthetic biology project, functionalised into one
chunk. And since that's a task that we repeat a lot, what's wrong with
functionalising it as part of our marked-up genome? You may find it
insulting, as it introduces von-neumann paradigms, but the way I see it
it just presents the genome in a more visible and readable format at the
engineering side of things, while making no difference to the final DNA.
I don't think there's much point imagining that we'll only ever hack DNA
as raw DNA code. Certainly we can only abstract what we sufficiently
understand, but right now that's enough to make some difference on the
way we represent DNA. We already use code to scan our sequences for
stuff that we can't, as humans, directly detect, even as we might
understand it. When's the last time you manually searched a kilobase
sequence for secondary structures? It's likely you used a function to do
it. I propose that when doing the reverse; when moving from analysing to
designing DNA, that we use those functions in the backend and write
readable code, so we can save ourselves the headache.
In any case, it's where I'm headed with some of my side projects right
now. It wouldn't take much to embed PySplicer's core class into a quick
function that codon optimised an amino sequence and embedded it into a
promoter/terminator sandwich, optionally daisy-chaining ORFs to make an
Operon. In that case, I can start sharing the "source code" of my DNA
projects as backend amino-dictionaries, functions, and a quick script of
compilation matter that's easy to comprehend. If you don't like it, you
don't have to collaborate. ;)