Benchling -- YAGDS - Yet Another Genome Design Software

836 views
Skip to first unread message

Nathan McCorkle

unread,
Sep 6, 2012, 2:23:17 PM9/6/12
to diybio
https://benchling.com/beta/
https://angel.co/benchling

still seems like a text file or wiki can do most of what I need in
relation to sequences and parts description... maybe in a year or two
one of these softwares will come out on top

--
Nathan McCorkle
Rochester Institute of Technology
College of Science, Biotechnology/Bioinformatics

Omri Drory

unread,
Sep 6, 2012, 7:09:22 PM9/6/12
to diy...@googlegroups.com
It's good to see so many efforts in this field - there's obviously a need for design tools combined with cheaper DNA synthesis.

I wish I knew more about them - hard to evaluate closed efforts.

http://genomecompiler.com shameless plug :-)

Cathal Garvey

unread,
Sep 7, 2012, 6:25:01 AM9/7/12
to diy...@googlegroups.com
Indeed, it would be nice to see the source here. Especially as they
compare themselves to a source-sharing platform.

Which is, of course, the same hypocrisy shared with/by github
themselves, but at least github is just an API layer atop a
fundamentally open source project; this thing isn't even that, by the
look of it.
--
www.indiebiotech.com
twitter.com/onetruecathal
joindiaspora.com/u/cathalgarvey
PGP Public Key: http://bit.ly/CathalGKey

Cathal Garvey

unread,
Sep 7, 2012, 7:57:58 AM9/7/12
to diy...@googlegroups.com
What bothers me most about these "github for biology" things is that you
can already use git version control for biological sequences. Sure, some
annotation layers would be nice; like an auto-translator or
site-searcher built into git-diff for comparing revisions, but
fundamentally it's just a body of text, like any other code
pre-compilation. Existing, fully matured revision control systems are
more than able to handle DNA and protein sequences.

I think what would be more useful than an ecosystem of dedicated tools
for handling DNA revisioning and collaborative coding (which *has
already been done*) would be a markup format for forward DNA design.

Before you say "why another markup, we've got millions!", we don't have
one for gene *design*, we have loads of formats repurposed from DNA
*study*. So, there's GenBank, which has all the annotation up top and
the sequence at the end; useless for "reading" the code. There's FASTA,
which technically supports "comments" but in actuality almost no
FASTA-supporting software permits comments.

None of them are really designed for real-time editing and active markup
of the *code*, because it's assumed that you aren't editing code
intelligently, you're just adding features to a piece of DNA that you
are studying. So the entire philosophy behind the markup is backwards
for synthetic biology.

So, what would be my desired features?
* Inline comments, perhaps using the hash-symbol "#"
* Semantic or Regex-ish alternatives to IUPAC such as [TA] instead of W
for "either T or A"
* Code variables.

As a basis, a simple markup format like this could end up being expanded
into a semantic-ish programming language. So you start out with a
simple, parse-able and readable markup for DNA that helps with design of
novel sequences, and allows collaboration.

From there, you could expand this onto a pseudo-code, with function
definitions and code generator functions. From parsers to interpreters.

This approach; design the code, not the code-sharing-platform, is in my
view a far more promising approach. Reinventing the wheel with a DNA-ish
spin isn't much use.

On 06/09/12 19:23, Nathan McCorkle wrote:
> https://benchling.com/beta/
> https://angel.co/benchling
>
> still seems like a text file or wiki can do most of what I need in
> relation to sequences and parts description... maybe in a year or two
> one of these softwares will come out on top
>

--

Bryan Bishop

unread,
Sep 7, 2012, 9:22:20 AM9/7/12
to diy...@googlegroups.com, Bryan Bishop
On Fri, Sep 7, 2012 at 6:57 AM, Cathal Garvey <cathal...@gmail.com> wrote:
So, what would be my desired features?
* Inline comments, perhaps using the hash-symbol "#"
* Semantic or Regex-ish alternatives to IUPAC such as [TA] instead of W
for "either T or A"
* Code variables.

It doesn't meet all of your requirements, but have you used sbml and libsbml?

--
- Bryan
http://heybryan.org/
1 512 203 0507

John Griessen

unread,
Sep 7, 2012, 9:54:33 AM9/7/12
to diy...@googlegroups.com
On 09/07/2012 08:22 AM, Bryan Bishop wrote:
> * Semantic or Regex-ish alternatives to IUPAC such as [TA] instead of W
> for "either T or A"
> * Code variables.

Like the idea of making a language with variables. Which compiler language is closest?
What about python? The compiling on the 2nd run done by python would be
good enough for speed it seems to me. But maybe I'm underestimating the scale of dealing with DNA.

XML seems to heavy/slow for DNA variables. Soon, we'll be able to figure probabilistic
"rates-of-success" for some kinds of DNA code changes and use those variables to generate
a useful randomized set of DNA code changes to try on specific bits or sections.

Another probabilistic variable would be time rates of enzyme or RNA cut and paste synthesis reactions
to see if feasible to do a certain way.

Omri Drory

unread,
Sep 7, 2012, 2:05:23 PM9/7/12
to diy...@googlegroups.com
Hi Cathal, I just forward your comments to my developers - we will try to find a way to add them to Genome Compiler.

Omri

Nathan McCorkle

unread,
Sep 7, 2012, 2:23:32 PM9/7/12
to diy...@googlegroups.com
It also seems prudent to have DNA compression an option in the
software, there's no need to store ATGC as ASCII... enabling non-ASCII
DNA storage, along with compression, will ensure any software won't be
painfully slow in the coming years when we really start to see
designer episomes and then genomes.
> --
> You received this message because you are subscribed to the Google Groups "DIYbio" group.
> To post to this group, send email to diy...@googlegroups.com.
> To unsubscribe from this group, send email to diybio+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Cathal Garvey

unread,
Sep 7, 2012, 3:29:25 PM9/7/12
to diy...@googlegroups.com
Is the overhead of dealing with lots of DNA really that much, though?
Given that most DNA-topology/chemistry effects are pretty local in
scope, and broader-scale sequence analysis is already "sorted" by
available bioinformatic tools, the remaining computational unit of
genome compilation is.. the gene.

And genes are kilobases in scale. As 1 base equates to about 2 bits in
terms of information density.. it's not really a big task. Once each
sub-unit of your genome is "compiled" it just gets concatenated onto the
growing genome, and you're talking about maybe a few megabytes for a
bacterial genome or a few gigabytes for a plant.

DNA's amazing because it can do so much with so little "information",
even though DNA itself has far more meta-information than binary, based
on the inter-base interactions and stacking/context effects, etc...

Nathan McCorkle

unread,
Sep 7, 2012, 4:09:27 PM9/7/12
to diy...@googlegroups.com
On Fri, Sep 7, 2012 at 3:29 PM, Cathal Garvey <cathal...@gmail.com> wrote:
> And genes are kilobases in scale. As 1 base equates to about 2 bits in
> terms of information density.. it's not really a big task. Once each
> sub-unit of your genome is "compiled" it just gets concatenated onto the
> growing genome, and you're talking about maybe a few megabytes for a
> bacterial genome or a few gigabytes for a plant.

unless you need to add methylation info, etc... 2 bits compared to
ASCIIs 8 bits is still a 25% reduction in storage and transmission
bandwidth. The only thing code-wise that would need figuring out is
the bitwise translation table, i.e. a standard on which letters
correspond to 00, 11, 01, 10

Even if you figure bytes are good enough for single bases, we could
just encode the regex [TA] wobble stuff with other letters (ASCII
codes)... but does this exist already? Some standard that people have
thought about and actually works?

> DNA's amazing because it can do so much with so little "information",
> even though DNA itself has far more meta-information than binary, based
> on the inter-base interactions and stacking/context effects, etc...

Sure it's amazing, but there isn't magically more data than not, we
don't capture any of the base stacking, interaction, or electron
'cloud' context in ASCII plaintext.... there are 6 bits of info being
wasted there now (AFAIK)

Cathal Garvey

unread,
Sep 7, 2012, 5:19:24 PM9/7/12
to diy...@googlegroups.com
I was wondering about direct base-to-byte conversions lately, in the
reverse context; how to encode arbitrary binary data in DNA. Before you
even *touch* upon that, you've got to firmly establish ways to avoid
introducing unstable or meddlesome DNA motifs into the greater DNA molecule.

That is, unlike binary digits, DNA bases can and do interact with one
another in potentially troublesome ways. It may be representable in
binary form as a simple string of bases, but the real-world molecule is
far messier and more complex.

To grant wiggle-room to achieve data-to-DNA encoding, I think you have
to sacrifice DNA "compression" for more redundant encoding options. So,
just like cells use redundant codons, your binary-to-DNA code would have
to be redundant, allowing you to choose different "encodons" for
different local contexts.

Anyway, not a job I care to undertake atm!

Anselm Levskaya

unread,
Sep 7, 2012, 5:48:08 PM9/7/12
to diy...@googlegroups.com
There's too much comparison going on here with von neumann machine code.

DNA "code" is intrinsically nonlocal, i.e.:
- when you're editing a protein-coding ORF, there are long-range
correlations across the entire structure of the gene that corresponds
to the 3D interactions that define the shape and dynamics of the
protein.
- when you're playing around with regulatory elements (cis factors)
and their protein regulators (trans factors), you're also dealing with
a network of interactions that are non-local and complexly
intertwingled across the sequence.
- eukaryotic cells add a whole new level of state-machine logic on top
of local (de)compaction as mediated by chromatin remodelling systems
and poorly-understood insulator/enhancer systems.

So you can't just expect to write useful linear source code unless
you're doing the most primitive cut-and-paste style synbio of a
handful of unmodified genes. i.e. just screwing around. (which is fun
and great - but not really what synbio is ultimately about)

Abstractions work in machine code because we built the machines to
make abstractions possible. Natural cells don't work like that. The
closest thing to cells in the machine code world are demoscene x86
assembly blobs -- filled with insane hacks to make something awesome
work in a small amount of space, with lots of weird code and data
reuse and generative magic.

It's worse than that though. Never forget that the true "compiler" of
DNA is Physics, specifically that of protein folding, conformational
dynamics, and catalysis. It won't be simulated anytime soon with
anything approaching useful kinetic accuracy. (It's not clear if
we'll ever get kT-accurate quantum simulations of correlated electron
wavefunctions that scale to protein-sized systems, though there is
some hope in the far future with exotic computing architectures.)

So ultimately what software useful for synthetic biology is going to
do is help us curate all of our brute-force efforts to build and
screen libraries of pseudorational libraries of proteins, pathways,
and cells. -Not- provide shitty abstraction layers that rest upon our
incredibly shaky understanding of what's going on circa 2012. Think
of curating an incredibly complicated genetic-programming run across a
hundred-thousand clusters -- that gives a better sense of the flavor
of what's needed.

We invented the light-build long before we understood QFT. We'll be
building amazing cellular machines long before we really understand
them quantitatively. Synbio's (and diybio's) biggest sin is
repeatedly elevating the convenient metaphors with EE/CS into a
slick-looking action plan that doesn't respect the fundamental
differences between these machine architectures.

-a

Bryan Bishop

unread,
Sep 7, 2012, 6:31:55 PM9/7/12
to diy...@googlegroups.com, Bryan Bishop, Anselm Levskaya
On Fri, Sep 7, 2012 at 4:48 PM, Anselm Levskaya wrote:
We invented the light-build long before we understood QFT.  We'll be
building amazing cellular machines long before we really understand
them quantitatively.  Synbio's (and diybio's) biggest sin is
repeatedly elevating the convenient metaphors with EE/CS into a
slick-looking action plan that doesn't respect the fundamental
differences between these machine architectures.

I don't think that's entirely DIYbio's fault. Synthetic biology has been telling all the programmers that biology is just like programming for almost 10 years now, if not more. So, there's a lot of hype you have to cut through. But, programmers misinterpret this as pessimism instead of fact. I am more optimistic than anyone, overly optimistic, wildly optimistic about things, which is hilarious when people accuse me of pessimism when pointing out that DNA isn't like software programming.

"... there is no source, the bytecode has multiple reentrent abstractions, is unstable and has a very low signal to noise ratio, the runtime is unbootstrappable, the execution is nondeterministic, it tries to randomly integrate and execute code from other computers... multiple reentrant and self-modifying abstractions. absolutely everything has subtle side effects."

Having said that... it's clear that there are many programmers that want to do genetic engineering projects. There's a bunch of bioinformatics tools that need contributors, and device firmware needs to be rewritten for lab equipment, etc. Or compression formats, that's useful too.

John Griessen

unread,
Sep 7, 2012, 7:29:36 PM9/7/12
to diy...@googlegroups.com
On 09/07/2012 04:48 PM, Anselm Levskaya wrote:
> Abstractions work in machine code because we built the machines to
> make abstractions possible. Natural cells don't work like that.

So, that's why I was suggesting using variables for guiding experiments,
not "coding" traits like in a procedural language to run on a processor.

John
-----
EE, machine designer, etc.

Omri Drory

unread,
Sep 7, 2012, 10:04:09 PM9/7/12
to diy...@googlegroups.com
Hi Anselm, I totally agree - simulation in biology are usually crap and will be so until a true molecular dynamics for more then a small peptide will be computational possible. I also agree that many "shitty" efforts comes from people who never used a pipet and don't really understand the messy nature of biology trying to just copy what worked in CS/engineering to biology.

We should be able to create amazing things WAY before we understand everything. My goal in Genome Compiler is to design to a point and then let libraries, high throughput experimentation, MAGE like genome editing and directed evolution to help us go through the search space in an economic way.

Also cheap DNA is important - hence the need for what Cambrian and others are doing.

Omri

Gavin Scott

unread,
Sep 8, 2012, 5:04:38 PM9/8/12
to diy...@googlegroups.com


On Friday, September 7, 2012 3:09:52 PM UTC-5, Nathan McCorkle wrote:
unless you need to add methylation info, etc... 2 bits compared to
ASCIIs 8 bits is still a 25% reduction in storage and transmission
bandwidth.

75% actually (but even in this week's all-the-textbooks-are-going-to-have-to-be-rewritten-AGAIN issue of Nature, someone makes the mistake of saying there's only "one bit of information per DNA base-pair" rather than two).

One option is to simply wrap your nice human-readable storage format with something like gzip compression which will likely give you about the same storage reduction as a native binary format. This obviously won't help for in-memory manipulation, but I think most good tools already translate ASCII into an optimized internal binary format for efficiency. Also in other contexts it's common for tools to recognize files with an additional .gz filename extension as a compressed version of whatever the name is with the .gz stripped off.

I just did a trivial test with a 900 kilobase FASTA file and WinZip was able to compress it by 73%, very close to the 75% you'd expect from a simple binary encoding.

G.

Gavin Scott

unread,
Sep 8, 2012, 5:33:28 PM9/8/12
to diy...@googlegroups.com
On Friday, September 7, 2012 5:31:58 PM UTC-5, Bryan Bishop wrote:
[...] people accuse me of pessimism when pointing out that DNA isn't like software programming.

"... there is no source, the bytecode has multiple reentrent abstractions, is unstable and has a very low signal to noise ratio, the runtime is unbootstrappable, the execution is nondeterministic, it tries to randomly integrate and execute code from other computers... multiple reentrant and self-modifying abstractions. absolutely everything has subtle side effects."

Tragically this looks like a very accurate description of several computer software systems I've had to deal with over the years :)
 
Even though you have to keep in mind that ultimately it's the physics by way of chemistry that control the "execution" of your proteins, I think there's still a lot of good work that can be done in silico working with higher abstractions, just as the average computer programmer does not have to worry about the "digital abstraction" and all the analog stuff going on inside the CPU chip to give the appearance of that nice discrete deterministic system.

The biochemists are always going to be the ones who come first just the way you can't have software without hardware to run it on, and, for the foreseeable future at least, doing protein design in a SynBio context is going to be all about the chemistry and molecular dynamics rather than "programming".

But I think a description of the cell as a "stochastic digital information processing system" is valid in many respects, and as such represents an entity that is "hackable"[1] in the same way a computer system is. In fact my current "Why biology is cool" speech starts out by describing the activity as:

Hacking into alien[2] computer systems looking for advanced technology we can steal to unravel mysteries of the universe, solve the energy crisis, and cure cancer.

G.

[1] In the old senses of the term "hack" as in http://www.outpost9.com/reference/jargon/jargon_44.html#SEC51
[2] Not designed by human intelligence

Anselm Levskaya

unread,
Sep 8, 2012, 7:30:12 PM9/8/12
to diy...@googlegroups.com
> Even though you have to keep in mind that ultimately it's the physics by way
> of chemistry that control the "execution" of your proteins, I think there's

Proteins don't "execute". They fold and unfold. They bounce into
things a billion times a second. They stick to things. They unstick
from things. They wiggle, sometimes into new shapes depending on
what's sticking to them or if they were tagged by other proteins.
Sometimes they catalyze chemistry.

> still a lot of good work that can be done in silico working with higher
> abstractions, just as the average computer programmer does not have to worry
> about the "digital abstraction" and all the analog stuff going on inside the
> CPU chip to give the appearance of that nice discrete deterministic system.

Let me reemphasize the point here. Programmers can exist because WE
built computers -explicitly- to support those abstraction layers. The
Wizards of EE formed a powerful magical convenant that protects all
the gentle digital denizens from concerning themselves with the
horrors of physical reality that lays sealed beneath the woven
lithography. It took them decades and a trillion dollars to build
those magical seals.

In biology we haven't even begun to form powerful enough magic to seal
away the chaos of physical reality. Over the next few decades we'll
almost certainly rebuild simple microbial cells (piecemeal, haltingly,
not all at once) with an increasingly modularized set of signaling
components and metabolic cores whose behavior we'll have -evolved- to
be isolated and predictable. It will take bajillions of manhours to
do that, and it will almost entirely be done by limited guesswork and
brute-force screening (i.e. traditional engineering). Only once we've
untangled the gordian knot of the cell will we be able to construct
these magical abstraction layers atop it.... and they'll probably be
leaky layers at that.

> But I think a description of the cell as a "stochastic digital information
> processing system" is valid in many respects, and as such represents an

That's a better description of an animal nervous system than it is of
a cell. Cells exist to replicate, not process information. The vast
majority of their complexity exists to reconvert organic matter into
copies of themselves and to organize their material architecture, not
'process information'.

> entity that is "hackable"[1] in the same way a computer system is. In fact

Yes, but what -isn't- hackable? Look, my gripe here is that cells are
really -nothing- like a von neumann machine. They're both nonlinear
dynamical systems that happen to carry lots of "code" that controls
their evolution in time. That's the strongest similarity. Cells
deserve more than crappy metaphors to other kinds of systems. If
y'all really want to improve how we engineer cells, it's worth taking
a few years to begin understanding how they really work.

As Bryan pointed out, SynBio suffered for a long time under the
domination of a naive pack of EE/CS enthusiasts who couldn't pull the
blinkers from their eyes to see that they weren't operating in the
same kind of world anymore. My recommendation to DiyBio enthusiasts
is not to repeat their mistake.

-a

Omri Drory

unread,
Sep 8, 2012, 10:25:58 PM9/8/12
to diy...@googlegroups.com
<3 Anselm reality field - one might say it's the opposite force for the late Steve Jobs reality distortion field :-)

Don't fall into the "too complex" trap also - we can already do amazing important things today - also think exponentially not linearly about the future and you will find the future will be here faster then you think.

Nathan McCorkle

unread,
Sep 10, 2012, 2:48:13 AM9/10/12
to diy...@googlegroups.com
On Sat, Sep 8, 2012 at 7:30 PM, Anselm Levskaya <levs...@gmail.com> wrote:
>> Even though you have to keep in mind that ultimately it's the physics by way
>> of chemistry that control the "execution" of your proteins, I think there's
>
> Proteins don't "execute". They fold and unfold. They bounce into
> things a billion times a second. They stick to things. They unstick
> from things. They wiggle, sometimes into new shapes depending on
> what's sticking to them or if they were tagged by other proteins.
> Sometimes they catalyze chemistry.

The definition of execute merely masks abstracted actions though...
"to carry out; to perform, to do" is what dictionary.com says for
'execute'

So I definitely agree that proteins execute, but the system is
currently a very black box.

> Yes, but what -isn't- hackable? Look, my gripe here is that cells are
> really -nothing- like a von neumann machine. They're both nonlinear
> dynamical systems that happen to carry lots of "code" that controls
> their evolution in time. That's the strongest similarity. Cells
> deserve more than crappy metaphors to other kinds of systems. If
> y'all really want to improve how we engineer cells, it's worth taking
> a few years to begin understanding how they really work.

Well I think most discussion refers to 'programming' cells, unless I
missed something I don't think I've ever really seen them compared to
von neumann architecture (or self-replicating machines, if that's what
you meant; at least not on DIYbio). Programming in the sense I think
of is 'instilling a pattern'... I can program waterways by adding dams
or locks, I can program the placement dominoes to fall in a certain
way, I can program a bunch of metal into a grandfather clock, I can
even program mental traits into young people.

> As Bryan pointed out, SynBio suffered for a long time under the
> domination of a naive pack of EE/CS enthusiasts who couldn't pull the
> blinkers from their eyes to see that they weren't operating in the
> same kind of world anymore. My recommendation to DiyBio enthusiasts
> is not to repeat their mistake.

There are similarities for sure, because a pattern of things exists, a
sort of logic... but yes, there are definitely things in synBio that
require knowledge of chemistry and physics aside from the ideas of
programming.

Computer-hackers should study a bit of developmental biology and
they'll quickly see the similarity to crazy voodoo "demoscene x86
assembly blobs -- filled with insane hacks to make something awesome
work in a small amount of space, with lots of weird code and data
reuse and generative magic."... and then start to feel like they're
drowning because the crazy developmental process just goes further and
further down some rabbit hole.

In the end, it's all for glowing unicorn horn cats, and multi-assed
monkeys! Worthy goals!

Cathal Garvey

unread,
Sep 10, 2012, 12:07:43 PM9/10/12
to diy...@googlegroups.com
I think I may have been misunderstood.

We already represent DNA in a linear form, even though we understand
that it *acts* nonlinearly. Indeed, DNA *is* linear, molecularly
speaking. When we design new DNA, we are generally designing new linear
molecules of code.

There is, of course, a pervasive misunderstanding in Synthetic Biology
that DNA can be programmed using the same idioms as von neumann machine
code. I sometimes laugh, sometimes sigh whenever I see genetic systems
being touted as "NAND" or "NOR" gates that can enable combinatorial
logic and escalating complexity in synthetic biology, as if you could
use more than one in the same cell without causing crosstalk. Or,
indeed, as if it's even the most efficient way to design a
"programmable" cell.

What I was suggesting rather, is that rather than writing the following
using existing science-oriented markup formats, which is dense, ugly and
hard to collaboratively edit or even edit individually using
revision-controlled software:
=======================================================
> My DNA thing. Nucleotides 1-100 are domain 1. Nucleotides 101-150, and
151-200, are identical ORFs, and both are optimised for expression in
E.coli K12. 201-280 is a terminator, even though it doesn't look like it
because I was just mashing the keyboard when I wrote it.
gctagcattg catagtcgac tagtcgatca gtcagatcga tcgatagcta
gctagctgca tgcatcagta cgtcagcatg catcgatcag tcagtcagtc
atgctgcatg ctcaggtcgc tagactgatg ctagcatgct agcatgataa
atgctgcatg ctcaggtcgc tagactgatg ctagcatgct agcatgataa
tcgcatgcta gctgatgcta gctactagtc gatgcatgct agtcagcatg
catgcatcag actgcatgca cagactgcat
=======================================================

We could write in a more design-oriented markup language, something
resembling this pseudo-python, without making the false assumption that
the DNA behaves any differently just because of its pretty presentation:
=======================================================
def DomainOne: # Here, hack this domain into something useful.
gctagcattg catagtcgac tagtcgatca gtcagatcga tcgatagcta
gctagctgca tgcatcagta cgtcagcatg catcgatcag tcagtcagtc

def MyORF: # This is an ORF optimised for K12 expression
atgctgcatg ctcaggtcgc tagactgatg ctagcatgct agcatgataa

def RandomTerminator: # I just mashed the keyboard here rly.
tcgcatgcta gctgatgcta gctactagtc gatgcatgct agtcagcatg
catgcatcag actgcatgca cagactgcat

My_DNA_Thing = DomainOne + MyORF * 2 + RandomTerminator
=======================================================

Now, of course I chose Python as my model because I love python. I'm
talking about markup formats, but that's only a step away from a script
or a "language", and I don't have any shame in suggesting that DNA can
be "programmed" using machine tools, provided (again) that you don't
fall into the trap of thinking about the consequent code linearly.

For example, a class-like construct might be used to create a gene
on-the-fly from a passed amino string; this is basically the workflow of
your prototypical synthetic biology project, functionalised into one
chunk. And since that's a task that we repeat a lot, what's wrong with
functionalising it as part of our marked-up genome? You may find it
insulting, as it introduces von-neumann paradigms, but the way I see it
it just presents the genome in a more visible and readable format at the
engineering side of things, while making no difference to the final DNA.

I don't think there's much point imagining that we'll only ever hack DNA
as raw DNA code. Certainly we can only abstract what we sufficiently
understand, but right now that's enough to make some difference on the
way we represent DNA. We already use code to scan our sequences for
stuff that we can't, as humans, directly detect, even as we might
understand it. When's the last time you manually searched a kilobase
sequence for secondary structures? It's likely you used a function to do
it. I propose that when doing the reverse; when moving from analysing to
designing DNA, that we use those functions in the backend and write
readable code, so we can save ourselves the headache.

In any case, it's where I'm headed with some of my side projects right
now. It wouldn't take much to embed PySplicer's core class into a quick
function that codon optimised an amino sequence and embedded it into a
promoter/terminator sandwich, optionally daisy-chaining ORFs to make an
Operon. In that case, I can start sharing the "source code" of my DNA
projects as backend amino-dictionaries, functions, and a quick script of
compilation matter that's easy to comprehend. If you don't like it, you
don't have to collaborate. ;)

Bryan Bishop

unread,
Sep 10, 2012, 12:16:18 PM9/10/12
to diy...@googlegroups.com, Cathal Garvey, Bryan Bishop
On Mon, Sep 10, 2012 at 11:07 AM, Cathal Garvey <cathal...@gmail.com> wrote:
For example, a class-like construct might be used to create a gene
on-the-fly from a passed amino string; this is basically the workflow of
your prototypical synthetic biology project, functionalised into one
chunk. And since that's a task that we repeat a lot, what's wrong with
functionalising it as part of our marked-up genome? You may find it
insulting, as it introduces von-neumann paradigms, but the way I see it
it just presents the genome in a more visible and readable format at the
engineering side of things, while making no difference to the final DNA.

So uh, why not just use python for spitting out the DNA? Seems to work in your example.

Omri Drory

unread,
Sep 11, 2012, 2:18:37 AM9/11/12
to diy...@googlegroups.com
Hi Cathal, I think you are doing similar mistake to the one you mentioned (NOR gates and the such) by trying to use computer like code for genetic engineering. The way I see it, and the way we are trying to implement it in Genome Compiler, is by graphical "drag and drop" interactions. We want and will add a scripting panel for this in the future but I think that graphical representations of DNA code (like SBOLv) where you can zoom in and out from the machine code (ATGC) to the translated code (20 AA) to the annotation layer (genetic parts) is the way biologies would like to work.

I don't see people programming in genetic code like this in the future:
10 //get code: aattccg
20 { if code "a" then whatever}

Biology is way to messy for that. 

I do see people putting together genetic parts in a graphical way (also libraries and combinatorial parts assemblies), ordering the DNA and doing the transformation/experimentation. 

Nathan McCorkle

unread,
Sep 11, 2012, 2:46:41 AM9/11/12
to diy...@googlegroups.com
Omri, the graphical way is nice, but I think only in the way that
labview is nice as compared to writing C code for an Arduino... it's
convenient, but not much different than how cathal is breaking up the
'parts' into def statements, etc...

If the graphical representation took into account environmental
variables and did rendering of the structures based on hairpins and
stuff, when you zoom out, that would be a good step in the direction I
think Anselm is nudging us toward.

If your tool is still just shuffling and versioning ASCII strings,
then there are going to be people like Bryan who are just going to use
VI or emacs anyway. The way to get a 'Bryan' some real power, is to
incorporate a bit of physic and chemistry... to 'compile' or at least
attempt to do so with all the folding prediction tools we've got,
otherwise I think you just have a fancy text editor.
> --
> You received this message because you are subscribed to the Google Groups
> "DIYbio" group.
> To post to this group, send email to diy...@googlegroups.com.
> To unsubscribe from this group, send email to
> diybio+un...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/diybio/-/L15G0xEoUusJ.
>
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



Omri Drory

unread,
Sep 11, 2012, 3:42:05 AM9/11/12
to diy...@googlegroups.com
and yet most people use "fancy text editors" like word and not VI :-)

What we built so far in Genome Compiler (started the real work on Q1 2012, released first product in June 1st/2012 first big (er) update at end of August 2012) is a nice looking editor with ever increasing capabilities. We think it's already useful for people and hope to become sustainable so we could build all the *really* advanced things you ask for. 

Cathal Garvey (Android)

unread,
Sep 11, 2012, 5:45:56 AM9/11/12
to diy...@googlegroups.com
I really don't get that argument. Whether you drag and drop code (ie a fanciful way of cut/pasting) or write it manually or script it procedurally is ultimately irrelevant. The code is always linear, and the execution of that code is always parallel.

Asking whether it's possible to achieve the desired outcome with one or another tool is like asking if it's possible to walk to the shop in tennis shoes or boots.
--
Sent from Android with K-9 Mail and APG.

Omri Drory <om...@genomecompiler.com> wrote:
To view this discussion on the web visit https://groups.google.com/d/msg/diybio/-/tniw3hTbqRYJ.

Omri Drory

unread,
Sep 11, 2012, 7:58:46 AM9/11/12
to diy...@googlegroups.com
yes, the code is linear - why not? this is a accurate representation of the code. The fact it folds in 3D space around Histones and that you can read it in many places at once shouldn't change anything. 

You might want (and we definitely plan to build) a way to create many different variations (libraries, combinatorial metabolic pathways, etc) - but that can also be designed in a graphical way.

How do you see the future of genetic engineering design? I would definitely want to know. 

----------------------

What I'm really trying to do is look into the future, imagine the capabilities/price we would have for DNA synthesis, construction and high throughput experimentation and figuring out what tools we would use then to do real work. What works in plasmids doesn't work for metabolic pathways and whole genome design/editing is even harder and with different requirement then that.  


-----------------------

Cathal Garvey

unread,
Sep 11, 2012, 8:08:08 AM9/11/12
to diy...@googlegroups.com
> How do you see the future of genetic engineering design? I would
definitely
> want to know.

I see biohacking travelling down an analogous route to other coding
disciplines, albiet without as much code fragmentation (because
everything already uses a universal "instruction set"). That is, I don't
see people exclusively coding in raw DNA, markup-scripting or
graphical-only formats, but rather an ecosystem of the lot. Much like
you can code in machine code, python, or scratch, and in the end it's
all programming.

You may be misunderstanding my intentions here, if you think that I'm
against graphical interfaces. I'm not at all; but ask your programmers
what happens under the graphics and they'll tell you it's a
code-shuffling program, same as what I'm making here. Before you can
make "pretty" overlays, you have to make "expert mode" libraries and
classes that handle the hard work.

Put another way; the code I'm creating for PySplicer (which I want to
turn into a more general toolkit than just codon optimisation!) can be
used to make scripting languages of the sort I prefer, or graphical
tools of the sort you prefer. The user interface to the underlying code
is just a use-philosophy, the code underneath doesn't need to be any
different.
>>> To post to this group, send email to diy...@googlegroups.com<javascript:>
>>> .
>>> To unsubscribe from this group, send email to
>>> diybio+un...@googlegroups.com <javascript:>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msg/diybio/-/tniw3hTbqRYJ.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>

Omri Drory

unread,
Sep 11, 2012, 10:35:34 AM9/11/12
to diy...@googlegroups.com
sure, the code is DNA. The end result (the compiled program) is a physical DNA molecule one use to transform living things. It doesn't matter how you design it - the end result is the same.

I really think that the look, feel and ease of use (+ capabilities) are very important if we were ever to democratize this tools of creation (quite literally!) and have a truly broad DIYbio revolution.

Omri

John Griessen

unread,
Sep 11, 2012, 11:00:12 AM9/11/12
to diy...@googlegroups.com
On 09/11/2012 09:35 AM, Omri Drory wrote:
> I really think that the look, feel and ease of use (+ capabilities) are very important if we were ever to democratize this tools
> of creation (quite literally!) and have a truly broad DIYbio revolution.

http://www.genomecompiler.com says osx and windows and a little about using some MIT licensed libraries.
Looks like software as a product, right? So, you seem to be
saying "look, feel and ease of use (+ capabilities) are very important", as in Our way should be the one way.

...and then you mention "have a broad revolution", but using your way only, right? Not a bunch of ways with FOSS tools?

Omri Drory

unread,
Sep 11, 2012, 11:29:03 AM9/11/12
to diy...@googlegroups.com
Hi John, why the attitude? Our software is free to use and you don't have to use it. 

It's our way if you use our software (and we are open for feedbacks and we change it with user requirements). You are more then welcomed to think about your own way, built it and put it out to the world.

I'm not part of the holy war between close/open/whatever - I want to build tools for myself and others like me who want to build things in biology. 

Cathal Garvey

unread,
Sep 11, 2012, 11:56:58 AM9/11/12
to diy...@googlegroups.com
I think the attitude comes from being told how to make a "democratised"
ecosystem for biotech, coming from a closed-source vendor, in discussion
of an open format for the same ecosystem. :)

Democratised ecosystems are build from FLOSS software. By definition, if
you're forced to use closed source software, the ecosystem isn't
democratised, because you have no control over it, or say in the process
of its design, or even a direct insight into its processes.

Designing a ground-up Free Libre Open Source format for DNA design, on
which anything from a CLI interface or a rich webapp can be built:
that's a democratised process. That's what I'm trying to contribute
towards. I value your input of course, but you don't seem to appreciate
that your tools, while really impressive, cannot (by definition) provide
the free, democratised ecosystem that we desire. Heck, I can't even use
it on Linux without installing outdated closed source software with
known security holes. That's fine; it's just not what I need.

Omri Drory

unread,
Sep 11, 2012, 12:13:29 PM9/11/12
to diy...@googlegroups.com
hmmm, a democratized ecosystem for content creation was created on closed sourced MAC's and iPhones (just one example from many). We use open source file formats (SBOL is and I guess genbank and fasta formats are open). Our software is free and no one *has* to use it. If we were to build the best tool for genetic design, will you not use it for spite? 

I guess time will tell. If we won't build the tools people like you and others want to use, then we will go the way of the dodo :-)

Again, why the attitude? there are no bad people here - just people passionate about this field.  

On Tuesday, September 11, 2012 6:57:05 PM UTC+3, Cathal wrote:
I think the attitude comes from being told how to make a "democratized"

Omri Drory

unread,
Sep 11, 2012, 12:14:38 PM9/11/12
to diy...@googlegroups.com
ps - funny this discussion is under the headlines of yet another startup (for profit!) in this field.

Cathal Garvey

unread,
Sep 11, 2012, 5:36:33 PM9/11/12
to diy...@googlegroups.com
Don't worry, there's no attitude or hostility here; email is a bad
vector for disagreement. I don't bear you any ill will, but that doesn't
mean I have any intention of using the software you develop.

I'm curious now, as your definition of "democratised" really doesn't fit
my image of democracy at all: in what way is the Apple "ecosystem"
democratised? To me it resembles something more like China; a capitalist
sort of dictatorship where the hard work of achieving success is
outsourced, and the system is ruthlessly gate-keeped to prevent outside
contribution or disturbance.

Let's follow that analogy a bit, then? I have an app on my phone
(Android, Cyanogenmod firmware; a case-study in Open Source's advantages
all to itself) called "Textsecure"[1], which allows me to exchange
normal SMS messages with anyone, or fully cryptographically protected
SMS's with other Textsecure users.

I was able to download the source code for this App, compile it, sign it
myself, and install it on my phone, without paying to get a developer's
license from anyone, or signing any licenses.

On Apple's "ecosystem", not only can you *not* do this, you can't even
find textsecure; it's not the sort of thing they'd even permit on
iPhones (hell, they don't even permit other browsers). Even if you
*could* find Textsecure, Apple are pretty brutal about open source
(because it threatens their somewhat fascist approach to user choice),
and you could never be sure that it wasn't altered by Apple or otherwise
compromised by them. And you can only install it through their marketplace.

So a closed source "ecosystem" is anything but. Rather than behaving
according to the ebb and flow of good or bad software choices, it
behaves according to an external director whose desires do not often
match the needs of the customer. Apple aren't a software or hardware
company, you understand; the are an expert Marketing company, able to
convince people to overlook these failings. And despite that, they are
*still* dying just like they did before, being rapidly replaced by
something far closer to an open system; Android.

Let's come back to Synbio, then. I recognise that your application
probably has some awesome features, but it will never meet my personal
needs, because those needs include most of Stallman's essential software
freedoms[2], and a few more practical engineering needs besides. It's
not spite if I refuse to use your software because it's not open source,
it's because the only family of software that has ever proven itself to
me to be trustworthy and reliable in the long run has been open source,
and it would take a great deal for you to convince me that losing my
"Essential Freedoms" is worth the convenience.

Don't take that as a criticism, it's just a basic incomptability.
Essentially, I'm not your target customer, and never will be unless you
change something very fundamental about the workings of your company.
Nothing to lament, then!

[1] - You can get this through the Android Market, but I elected not to
install that on my newly flashed phone. If anyone else felt likewise and
wants an APK signed by me (I promise I didn't install spyware), email
me. :) Source code here: https://github.com/WhisperSystems/TextSecure
[2]: https://www.gnu.org/philosophy/free-sw.html

Jonathan Cline

unread,
Sep 16, 2012, 11:35:33 PM9/16/12
to diy...@googlegroups.com, jcline


On Tuesday, September 11, 2012 4:58:46 AM UTC-7, Omri Drory wrote:

How do you see the future of genetic engineering design? I would definitely want to know. 


There's already a great language for that.  It's called English.  The "future" of design will be done in formalized English.  Not in a graphical-description-language and not a terse mnemonic language.  The thousands of "Current Protocols" are already programs written in English.

See here: 

Don’t Train the Biology Robot: Have the Machine Read the Protocol and Automate Itself


http://88proof.com/synthetic_biology/blog/archives/290


That's some prior art, just in case anyone was thinking of patenting it.


## Jonathan Cline
## jcl...@ieee.org
## Mobile: +1-805-617-0223
########################


Omri Drory

unread,
Sep 17, 2012, 11:42:58 AM9/17/12
to diy...@googlegroups.com, jcline
Why the hell would I want to patent this? Didn't understand that remark.

John Griessen

unread,
Sep 17, 2012, 1:13:38 PM9/17/12
to diy...@googlegroups.com
On 09/17/2012 10:42 AM, Omri Drory wrote:
> Why the hell would I want to patent this? Didn't understand that remark.
>
>
> On Monday, September 17, 2012 6:35:33 AM UTC+3, Jonathan Cline wrote:

> http://88proof.com/synthetic_biology/blog/archives/290 <http://88proof.com/synthetic_biology/blog/archives/290>
>
>
> That's some prior art, just in case anyone was thinking of patenting it.

Prior art is one of the things that makes it more difficult to get a patent.
JC is talking in terms of patent busting.

Omri Drory

unread,
Sep 17, 2012, 3:44:44 PM9/17/12
to diy...@googlegroups.com
Ok :-)

I dislike patents - I think they will be the bane of our field if the atrocities of software patents will follow this space. Innovate don't litigate

Nathan McCorkle

unread,
Sep 17, 2012, 11:36:48 PM9/17/12
to diy...@googlegroups.com

Patents are better than closed source though, right?

--
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To post to this group, send email to diy...@googlegroups.com.
To unsubscribe from this group, send email to diybio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/diybio/-/CsIVv7U8vd0J.

Omri Drory

unread,
Sep 18, 2012, 6:10:49 AM9/18/12
to diy...@googlegroups.com
Ohh, dan't win with this crowd :-)

Yes - like you all running linux on everything and never using any even half closed software (google/github/social networks/phones/etc/etc). Again, I'm not into religious war and I'm not the close source corporate douchebag symbol you try to fit me to. I'm someone who want to create the best CAD tool for biology.

To quote someone else: "What have you done that's so great?" 

Cathal Garvey

unread,
Sep 18, 2012, 10:05:00 AM9/18/12
to diy...@googlegroups.com
Hey, again: I don't dislike you or your project. I'd love to use
GenomeDesigner. It's not that I feel any adversarial relationship
towards you guys, it's just that your project is incompatible with my needs.

Glad to hear you dislike patents, of course. ;)

On 18/09/12 12:10, Omri Drory wrote:
> Ohh, dan't win with this crowd :-)
>
> Yes - like you all running linux on everything and never using any even
> half closed software (google/github/social networks/phones/etc/etc). Again,
> I'm not into religious war and I'm not the close source corporate douchebag
> symbol you try to fit me to. I'm someone who want to create the best CAD
> tool for biology.
>
> To quote someone else: "What have you done that's so great?"
>
> On Tuesday, September 18, 2012 6:36:52 AM UTC+3, Nathan McCorkle wrote:
>>
>> Patents are better than closed source though, right?
>> On Sep 17, 2012 3:44 PM, "Omri Drory" <om...@genomecompiler.com<javascript:>>
>> wrote:
>>
>>> Ok :-)
>>>
>>> I dislike patents - I think they will be the bane of our field if the
>>> atrocities of software patents will follow this space. Innovate don't
>>> litigate
>>>
>>> On Monday, September 17, 2012 8:13:44 PM UTC+3, John Griessen wrote:
>>>>
>>>> On 09/17/2012 10:42 AM, Omri Drory wrote:
>>>>> Why the hell would I want to patent this? Didn't understand that
>>>> remark.
>>>>>
>>>>>
>>>>> On Monday, September 17, 2012 6:35:33 AM UTC+3, Jonathan Cline wrote:
>>>>
>>>>> http://88proof.com/synthetic_**biology/blog/archives/290<http://88proof.com/synthetic_biology/blog/archives/290><
>>>> http://88proof.com/synthetic_**biology/blog/archives/290<http://88proof.com/synthetic_biology/blog/archives/290>>
>>>>
>>>>>
>>>>>
>>>>> That's some prior art, just in case anyone was thinking of
>>>> patenting it.
>>>>
>>>> Prior art is one of the things that makes it more difficult to get a
>>>> patent.
>>>> JC is talking in terms of patent busting.
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "DIYbio" group.
>>> To post to this group, send email to diy...@googlegroups.com<javascript:>
>>> .
>>> To unsubscribe from this group, send email to
>>> diybio+un...@googlegroups.com <javascript:>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msg/diybio/-/CsIVv7U8vd0J.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>

Nathan McCorkle

unread,
Sep 19, 2012, 3:52:53 AM9/19/12
to diy...@googlegroups.com
As long as it's extensible somehow (plug-ins?) and you include an API
for getting data in, and getting data out... I think I might be
interested in it. But there is just too much software out there today
using different formats, you are losing a lot of users by not having
entrances and exits clearly marked.

Perhaps I wanted to interface with these sort of programs:
http://en.wikipedia.org/wiki/Molecular_design_software

They probably all have different APIs and file formats... we can't be
expected to reverse-engineer your protocols and file formats, which
might break with any new versions, and still be happy with your
software.

It's this kind of thing that is stalling 3D design and printing,
because none of the good closed-source CAD companies have an open or
standard format, and most projects end up using the somewhat
inappropriate STL file format. With DNA, we can't afford that kind of loss of
fidelity.
> https://groups.google.com/d/msg/diybio/-/nuTXiUjJz9wJ.
>
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



Omri Drory

unread,
Sep 19, 2012, 4:18:16 AM9/19/12
to diy...@googlegroups.com
One of the co-founders of the Synthetic Biology Open Language (SBOL) is our first employee (Cesar Rodriguez). We currently support export as txt file (more open then this?) and will support SBOL in the near future. We're not a closed garden - you can import and export your data. 
Reply all
Reply to author
Forward
0 new messages