Good god, what are you doing? *Here* is a standard protocol- :-) see,
we're already working on this.
No need to use confusing English. The computer can (later) generate
the confusing English if you insist. :-) Maybe you could review the
pcr.xml example and compare it to the PCR protocol that you wrote up
in your email?
Well, that's nice of you, but that's actually not my point (credit
isn't the point). The point is that a computer-readable format has
previously been discussed here (and I'm fine discussing it again of
course) and it just seems to be going backwards to have this full
block of English text about a protocol. I mean, it's good text of
course, but if we're going to be discussing specs for a machine, let's
just ask it to implement CLP-ML and be done with it. Of course, I
still think it's possible to come up with something better than
CLP-ML, or find errors in the pcr.xml example.
Some of the original discussions from way back when-
As Mac put it-
I think semantic representation of recipes / protocols is fascinating.
I could imagine using a tool to functionally define my starting
conditions and desired ending conditions and having it generate a
custom protocol for me by finding the set of protocols connected the
initial conditions to the ending conditions. Protocols are
essentially modular operations with defined inputs and outputs, so *in
principle* it would just be a matter of matching inputs and outputs
For example: how would I purify a 3kb insert from a plasmid carried by
a colony I have growing on a plate?
If it were possible to build a structured representation of laboratory
operations, and we avoided getting bogged down in the semantics (is it
a material? is it a reagent?), I could imagine such a system being
- synthesize custom protocols optimized to produce a final quantity of
a desired product
- optimize workflow by finding the minimum number of operations
required to reach a desired product
- keep better track of supplies in the lab (more real-time)
- form the basis of protocol walk-through educational tools.
So, anyway, I think there needs to be a way to ask whether or not the
machine can satisfy what the protocol is asking for. For instance, the
pcr.xml example isn't anything particularly fancy. However, in some
cases you need something genuinely fancy, and it would be nice to be
able to check whether or not a machine has the capabilities that
you're requesting of it. For instance, if you tag your pcr protocol
(or it has within it) a requirement about a certain maximum
temperature, then that should be within the operating range of the
thermocycler that you're considering to build, or whatever. In some
cases these are easily tweakable variables, and it doesn't matter much
and this looks like pre-emptive optimization. But as it turns out
these issues show up for every project :-) including the gel box
project, or the transilluminator, where there are different settings
and parameters you need to set everything up with. In practice, if the
thermocycler ends up with an ethernet connection, then you'd just
dump/cat the file to it, or something, or if it's USB-wired, you
wouldn't send it through /dev/eth0 but rather elsewhere like a mounted
So, I do mean 'tagging' when I say it- for instance, if Jonathan
packages up his hardware in a format that he and I have been
discussing (or maybe we haven't- I forget)- then we can tell
automatically whether or not the equipment can handle a certain
protocol that you want to execute, just by comparing two pieces of
information (two files): the name of the protocol's filename, the name
of the machine's metadata/spec/packaging file. Neat, huh? Makes
everything a lot simpler. So exploring the range of different
variables that we need to address (like the temperature ranges and
volume ranges) is an important first step. But also figuring out how
to express it in a computer-readable format. (Translators for taking
pcr.xml -> English are easier than English -> xml, for anybody
wondering- unless we have some computational linguistics people
sitting around with a few tricks up their sleeves).
For protocols, you mean? I don't know which part you're talking about
since your email quotes Jonathan Cline quoting Jake. So, if you're
asking me about a representation format for biological protocols,
there are three formats that I know of- one was introduced by Mac
(EXACT), and it turned out to have some haskell programming behind it
which was a pretty big plus. The other two formats are CLP-ML and PSL.
CLP-ML was found in the literature as a way to represent clinical
laboratory protocols in XML, so the authors of the paper of course
uploaded an XML DTD as the supplementary material. In this protocol,
you specify- exactly- in a way that is *validated* by the computer (so
it can yell at you if you are being somewhat ambiguous, or something)-
what is required for the experiment, the exact steps, which materials
in particular are involved, etc. etc. I even think they went as far as
doing some crazy voodoo magic with URIs for referencing materials, but
I don't know if that's necessary (it would be neat though, but I'm not
going to implement it right away- somebody else is welcome to, of
course). The other format, PSL, is a process-specification language
that was developed by NIST in the late 90s or something. It was used
for the representation of manufacturing automation processes, such as
to compress the process-information about an automobile factory (for
building a car) into a single 'recipe'. But it turns out that the
project is dead now, and CLP-ML, IMHO, seems more alive, even though
there's nobody using it other than the people on this list who know
about it. The EXACT examples were fairly neat- they had two haskell
scripts; one was to generate the instructions for culturing yeast in
batch, and another one was the core library files. However, oddly
enough they also supplied something that looked pretty neat-
human-readable text (not haskell) that was describing what you should
do at each stage, and there were pre/post-conditions .. the odd part
is that it wasn't generated by the haskell script, so I'm left
wondering what on earth was going on there- maybe they had some other
software that they aren't sharing? I haven't asked them to share it
with me yet, so I don't know if it exists or not, and anyway it's
something that we or I should write on my own, etc.
Now, if you mean hardware packaging formats, that's another entire can
of worms. Let me get to that next.
> Why not just build a spec sheet for it, maybe this format you mention is
> just that, but really it could all be in a text file, no need for CAD
> drawings, etc... just give the dimensions of everything, a list of the
> hardware and electronics, the wiring scheme, and links/examples to/of
> software that would properly address each hardware interface.
Spec sheets are one of the most awesome things that the age of
electronics has brought about. The problem is that the majority of
spec sheets are actually just encoded into PDFs, rather into a
standardized format. However, there actually has been some work here-
there's something called ECIX, which is a way to represent electronic
datasheets in a computer-parseable way. What this allows is for
electronic design software packages (such as related to VHDL, or VLSI,
or Verilog, or sometimes just PCB-related-electronics) to pick and
choose parts that have compatible specifications. Unfortunately there
is no ECIX integration with gEDA, the GNU electronic design automation
(gEDA) package (yet). I'm sure this will change in the future-
especially once there are more datasheets published in the ECIX
format. You can google around for it. As I recall, it's basically an
XML DTD, and there's an example dot XML file that references it for
the implementation of a Samsung timer IC, or something. It even had
pin geometry layout information, which is very useful for SMT
(pick-and-place-- the giant machines that pick off electronic
components from roles and put them on breadboards based off of some
PCB definition file for an electronic circuit).
I am not opposed to a text file. Basically a text file on Windows is
just something with the ".txt" file extension (but you already know
this), and you can change which program opens up those files by
default, blah blah blah. So, the whole point in working with PyYAML is
that (1) you can use the interactive python interpreter in real time
to play around with it (if you have to- you probably won't), and (2)
YAML is human-readable, and much more friendly than XML. So, if you
have a file like "pcr.yaml", you could associate that (on Windows, or
any other operating system) with notepad, or wordpad, and it's
practically a text file, except it'll just be used somewhat
differently. I don't know if this is too basic of an explanation or
not. I do agree that the dimensions should be put into this particular
file- it's what I've been calling a metadata file.
When you say a list of the hardware and electronics, what do you mean?
What are you listing in particular? What database are you referencing?
If you say resistor, which one? Is it from a digikey catalog, or
mouser catalog? The advantage of me writing all this software for all
you other folks (hehe) is that you don't have to deal with this nitty
gritty, and just say "these parameters have to be specified", and then
the kit can be ordered with those parts, or somebody can spend some
time and write an interface to the digikey/mouser/amazon/radioshack
catalogs. I heard Best Buy recently exposed their API over the web-
too bad they don't sell things that we actually need ;-).
What do you mean when you say "links to software that would properly
address each interface"? The way that I have been dealing with this is
an algorithm to check whether or not something that says "gives 3000
psi" is wired up to something that says "5 to 20 psi acceptable range"
(more or less- the format is more general than this). This way, two
parts can be checked for compatibility with a simple python function,
or a simple program that you won't have to look at (this is backend
programming stuff, for anybody who thinks this is horrible- and yeah,
I'm being overly elaborate, but I don't think I've explained these
things properly before).
An excellent example of packaging done right is debian. I don't know
if you've ever booted up ubuntu or debian or some other linux
installation. Basically, there is a package management system. Package
management means that you don't have to sit there and install
different software: you type a command, or select a few software
packages from a list, and because there were people who made the
packages in a certain way, they automatically install (or you can
configure them to do something weird). I highly encourage you to go
download and burn an ubuntu disc. And if you're worried about that,
and if you use Windows, go get "wubi", whcih is an easier way to
install ubuntu. Personally, I don't like Windows, but I just reference
it since I have no idea what people are familiar with, in an attempt
to be verbose and helpful.
There are some documents that explain what these things are, over the web.
Debian new package maintainer's guide
RPM package building guide (concise)
Wikipedia on package management systems in general
Finally, on the topic of CAD. I think that CAD is an important thing
to include. With freely available tools like HeeksCAD, BRLCAD,
avocado, etc., there's little excuse to not just go and play around
with the systems. Note that BRLCAD is going to be complicated on
Windows (I've never tried it)- so I recommend HeeksCAD for anyone who
might not know what they are doing.
Of course, you don't have to bother yourself with CAD- if you have
JPEG drawings, you can upload them to Ponoko, and they convert it into
a 2D CAD file. Sketchup, a Google app, also does something like that
IIRC. But really, it's not as terrible as the 80s with really funky
commands just to move objects around. Unless you use BRLCAD ;-)
(unfortunately, I'm not joking). Alternatively, you can talk with me
about the files or something and I'll be glad to package it up,
especially once the software starts working better- I still recommend
formats like IGES, STEP, etc., as we were recently discussing. Maybe
there's something more I can do to help people? Just speak up ..
I don't see how that differs from what I'm talking about. Same thing.
Maybe you can point out anything that differs significantly. For
instance, you say that your device has 16 holes under the thermocycler
lid (where you place the micro-tubes), and those tubes each have
separate but equal operating conditions, which is exactly what you're
trying to specify. That's the metadata that the format I'm proposing
is going to store (I would say it already sort of works, but I don't
have an entire working code suite working at the moment).
Also, you mentioned hyperterminal. That brings back some old, old
memories. Is that really how you use it? Are you building your own
thermocycler, and if so, are you wiring it up to your computer and
accessing it via hyperterminal? Just wondering. I wouldn't bother with
drag/drop for excel- if you need some help, one of us (maybe even me?)
could write a program to just save the data to an OpenOffice Calc
(spreadsheet) file, or something compatible with Microsoft Excel
(maybe a CSV file- a comma-separated value format).
Out of curiosity, does anyone else use hyperterminal with other lab equipment?
Standalone USB controller ICs are usually in a surface-mount package
and thus a bit of a pain to solder, but if someone wanted to set up
PCB fabrication with a fab shop once the design is further along, it's
generally pretty trivial to get the fab shop to solder surface-mount
components on for boards that are meant for kits. Limor Fried's
Boarduino (http://www.ladyada.net/make/boarduino/) has a USB version,
and if you order a USB Boarduino kit you get a PCB with the USB chip
presoldered and the rest of the components in a baggie.
Also, many higher-end microcontrollers speak USB natively. I have a
bit of experience with the Freescale 8-bit micros, which are well
within the price range for a DIY project (between $3 and $12 apiece).
These chips also come in surface-mount packages, but I have
successfully soldered a Freescale micro in a QFP package by hand, and
a SchmartBoard makes it even easier.
I strongly recommend going with a micro that speaks USB. They're
really not all that expensive, and can be programmed in C. I'd be
happy to contribute to or even take the lead on programming the
firmware for a Freescale-brained thermocycler. These chips have a ton
of GPIO lines that can be used to control the peltier junction, send
output to an LED display or LCD, control a ventilation fan (if needed;
that could just be always-on) and read input from a thermistor.
> A cheap little microcontroller on a fairly simple and small PCB should
> be able to manage all these functions. The atmega8 has 6 10-bit A/D-
> converters (for temp channels or other sensors), a couple timers and a
> real-time clock, three PWM channels (for driving things or speaker
> noises), it also has a bunch of I/O pins that can be used for lights
> or buttons.
I'm not sure that the 8K of Flash that the ATMega8 provides would be
enough to hold all the stored programs that are needed, but I tend to
be pessimistic about that sort of thing, and there are a lot of tricks
for optimizing program size. I'll have my Arduinos back in about a
week, though, so I can start screwing around with putting something
together on that platform.
Arduino images tend to be a bit bloaty, but one argument for using
that platform -- or possibly the new ATMega128 version -- is that the
barrier to entry for learning to code for them is extra-specially low,
so it might make sense to target the AVR on the grounds that it
becomes easier for people to contribute to the project. And, again,
you get onboard USB. Plus there's nothing stopping us from putting the
Arduino firmware on a custom PCB, should we so desire. This would
actually make it *really* easy to upgrade the thermocycler's firmware
-- plug in USB cable, load up new image, boom, you're done.
> Other features that would be nice would be a beeper or speaker for
> alarms and letting you know when certain points in the cycle are hit.
> Like adding Taq right at the right point after initial denaturation
> for hot start modes. It would also be nice to have it beep for awhile
> when done, or cool and hold your tubes until you retrieve them, or
> cool them down if nobody answers the beeps after awhile. (for
> forgotten rxns)
Ooh, I like those ideas. Easy to program, too.
> You could also just by broken
> cyclers for cheap and use the parts to build your own one with open
I may do just this when I get some free time.
I would also make an analogy to networked printers. Anybody with
access can print through cupsd or some networked-printing-server. Most
printers out on the market now have a built-in web server and know
about connecting to different types of networks, like SAMBA. So, if
you're going to implement a networked thermocycler (i.e., it's going
to have an IP address and all), consider looking into printer server
architectures for networked devices. Also, for investigating router
software, see OpenWRT.
"OpenWrt is described as a Linux distribution for embedded devices.
Instead of trying to create a single, static firmware, OpenWrt
provides a fully writable filesystem with package management. This
frees you from the application selection and configuration provided by
the vendor and allows you to customize the device through the use of
packages to suit any application. For developer, OpenWrt is the
framework to build an application without having to build a complete
firmware around it; for users this means the ability for full
customization, to use the device in ways never envisioned."
Oh. Hm. I always thought OpenWRT was for reflashing netgear firmware
for ethernet/wireless routers. Guess it has other uses.
Using a microcontroller that supports OpenWRT will increase costs
enormously and open up security holes. I don't want an operating
system on my thermocycler.
I have some ideas about writing a dedicated interface (that will speak
a very limited subset of HTTP, and spit out HTML) on top of uIP
(http://www.sics.se/~adam/uip/index.php/Main_Page) that I'll have to
elaborate on later, since I need to help Len pack for CodeCon, but
don't let me forget this.
I'd like you to show me that. Genuinely, I would, I would love to see
that in action.
> recipes in some ascii-but-otherwise-bizarre format is probably going
> to progress slowly. Biologists become biologists usually because they
> shy away from technology, so even asking these end user to learn HTML
Nobody is asking them to learn HTML. I've talked to you about writing
fancy frontends and wizards, or letting them talk to 'package
maintainers' who know the super secret ninja arts (or whatever we're
> is out of their focus. Throw computation power at the problem rather
> than human effort at the problem. Let the biologists remain experts
> in biology.
Sorry, but you can't just assume a computer is magic.
> In the cooking world, there are/were lexical analyzers to convert
> "plain text cooking recipes" into computer readable interchange format
Please show me. The only reference on this I can find was from a book
published in 1985 called Computational Recipes where a fellow named
David came up with a polish or posix-style notation representation of
cooking. It turns out he had a consulting business in industrial
kitchen automation, or something. But now he doesn't seem to be around
on the internet.
> for data storage and sharing. The "plain text recipe format" was
> pretty loose on the whitespace acceptance, measurement style, verb/
> noun placement, etc. Biologists can write english with specific
> vocabulary and it can be computer readable. They should do more of
> that, in a regularly-formatted way.
I don't see how this is different from using particular interfaces to
validate their grammars and so on. What's the big deal?
> I haven't seen that many biology protocols of course, other than
> what's on OWW and http://www.protocol-online.org . I find the
> pcr.xml to be completely unreadable and unwrittable compared to what's
Yeah, that's not the human readable output. I would write a generator
that would take that information and generate instructions, plus
information on each of the pieces of equipment if necessary. Think of
it as a verbosity flag on some shell program (ok, then imagine
whatever GUIs you like on top of that). Btw, that's why I was
originally suggesting YAML for the representation of recipes- it would
be much more easy to read and even human-writable, but at the same
time it's an equivalent format for object serialization for python,
perl, and whatever other languages have YAML-implementations.
> on the wikis. The following PCR example has a good visual format and
> is likely very parsable by a laptop into "machine-code":
While it looks good, I've seen some pretty terrible things done with
that format. The other day I was reading a protocol that asked me to
then apply "2 volumes EtOh". wtf?
> Or has lexical analyzer been done in bio already?
Dunno, but I'd like to hear about it.
1. The idea of computer-readable protocols. The benefits of this are manifold.
2. The fact that aiming for computer-readable protocols does not mean,
in any way, that we have to write them in raw <inscrutable and arcane,
however logical it may be, computer format here>. We should be able
to specify them in a reasonably intuitive manner. There are plenty of
graphical metaphors for setting out a list of instructions that talk
about well-described operations with well-formed units; imperative
flow charts, declarative data flow "patches," dependency graphs, etc.
I call it a graphical programming language, but call it whatever you
like; in my opinion, protocols and workflows should look like this, if
not more beautiful:
Software could also optimize *your* time - if it knows when you have
to be attending a protocol, and when you can "just let it simmer,"
then you could tell your computer all the experiments you'd love to
do, and it could give you a schedule of what to do this week,
accounting for how many pieces of equipment you have, what products of
which protocol feed into others, who else is using your lab, and when
you'd like to take a lunch break.
Thanks goodness we're living in the future! and all this is possible,
It's interesting though that this hasn't actually happened yet though.
Why is it that we can count the number of examples with our fingers? I
mean, to me, this seems fairly obvious, intuitive, even easy. But on
the other hand, it just doesn't really exist at the moment. I do know
however that there has been significant push to get bioinformatics
databases in better states- there was some letter circulating around
in the bioinformatics journals about standards, or something. But I
don't remember anything in that 'open community letter' about protocol
> 2. The fact that aiming for computer-readable protocols does not mean,
> in any way, that we have to write them in raw <inscrutable and arcane,
> however logical it may be, computer format here>. We should be able
That's right- we can have it so that computer-readable protocols are
human-readable, or can be transformed into human readable forms. Also,
the idea of parsing human readable text, into something that is a
structured computer form- kind of like metadata structure in OCR or
something. But in general I wouldn't trust this going over the
protocol-online.org dataset, which is not a structured dataset.
> I call it a graphical programming language, but call it whatever you
> like; in my opinion, protocols and workflows should look like this, if
> not more beautiful:
I do some work in a lab that generates graphs through an open source
program called 'graphviz'. Essentially, what we do is convert
functional structure diagrams to these sorts of visual graphs. They
are not representations of programming grammars, though.
So, protocols could be converted to graphical visualizations, but I
don't know if you're actually talking about something like 'turtle',
the graphical programming language, or scratch.mit.edu, or something.
> Software could also optimize *your* time - if it knows when you have
> to be attending a protocol, and when you can "just let it simmer,"
> then you could tell your computer all the experiments you'd love to
> do, and it could give you a schedule of what to do this week,
> accounting for how many pieces of equipment you have, what products of
> which protocol feed into others, who else is using your lab, and when
> you'd like to take a lunch break.
Yes. Schedule optimization, like when you should do what, is also a
problem that computers should solve. When I first entered university,
I spent a few hours one day with some sticky notes and a calendar
trying to get a course schedule optimized. After a while I just sat
there and figured that this is a problem that my computer should be
solving- so I wrote a schedule optimizer, a combinatorial constraint
solver sort of, except it's not generic and it's highly constrained to
that particular problem space. I think problems like these that occur
in the lab (like what to do first, or how to organize a project or
something) could be done with computational methods, very easily, or
"saved" results could be used again if you like some particular
organizational scheme to conducting a certain type of protocol, or
> Thanks goodness we're living in the future! and all this is possible,
Bryan, can you give a brief description of the machine language/UML thing you're talking about? (without me looking at that old thread right now)
Why not just build a spec sheet for it, maybe this format you mention is just that, but really it could all be in a text file, no need for CAD drawings, etc... just give the dimensions of everything, a list of the hardware and electronics, the wiring scheme, and links/examples to/of software that would properly address each hardware interface.
Perhaps it's that, at least today, one actually does have to write all
this XML if we want a computer-readable protocol. The lack of a
usable design tool means that you have to care about this in order to
write protocols in a computer-readable way. If a piece of software
existed that (1) made it easy to write protocols and (2) made it easy
to read/share/follow/measure/track/etc. protocols (i.e. a value add
above transcribing), and it was made known, I think we might see more
of these documents pop up.
An alternate question is: imagine a universe with no CAD software, and
asking "why is the only DXF file the one written by the DXF spec
> So, protocols could be converted to graphical visualizations, but I
> don't know if you're actually talking about something like 'turtle',
> the graphical programming language, or scratch.mit.edu, or something.
Yes, I mean a graphical design tool that hides the "source code" from
the protocol expert (biologist, chemist, ...) and lets them think in
terms of their domain. Certainly, the same tool could read a
computer-readable protocol file and produce a flowchart, or English
instructions (or German or Mandarin, for that matter), or a daily
> When I first entered university,
> I spent a few hours one day with some sticky notes and a calendar
> trying to get a course schedule optimized...
Ha! I got a kick out of that story :) Luckily, some clever
programmers at my school built a scheduling tool that scraped course
day/times from the university course registration server and made this
a super simple process, so I chose to not have the pleasure of writing
Anyhow, glad to see the discussion on computer-readable protocols.
Back to hacking hardware so this isn't all daydreaming, I suppose...
Oh, maybe we were. Yeah, so computer engineering projects are still
within the domain. I know this isn't what you were talking about, but
if you haven't seen the packaging that the opencores people are doing,
you should go take a look- that's computer engineering at its finest.
But just for electronics and stuff, ok- thanks for the clarification.
> By a list of hardware and electronics, I mean just that, I think we should
> develop a simple list of parts and their technical identification specifics.
> Just have a list of the needed devices, with manufacturer, part ID, maybe a
> resaler's web link, and a quick description of the part. Then throw in a
Well, that's what I'm trying to move away from. A thousand links of a
thousand different parts is a step backwards- I want to unify that
interface so that I don't have to hunt down all of the parts all the
time. This is somewhat the idea of octoparts.com, although it kind of
fails to work. But yeah, that's the basic idea, that there has to be a
way to actually implement the kits or projects and packages and so on-
either with parts that *you* make, or parts that you buy or somehow
happen to have in your inventory, and the various options for buying
them, just like you have various options for which mirror you want to
download stuff from when you use sourceforge or various
high-visibility projects that need that sort of infrastructure.
> pinout schematic with parts labelled, etc, and maybe a pre-arranged PCB
> board file too.When I asked about software device linking, I meant the
IIRC, there are a few standard formats for PCB file representation. I
think gEDA had some free tools for this somewhere.
> actually microcontroller code, for the whatever controller is in the final
Microcontroller code stuffs have already hit the repositories, I seem
to recall an arduino-specific code repository, another one about
buglabs, and I'm sure I've seen microcontroller stuff in the debian
repositories that I'm wired up to.
> design... such as binary modulating of a relay that controls the peltier
> junction, or PWM on the peltier, as controlled by some thermocouple,
> thermistor, or digital two-wire temperature sensor. Time Clock, USB I/O,
> maybe ethernet as well, or possibly just two-wired to an optional netmedia
> siteplayer chip package for web functionality.
Yep, abstracted functions that any thermocycler should implement.
> Work on the basic piece of equipment, get it, then program OCR software and
> crazy algorithms to translate laboratory experiment protocols. All PCR is
Well, I don't think that we should have to do OCR :-) that's the hard
crazy stuff. OCR, human language analysis stuff, that's what we should
be avoiding. Unless we happen to have somebody who is competent in
that area hanging around here and willing to speak up. :-)
> concerned about is temperatures and times, if we have a solid base piece of
> hardware, it should be no problem to easily implement the functionality you
1) Making it easy to write protocols. So, the one thing that I was
previously working on was a front-end wizard-guide-style program that
would help people as they are writing a protocol. This would be the
hand-holding version. It would work just like the typical
ask-a-question-then-let's-go-to-that-datatype-input software that you
see in introductory programming books (only because I don't have a
better idea). This would also work for asking the programmer whether
or not the information extracted from a parsed file is correct or not-
but I really don't know how to write English language parsers that are
that powerful. :-( I guess I have some reading to do. Any suggestions?
Another way to write protocols would be something like a graphical
point-and-click thing, sure, but really I'm more interested in the
underlying structure, because I'm not too fast with a mouse, whereas
with my keyboard, my fingers fly.
2) Make it easy to read, share, follow, measure, track protocols. So,
I think reading, sharing and following the protocols is possible. This
is the idea of a web repository frontend, which is this tinky
ikiwiki+git frontend for the YAML database stuff of metadata for
different open source hardware projects, or different protocol
implementations (i.e., whether it's a PCR machine, thermocycler, or
just a symphony of tubes being orchestrated by human manual labor).
But this comes a bit later after some of the basics are set up.
Preserving both the ability to download the information but also share
it would involve something like hashes or something, which is
irrelevant to everybody except someone interested in the backend
'magic' of it all.
> An alternate question is: imagine a universe with no CAD software, and
> asking "why is the only DXF file the one written by the DXF spec
It's the chicken-and-egg problem: as soon as you have enough DXF
files, some tool will be written to help manage the files. Of course,
nobody wants to manage the files until that tool is written.
>> When I first entered university,
>> I spent a few hours one day with some sticky notes and a calendar
>> trying to get a course schedule optimized...
> Ha! I got a kick out of that story :) Luckily, some clever
> programmers at my school built a scheduling tool that scraped course
> day/times from the university course registration server and made this
> a super simple process, so I chose to not have the pleasure of writing
> that myself.
Yep, that's exactly what I did. Web scraping and everything w/ perl's
WWW::Mechanize. Good times. :-/
A thousand links of a
thousand different parts is a step backwards- I want to unify that
interface so that I don't have to hunt down all of the parts all the
I think you will know the pains I'm talking about when I tell you to
get some item named 'dfuo3ur0892409814' and you don't actually know
what it is or where to acquire one .. these problems intensify as you
get deeper and deeper into obscure machines. Luckily, machines with
something like 15 parts, like a thermocycler, aren't going to be like
that, but in general- this is infrastructure. I think fenn wrote about
this problem the other day pretty well too:
> That makes sense. Labs therefore need to replicate to meet demand as it is
> made practical.
Labs need to focus on automation so that they aren't subjecting their
members to excessive opportunity costs by making everything from scratch.
If you spend 1000 man-hours building the equivalent of a $1000 machine,
have you gained or lost? What about the second or third time you do it?
> For those on this list knowledgeable of working with metals and
> refrigeration. Is fabbing a Fridge from scratch feasible?
Depends what you mean by "scratch" - I'd define scratch as the most
abundant minerals in earth's crust, water, sunlight, and air. Given that,
your "refrigerator" would probably end up looking something like a fat
tree. Not a bad design, all things considered, but our technology simply
isn't there yet.
Now if you take "scratch" as "anything you can buy at a Home Depot then
the situation changes, but is that really what you mean?
> What are the steps and how long would it take with optimal tools and
Like a refrigerator factory?
The basic components of a conventional refrigerator are:
- refrigerant, a chemical that has a high heat of vaporization and
boiling point near or below the desired temperature; propane or
ammonia are easy to come by. for this you'd need feedstocks,
distillation apparatus, analysis instruments, containment tanks
- compressor motor, usually an electric motor permanently sealed inside
the refrigerant plumbing. requirements: wire rolling and drawing,
brushes or steel laminations, bearings, switches
- compressor pump, requires high precision seals if the motor is not
enclosed in the system. otherwise it's just a relatively complicated
precision mechanical device with multiple bearings and sliding seals..
- heat exchangers, just a long length of copper or aluminum
tubing with fins soldered on. these would have to be made in some
seamless process, probably with a floating mandrel, which is tricky:
- circulation fan. same as for compressor motor, with some bent sheet
- defrost heater, a quartz tube with some nichrome heating wire inside.
no idea how to make these.
- thermostat, usually a long strip of two dissimilar metals bonded together.
supposedly they can simply be riveted together, but i always seem them
as a continuous smooth strip
- insulation, often polyurethane foam, but fiberglass and styrofoam are
almost as good. is there an organic chemist in the house?
- shelving, drawers, handles, exterior case, and frame. easy enough if
you know how to make sheet metal.
so based on the level of technology infrastructure required to make each
one of those, i'd say it's not feasible to make from pure chemicals. It's
a lot more attractive idea if you can buy or trade for semi-finished
stock, that is to say, tubing of consistent diameter, sheet metal, wire.
(but if you're doing that, why not just buy a compressor motor? why not
buy or scrounge a fridge?)
you can make a decent produce extender by putting one unglazed ceramic pot
inside another and filling the space between with water.
this isn't even getting into alternative designs like thermoacoustics or
hilsch tubes, which could simplify the technology tree at a cost in
performance and efficiency.
see why we need SKDB?
> I Dunno, maybe I am not understanding something, or maybe I am just more
> confortable with a regular straighforward parts list. Buy these parts,
> solder like such, place in enclosure, connect power supply and data cables.
Overall that will be the end result, yes. Think of this as infrastructure.
I'm vageuly familiar with flex, bison and yacc. So, BNF grammars and
such. Okay. When I look at these grammars, they are mostly used to
define the syntax and grammar for programming languages, which is then
used as input to a compiler-compiler. Things tend to crash when they
come on to syntax errors (as this poorly formatted, unstandardized
text tends to be) and other problems. It would be absolutely awesome
if we're somehow able to convert all of the information into a
computer-readable format via a flex-based grammar, plus marking of
which files were incorrectly interpreted or parsed, plus some
non-negligible number of protocols which are correctly converted into
(something). Do you have any references on either software that does
this, or something from the literature demonstrating this on large,
poorly structured datasets? Worse than fuzzy.
> (or for fun, find an implementation of "adventure shell")
You mean like the old infocom adventure game stuff?
> It could be I'm simplifying biology protocols. So far I'm convinced
> that a lab robot could run straight from protocol-online.org
I have seen so many, many problems with the protocols there. I'll make
a list of just some stuff I've noticed, stuff that would cause a
* out of order sections
* different formatting for the representation of different sections
* grammatical references to previous steps
* references to previous steps but using different terminology not
explicitly defined anywhere else in the same document (i.e., "after
the digestion step" but it didn't tell you that four steps ago you
were using a restriction enzyme to digest a DNA strand)
* ambiguity on what constitutes a 'step' in the process versus when
it's a 'note' or just an added piece of information.
* "magic human touch" protocols- we've all seen these- where there's
something "magical" that some poor graduate student had to master by
hand, or something (so this wouldn't be a good protocol to waste your
time on anyway)
* special information in diagrams that can't easily be extracted into
the "flow of text" for proper parsing
* mangled information- sometimes the weird representation of the
protocol that was chosen was because the traditional format doesn't
apply for some reason (structure is function)
Now, all of those (or almost all of those) are problems that can be
solved by tiny hacks to a parser. But I really don't see the benefit-
especially because of the large variation of the different ways of
representing the protocols that is used across protocol-online.org. In
some cases there are subsets of the dataset that are formatted in the
same way, and I think it would be useful to be able to convert those
subsets (but I haven't found them yet) that follow the same "grammar"
(even though we don't know what grammar that actually is (hm a tough
one)). Those subsets would make good starting spots for converting
protocols or running the parser on. Could you help convince me how an
amazing super grammar can be written with flex to solve this problem?
> "directions". The difficult part of bio is "knowing what to try when
> it doesn't work" (hard AI), not actually performing the individual
> steps (easy AI).
Right, if you get to assume hard ai then yes, everything becomes
magically easier- of course. I spent yesterday talking with for a few
hours with somebody who wants to do AGI research- so my thoughts on
this are kind of all fuzzy and I have no idea if you're talking about
ai in the sense that AGI people talk about ai, or the Peter Norvig
sense of fancy tricks with SVMs and machine "learning" algorithms
(which aren't actually about the biological analogs of learning (which
isn't a bad thing, it's just hard to distinguish them for an untrained
> - there are not that many verbs, and the usage is very precise and
> consistent (yay, science).
Not really. Sometimes you even see domain-specific terminology or verbiage.
> - there are not that many nouns (any that don't match can be flagged
> to be added to the dictionary)
> - the measurements are in consistent units
Sometimes. My favorite example is "2 volumes EtOh"- I think I saw this
> - steps are listed as individual bullet or number points
Not always. Sometimesyougetblocksofparagraphsandmangledtextlikethisandyouneedtobeanexpertatenglish.
> = the lexical space is relatively small.
It seems to vary widely to me. Can you show me a subset that has a
small lexical space?
> Writing lexical analyzers can be really quite fun, you might enjoy it
> a lot. Long ago I wrote a couple talking automatons (s/w robots) that
> would chat each other up (as well as any other chat users) in a chat
> system based on an invented dictionary of dialogue with verb-noun
> parsing, in a language that a friend of mine invented as a combination
> of perl and C (it was basically C with strings built in), which was
> compiled in his compiler written using lex. I guess there's not too
> many people around anymore who know lex or it's precursors.
Back in the day I was doing something like that, except I was using
chatbots that were built around regular expression matching and
regular expression manipulation. I thought I was the king after that
:-). They weren't very bright, IMHO. ;-)
> At least it's worth looking into the complexity of this as a solution
> before trying to shoehorn bio into any-ML at the user-entry level.
> *ML is a computer-to-computer interchange format and should ideally
> never touch the screen or come from a user's keyboard (the exception
That's fine, we can make it hidden under the scene- either through
wizards or graphical programming interfaces (a recommendation from
this thread) or other possibilities. Another possible wizard would be
one that helps interpret a particular file and asks which parts are
related to which parts and thus how to generate the grammar that would
be put into flex- i.e., it would ask which characteristics are related
to a new step, and which actions are referencing different procedures,
and whether or not those procedures match any particular open source
hardware package (i.e., a step that says "do PCR here using
thermocycler" (I know, this is a bad way of saying it) and then an
option of looking through the local inventory for compatible
machines). Maybe, I'd believe this working- now, how would you go
about asking the user how to parse the text that they pasted? There
would be a few common different blocks that we're expecting, and then
each block has some delineation method- if you've ever imported a
dataset into OpenOffice Calc or any other spreadsheet application,
there tends to be a mechanism by which the program distinguishes the
symbol of delineation between data points in a data set, which is
pretty nifty- something like that could be used here, and then applied
to a regular expression (or lexical parser generator thingy),
especially with regexp specialties like $ and ^ and so on.
> being the developers themselves). Especially ML should never reach
> the biologists.. they're timid & scare easily. ;-D
did you just insult biologists??
I don't know how you're measuring whether or not something has to do
something about designing a thermocycler, but CAD is still an
important aspect of engineering. It allows for the machine to be more
repeatable and more describable. It means it's not just some hack, and
instead it's something that has a specific set of specifications,
transmitted for engineering- for building- for making stuff work.
> I think it's great what Bryan is trying to do, and I'm sure that is
> the future direction of science. But who is going to actually order
> the parts? Who is going to actually solder them onto a breadboard?
> Who is going to program the software? Who is going to input the
> program? Who is going to decide what DNA to PCR up? Who is going to
> load the tubes? Who is going to start the cycle? Who is going to
> remove the tubes? Who is going to clean up the rxn mix?
> The answer to all of those questions is: Me! A computer isn't going
> to do any of that for me. Now it sure would be nice if all we had to
How do you know that a computer isn't going to do any of that? Are you
a programmer? Have you seen websites that already order parts, have
you seen videos that teach you how to solder? Have you seen machines
that preconfigure other machines for lab experiments? It just sounds
like you've not considered any of ithis.
> do was come up with an idea like "Make a glow-in-the-dark worm so that
> fish will bite on it better," and encode it into an XML file and a
Do fish really bite better on glow-in-the-dark-worms? Since, you know,
they are glowing, and worms don't normally /do/ that.
> computer would figure out all the lab procedures, order the cDNA and
By the way- your computer already orders cDNA and reagents, you're
just the one clicking the browser buttons. Just imagine doing that
without clicking- it's a very simple computer trick.
> reagents, design a thermocycler to the required specs, order the parts
Ever order anything over the web from Amazon? etc.
> for it, walk you through the construction, program it, help you load
Ever read instructons for a lego kit?
> it, and so forth. But the reality is that it's not going to do any of
How do you know?
> So I don't see the actual value of any of this talk about putting
> things into certain machine-readable formats. Maybe I'm missing
> something, if so please try to educate me. What I'd like to know is
> what this computerization of everything is actually going to do to
> help the project?
In truth, I'm not really asking anybody to do anything new- I'm the
one who has been putting the work into this stuff that you think is
impossible (I guess it's all the more awesome if I make something work
that you otherwise consider impossible). But what I do want to see
happen is CAD files of the designs, and so on, and Jonathan Cline's
grammar parser if he's working on that, because that's really really
exciting and almost worth its weight in gold, or however you say that.
> It would be great if someone would take the work and encode it in a
> useful format. But how is it actually going to help get this thing
> designed and built? The only way I know how to do things is to look
CAD files can be used to help build things by telling other machines
where to cut or how to remove metal. And in the case of additive
manufacturing, there are other ways to build plastic components (goo
squirters, anyone?). And of course ultimately there's the classic go
over to the bench and saw some wood or something, yeah.
> at what other's have done, borrow and modify their designs,
How do you modify their designs if they are just JPEGs? At that point
you're reconstructing a lot of previous effort, and you're wasting
> collaborate with the electronics people and programmers, come up with
> a prototype design, and build it. Plenty of software tools may help
> along the way, but nothing I know of is going to do ANY of the actual
That's why I'm building it. We're making it happen. Thank goodness we
live in the future.
> Someone mentioned a target cost... I think it should just be as cheap
> as possible. We allready know the basics of what it has to do. And
> there are quite a few ideas for features that won't add hardly
> anything to the cost.
When other projects have some expensive component, I most often see
some alternative component that could be switched in if you want to be
ridiculously cheap about the whole thing. So something like that might
be appropriate here, especially if USB/ethernet/SD turns out to be the
price inflation culprit.
> When it comes to USB vs serial vs ethernet vs wireless we need to keep
> the point of the project in mind and use the cheapest possible way.
> I've built a serial interfaced microcontroller board before, so I know
> that's one way that will work just fine. When the group looked at USB
> we decided it was too much more money and too much design time to
> justify. That was quite awhile ago, so maybe USB is cheaper nowdays.
> One thing we looked at was that we'd have to do extra programming to
> use the USB interface rather than just debugging in hyperterminal and
I'm still confused about this. Do you guys really use hyperterminal?
What operating system are you using??
> A couple posts were about integrating a webserver. I don't see how
> that would be of any use in a thermocycler. You put in your tubes and
> start the program. When it's done you take out your tubes and procede
> to your next step. It's not something you need to control over the
> web. There's really nothing you can do with it on the web, if there's
> no tubes in it there is no point to start it. And when you put your
> tubes in you start it right away. I can't think of a situation where
> you'd load it up with the intention of starting it from a different
> location at a later time. If anyone needs something like that I'm
> sure a simple software timer would work just fine.
Streaming data over the web, repairing the software over the web,
diagnostics- what if I have to fix diybio NYC's thermocycler and I am
all the way over here in Austin? There are other advantages, have you
ever used a print server?
> As was mentioned there's also security concerns. And it could be just
> as bad as stopping a pacemaker. Suppose someone decides to log into
> my thermocycler at 3am and turn it up to full heat? They do get
> pretty hot (at least boiling temps). There's every chance that it
> could start a fire. I could easily see it melting down and catching
> fire if it were run long enough. It probably wouldn't be hard to at
> least ruin the thermocycler.
There are security measures that we could implement to prevent that.
However, I agree that an external interface- to the outside world- is
probably a bad idea. However, having a presence on the internal
network is still game, IMHO.
> So if it had some sort of network connection we'd have to worry about
> all sorts of security measures and failsafes. That goes against the
> idea of cheap. As for the slickness of a web interface... that would
Nah, security software is free.
> be nice, but I'm sure someone will make a nice java or vb program to
> operate it if we come up with the hardware.
VB? What operating system are you using??
>> Also, I don't think the goal should be to construct a device
>> for as cheap as possible.
> I do think we should focus on this. If we make a really cheap device
> it will be available to more people. We also might attract people
> that otherwise would just buy a commercial unit. If our device is
> half the price of a commercial unit some small labs might just decide
> to go with the commercial unit. But if ours is 1/10th the price of a
> crummy commercial unit and it has a lot more features I think a lot of
> people would almost have to at least give the open thermocycler a try.
I agree that the cost should be something like 1/10th the cost of a
commercial system, but that's different than saying "as cheap as possible"
and driving every design decision by cost alone. That's where you get into
sacrificing features and usability to make the system 1/10.5th the cost,
which isn't necessarily worth it. That's why I said design something that
the greatest number of people will use, which certainly reflects cost.
Actually a better way to state it might be "Design something a greatest
number of current non-thermocycler users will use" (i.e. Greatest number of
>> Anyone looking to do PCR as cheap as possible can already do so with 3 water
> I'm hoping that we can make this device cheaper (and with better temp
> control) than the cost of 3 water baths. Now I know you can just use
> a burner and a pot of water, but I don't think you can really get the
> level of temp control required to really do a good job, and
> thermostatically controlled waterbaths aren't cheap.
You can make a pretty cheap water bath with a fish tank heater or two.
> I'm not sure what features would cost all that much. I'm not
> suggesting hacking it together with duct tape and chewing gum. "As
> cheap as possible" to me means as cheap as you can do it while making
> it at least as good as a commercial unit.
I agree that making something at this price point should be doable. I was
referring more to marginal costs like serial/usb. To mean "As cheap as
possible" is different than "As cheap as possible while making it at least
as good as a commercial unit" - but the later would be a perfect design
> I think we need to figure out how many tubes people need to run. We
> should also check out what size p-junctions are cheap and available
> off the shelf. Then that will give us our optimal block size for
> cost, then hopefully we can reconcile the two numbers.
I usually run about 12 max, what do others do? Has the discussion of a
heated vs non-heated lid come up?
How about this? A program that will automatically grab the data, save
it to multiple open standards file formats (I'm not too interested in
Excel, but if you want to use Excel after an OpenOffice step or
something, that's cool), and then some automatic graphing with
gnuplot. Heck, it's made for this. I don't even mention this stuff
normally because it just seems so ridiculously simple to me ..
dragging-and-dropping into stuff like excel just feels archaic and a
step backwards. Labware can get away with really really old stuff
because the people that use the equipment don't necessarily know of
how better things are on the outside world.
> I'm sure that someone knows how to do this better than me. However I
> think it can be done how I mentioned and that might be a good starting
> point to build upon.
> As for the rest of your post... Show me the money! I can't really
> spend a bunch of time learning computer formats and CAD file standards
> unless it helps me right now. I think it's great if you want to
> compile the specs and whatnot into CAD files and XML and whathaveyou.
> I just have no idea how to start or what it would accomplish.
Here's how you start. You open up a CAD app, like HeeksCAD, and start
drawing your machine. No more bloody JPEGs. Then go to file->save.
Then upload the file to some publicly accessible place. There are
tutorials on this that I'll assemble in a future email for
CAD-related-links and tutorials and such.
> I think it's great and reasonable to use solidworks or some cad
> program to design and model the block. I just don't know about
> getting it made like this. When I want a block for my system there's
> about a 95% chance that I'm going to pull one out of another
> thermycycler or the like.
If you're making a hacked thermocycler made up of other parts, then
yeah I don't expect you to magically have a CAD file of the
components, but it would be nice. That's like a first-year reverse
engineering project that they don't teach you at a university anymore
(it's one of those introductory projects that people just end up doing
anyway, sort of thing). I don't know what you mean when you say you
don't know how to get it made "like this".
> Failing that I'd take it to my local machine shop with drawings and
> specs and measurements and actual tubes to make sure they fit snug.
Those drawings can be generated from 3D CAD programs, by the way.
(Are you sometimes meaning to say spectrophotometer instead of specifications?)
> But my local machine shop guy wears greasy coveralls and says stuff
> like "Get er dun!". If I start talking about XML files he's just
> going to tell me that it's too fancy and either charge me a buttload
> of cash (and throw the XML files in the trash) or tell me that he
> doesn't do that sort of thing.
Maybe you need better machinists? But seriously, I don't understand
how you think that XML can't be converted into other formats. I also
have not specifically said anything about an XML format for CAD files.
I think you're making this harder than it really is.. XML for
protocols is one thing, but I didn't mention XML for CAD.
> So I see a bit of disconnect between the real world and the idea of
> all this computerization.
I think you're the disconnect at this point. :-/ Sorry man.
I program Arduinos all the time, and it's not that big of a jump from
the Arduino to a plain old AVR. For that matter, the Arduino
bootloader can be installed on an AVR so that new firmware can be
developed and uploaded easily.
I've also worked with Freescale uCs, which are really straightforward.
All this talk about hyperterminal and serial cables is making me hold
my head and groan. Seriously, fuck hyperterminal. What's the point? We
can easily add an RJ-45 socket to the *board*, throw a small TCP/IP
stack onto the microcontroller, set the microcontroller up to receive
commands over the ethernet cable in HTTP, and have it return its
output in XHTML -- ie, XML that is also valid HTML.
This means that, as Tito suggested, we can have a "Web 2.0
thermocycler". Fuck shitty pushbutton interfaces on the thermocycler
itself. I want to be able to open up a web browser, point it at my
thermocycler's address on my home network, paste the sequence that I'm
amplifying into a form on that webpage, paste in the sequences for the
primers I'm using, and have the *thermocycler itself* compute the
exact right temperature and time settings for this particular PCR run,
and then click "Start". I also want to be able to watch realtime
updates of the data coming back from the thermocycler, in my browser,
and save it off to disk whenever I want.
And we *can* do this. I already know how and I'm sure Bryan and other
programmers on this list have an idea of how to do it.
This is why Tito is asking for User Stories of "what happens when I
use a thermocycler". Jake, you don't *need* to know CAD formats or how
to program in order for this project to work -- the programmers are
champing at the bit to make this happen. We want to make it so that
using a thermocycler is as easy as using Facebook.
Microcontrollers that natively support USB are not appreciably more
expensive than microcontrollers that don't. It's literally a matter of
a few cents to a few dollars. I would be shocked if the difference
were more than $5.