A few months ago I wrote a quick email about recipe representation on
the openmanufacturing list. In particular, I think it would be useful
if we could convert protocol-online.org into a repository of XML
descriptions of lab protocols. This way, we can have lists of what
tools, instruments, renewable materials and one-time use materials we
need for each experiment or procedure, as well as the relevant list of
procedures or steps.
So, here's the paper I've been working off of-
Definition of an XML Markup Language for Clinical Laboratory
Procedures and Comparison with Generic XML Markup
Background: Clinical laboratory procedure manuals are typically
maintained as word processor files and are inefficient to store and
search, require substantial effort for review and updating, and
integrate poorly with other laboratory information. Electronic
document management systems could improve procedure management and
utility. As a first step toward building such systems, we have
developed a prototype electronic format for laboratory procedures
using Extensible Markup Language (XML).
Methods: Representative laboratory procedures were analyzed to
identify document structure and data elements. This information was
used to create a markup vocabulary, CLP-ML, expressed as an XML
Document Type Definition (DTD). To determine whether this markup
provided advantages over generic markup, we compared procedures
structured with CLP-ML or with the vocabulary of the Health Level
Seven, Inc. (HL7) Clinical Document Architecture (CDA) narrative
Results: CLP-ML includes 124 XML tags and supports a variety of
procedure types across different laboratory sections. When compared
with a general-purpose markup vocabulary (CDA narrative block), CLP-ML
documents were easier to edit and read, less complex structurally, and
simpler to traverse for searching and retrieval.
I made a quick, somewhat incomplete example of PCR in CLP-ML. Check it out:
Take a look at protocol-online:
The site is presently just a giant annotated linkfarm. Many of the
links have grown somewhat dead over the years- some of the pages are
accessilbe via the kind archivists at the Wayback Machine of the
Internet Archive, but that's not a good strategy. I think it would be
a better idea to save protocols in a standardized format, so that
amateurs (or professionals) could download them all at once, and then
check which ones they can perform, given their current inventory,
tools, etc. Wouldn't that be neat?
That said, CLP-ML has a few problems IMHO. When I was making the PCR
example, I would become spontaneously confused about what to call even
simple things like water- is it a material, reagent (since it's a
chemical, and a definition of a reagent is a chemical), or is it
something else entirely? Whereas, a simpler format that might just be
a YAML list of instruments, tools, renewable materials, non-renewable
reagents and non-renewable materials might be more simple to wrap your
Anyway, hope you guys find that useful. I am sure there are many
corrections that can be made to pcr.xml, since I know how often I was
tripping while writing it, flipping back to Jim Harrison's paper and
Something like this perhaps?
John, I need to run at the moment but wanted to mention that you're on
the right track (okay, and Mackenzie, but his email warrants a
different response from me). This all started for me a while back with
the idea of recipe representation, interoperability and compatibility.
For instance, what if we could represent recipes and then have
automation solutions to solve them?
And then if you *don't* have the automation hardware, the instructions
should be convertable into a human-readable format, so that somebody
can run around doing what a robot would otherwise do (like the EXPO
videos of the EXPO robot or "The Robot Scientist" in action:
Then, we can have sites like debian's APT repository, instructables,
thingiverse, ponoko, shapeways, protocol-online, etc., except with the
ability to send that data to CAM (computer-aided manufacturing) *or*
to humans in some English-like format. You can convert computer
readable stuff into things for both computers and humans, but you
can't (easily) convert human readable stuff (like instructables) into
computer-readable stuff. So that's one reason why this is all so
So, get back with me if you're interested in pursuing this further-
it's something I've been focusing on from the open manufacturing and
open source hardware side of things.
Okay, need to run.
Mac, where'd you dig EXACT up from? Anyway, yes, I agree that drafting
some microformats would be useful. And once we do that, there are a
few tools that I've been imagining since forever for these purposes,
such as a "wizard" tool to help construct and pick out protocols, for
properly documenting the machines/instruments and the workflow, in a
timely manner rather than forcing a user to learn a programming
However. I strongly encourage everyone to consider it okay for those
who encode protocols to be programmers, because then we can just sit
and listen to what others have to say about a certain protocol, and
then write it up properly (or do some reading of our own, but it's
really much easier when there's more than one person working on a
protocol at a time).
I would be very excited to see something related to SBML and also
related to these protocols, much like the other aspects of recipe
representation that I brought up before-
So, for all of you out there who are not programmers, please consider
what you find absolutely critical when reading a protocol, and how you
would organize it into different sections, steps and lists-- the
programmerfolk can handle it past that point :-). For instance, I'm
thinking that each protocol basically has some number of "dynamic"
inputs, or inputs that change each time the protocol is executed, and
then a number of static inputs, or tools and supplies that are used
each time and don't really change. And then there are those things
which are acted upon (reactants), and consequent products to deal
with. Among other things- anybody is welcome to help hash out these
groups of things. In the end, what this will turn into is a list of
machines, instruments and tools that you would need to execute a
protocol, and then a separate list of required reagents and other
chemicals, and then a separate list of procedures and steps, and so
on, which will then be combined into a single protocol.
What do you guys think?
That's interesting, and certainly possible. There are two issues to
consider. I've been working on shell program interoperability, via
analysis of input switches or parameters and what type of MIME-types
they want on the different parameters. "In theory", this will
eventually be turned into a program to use dot ggo files (gengetopt
input) with the apt system and the "open-file-with" dialog box on
linux installations to go find a program that will accept the input
(like: PDF, HTML, TXT, etc.)-- this is analogous to parametric
constraints on interoperability of mechanical components. Secondly,
another issue is that there are many linux programs that serve as an
example that would be analogous to a "combined protocol"- for example,
there are some programs that are very simple and get a single package,
like mpg123, an mp3-player for the shell. But on the other hand, there
are really big, nasty, complex programs like OpenFOAM (a CFD
simulation package) which are not broken down. So, because of this,
and because of my experience in biology labs, I suspect that some
"interconnected protocols" are going to have to be written by hand to
make a "single over-arching experiment"- which would be done once
(like how I have done pcr.xml and nobody else *has to* do it again
(although they should, because it sucks)).
I wonder though what you mean by having a program find a set of
protocols to walk you through in order to amount to a larger
experiment? Maybe my lab experience has been too minimal, is there
something that I am neglecting or forgetting? If that sort of thing
happens more often than I'm aware of, that would be pretty neat. :-)
> For example: how would I purify a 3kb insert from a plasmid carried by
> a colony I have growing on a plate?
Eek. That's a rather advanced example. I don't think you'd type that
question into the program :-) but rather maybe select what you have
and what you don't have from a list, or something, and /then/ a
program could search the protocols for those known inputs and outputs.
But that's a complicated example- in fact, pcr.xml might have been too
complicated for starters, it might have been a better idea for me to
just go with gel_electrophoresis.xml. :-)
> If it were possible to build a structured representation of laboratory
> operations, and we avoided getting bogged down in the semantics (is it
> a material? is it a reagent?), I could imagine such a system being
> used to:
One of the problems I had in a biology lab once was that the protocols
were hardly structured enough, so I wasn't entirely sure about
following the instructions. It wasn't terrible, but it also didn't
help, and I can't imagine the feeling is absent in those doing
protocols at home.
> - synthesize custom protocols optimized to produce a final quantity of
> a desired product
That's certainly doable. Stoich math, maybe.
> - optimize workflow by finding the minimum number of operations
> required to reach a desired product
In computer science, the problem is finding the longest path. I think
we can do shortest path easily enough, yes. It would be interesting to
see if we can markup our workflows or our protocols from old PhD
theses (or something) from the community, and then see whether or not
there was an easier way to prove a theory or execute an experiment to
elucidate the relationship between a few variables (a job more fit for
functional induction or symbolic regression, but that's another email
entirely I suspect).
> - keep better track of supplies in the lab (more real-time)
Absolutely. This is a high priority on my todo list. I'm also figuring
that a tool that would be able to compare "what I have" to "what I
need" would be nice. For instance, if this fictional tool was
currently operational, I'd feed in "pcr.xml" to it, and then it would
complain to me that I don't have a thermocycler in the inventory or
tool capabilities lists. I also bet this could help people keep track
of community labs so that they know when to order more supplies. If
everyone is required to use the computer system to get a protocol and
then to use the protocol to "check out" supplies and reagents, then
not only can they be given *exact* safety information for everything,
but the community lab's operations can be more thoroughly, more openly
tracked- but it's also great for collaboration etc. Win-win-win
> - form the basis of protocol walk-through educational tools.
> I like the idea of representing Protocols-Online.org in some
> machine-accessible way. Maybe you could take a shot at making a yaml
> version of the PCR CLP-ML document you linked to?
I'll get around to it eventually. Could somebody throw a shoe at me
every once in a while to remind me? :-/
> A few months ago I wrote a quick email about recipe representation on
> the openmanufacturing list. In particular, I think it would be useful
> if we could convert protocol-online.org into a repository of XML
> descriptions of lab protocols. This way, we can have lists of what
> tools, instruments, renewable materials and one-time use materials we
> need for each experiment or procedure, as well as the relevant list of
> procedures or steps.
There's also Good Laboratory Practice (GLP), under which work is done
according to defined Standard Operating Procedures (SOPs). The
protocols are written by and for people, but are nevertheless fairly
structured and defined. I think GLP tends to get used only for data
for which the regulators require it.