XML for lab protocols

Bryan Bishop

unread,

Feb 8, 2009, 10:47:12 PM2/8/09

to diy...@googlegroups.com, kan...@gmail.com

Hey all,

A few months ago I wrote a quick email about recipe representation on
the openmanufacturing list. In particular, I think it would be useful
if we could convert protocol-online.org into a repository of XML
descriptions of lab protocols. This way, we can have lists of what
tools, instruments, renewable materials and one-time use materials we
need for each experiment or procedure, as well as the relevant list of
procedures or steps.

http://groups.google.com/group/openmanufacturing/msg/1fc4fbbfd4a6fb23

So, here's the paper I've been working off of-

Definition of an XML Markup Language for Clinical Laboratory
Procedures and Comparison with Generic XML Markup
http://www.clinchem.org/cgi/content/abstract/52/10/1943
pdf: http://www.clinchem.org/cgi/reprint/52/10/1943
dtd: http://www.clinchem.org/content/vol0/issue2006/images/data/clinchem.2006.071449/DC1/clinchem.2006.071449-2.txt

"""
Background: Clinical laboratory procedure manuals are typically
maintained as word processor files and are inefficient to store and
search, require substantial effort for review and updating, and
integrate poorly with other laboratory information. Electronic
document management systems could improve procedure management and
utility. As a first step toward building such systems, we have
developed a prototype electronic format for laboratory procedures
using Extensible Markup Language (XML).

Methods: Representative laboratory procedures were analyzed to
identify document structure and data elements. This information was
used to create a markup vocabulary, CLP-ML, expressed as an XML
Document Type Definition (DTD). To determine whether this markup
provided advantages over generic markup, we compared procedures
structured with CLP-ML or with the vocabulary of the Health Level
Seven, Inc. (HL7) Clinical Document Architecture (CDA) narrative
block.

Results: CLP-ML includes 124 XML tags and supports a variety of
procedure types across different laboratory sections. When compared
with a general-purpose markup vocabulary (CDA narrative block), CLP-ML
documents were easier to edit and read, less complex structurally, and
simpler to traverse for searching and retrieval.
"""

I made a quick, somewhat incomplete example of PCR in CLP-ML. Check it out:
http://heybryan.org/~bbishop/docs/protocols/pcr.xml

Take a look at protocol-online:
http://protocol-online.org/

The site is presently just a giant annotated linkfarm. Many of the
links have grown somewhat dead over the years- some of the pages are
accessilbe via the kind archivists at the Wayback Machine of the
Internet Archive, but that's not a good strategy. I think it would be
a better idea to save protocols in a standardized format, so that
amateurs (or professionals) could download them all at once, and then
check which ones they can perform, given their current inventory,
tools, etc. Wouldn't that be neat?

That said, CLP-ML has a few problems IMHO. When I was making the PCR
example, I would become spontaneously confused about what to call even
simple things like water- is it a material, reagent (since it's a
chemical, and a definition of a reagent is a chemical), or is it
something else entirely? Whereas, a simpler format that might just be
a YAML list of instruments, tools, renewable materials, non-renewable
reagents and non-renewable materials might be more simple to wrap your
head around.

Anyway, hope you guys find that useful. I am sure there are many
corrections that can be made to pcr.xml, since I know how often I was
tripping while writing it, flipping back to Jim Harrison's paper and
such.

- Bryan
http://heybryan.org/
1 512 203 0507

Mackenzie Cowell

unread,

Feb 9, 2009, 1:07:23 PM2/9/09

to diy...@googlegroups.com

Bryan,

I think semantic representation of recipes / protocols is fascinating.
I could imagine using a tool to functionally define my starting
conditions and desired ending conditions and having it generate a
custom protocol for me by finding the set of protocols connected the
initial conditions to the ending conditions. Protocols are
essentially modular operations with defined inputs and outputs, so *in
principle* it would just be a matter of matching inputs and outputs
up.

For example: how would I purify a 3kb insert from a plasmid carried by
a colony I have growing on a plate?

If it were possible to build a structured representation of laboratory
operations, and we avoided getting bogged down in the semantics (is it
a material? is it a reagent?), I could imagine such a system being
used to:
- synthesize custom protocols optimized to produce a final quantity of
a desired product
- optimize workflow by finding the minimum number of operations
required to reach a desired product
- keep better track of supplies in the lab (more real-time)
- form the basis of protocol walk-through educational tools.

I like the idea of representing Protocols-Online.org in some
machine-accessible way. Maybe you could take a shot at making a yaml
version of the PCR CLP-ML document you linked to?

Mac

John Brohan

unread,

Feb 9, 2009, 1:39:05 PM2/9/09

to diy...@googlegroups.com

This is a very interesting idea.
I spent several years implementiing automatic chemical synthesis on a
Tecan 8-arm robot. In organic synthesis there are steps like, add a
reagent cool, stir, extract the upper layeer and so on. These steps
were implemented for a rack (~64) of 10ml test tubes, each containing
a known mass of known MW starting material.
As I see it this software that I worked on would be the execution
layer of the higher <xml> description of the steps to be executed to
do the purification or whatever. Maybe the <xml> would work for a
manual or single pipette machine to prove the method, the could then
be replicated on a more powerful robot.

Yours Sincerely
John

--
John Brohan National Instruments LabVIEW expert in Montreal
Traders Micro "We connect all sorts of things to computers"
http://www.woundfollowup.com
tel 514 995 3749. email jbr...@tradersmicro.com

Dan

unread,

Feb 9, 2009, 2:10:17 PM2/9/09

to diy...@googlegroups.com

On Mon, Feb 09, 2009 at 01:07:23PM -0500, Mackenzie Cowell wrote:
>
> I think semantic representation of recipes / protocols is fascinating.
> I could imagine using a tool to functionally define my starting
> conditions and desired ending conditions and having it generate a
> custom protocol for me by finding the set of protocols connected the
> initial conditions to the ending conditions. Protocols are
> essentially modular operations with defined inputs and outputs, so *in
> principle* it would just be a matter of matching inputs and outputs
> up.

Something like this perhaps?

http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/14/e464?etoc

regards,

Dan

--
|| Dan || dan[at]dreadportal.com || http://dreadportal.com/ ||
"Reality is that which, when you stop believing in it, doesn't go away."
(Philip K. Dick - How to Build a Universe)

Bryan Bishop

unread,

Feb 9, 2009, 2:33:46 PM2/9/09

to diy...@googlegroups.com, kan...@gmail.com

On Mon, Feb 9, 2009 at 12:39 PM, John Brohan <jbr...@gmail.com> wrote:
> This is a very interesting idea.
> I spent several years implementiing automatic chemical synthesis on a
> Tecan 8-arm robot. In organic synthesis there are steps like, add a
> reagent cool, stir, extract the upper layeer and so on. These steps
> were implemented for a rack (~64) of 10ml test tubes, each containing
> a known mass of known MW starting material.
> As I see it this software that I worked on would be the execution
> layer of the higher <xml> description of the steps to be executed to
> do the purification or whatever. Maybe the <xml> would work for a
> manual or single pipette machine to prove the method, the could then
> be replicated on a more powerful robot.

John, I need to run at the moment but wanted to mention that you're on
the right track (okay, and Mackenzie, but his email warrants a
different response from me). This all started for me a while back with
the idea of recipe representation, interoperability and compatibility.
For instance, what if we could represent recipes and then have
automation solutions to solve them?

http://groups.google.com/group/openmanufacturing/browse_thread/thread/a8d8ee245aaae97d/7a19a3b45e8f94d5?#7a19a3b45e8f94d5

And then if you *don't* have the automation hardware, the instructions
should be convertable into a human-readable format, so that somebody
can run around doing what a robot would otherwise do (like the EXPO
robot).

videos of the EXPO robot or "The Robot Scientist" in action:
http://www.aber.ac.uk/compsci/Research/bio/robotsci/video/

Then, we can have sites like debian's APT repository, instructables,
thingiverse, ponoko, shapeways, protocol-online, etc., except with the
ability to send that data to CAM (computer-aided manufacturing) *or*
to humans in some English-like format. You can convert computer
readable stuff into things for both computers and humans, but you
can't (easily) convert human readable stuff (like instructables) into
computer-readable stuff. So that's one reason why this is all so
important.

So, get back with me if you're interested in pursuing this further-
it's something I've been focusing on from the open manufacturing and
open source hardware side of things.

Okay, need to run.

Mackenzie Cowell

unread,

Feb 9, 2009, 3:25:49 PM2/9/09

to diy...@googlegroups.com

EXACT is interesting. But it is so much easier to write:

"inoculate 4ml of liquid YPAD or 10ml of SC and incubate with shaking
overnite at 30 deg C"

than the following EXACT protocol. I would guess that user adoption
(as opposed to anything technical) would be the main barrier to
success for something like EXACT. If only there were a nice webapp
like freebase.com + openwetware.org for building these:

Operating procedure: grow yeast culture
pre-condition: sealed yeast colonies plate located_in cold room
pre-condition: YPD media bottle located_in cold room
experiment action: move 12
object: YPD media bottle
start location: in store
end location: in laminar flow hood
experiment action: move 13
object: 500ml conical flask
start location: in store
end location: in laminar flow hood
experiment action: move 14
object: sealed yeast colonies plate
start location: in cold room
end location : in laminar flow hood
experiment action: add 15
component 1: YPD medium
volume: 50ml
start container: YPD media bottle
end container: 500ml conical flask
equipment: pipette
experiment action: rename 16
old name: 500ml conical flask
new name: YPD conical flask
experiment action: add 17
component 1: single yeast colony
volume: small volume
start container: sealed yeast single colonies plate
end container: YPD conical flask
equipment: inoculating loop
experiment action: rename 18
old name: YPD conical flask
new name: yeast culture flask
experiment action: move 19
object: yeast culture flask
start location: in laminar flow hood
end location: in incubator
experiment action: incubate 20
object: yeast culture flask
equipment: shaking incubator
rpm: 200
temp: 30C
time interval: 12–24h
goal: grow yeast until medium becomes cloudy
Post condition: yeast culture located_in incubator

efer...@gmail.com

unread,

Feb 9, 2009, 3:46:01 PM2/9/09

to diy...@googlegroups.com

This is very exciting if I had the web skills I would pull an all nighter weekend and make this happen...with web 2. graphic visualization too. I can see that web site goodness in my head. Sigh.
Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: Mackenzie Cowell <m...@diybio.org>

Date: Mon, 9 Feb 2009 15:25:49
To: <diy...@googlegroups.com>
Subject: Re: XML for lab protocols

Mackenzie Cowell

unread,

Feb 9, 2009, 3:59:44 PM2/9/09

to diy...@googlegroups.com

Another approach would be to begin structuring existing protocols on
openwetware with microformats we draft for that purpose. Or at least
marking up elements of the protocol with machine-readable xml while
presenting the plaintext to regular browsers. We could start by
tagging all the materials.

mac

Bryan Bishop

unread,

Feb 9, 2009, 6:37:59 PM2/9/09

to diy...@googlegroups.com, kan...@gmail.com

On Mon, Feb 9, 2009 at 2:59 PM, Mackenzie Cowell <m...@diybio.org> wrote:
> Another approach would be to begin structuring existing protocols on
> openwetware with microformats we draft for that purpose. Or at least
> marking up elements of the protocol with machine-readable xml while
> presenting the plaintext to regular browsers. We could start by
> tagging all the materials.

Mac, where'd you dig EXACT up from? Anyway, yes, I agree that drafting
some microformats would be useful. And once we do that, there are a
few tools that I've been imagining since forever for these purposes,
such as a "wizard" tool to help construct and pick out protocols, for
properly documenting the machines/instruments and the workflow, in a
timely manner rather than forcing a user to learn a programming
language.

However. I strongly encourage everyone to consider it okay for those
who encode protocols to be programmers, because then we can just sit
and listen to what others have to say about a certain protocol, and
then write it up properly (or do some reading of our own, but it's
really much easier when there's more than one person working on a
protocol at a time).

I would be very excited to see something related to SBML and also
related to these protocols, much like the other aspects of recipe
representation that I brought up before-
http://groups.google.com/group/openmanufacturing/msg/1fc4fbbfd4a6fb23

So, for all of you out there who are not programmers, please consider
what you find absolutely critical when reading a protocol, and how you
would organize it into different sections, steps and lists-- the
programmerfolk can handle it past that point :-). For instance, I'm
thinking that each protocol basically has some number of "dynamic"
inputs, or inputs that change each time the protocol is executed, and
then a number of static inputs, or tools and supplies that are used
each time and don't really change. And then there are those things
which are acted upon (reactants), and consequent products to deal
with. Among other things- anybody is welcome to help hash out these
groups of things. In the end, what this will turn into is a list of
machines, instruments and tools that you would need to execute a
protocol, and then a separate list of required reagents and other
chemicals, and then a separate list of procedures and steps, and so
on, which will then be combined into a single protocol.

What do you guys think?

Bryan Bishop

unread,

Feb 9, 2009, 6:50:39 PM2/9/09

to diy...@googlegroups.com, kan...@gmail.com

On Mon, Feb 9, 2009 at 12:07 PM, Mackenzie Cowell wrote:
> I think semantic representation of recipes / protocols is fascinating.
> I could imagine using a tool to functionally define my starting
> conditions and desired ending conditions and having it generate a
> custom protocol for me by finding the set of protocols connected the
> initial conditions to the ending conditions. Protocols are
> essentially modular operations with defined inputs and outputs, so *in
> principle* it would just be a matter of matching inputs and outputs
> up.

That's interesting, and certainly possible. There are two issues to
consider. I've been working on shell program interoperability, via
analysis of input switches or parameters and what type of MIME-types
they want on the different parameters. "In theory", this will
eventually be turned into a program to use dot ggo files (gengetopt
input) with the apt system and the "open-file-with" dialog box on
linux installations to go find a program that will accept the input
(like: PDF, HTML, TXT, etc.)-- this is analogous to parametric
constraints on interoperability of mechanical components. Secondly,
another issue is that there are many linux programs that serve as an
example that would be analogous to a "combined protocol"- for example,
there are some programs that are very simple and get a single package,
like mpg123, an mp3-player for the shell. But on the other hand, there
are really big, nasty, complex programs like OpenFOAM (a CFD
simulation package) which are not broken down. So, because of this,
and because of my experience in biology labs, I suspect that some
"interconnected protocols" are going to have to be written by hand to
make a "single over-arching experiment"- which would be done once
(like how I have done pcr.xml and nobody else *has to* do it again
(although they should, because it sucks)).

I wonder though what you mean by having a program find a set of
protocols to walk you through in order to amount to a larger
experiment? Maybe my lab experience has been too minimal, is there
something that I am neglecting or forgetting? If that sort of thing
happens more often than I'm aware of, that would be pretty neat. :-)

> For example: how would I purify a 3kb insert from a plasmid carried by
> a colony I have growing on a plate?

Eek. That's a rather advanced example. I don't think you'd type that
question into the program :-) but rather maybe select what you have
and what you don't have from a list, or something, and /then/ a
program could search the protocols for those known inputs and outputs.
But that's a complicated example- in fact, pcr.xml might have been too
complicated for starters, it might have been a better idea for me to
just go with gel_electrophoresis.xml. :-)

> If it were possible to build a structured representation of laboratory
> operations, and we avoided getting bogged down in the semantics (is it
> a material? is it a reagent?), I could imagine such a system being
> used to:

One of the problems I had in a biology lab once was that the protocols
were hardly structured enough, so I wasn't entirely sure about
following the instructions. It wasn't terrible, but it also didn't
help, and I can't imagine the feeling is absent in those doing
protocols at home.

> - synthesize custom protocols optimized to produce a final quantity of
> a desired product

That's certainly doable. Stoich math, maybe.

> - optimize workflow by finding the minimum number of operations
> required to reach a desired product

In computer science, the problem is finding the longest path. I think
we can do shortest path easily enough, yes. It would be interesting to
see if we can markup our workflows or our protocols from old PhD
theses (or something) from the community, and then see whether or not
there was an easier way to prove a theory or execute an experiment to
elucidate the relationship between a few variables (a job more fit for
functional induction or symbolic regression, but that's another email
entirely I suspect).

> - keep better track of supplies in the lab (more real-time)

Absolutely. This is a high priority on my todo list. I'm also figuring
that a tool that would be able to compare "what I have" to "what I
need" would be nice. For instance, if this fictional tool was
currently operational, I'd feed in "pcr.xml" to it, and then it would
complain to me that I don't have a thermocycler in the inventory or
tool capabilities lists. I also bet this could help people keep track
of community labs so that they know when to order more supplies. If
everyone is required to use the computer system to get a protocol and
then to use the protocol to "check out" supplies and reagents, then
not only can they be given *exact* safety information for everything,
but the community lab's operations can be more thoroughly, more openly
tracked- but it's also great for collaboration etc. Win-win-win
situation.

> - form the basis of protocol walk-through educational tools.

Right.

> I like the idea of representing Protocols-Online.org in some
> machine-accessible way. Maybe you could take a shot at making a yaml
> version of the PCR CLP-ML document you linked to?

I'll get around to it eventually. Could somebody throw a shoe at me
every once in a while to remind me? :-/

Douglas Ridgway

unread,

Feb 9, 2009, 10:44:56 PM2/9/09

to diy...@googlegroups.com

On Sun, Feb 8, 2009 at 8:47 PM, Bryan Bishop <kan...@gmail.com> wrote:

> A few months ago I wrote a quick email about recipe representation on
> the openmanufacturing list. In particular, I think it would be useful
> if we could convert protocol-online.org into a repository of XML
> descriptions of lab protocols. This way, we can have lists of what
> tools, instruments, renewable materials and one-time use materials we
> need for each experiment or procedure, as well as the relevant list of
> procedures or steps.

There's also Good Laboratory Practice (GLP), under which work is done
according to defined Standard Operating Procedures (SOPs). The
protocols are written by and for people, but are nevertheless fairly
structured and defined. I think GLP tends to get used only for data
for which the regulators require it.

Tom

unread,

Feb 10, 2009, 10:38:57 AM2/10/09

to DIYbio

When considering some standards for either protocols or some custom
apparatus one might want to consider the model of the MIAME
conventions, widely used among researchers (http://www.mged.org/
Workgroups/MIAME/miame.html). This was originally developed for
microarray analysis to define the minimum amount of information
necessary to describe an experiment so as to be able to repeat it but
the idea has been adapted to other types of complex biological data.
Setting up some formal standard for those submitting protocols could
be useful. Hope this helps. Also, the 5K for the genome is nothing,
being able to visualize the data and annotate with genes and other
functional information is the big thing (like the Watson website),
without that, the raw 3 Gb of sequence assembled is just noise.
Tom

On Feb 8, 10:47 pm, Bryan Bishop <kanz...@gmail.com> wrote:
> Hey all,
>
> A few months ago I wrote a quick email about recipe representation on
> the openmanufacturing list. In particular, I think it would be useful
> if we could convert protocol-online.org into a repository of XML
> descriptions of lab protocols. This way, we can have lists of what
> tools, instruments, renewable materials and one-time use materials we
> need for each experiment or procedure, as well as the relevant list of
> procedures or steps.
>
> http://groups.google.com/group/openmanufacturing/msg/1fc4fbbfd4a6fb23
>
> So, here's the paper I've been working off of-
>
> Definition of an XML Markup Language for Clinical Laboratory

> Procedures and Comparison with Generic XML Markuphttp://www.clinchem.org/cgi/content/abstract/52/10/1943
> pdf:http://www.clinchem.org/cgi/reprint/52/10/1943
> dtd:http://www.clinchem.org/content/vol0/issue2006/images/data/clinchem.2...

> - Bryanhttp://heybryan.org/
> 1 512 203 0507

Reply all

Reply to author

Forward