ANNOUNCEMENT: Portable Metabolic Binary Standard

Herbert M Sauro

unread,

Sep 9, 1999, 3:00:00 AM9/9/99

to

Portable Metabolic Binary Standard (pmb files)

This message is to announce the availability of a new portable metabolic
binary format (pmb files). For the moment this is likely only to be of
interest to software developers of cell/metabolic simulation software. The
hope is that by making such a specification available, users of different
simulation packages will be able to exchange both data and models.
Specifically we hope that software developers will add pmb export and import
facilities to their packages. Since the pmb format is a binary format it is
not readable by humans, however a fully structured metabolic language is
being considered along side pmb and details will be posted in a later
announcement.

A free set of utilities (including viewers and converters), documentation
and source code is available at:

http://members.tripod.co.uk/sauro/biotech.htm

under the utilities page.

MMFF List Committee

Athel Cornish-Bowden

unread,

Sep 10, 1999, 3:00:00 AM9/10/99

to

Herbert Sauro wrote:

>Portable Metabolic Binary Standard (pmb files)
>
>This message is to announce the availability of a new portable metabolic
>binary format (pmb files). For the moment this is likely only to be of
>interest to software developers of cell/metabolic simulation software. The
>hope is that by making such a specification available, users of different
>simulation packages will be able to exchange both data and models.
>Specifically we hope that software developers will add pmb export and import
>facilities to their packages. Since the pmb format is a binary format it is
>not readable by humans, however a fully structured metabolic language is
>being considered along side pmb and details will be posted in a later
>announcement.

What is the advantage of making it a binary format (and hence "not readable
by humans")?

Most of those of us who follow this list are humans and some of the
advantages of making things readable by humans are obvious. Another
advantage, even with things to be read by machine is that text files
generate fewer problems for moving data between operating systems.

What are the compensating disadvantages?

For large data files (e.g. graphics files, word-processor files, etc.) I
can think of some advantages in using a binary format, such as

1. Reading and writing are faster;
2. For a commercial developer using a binary format makes life more
difficult for competitors, pirates etc.;
3. A binary file may contain less wasted space.

However,

1. For the sort of thing Herbert is talking about, I would be surprised if
the files got so big that one would notice any difference in speed;
2. I wouldn't have thought that making life difficult for competitors was
an issue here;
3. Looking at the sizes of text files and binary files produced by
commercial programs and containing the same information, I get the
impression (possibly wrong) that any advantage in more efficient use of
storage is likely to be trivial.

Doubtless there is some important reason for using a binary format that I
haven't thought of, but I'd be interested in knowing what it is.

Athel

Email: at...@ibsm.cnrs-mrs.fr
Site map: http://ir2lcb.cnrs-mrs.fr/~athel/sitemap.htm
MCA chapter from my book:
http://ir2lcb.cnrs-mrs.fr/~athel/mcai.htm

---

Herbert M Sauro

unread,

Sep 10, 1999, 3:00:00 AM9/10/99

to

I think some clarification is required on the role of pmb files since it
wasn't clear perhaps why it was announced.

Athel Cornish-Bowden <at...@ir2cbm.cnrs-mrs.fr> wrote in message
news:v03007801b3fed8bb3cc4@[193.50.234.80]...

> Herbert Sauro wrote:
>
> >Portable Metabolic Binary Standard (pmb files)
> >
> >This message is to announce the availability of a new portable metabolic

>> .........etc

> >being considered along side pmb and details will be posted in a later
> >announcement.
>
> What is the advantage of making it a binary format (and hence "not
readable
> by humans")?
>
> Most of those of us who follow this list are humans and some of the
> advantages of making things readable by humans are obvious. Another
> advantage, even with things to be read by machine is that text files
> generate fewer problems for moving data between operating systems.

****** I hope pmb files are platform independent, there are no issues for
example concerning the type of line feed used. Just this very day at work I
had to manually edit in binary some source code we obtained from a Unix
platform because the debugger on our development platform went crazy when it
found single LFs were terminating lines (Windows uses CRLF), now that took a
while to work out the cause!

>
> What are the compensating disadvantages?
>
> For large data files (e.g. graphics files, word-processor files, etc.) I
> can think of some advantages in using a binary format, such as
>
> 1. Reading and writing are faster;
> 2. For a commercial developer using a binary format makes life more
> difficult for competitors, pirates etc.;
> 3. A binary file may contain less wasted space.
>
> However,
>
> 1. For the sort of thing Herbert is talking about, I would be surprised if
> the files got so big that one would notice any difference in speed;

******* You're right, nothing to do with speed.

> 2. I wouldn't have thought that making life difficult for competitors was
> an issue here;

******* You're right, it's nothing to do with keeping competitors out, quite
the opposite in fact.

> 3. Looking at the sizes of text files and binary files produced by
> commercial programs and containing the same information, I get the
> impression (possibly wrong) that any advantage in more efficient use of
> storage is likely to be trivial.
>

******* You're right, there is no storage problem.

> Doubtless there is some important reason for using a binary format that I
> haven't thought of, but I'd be interested in knowing what it is.
>

******** Indeed there is, and it goes like this.

Any thing which humans can meddle with will invariable end up containing
errors, errors like spelling mistakes, syntactic and semantic errors.

If human users require readable text files then they need a parser to read
and cope with all the potential errors that can arise. These are not
particularly easy things to write especially for users whose main subject
specialism is not computer science. However it is easy to read binary files.
These are rigidly structured and because they're binary files they will
99.99% of the time be products of other computer programs rather than
humans, we can therefore assume they are error free (One can add a
sprinkling of error checking but it's trivial in comparison to human
written text files). Also a binary file is less likely to be messed about
with by humans since they won't have a clue what the contents mean. Not
everyone is equipped to write text parsers but probably most scientists
could
write a Fortran, C or Obj Pascal program to read a binary file. The people
who have developed programs like Gepasi, Mist, E-Cell, DbSolve, MetaMod etc
can now easily add a pmb import/export section to their software without the
tedium and very time consuming task of writing a text parser. The next
version of WinScamp for example (should be available as you read this) can
import and export pmb files. The hope is that users of these software
packages will be able to exchange simulation models.

You'll be interested to hear that there is a new language parser being
written for MCA/metabolic models which can spit out these binary files
aswell as do the usual numerical stuff (+ lots more). Users will be able to
write they models is a human readable text form, export it as a pmb file,
give it to someone else who will import it into something like Gepasi.

That's the reason.

> Athel

Douglas B. Kell

unread,

Sep 10, 1999, 3:00:00 AM9/10/99

to

>> Doubtless there is some important reason for using a binary format that I
>> haven't thought of, but I'd be interested in knowing what it is.
>>
>
>******** Indeed there is, and it goes like this.
>
>Any thing which humans can meddle with will invariable end up containing
>errors, errors like spelling mistakes, syntactic and semantic errors.
>

I'm afraid I can't altogether agree with Herbert here, and thus would
support what I take implicitly to be Athel's reasoning.....see comments
below...since incompetent users have NO NEED to meddle with these things
[and any file exported by a program can be made read-only anyway], but at
least there is a big cohort of users competent enough to READ them to see if
the contents of such a file can help when some analysis has gone
pear-shaped. [BTW, a little hobby of mine, syntactic errors can be stopped
by the use of parse-trees.]

[chomp]

Not
>everyone is equipped to write text parsers but probably most scientists
>could
>write a Fortran, C or Obj Pascal program to read a binary file.

Come off it (and I understand that those who are so equipped would use Perl
anyway). "Most scientists" couldn't even write something to print 'Hello
World'

>The people
>who have developed programs like Gepasi, Mist, E-Cell, DbSolve, MetaMod etc
>can now easily add a pmb import/export section to their software without
the
>tedium and very time consuming task of writing a text parser. The next
>version of WinScamp for example (should be available as you read this) can
>import and export pmb files. The hope is that users of these software
>packages will be able to exchange simulation models.

As we discussed in Visegrad, the MOTIVATION is fine, but (as I also
explained my position there) the RESULT can be achieved in ascii and the
right model to think about (in terms of exchanging data with
otherwise-incompatible systems) can be seen in what the FLOW CYTOMETRY crowd
went through in an EXACTLY analogous manner (and here dealing with competing
commercial manufacturers, not a friendly group who want to spread the word
in the best way possible).

The way it was done there is to have the standard as a FILE HEADER -

see full details at http://www.isac-net.org/fcs3/FCS3.html

including

'2.1 Conventions

2.1.1 The ASCII character code is used for all keywords and most of the
keyword values throughout an FCS3.0 file (see section 3.2.20 regarding the
use of UNICODE characters). '

>
>You'll be interested to hear that there is a new language parser being
>written for MCA/metabolic models which can spit out these binary files
>aswell as do the usual numerical stuff (+ lots more). Users will be able to
>write they models is a human readable text form, export it as a pmb file,
>give it to someone else who will import it into something like Gepasi.

This of course **sounds** much more interesting, but then those who write
the aforementioned programs could as easily be invited to write their [what
are called in the flow cytometry trade "LIST MODE"] files with the
standardised header, and I thought/hoped that that was what was going to
happen as a result of some of the earlier discussions. Not knowing what
Herbert has in mind here, I would simply urge at this stage that we consider
asking the writers of metabolic simulators to make both types of option
available (radio button in the program and the parser could obviously do the
rest....)...

Looks like a serious off-line topic for BTK Stellenbosch, where we could
also be wise (Jannie/Johann...?) to think about getting the genome
metabolism crowd involved here in integrating MCA/modelling into the
post-genomic data flood deconvolution activities - the
Mendes/Karp/KEGG/e-cell people in particular - a few links at
http://gepasi.dbs.aber.ac.uk/dbk/metabol.htm - before they all set up their
own systems that we would wish had been different.

>
>That's the reason.
>
>> Athel

and that's mine;-)

Douglas.
(Prof.) Douglas B. Kell, Cledwyn Building,
Institute of Biological Sciences, University of Wales,
Aberystwyth SY23 3DD
Tel: +44 1970 622334 Fax: +44 1970 622354
d...@aber.ac.uk http://gepasi.dbs.aber.ac.uk

Pedro Mendes

unread,

Sep 10, 1999, 3:00:00 AM9/10/99

to

Well, well,

We (MMFF group) already discussed most of this binary vs. text story.
And to assure everyone in this list, I repeat what Herbert said in his
original email: we are also going to define a text format too. So if you can
wait a bit longer, you will see it happen!

However, I'd like to express my personal opinion that this discussion is
rather futile:

No one ever edits a text file directly (i.e. the strings of 7 bits inside each
byte). One uses a *program* to do so. This program displays things on
screen that we then percieve as letters of the alphabet and other
characters. It turns out that a text file is as binary as anything else.

Now, rather than text, let's think of a pathway. There is no way of
representing a pathway in a computer other than by strings of zeros and
ones (like anything else). So, to make the parallel with text, what we
need is a pathway editor program which (like the text editor) allows one
to view and edit pathways in a convenient way. And that is what everyone
(me included) ever uses to manipulate pathways with a computer. The
file format is totally irrelevant because one always uses a program to
read it.

The portable metabolic text file format will only be needed by those who
prefer using a text editor to manipulate a pathway (using some
language). That's all. So you emacs, vi, notepad, pfe, etc. fans should
wait a little longer because we will cater for you too.

The main objective of PMB files is for programs to communicate with the
same file format. So that if I use Gepasi to define a pathway model I can
then send it to Herbert and he can read it with Scamp straight away, just
like that. We hope that other programs will adopt the format such that
users can pass data between them without any trouble. In fact our
(NCGR's) future metabolic database, named PathDB, will also produce
these files as output. You will be able to query it over the net and get on
your computer a PMB file such that you can immediatly load it into
Scamp, Gepasi, etc. and play away. Again, what the file format looks
like is of no relevance really, since one can use any of these programs to
edit it. (By the way, nobody seems to care much if word files are ASCII
or binary since one ends up using word to edit them anyway).

But to finish on a positive note: those that really like text files should wait
a while because we will also produce a text-based format. So be patient...

Douglas made a different point in his email that I will reply independently
(since it is very important).

Pedro

Dr. Pedro Mendes
http://gepasi.dbs.aber.ac.uk/pedro/prmhome.htm
National Center for Genome Resources,
1800-A, Old Pecos Trail,
Santa Fe, NM 87505, USA
http://www.ncgr.org

Pedro Mendes

unread,

Sep 10, 1999, 3:00:00 AM9/10/99

to

On 10 Sep 99, at 22:28, Douglas B. Kell wrote:

> ....
> [a lot of other stuff on a different topic]

>
> Looks like a serious off-line topic for BTK Stellenbosch, where we could
> also be wise (Jannie/Johann...?) to think about getting the genome
> metabolism crowd involved here in integrating MCA/modelling into the
> post-genomic data flood deconvolution activities - the
> Mendes/Karp/KEGG/e-cell people in particular - a few links at
> http://gepasi.dbs.aber.ac.uk/dbk/metabol.htm - before they all set up their
> own systems that we would wish had been different.

I totally agree (except that there is yet no database set up by Mendes,
but it won't be long). Like many said in Visegrad, BTKers have to be
proactive in this area. We risk being left out by the molbio community
simply because they do not know about our activities. There are also
enough people out there who despite knowing about our theories,
formalisms and methods, will reinvent the wheel and get all the credit.
We cannot sit down and let this happen.

At Stellenbosch we could do a lot of good by keeping ahead of the game.
From what I see in the preliminary program there are already two good
subjects for this: "What is a metabolic pathway?" and "X-omics".
Especially on the pathways question, the bulk of the molbio crowd is not
yet thinking about this. The large majority take for granted the standard
textbook metabolic pathways. Viz the current pathway databases like
KEGG or WIT. The next generation of pathway databases will go much
further... and before someone rediscovers things like elementary modes
(oopos, they already did!) and steals the glory from Schuster et al. we
better advance our ideas and make them widely heard.

My hopes for Stellenbosch is that we have a productive discussion about
what is really a pathway and then take the pain to write a review/position
paper echoing the discussion. Joint authorship by several groups would
be great and it would really help if someone could do a bit of political
manouvering to get it to a big name like Nature or Science.

Herbert M Sauro

unread,

Sep 11, 1999, 3:00:00 AM9/11/99

to

> >
> >******** Indeed there is, and it goes like this.
> >
> >Any thing which humans can meddle with will invariable end up containing
> >errors, errors like spelling mistakes, syntactic and semantic errors.
> >
>
> I'm afraid I can't altogether agree with Herbert here, and thus would
> support what I take implicitly to be Athel's reasoning.....see comments
> below...since incompetent users have NO NEED to meddle with these things

But the reality is, users, both competent and incompetent *will* mendle, and
when things go wrong, guess who has to tell them how to put things right,
who will they complain too etc? I've little time enough without having to
hold the hands of so-called professional scientists who should know better.
If I can reduce user maintenance then it's good for me. Originally we did
think about producing a simple human readable ascii file but if we
(particulary myself) were going to spend weeks of man hours writing a parser
to deal with syntax and semantic issues then we might aswell spend the time
better by spending less time developing a binary format (which is relatively
easy to do) which any competent scientist who develops software can
implement (I know for certain that the guys and gals in your group Douglas
would have no trouble at all writing a program to read binary files) and the
rest of the time and more of it developing a decent structured language (but
never the less simple) which anybody with some experience of Basic would be
quite at home with then yes I opt for this strategy. No cryptic symbols in
the text either (eg the flow-cytometry syntax). If we can produce one really
good parser for the community, with a rich language and set of facilities,
it will save years of work for other people, especially if the parser will
also output something (ie pmb files) that other programs can easily read and
know that it will be error and meddled free. An awful lot of work goes into
writing bullet proof language compilers, quite apart from all the stuff that
goes in the background to turn the language into some sort of executable
code. HTML appears to be simple human readable assci file but the html
parsers are by no means trival pieces of software. You wouldn't believe the
amount of code in Scamp to stop users making silly semantic mistakes, it's
not just incompentent users but competent users need protecting too.

> [and any file exported by a program can be made read-only anyway], but at
> least there is a big cohort of users competent enough to READ them to see
if
> the contents of such a file can help when some analysis has gone
> pear-shaped. [BTW, a little hobby of mine, syntactic errors can be stopped
> by the use of parse-trees.]
>
> [chomp]
>
> Not
> >everyone is equipped to write text parsers but probably most scientists
> >could
> >write a Fortran, C or Obj Pascal program to read a binary file.
>
> Come off it (and I understand that those who are so equipped would use
Perl
> anyway). "Most scientists" couldn't even write something to print 'Hello
> World'

Remember I work in the finacial sector and we don't use things like perl but
the more usual computer languages, so perl didn't occur to me.

As far as I remember, I think everyone of your students in your group can
easily write software, and some of them could do it in their sleep. Henrik
Kacser said once (probably more than once, and possibly paraphrasing someone
else) that a scientist only needs to know two languages, english and
mathematics, today I would also add a third, namely the ability to program a
computer (in perl, python or what ever).

It had been the intention to write a pmbAscii converter which would convert
the bianry format to a human readable form, this would be fairly easy to do
and I have one sitting on my hard disk. As an exercise I'll leave it to the
professionals to write the reverse converter, I agree it would be useful but
I don't have the time right now.

>
>
> The way it was done there is to have the standard as a FILE HEADER -
>
> see full details at http://www.isac-net.org/fcs3/FCS3.html
>
> >

>are called in the flow cytometry trade "LIST MODE"] files with the
> standardised header, and I thought/hoped that that was what was going to
> happen as a result of some of the earlier discussions. Not knowing what
> Herbert has in mind here, I would simply urge at this stage that we
consider

The problem is time and resources, which are serverly lacking. The solution
we have I think is both the cheapest and in the long term the best. If
anyone has anything better in mind please feel free to go ahead and develop
a standard, I can certainly find far better and more fun things to do with
my spare time.

Herbert Sauro

Herbert M Sauro

unread,

Sep 11, 1999, 3:00:00 AM9/11/99

to

> >
> >******** Indeed there is, and it goes like this.
> >
> >Any thing which humans can meddle with will invariable end up containing
> >errors, errors like spelling mistakes, syntactic and semantic errors.
> >
>
> I'm afraid I can't altogether agree with Herbert here, and thus would
> support what I take implicitly to be Athel's reasoning.....see comments
> below...since incompetent users have NO NEED to meddle with these things

But the reality is, users, both competent and incompetent *will* mendle, and
when things go wrong, guess who has to tell them how to put things right,
who will they complain too etc? I've little time enough without having to
hold the hands of so-called professional scientists who should know better.
If I can reduce user maintenance then it's good for me. Originally we did
think about producing a simple human readable ascii file but if we

(particularly myself) were going to spend weeks of man hours writing a

parser
to deal with syntax and semantic issues then we might aswell spend the time
better by spending less time developing a binary format (which is relatively
easy to do) which any competent scientist who develops software can
implement (I know for certain that the guys and gals in your group Douglas
would have no trouble at all writing a program to read binary files) and the
rest of the time and more of it developing a decent structured language (but
never the less simple) which anybody with some experience of Basic would be
quite at home with then yes I opt for this strategy. No cryptic symbols in
the text either (eg the flow-cytometry syntax). If we can produce one really
good parser for the community, with a rich language and set of facilities,
it will save years of work for other people, especially if the parser will
also output something (ie pmb files) that other programs can easily read and
know that it will be error and meddled free. An awful lot of work goes into
writing bullet proof language compilers, quite apart from all the stuff that
goes in the background to turn the language into some sort of executable
code. HTML appears to be simple human readable assci file but the html

parsers are by no means trivial pieces of software. You wouldn't believe the

amount of code in Scamp to stop users making silly semantic mistakes, it's

not just incompetent users but competent users need protecting too.

> [and any file exported by a program can be made read-only anyway], but at
> least there is a big cohort of users competent enough to READ them to see
if
> the contents of such a file can help when some analysis has gone
> pear-shaped. [BTW, a little hobby of mine, syntactic errors can be stopped
> by the use of parse-trees.]
>
> [chomp]
>
> Not
> >everyone is equipped to write text parsers but probably most scientists
> >could
> >write a Fortran, C or Obj Pascal program to read a binary file.
>
> Come off it (and I understand that those who are so equipped would use
Perl
> anyway). "Most scientists" couldn't even write something to print 'Hello
> World'

Remember I work in the financial sector and we don't use things like perl

but
the more usual computer languages, so perl didn't occur to me.

As far as I remember, I think everyone of your students in your group can
easily write software, and some of them could do it in their sleep. Henrik
Kacser said once (probably more than once, and possibly paraphrasing someone

else) that a scientist only needs to know two languages, English and

mathematics, today I would also add a third, namely the ability to program a
computer (in perl, python or what ever).

It had been the intention to write a pmbAscii converter which would convert

the binary format to a human readable form, this would be fairly easy to do

and I have one sitting on my hard disk. As an exercise I'll leave it to the
professionals to write the reverse converter, I agree it would be useful but
I don't have the time right now.

>
>
> The way it was done there is to have the standard as a FILE HEADER -
>
> see full details at http://www.isac-net.org/fcs3/FCS3.html
>
> >
>are called in the flow cytometry trade "LIST MODE"] files with the

> standardised header, and I thought/hoped that was what was going to

> happen as a result of some of the earlier discussions. Not knowing what
> Herbert has in mind here, I would simply urge at this stage that we
consider

The problem is time and resources, which are severely lacking. The solution