[erlang-questions] Is there a good source for documentation on BEAM?

Jonathan Coveney

unread,

May 7, 2012, 2:39:35 AM5/7/12

to erlang-q...@erlang.org

This question seems to come up now and again, and it's surprising to me that a crucial part of the documentation isn't better documented. Is there a reason that it is the case? Is the reason that there is no VM spec to give the devs the flexibility to change the intermediate layer without having to worry about backwards compatibility to the degree that Java does?

Thus far I've found a description of the opcodes:

http://azunyanmoe.wordpress.com/2011/03/30/erlang-vm-opcodes/

and this resource on the file format:

http://www.erlang.se/~bjorn/beam_file_format.html

But there doesn't seem to be a lot of high level talk about what the opcodes do (a la the JVM specification, for example). I know it's not impossible, and could always ask the guys at Erjang how they went about it, but thought I'd ask here.

Please forgive a newbie question, and thanks in advance

Jon

Joe Armstrong

unread,

May 7, 2012, 4:47:26 AM5/7/12

to Erlang

---------- Forwarded message ----------
From: Joe Armstrong <erl...@gmail.com>
Date: Mon, May 7, 2012 at 10:46 AM
Subject: Re: [erlang-questions] Is there a good source for
documentation on BEAM?
To: Jonathan Coveney <jcov...@gmail.com>

Hi,

I did start writing a description but it's not very complete.

This is on my list of things-to-do-one-day-when-you-get-time

See http://dl.dropbox.com/u/4764922/beam.pdf

If there is any interest I could up the priority :-)

/Joe

> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Thomas Lindgren

unread,

May 7, 2012, 4:46:59 AM5/7/12

to Jonathan Coveney, erlang-q...@erlang.org

>________________________________
> From: Jonathan Coveney <jcov...@gmail.com>
>To: erlang-q...@erlang.org
>Sent: Monday, May 7, 2012 8:39 AM
>Subject: [erlang-questions] Is there a good source for documentation on BEAM?

>
>
>This question seems to come up now and again, and it's surprising to me that a crucial part of the documentation isn't better documented. Is there a reason that it is the case? Is the reason that there is no VM spec to give the devs the flexibility to change the intermediate layer without having to worry about backwards compatibility to the degree that Java does?

Actually, I don't think such docs are all _that_ crucial -- who really needs to know, except a small number of VM implementors? (And they should read the source to get at all the goodies.) But perhaps someone on the list might be moved to do a tutorial presentation on an Erlang Factory or something?

(By the way, I too assume not doing it is to avoid getting bogged down into minutiae.)

If you want to learn more about some of the intellectual roots, try these:
http://wambook.sourceforge.net/

http://dl.acm.org/citation.cfm?id=188051

Best regards,
Thomas

Richard Carlsson

unread,

May 7, 2012, 5:17:20 AM5/7/12

to erlang-q...@erlang.org

On 05/07/2012 08:39 AM, Jonathan Coveney wrote:
> This question seems to come up now and again, and it's surprising to me
> that a crucial part of the documentation isn't better documented. Is
> there a reason that it is the case? Is the reason that there is no VM
> spec to give the devs the flexibility to change the intermediate layer
> without having to worry about backwards compatibility to the degree that
> Java does?

Yes, that's probably the reason. The BEAM is not the first VM for Erlang
(JAM was used until the late 90s), and might not be the last. In the
case of Java, the JVM was central for defining the language semantics
and ensure portability across platforms and VM implementations. The
Erlang language, on the other hand, is mostly functional and its
semantics is better specified in terms of the source level code. The
implementation on top of a VM is a detail.

Still, the Beam is an interesting VM, and the implementation has been
thoroughly battle tested, so using it as a target for other languages is
not a bad idea. It deserves to be better documented.

/Richard

Michael Turner

unread,

May 7, 2012, 5:27:33 AM5/7/12

to Thomas Lindgren, erlang-q...@erlang.org

"Actually, I don't think such docs are all _that_ crucial -- who
really needs to know, except a small number of VM implementors?"

Aren't Erlang's chances of greater mindshare improved by making it
easier to become a VM implementor? I doubt very much that Java would
be where it is today had it not been for clear VM specification.
That's not to say that Erlang should follow in all of Java's
footsteps, even if it could. But I have to say I was a boggled to
learn that you can't find out what the VM opcodes mean without reading
the source (and maybe not even then, if the source contains bugs
vis-a-vis some idealized machine model.)

-michael turner

Joe Armstrong

unread,

May 7, 2012, 7:47:33 AM5/7/12

to Michael Turner, erlang-q...@erlang.org

I think it works like this:

1) first you don't understand how the X works (X=Beam, JVM, X11,
... you name it)
2) You struggle - and think - google and have a hot bath
3) Eureka - bath flows over
4) Now you can understand it - and you can also remember why you
could not understand it
5) Now it's easy you understand it
6) You see no reason to document it since it's obvious

Round about 4) there is a small window of opportunity to explain to
other people how it works.
Once you get to 6) it's very difficult to remember what it felt like
at point 2) and consequently difficult
to write decent documentation.

/Joe

Masklinn

unread,

May 7, 2012, 7:58:32 AM5/7/12

to Joe Armstrong, erlang-q...@erlang.org

On 7 mai 2012, at 13:47, Joe Armstrong <erl...@gmail.com> wrote:

> I think it works like this:
>
> 1) first you don't understand how the X works (X=Beam, JVM, X11,
> ... you name it)
> 2) You struggle - and think - google and have a hot bath
> 3) Eureka - bath flows over
> 4) Now you can understand it - and you can also remember why you
> could not understand it
> 5) Now it's easy you understand it
> 6) You see no reason to document it since it's obvious

Pretty much the same process which prevents the completion and uptake of alternative git porcelains: by the time the author has a good enough knowledge to finish his implementation he has good enough knowledge to use the default cli without hindrance.

Miles Fidelman

unread,

May 7, 2012, 9:18:16 AM5/7/12

to Erlang

At the very least, it's probably worth putting someone w/ the rest of
Erlang documentation, rather than buried on dropbox!

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

Michael Turner

unread,

May 7, 2012, 10:48:14 AM5/7/12

to Masklinn, erlang-q...@erlang.org

> On 7 mai 2012, at 13:47, Joe Armstrong <erl...@gmail.com> wrote:
>
>> I think it works like this:
>>
>> 1) first you don't understand how the X works (X=Beam, JVM, X11,
>> ... you name it)
>> 2) You struggle - and think - google and have a hot bath
>> 3) Eureka - bath flows over

3.1) Because you're Archimedes, you have a very bright slave-boy
scribe. You explain it to him exultantly ...

3.2) ... then again in puzzlement over *his* puzzlement ....

3.3) ... then yet again in profound irritation over his obtuseness.

3.4) He finally gets it. But then he points out a subtle error in your
reasoning, one that had been tripping him up in his understanding.

3.5) You have him flogged for his impertinence.

3.6) He copy-edits and publishes the manuscript under your name. But
he also embezzles 10% of your book royalties to repatriate to his
war-widowed mother who is still living in the city-state that your
city-state conquered, resulting in his slavery.

We now return you to our regularly-scheduled historical narrative ....

-michael turner

Thomas Lindgren

unread,

May 7, 2012, 3:15:14 PM5/7/12

to Michael Turner, erlang-q...@erlang.org

----- Original Message -----
> From: Michael Turner <michael.eu...@gmail.com>
> To: Thomas Lindgren <thomasl...@yahoo.com>
> Cc: Jonathan Coveney <jcov...@gmail.com>; "erlang-q...@erlang.org" <erlang-q...@erlang.org>
> Sent: Monday, May 7, 2012 11:27 AM
> Subject: Re: [erlang-questions] Is there a good source for documentation on BEAM?
>
>& quot;Actually, I don't think such docs are all _that_ crucial -- who

> really needs to know, except a small number of VM implementors?"
>
> Aren't Erlang's chances of greater mindshare improved by making it
> easier to become a VM implementor? I doubt very much that Java would
> be where it is today had it not been for clear VM specification.
> That's not to say that Erlang should follow in all of Java's
> footsteps, even if it could. But I have to say I was a boggled to
> learn that you can't find out what the VM opcodes mean without reading
> the source (and maybe not even then, if the source contains bugs
> vis-a-vis some idealized machine model.)

Well, we should ask why we need them.

There has been a substantial number of non-BEAM Erlang implementations already, so I'm
not convinced detailed BEAM docs is the key property* to spread Erlang.
Indeed, requiring detailed docs of every change of BEAM seems likely to slow innovation down instead.

If the motive is education, I think someone interested in compilers and virtual machine architectures
would have little trouble with BEAM as such. In a real sense, BEAM is just a vehicle to express compiler optimizations for a
restricted part of ERTS (the sequential execution part, basically). The interesting choices and optimizations are found by examining
the whole implementation. But again, a tutorial could be useful.

Another argument might be that BEAM should be specified in detail in order to be a suitable binary format for distribution,
which is essentially what the JVM instruction set has become. (The commercial implementations convert it into internal
formats and optimize the hell out of those.) My druthers would in that case be to define something less detailed
and less changeable to serve in this role.

Okay, that's all I can come up with.

(* I'd prefer to have the marketing muscle of Sun instead!)

Jonathan Coveney

unread,

May 7, 2012, 4:22:40 PM5/7/12

to Thomas Lindgren, erlang-q...@erlang.org

I love the thoughtfulness of this listserv.

I think a tutorial would be invaluable. I think the sorts of people who get into erlang may be like me... some functional background, a lot of non-functional background, you see erlang and want to get a sense of how the opcodes (and by extension the internals) work. You can trawl the source (and I will), but I definitely think that if someone were up to it, the community would be well served by a tutorial on the implementation details.[1]

[1] where the devil is, so they say

Thanks for the responses!

Jon

2012/5/7 Thomas Lindgren <thomasl...@yahoo.com>

Richard O'Keefe

unread,

May 7, 2012, 5:57:34 PM5/7/12

to Joe Armstrong, Erlang

On 7/05/2012, at 8:47 PM, Joe Armstrong wrote:

> ---------- Forwarded message ----------
> From: Joe Armstrong <erl...@gmail.com>
> Date: Mon, May 7, 2012 at 10:46 AM
> Subject: Re: [erlang-questions] Is there a good source for
> documentation on BEAM?
> To: Jonathan Coveney <jcov...@gmail.com>
>
>
> Hi,
>
> I did start writing a description but it's not very complete.
>
> This is on my list of things-to-do-one-day-when-you-get-time
>
> See http://dl.dropbox.com/u/4764922/beam.pdf
>
> If there is any interest I could up the priority :-)
>
> /Joe

I would have expected it to be a matter of personal and
professional pride that
One reason that several of my EEPs and so on never got model
implementations is that the BEAM instructions have less than
a 10th of the documentation that the WAM instructions had at
Quintus. It does seem clear that the Erlang/OTP team are
much better programmers than Quintus were, because whenever
we forgot to document some detail of an instruction, someone
was sure to get it wrong and new builds would start crashing
horribly. It's also a reason I've never contributed at a low
level to SWI Prolog.

Two of the best documented abstract instruction sets I've seen
are Icon (there's a whole book about it) and Lua (there's a
document someone wrote with nearly a page per instruction; if
I could bring myself to _care_ about Lua I could start writing
extensions for it tomorrow). Since Hassan Aït-Kaci wrote his
little book about the WAM, that counts as well documented too,
except of course that I doubt anyone actually follows it all
that closely, Quintus certainly departed from it. (And yes of
course's there's the Java VM book too; I don't suppose it's a
coïncidence that Tim Lindholm worked at Quintus and times on
the WAM.)

Looking at that document of yours, Joe,
(a) Out of 8 pages, only 5 are actually about the BEAM.
(b) We learn only at the end of those 5 pages that there are
really TWO different things called BEAM:
- the 'high level abstract instructions' generated
by the compiler, which is pretty stable, and
- the 'internal form' meant for execution, generated
by the loader, which 'has changed many times'.
No *wonder* I failed to learn about the BEAM by reading the
emulator code and trying to match what I saw there against
the "instructions" I saw in the .S files.
(c) The document begins in the middle. The beginning is
the *architecture*. Given the architecture, a lot of the
instructions become much easier to understand.
(d) The document is very obviously a draft, so there's not
much more it's fair to say at the moment.

With there being two "BEAM"s, there need to be two manuals.
"External BEAM", quite full, for people working on tools that
generate, parse, or otherwise manipulate that kind of code.
"Internal BEAM", taking an external BEAM background and
architectural understanding for granted, that tells people
working on the loader, emulator, HiPE, &c what they need to
know.

Since the Erlang/OTP team and HiPE teams have evidently
managed without the "Internal BEAM" document, the
"External BEAM" document is the priority.

Richard O'Keefe

unread,

May 7, 2012, 9:07:13 PM5/7/12

to Thomas Lindgren, erlang-q...@erlang.org

On 8/05/2012, at 7:15 AM, Thomas Lindgren wrote:
> There has been a substantial number of non-BEAM Erlang implementations already, so I'm
> not convinced detailed BEAM docs is the key property* to spread Erlang.

And how many of those non-BEAM implementations still exist?
Does GERL? Is E2S still maintained? How much of OTP can it handle?

> Indeed, requiring detailed docs of every change of BEAM seems likely to slow innovation down instead.

I not only *don't* believe that, I *can't* believe that.
Joe has informed us that there are TWO levels of BEAM,
one of which has been very stable, and one of which has
changed many times.

I don't even believe your claim if made about the low level
much changed "BEAM", but let's suppose it true for the sake
of argument. If the high level of BEAM has remained pretty
stable for quite a while, how would documenting it have
slowed innovation down?

I can *prove* that the absence of documentation has definitely
slowed innovation down. There are several of my EEPS where I
*would* have provided model implementations had I been able to.

Heck, the frames proposal was in more or less its present
shape *years* ago. It was only last month when it finally
dawned on me that I could make measurements by constructing
my *own* micro-BEAM which I *could* understand.

>
> If the motive is education, I think someone interested in compilers and virtual machine architectures
> would have little trouble with BEAM as such.

I have an interest in compilers and VMs. I worked professionally on Quintus
Prolog and the real WAM (not the one in the papers or Aït-Kaci's book). And
trying to figure out the BEAM was such a slog that to be honest, I said to
myself "the hell with it, if they don't *WANT* me to understand the BEAM,
I'm not going to waste any more of my time trying to penetrate the obscurity".

There are three key software engineering lessons:
- if it isn't tested, it doesn't work;
- if it isn't documented, it doesn't work;
- tests are one kind of documentation but not enough.

For example, at Quintus, shortly after David Warren left, a compiler bug
was reported. The whole compiler had two comments in it. One was a
copyright notice, and the other was a commented out bit of code saying
"this doesn't work". All variable names were either one letter or one
letter and one digit.

To fix the bug, I had to document the data structures used by the compiler,
including running lots of tests through it to make sure my documentation
was correct. Whenever I figured out what a variable meant, I gave it a
longer name. Once I understood the data structures, I was able to revise
them and make the compiler about 20% faster. That was good, because the
bug turned out to be in the one part of the compiler I could never
understand. Adding some extra code to check if the bug had happened and
deoptimise in that case increased compile time by 10%, but overall the
compiler was now faster.

I am still quite angry that the information I needed to fix that bug
properly was *KNOWN* to the author, who didn't bother writing it down.
I didn't even need to know how the thing worked; what I needed to know
was what exactly it was supposed to *do*.

> In a real sense, BEAM is just a vehicle to express compiler optimizations for a
> restricted part of ERTS (the sequential execution part, basically).

No, compiler optimisations are expressed in the executable code of the
compiler. BEAM lets you express the *results* of such optimisations,
which is a different thing. It's just like the Quintus compiler: I could
figure out in that case what the *results* were, but the actual process
remained obscure. (More precisely, what the 'invariants' were.)

The thing is, the compiler module in question was about 2 kSLOC, and it took
me two full *weeks* to figure out what the author already *knew*. That was
not a good use of my time.

I've already spent about 4 full days equivalent trying to figure BEAM out,
and with nobody paying me to do that, it's just not worth it, especially
because somebody already KNOWS and just can't be bothered TELLING.

Yes, I'm shouting. "We don't need it" and "you don't need it" are utterly
different propositions, and too many people in too many areas of life fail
to realise that.

> Another argument might be that BEAM should be specified in detail in order to be a suitable binary format for distribution,
> which is essentially what the JVM instruction set has become.

I suggested many years ago that Erlang should take a leaf out of Kistler's
book (or PhD thesis). The "Juice" system for Oberon compiled source files
to abstract syntax trees, then cleverly compressed the ASTs and used them
as the binary distribution form. They came in smaller than .class files
and had no presuppositions about the target hardware (not even primitive
size and alignment if I recall correctly). The cost of decompressing and
generating native code was low, to the point where it was faster to
dynamically load Juice files than their equivalent of .so/.dll files, and
the generated code actually ran faster because the code generator knew
more about the environment of the target, including existing code. (I
don't know if the Juice runtime did cross-module inlining, but it would
have been possible.)

Michael Turner

unread,

May 8, 2012, 1:35:01 AM5/8/12

to Richard O'Keefe, Erlang

"(b) We learn only at the end of those 5 pages that there are
really TWO different things called BEAM:
- the 'high level abstract instructions' generated
by the compiler, which is pretty stable, and
- the 'internal form' meant for execution, generated
by the loader, which 'has changed many times'."

Then there's the recommendation I read somewhere (I believe by a HiPE
implementor) that it makes more sense to target Core Erlang. So, in a
way, there are three levels you could target, two of them plausibly
called "BEAM".

I got interested in this issue when I noticed that ECLiPSe Prolog
*used* to have working multicore parallelism, but then ... not. Wow:
one of the better constraint programming packages, and it's not
multicore. That seems like an opportunity going to waste. I also
noticed that there's a Prolog implementation in Erlang.

ECLiPSe provides a lot of power. It seems to be languishing anyway.
ECLiPSe-over-BEAM might solve a lot of problems of viability for
ECLiPSe, by reducing the workload on its maintainers while potentially
increasing its user base. And I'm sure Ericsson could find uses for
ECLiPSe in telecom network design and optimization, which (if there
were ECLiPSe-over-BEAM) could further increase Erlang's value within
the company.

Just to give one possible example.

-michael turner

Richard O'Keefe

unread,

May 8, 2012, 2:05:12 AM5/8/12

to Michael Turner, Erlang

On 8/05/2012, at 5:35 PM, Michael Turner wrote:

> "(b) We learn only at the end of those 5 pages that there are
> really TWO different things called BEAM:
> - the 'high level abstract instructions' generated
> by the compiler, which is pretty stable, and
> - the 'internal form' meant for execution, generated
> by the loader, which 'has changed many times'."
>
> Then there's the recommendation I read somewhere (I believe by a HiPE
> implementor) that it makes more sense to target Core Erlang. So, in a
> way, there are three levels you could target, two of them plausibly
> called "BEAM".

If you want to compile an extended syntax into something that is
already supported in Core Erlang, no worries. For example,
implementing list comprehensions with out-of-line code, as
currently done, could be done that way.

If you want to compile list comprehensions into inline code,
I believe it can be done using high-level BEAM, but you can't
do it through Core Erlang.

If you want to compile something like frames, you need to
- extend HiPE if you want the new feature to go fast *and*
- extend low level BEAM *and*
- extend high level BEAM *and*
- extend Core Erlang *and*
- extend source Erlang (probably the easiest step) *and*
- extend the AST form (not too hard) *and*
- extend the tools that process the AST form *and*
- last but very much not least, you need to
extend the documentation.

The more levels you have the harder it gets, unless the design
is very modular. Here I will say that Quintus never really
got this as organised as they should have. Adding a new
instruction required editing upwards of a dozen files.

The nearest I've seen to a well-structured emulator was the
Icon one, where the various C files that were used were generated
from an annotated master.

Jonathan Coveney

unread,

May 8, 2012, 2:52:19 PM5/8/12

to Richard O'Keefe, Erlang

I think that any time you find yourself justifying not documenting things, it's probably a mistake :) I know it'd be a lot of someone's time (I read the pdf in dropbox and found it very helpful, and think that a document like that that covered more of the opcodes in the context of more complicated functions would be extremely helpful). I agree that the External BEAM is probably the thing to focus on.

I think that more documentation around this would be good in many respects...

1. Documenting these sorts of things frames understanding for others, which can lead to more eyes on the implementation, which is always good

2. It'll make it easier for people to write custom VM's, which is only a good thing for Erlang. For the JVM, for example, this is a huge benefit and as Java the language dies, the ecosystem of budding JVM languages will no doubt go strong

3. It'll make it easier for people to compile things to BEAM, which only goes to show that Erlang's underbelly is general purpose and useful for building robust, fault-tolerant software

I realize that it'd be a lot of someone's time, but I want to say that it'd probably be a really good endeavor for the community. I think there are a lot of smart people that aren't full time VM or language architects who still can probably do interesting things if they have some guidance on what is going on with BEAM.

Jon

2012/5/7 Richard O'Keefe <o...@cs.otago.ac.nz>

August Schwartzwald

unread,

May 8, 2012, 3:46:25 PM5/8/12

to erlang-q...@erlang.org

I've also been looking for information about how Erlang works internally
for some time now and I found this document very helpful. I'm looking
forward to the complete version whenever it is done.

If you want comments/feedback on the document I can gladly help with that
too.

Richard O'Keefe

unread,

May 8, 2012, 7:58:44 PM5/8/12

to Jonathan Coveney, Erlang

Let me illustrate the Icon approach by showing you a fragment of the
micro-BEAM I wrote to get the performance numbers in the frames proposal.
(The whole thing is fragmentary.)

...
@i
max src, snd, dst
@d
dst := max(src, snd)

This computes the maximum using the micro-Erlang term ordering.
If src and snd are tagged immediate integers the comparison is
done inline; the compare() function is called otherwise.
@c
T = @src;
U = @snd;
@dst = cmp(T, >, U) ? T : U;
@step;
@e
...
@i
check_record src, size, const
@d
Type test.

Fail unless src is tagged as a pointer to a tuple or frame,
the first word it points to is size, and the second is the
const (which must be an atom, but we don't check that).
Used for record matching.
@c
T = @src;
if (!is_tuple(T)) @fail "is_record"; else
if (FIELD(T, TUP_TAG, 0) != @size) @fail "is_record"; else
if (FIELD(T, TUP_TAG, 1) != @const) @fail "is_record"; else
@step;
@e
...
There is a preprocessor written in AWK that turns this into
several C files. One of them is the emulator cases. For
the check_record instruction you get

#line 75 "frame.master"
case CHECK_RECORD:
#line 76 "frame.master"
T = reg[(int)P[1]];
#line 77 "frame.master"
if (!is_tuple(T))
P = failure, operation = "is_record"; else
#line 78 "frame.master"
if (FIELD(T, TUP_TAG, 0) != 4)
P = failure, operation = "is_record"; else
#line 79 "frame.master"
if (FIELD(T, TUP_TAG, 1) != P[3])
P = failure, operation = "is_record"; else
#line 80 "frame.master"
P += 4;
break;

where I've broken the long lines (the preprocessor doesn't).
The #line directives are option.

@i introduces an instruction; the next line is a template
for it saying what the operands are.
@d introduces the description for people.
@c introduces the code. In it, various built-in @macros
are expanded.

One advantage of doing it this way is that by using
@step to update the PC I *cannot* get the offset wrong;
the preprocessor counted the operands and their sizes
for me. Similarly, what I write has *no* operand numbers;
the preprocessor counted those, and supplies all necessary
casts as well. I can shuffle operands around (in @i)
without revising the code (in @c), and have.

It wouldn't be too hard to write another preprocessor that
built some kind of documentation (HTML would probably be
easiest) out of this, but since this was an experiment,
it didn't seem worth while.

Why did I write the preprocessor?
Well, to be honest, the first draft didn't use one.
I got a bit sick of debugging, and wrote the preprocessor
(based on vague memories of Icon) to eliminate a class of
errors. It turned out to be _easier_ to develop a
documented emulator than an undocumented one.

Jeff Schultz

unread,

May 8, 2012, 8:21:08 PM5/8/12

to Erlang

On Tue, May 08, 2012 at 11:52:19AM -0700, Jonathan Coveney wrote:

[plain, good sense elided]

> I think that more documentation around this would be good in many
> respects...

> 1. Documenting these sorts of things frames understanding for others, which
> can lead to more eyes on the implementation, which is always good
> 2. It'll make it easier for people to write custom VM's, which is only a
> good thing for Erlang. For the JVM, for example, this is a huge benefit and
> as Java the language dies, the ecosystem of budding JVM languages will no
> doubt go strong
> 3. It'll make it easier for people to compile things to BEAM, which only
> goes to show that Erlang's underbelly is general purpose and useful for
> building robust, fault-tolerant software

And to add one more:

4. Documenting, and publishing, your internals makes them
patent-system visible prior art. This protects you from some
Johnny-come-lately company, or individual, who patents something
you've been using for many years. Source code, even publicly
available source code like Erlang, is not generally valid prior art in
patent terms. Just because you've been doing, say, memory management,
one particular way for a decade, won't block a patent covering exactly
that.

And yes, it does happen.

Jeff Schultz

Thomas Lindgren

unread,

May 9, 2012, 2:03:27 PM5/9/12

to Richard O'Keefe, erlang-q...@erlang.org

----- Original Message -----
> From: Richard O'Keefe <o...@cs.otago.ac.nz>
> To: Thomas Lindgren <thomasl...@yahoo.com>
> Cc: Michael Turner <michael.eu...@gmail.com>; "erlang-q...@erlang.org" <erlang-q...@erlang.org>
> Sent: Tuesday, May 8, 2012 3:07 AM
> Subject: Re: [erlang-questions] Is there a good source for documentation on BEAM?
>
>

> On 8/05/2012, at 7:15 AM, Thomas Lindgren wrote:
>> There has been a substantial number of non-BEAM Erlang implementations
> already, so I'm
>> not convinced detailed BEAM docs is the key property* to spread Erlang.
>
> And how many of those non-BEAM implementations still exist?
> Does GERL? Is E2S still maintained? How much of OTP can it handle?

This, to my mind, says more about the (lack of) need for a second source implementation than any inherent
problems with learning BEAM. If you want to try your hand, quite a bit of the complexity is not in handling BEAM
as such but in reimplementing ERTS: writing the BIFs, SMP, memory management, etc.

>> Indeed, requiring detailed docs of every change of BEAM seems likely to

> slow innovation down instead.
>
> I not only *don't* believe that, I *can't* believe that.
> Joe has informed us that there are TWO levels of BEAM,
> one of which has been very stable, and one of which has
> changed many times.
>
> I don't even believe your claim if made about the low level
> much changed "BEAM", but let's suppose it true for the sake
> of argument. If the high level of BEAM has remained pretty
> stable for quite a while, how would documenting it have
> slowed innovation down?

Note that BEAM files are not guaranteed to be compatible across releases, and they do change incompatibly
every now and then. (Not very often, to be sure. I recall it happening twice.) Check the mailing list for some discussions.

The "sub-BEAM" implementation can change more rapidly, of course. I assume implementors there can do
platform specific things like inline expanding instructions into native, mapping VM registers to native registers,
constructing superinstructions, etc. (I seem to recall all of these being tried at one time or another.)

As for what I see would cause a slowdown: the attention of the key hackers would be spent on writing this
documentation (and then maintaining it, I assume). Perhaps people will start depending on documented details
of implementation, explicitly or implicitly. Major changes would also mean major internal docs rewrites.

See below for one option.

> ... [pace of innovation, see below on kickstarter for my comment]

>>

>> If the motive is education, I think someone interested in compilers and
> virtual machine architectures
>> would have little trouble with BEAM as such.
>
> I have an interest in compilers and VMs. I worked professionally on Quintus
> Prolog and the real WAM (not the one in the papers or Aït-Kaci's book). And
> trying to figure out the BEAM was such a slog that to be honest, I said to
> myself "the hell with it, if they don't *WANT* me to understand the
> BEAM,
> I'm not going to waste any more of my time trying to penetrate the
> obscurity".

At that level of knowledge, I assume the BEAM instruction set in itself is no big hurdle.
If you want to learn the internals beyond that, what level of detail are you looking for?

>> In a real sense, BEAM is just a vehicle to express compiler optimizations
> for a
>> restricted part of ERTS (the sequential execution part, basically).
>
> No, compiler optimisations are expressed in the executable code of the
> compiler. BEAM lets you express the *results* of such optimisations,
> which is a different thing. It's just like the Quintus compiler: I could
> figure out in that case what the *results* were, but the actual process
> remained obscure. (More precisely, what the 'invariants' were.)

Here is how I see it: The instruction set of BEAM has been chosen for the purpose of expressing, and then used to express, various optimizations.
Consider a simple example: targeting BEAM vs JAM (a stack machine used previously to implement erlang).
In order to optimize register use on JAM, you first have to translate it to a new intermediate language (and then probably never
try to translate it back to JAM), while BEAM (like its uncle WAM) expresses registers explicitly and so makes such optimizations straightforward.

> ...

> Yes, I'm shouting. "We don't need it" and "you don't
> need it" are utterly
> different propositions, and too many people in too many areas of life fail
> to realise that.

(To avoid any confusion, let me add that I last worked at Ericsson CSLAB in 1998. So I'm hardly an OTP insider.)

So perhaps the right approach is to do a kickstarter to fund someone writing a deep dive Erlang/OTP internals book?
Complexity: roughly the level of writing a Linux kernel book, at a quick guess. Perhaps a bit easier.

>> Another argument might be that BEAM should be specified in detail in order
> to be a suitable binary format for distribution,
>> which is essentially what the JVM instruction set has become.
>
> I suggested many years ago that Erlang should take a leaf out of Kistler's
> book (or PhD thesis). The "Juice" system for Oberon compiled source
> files
> to abstract syntax trees, then cleverly compressed the ASTs and used them
> as the binary distribution form. They came in smaller than .class files
> and had no presuppositions about the target hardware (not even primitive
> size and alignment if I recall correctly). The cost of decompressing and
> generating native code was low, to the point where it was faster to
> dynamically load Juice files than their equivalent of .so/.dll files, and
> the generated code actually ran faster because the code generator knew
> more about the environment of the target, including existing code. (I
> don't know if the Juice runtime did cross-module inlining, but it would
> have been possible.)

Not a bad idea.

Best regards,
Thomas

Thomas Lindgren

unread,

May 9, 2012, 2:07:25 PM5/9/12

to Jeff Schultz, Erlang

----- Original Message -----
> From: Jeff Schultz <j...@csse.unimelb.edu.au>
>...
> 4. Documenting, and publishing, your internals makes them
> patent-system visible prior art. This protects you from some
> Johnny-come-lately company, or individual, who patents something
> you've been using for many years. Source code, even publicly
> available source code like Erlang, is not generally valid prior art in
> patent terms. Just because you've been doing, say, memory management,
> one particular way for a decade, won't block a patent covering exactly
> that.
>
> And yes, it does happen.

Not to open a can of worms, but perhaps OTP should spend their time patenting their
technology instead in that case? I'm sure Ericsson AB would like that, since they then
may be able to crush their competitors into a finer pulp.

Best regards,
Thomas

Richard O'Keefe

unread,

May 10, 2012, 2:10:24 AM5/10/12

to Thomas Lindgren, erlang-q...@erlang.org

(1) We were told that BEAM documentation isn't needed because
there are other Erlang implementations.
(2) I ask whether any of those other implementations ever kept
up with The Real Thing. (By the way, as far as I know,
none of them ever supported bit syntax, and my recent
attempt to install GERL failed miserably.)
(3) Suddenly we are told that the abandoning of those other
things just *proves* that we don't need BEAM documentation.

?

>
>
>>> Indeed, requiring detailed docs of every change of BEAM seems likely to
>> slow innovation down instead.
>>

[Me]

>> I not only *don't* believe that, I *can't* believe that.
>> Joe has informed us that there are TWO levels of BEAM,
>> one of which has been very stable, and one of which has
>> changed many times.
>>
>> I don't even believe your claim if made about the low level
>> much changed "BEAM", but let's suppose it true for the sake
>> of argument. If the high level of BEAM has remained pretty
>> stable for quite a while, how would documenting it have
>> slowed innovation down?
>

[Thomas Lindgren]

> Note that BEAM files are not guaranteed to be compatible across releases, and they do change incompatibly
> every now and then. (Not very often, to be sure. I recall it happening twice.) Check the mailing list for some discussions.

If BEAM were completely stable, it might be reasonable to expect
anyone who cares to figure out BEAM for themselves, once and for
all. The more it changes, THE MORE IT NEEDS TO BE DOCUMENTED.

I repeat my claim about the fragmentary emulator I wrote:
the better the documentation was, the *FASTER* I could write
and rewrite it. Having the tables needed for the disassembler
(which exists) and the assembler (which doesn't yet, but will)
automatically generated from the same file that the emulator
switch is generated from means they are consistent *all the time*
without me having to check. Switching a numeric operand from
scaled (use @size) to unscaled (use @number) or back with the
instruction description where the preprocessor can see it means
that an incomplete edit (changing some occurrences but not
others) will be caught *before the C compiler is run* let alone
before run time.

>
> As for what I see would cause a slowdown: the attention of the key hackers would be spent on writing this
> documentation (and then maintaining it, I assume).

In my fragmentary emulator, there are roughly equal lines of
documentation and code, except that thanks to the preprocessor,
the lines of code are simpler and more often correct than they
would have been without. *This* hacker, at least, found it
took *less* time to write documentation+code than to just write
code.

It's not hugely detailed documentation, but whether operands are
raw or tagged, whether a tag check is *intentionally* omitted,
whether some other instruction is expected to set up a context
in some register, the name of a related instruction where the
details can be found, sometimes what possibly surprising source
forms an instruction was meant for. It DOESN'T have to be huge
benefit, and I would expect 'key hackers' to be doing it for
their *own* sake, never mind anyone else's.

I can understand the documentation being stripped out before
release; what I'm having trouble with is the idea of it never
having existed, and the idea that not having it makes life in
some unimaginable way easier for the developers.

> Perhaps people will start depending on documented details
> of implementation, explicitly or implicitly. Major changes would also mean major internal docs rewrites.

Well, yes, BUT a preprocessor that helps you get the original code
right will also make it easier to make major changes right.

>
> At that level of knowledge, I assume the BEAM instruction set in itself is no big hurdle.
> If you want to learn the internals beyond that, what level of detail are you looking for?

Getting a *rough* idea of the compiler's output is no big deal.
Understanding it well enough to generate correct code myself *is*.

For one example of the kind of understanding you can get from
documentation, I was puzzled because I couldn't see the instructions
I knew must be there to nil out X registers that ceased to be live.
Joe's little document made it clear that
(a) there are a lot more X registers allowed in Erlang that in
Quintus Prolog;
(b) maintaining them is more expensive than in Quintus Prolog;
(c) the nilling instructions I expected don't exist;
(d) there is a (temporary) space leak: if register K is live
at an allocation point, all registers <= K are assumed to
be live.
With hindsight, I can now see (most of) that in the .S files.
But it really wasn't obvious.

One important detail is the layout of Erlang stack frames.

Michael Turner

unread,

May 10, 2012, 2:22:38 AM5/10/12

to Thomas Lindgren, erlang-q...@erlang.org

"As for what I see would cause a slowdown: the attention of the key
hackers would be spent on writing this
documentation (and then maintaining it, I assume)."

Perhaps better: volunteers could document it (on a relatively
controlled wiki, for example). Then the "key hackers" could mention
any needed corrections.

As for maintenance, you say yourself (in a later e-mail) that you can
only remember significant changes happening twice. Documenting such
infrequent changes doesn't exactly sound like some grinding daily
burden for already-overworked Ericsson programmers. If they have to
propose these changes in writing anyway (at least in internal e-mail),
sounds like most of the documentation work gets done before the
changes are made.

-michael turner

Tim Watson

unread,

May 10, 2012, 3:38:00 AM5/10/12

to Thomas Lindgren, erlang-q...@erlang.org

So perhaps the right approach is to do a kickstarter to fund someone writing a deep dive Erlang/OTP internals book?
Complexity: roughly the level of writing a Linux kernel book, at a quick guess. Perhaps a bit easier.

That would be a vital spot on every erlang programmer's bookshelf for sure.

Joe Armstrong

unread,

May 10, 2012, 3:59:04 AM5/10/12

to Michael Turner, erlang-q...@erlang.org

What a long discussion ...

I have a few comments.

I have started writing a 2'nd edition of "Programming Erlang" - I have a
dumping ground for potential chapters in books that
one-day-some-time-if-i-get-time
I might write. One of these in called "beam".

A question was asked about how the beam worked - so I thought it's bit silly me
sitting on this - I'll post it off since it might help.

I don't actually know how the Beam works - I have a vague idea - but
fortunately I
can just wander down the corridor and ask Björn who does know how it works.
I also have a strong aversion to reading code - I like to know how
stuff is supposed to work
and not reading the code to find out.

(I hate reading code - as soon as I read code - I get sidetracked by
wondering "why was it written this way"
and often get a strong desire to rewrite it - I once wondered how PDF
worked and that was a complete
disaster - 3 months down the drain and ErlGuten was the result - and
all I really wanted
to do was figure out why the kerning in an open office slide was
manifestly wrong.
Any sensible person would have said "don't ask")

I know of only two people who have figured out how the beam works
*without* asking Björn
that was Fredrik Svan and Kresten Krab Thorup and I am deeply
impressed that they
managed to do this. I asked both Fredrik and Kresten how they did this
- they both said
"I reverse engineered the code" - they both had good reasons Fredrik
made a javascript
Erlang in the browser thing, and Kresten made Erjang.

Now there are two levels at which one could describe the Beam - level one
is the relationship between erlang code and the beam instructions -
this is what I described.
Actually running erl -S and guesswork gets you pretty far - this what
I did - I only had
to play my "ask Björn card" a few times (there's some stuff about
marking which registers
have to be garbaged, which is not guessable).

At this level of abstraction we can completely ignore memory management, most of
garbage collection, how process stacks and heaps are organised, how multicores
are locked etc.

To describe the next level - we suddenly jump from a one chapter
description to a
entire book. This is a book that is tricky to write - I guess no one
person knows
all the answers. it's also a book that few (I suspect) would read.
Fredrik and Krestin
didn't have to understand much of the beam memory managment. Fredrik used
whatever GC and object representation javascript uses and Krestin used
whatever the JVM did - so it wasn't really relevant.

Realistically the only thing that might get written is a piffed up version of
what I've distributed - but I would be reluctant to include it in the next
edition of Programming Erlang, I can use the space for content
with wider appeal.

I'll try and make a better version of what I've distributed and
put it up on the main web site... but this is a low priority task

Cheers

/Joe

Björn Gustavsson

unread,

May 10, 2012, 4:28:24 AM5/10/12

to Richard O'Keefe, erlang-q...@erlang.org

On Thu, May 10, 2012 at 8:10 AM, Richard O'Keefe <o...@cs.otago.ac.nz> wrote:
[...]

> Joe's little document made it clear that
> (a) there are a lot more X registers allowed in Erlang that in
> Quintus Prolog;
> (b) maintaining them is more expensive than in Quintus Prolog;
> (c) the nilling instructions I expected don't exist;
> (d) there is a (temporary) space leak: if register K is live
> at an allocation point, all registers <= K are assumed to
> be live.

There is not a temporary space leak in practice, because the compiler
will insert an instruction that will clear (set to NIL) each dead register
below K before any allocation point.

--
Björn Gustavsson, Erlang/OTP, Ericsson AB

Vlad Dumitrescu

unread,

May 10, 2012, 4:28:28 AM5/10/12

to Joe Armstrong, erlang-q...@erlang.org

Hi,

On Thu, May 10, 2012 at 9:59 AM, Joe Armstrong <erl...@gmail.com> wrote:
> Now there are two levels at which one could describe the Beam - level one
> is the relationship between erlang code and the beam instructions -
> this is what I described.
>

> To describe the next level - we suddenly jump from a one chapter
> description to a
> entire book. This is a book that is tricky to write - I guess no one
> person knows
> all the answers.

This is not entirely on topic, but close enough (and probably only
Björn could know the answer, I have a feeling that reading the code
would require many years of study :-)): I have been wondering how much
is the runtime tied up to the BEAM instruction set.

More precisely, would it be possible (at least conceptually) to
separate the scheduler + process manager + message passing
infrastructure from the instruction executor?

One common thing is the internal data format (because the runtime
creates for example exit messages) so there are restrictions imposed
by that, but it would be kind of cool to be able to plug in an
interpreter for a different instruction set, in case one wants to use
something else than Erlang and this other thing can't be compiled to
Erlang, Erlang core or BEAM.

best regards,
Vlad

Matthias Lang

unread,

May 10, 2012, 2:15:46 PM5/10/12

to Jonathan Coveney, erlang-q...@erlang.org

On Monday, May 07, Jonathan Coveney wrote:

> This question seems to come up now and again, and it's surprising to me
> that a crucial part of the documentation isn't better documented.

Of historical interest only:

http://www.cs-lab.org/historical_beam_instruction_set.html

Then there's this partial beam emulator in Erlang which gives some insight:

https://github.com/tonyrog/beam

> always ask the guys at Erjang how they went about it,

Please do. AFAICT he's succeeded very well, so his opinion on how much
documentation would have helped would be worth hearing.

Matt

Erik Søe Sørensen

unread,

May 10, 2012, 4:05:12 PM5/10/12

to Joe Armstrong, Erlang

I began writing about the instruction set and possibly binary format at some point, but got sidetracked by trying to explain "why isn't this documented, and is that a good or a bad thing?"

(I have a bit of text and graphics, though, which perhaps I ought to publish.)

One interesting thing came out of that effort, though: I noticed that

- there is no tail-call version of the call_fun (function object application) instruction

- even so, TCO still works in beam as it should

- the reason for this is that a tail-call version exists in the *internal* version of the instruction set, which is introduced by rewriting the sequence "call_fun; deallocate; return" (iirc)

- Erjang had, of course, missed this the first time around, so I corrected it once I'd found out.

I've been wondering whether this special case was caught in your description of beam...

Of course, what I should have been wondering is whether the JS interpreter had got it right :-)

That kind of irregularity is probably not unrelated to the unpublishedness of the format.

On the other hand, this is one gotcha that affects not only would-be producers of beam code (which is probably the harder part to get right, when you don't know which invariants you have to maintain in the produced code), but it also affects beam consumers (which is in some ways easier, at least when you can ignore everything GC-related as it was the case with Erjang) which have to do the same kind of rewrite.

2012/5/7 Joe Armstrong <erl...@gmail.com>

---------- Forwarded message ----------
From: Joe Armstrong <erl...@gmail.com>

Date: Mon, May 7, 2012 at 10:46 AM
Subject: Re: [erlang-questions] Is there a good source for
documentation on BEAM?

Richard O'Keefe

unread,

May 10, 2012, 6:37:32 PM5/10/12

to Michael Turner, erlang-q...@erlang.org

On 10/05/2012, at 6:22 PM, Michael Turner wrote:

> "As for what I see would cause a slowdown: the attention of the key
> hackers would be spent on writing this
> documentation (and then maintaining it, I assume)."
>
> Perhaps better: volunteers could document it (on a relatively
> controlled wiki, for example). Then the "key hackers" could mention
> any needed corrections.

This is certainly possible, and it would be a lot better than nothing.
The problem is of course that it goes backwards:
volunteers can document what *is* there, not what was *meant* to
be there, and cannot document why certain things that *aren't*
there shouldn't be.

Peitho: Here you are, Socrates, your very own orrery.
Socrates: But Peitho, where is the description of how it works?
Peitho: When you've figured it out, why don't you write that?
Socrates: [censored]

Richard O'Keefe

unread,

May 10, 2012, 6:42:01 PM5/10/12

to Björn Gustavsson, erlang-q...@erlang.org

On 10/05/2012, at 8:28 PM, Björn Gustavsson wrote:

> On Thu, May 10, 2012 at 8:10 AM, Richard O'Keefe <o...@cs.otago.ac.nz> wrote:
> [...]
>> Joe's little document made it clear that
>> (a) there are a lot more X registers allowed in Erlang that in
>> Quintus Prolog;
>> (b) maintaining them is more expensive than in Quintus Prolog;
>> (c) the nilling instructions I expected don't exist;
>> (d) there is a (temporary) space leak: if register K is live
>> at an allocation point, all registers <= K are assumed to
>> be live.
>
> There is not a temporary space leak in practice, because the compiler
> will insert an instruction that will clear (set to NIL) each dead register
> below K before any allocation point.

And *THAT* is precisely the kind of BEAM documentation that
volunteers will not be able to reconstruct without a disproportionate
amount of effort:

This operand gives the number of the highest live
X register, so that if a garbage collection is needed,
the collector knows which registers to trace from.
If any X registers with smaller numbers are dead at
this point, the compiler MUST ensure that they contain
immediate values, by nilling them if necessary.

Anthony Ramine

unread,

May 14, 2012, 9:02:17 AM5/14/12

to Richard O'Keefe, Erlang

Couldn't some of the bootstrap Perl scripts like beam_makeops and make_tables be
rewritten and documented in Erlang? I think it would make things more obvious if
they were not obscure Perl scripts without comments. Furthermore it would make
Erlang/OTP eat more of its own dog food.

The only thing that would need to be changed with regard to the bootstrap itself
is that their output would have to be versioned just as the erts/preloaded/ BEAM
files. A new command should also be added to otp_build to update them.

Some of these generated files are:

beam_tr_funcs.h
beam_pred_funcs.h
beam_hot.h
beam_cold.h
beam_opcodes.c
beam_opcodes.h
beam_opcodes.erl
beam_opcodes.hrl
erl_am.c
erl_bif_table.c
erl_bif_wrap.c
erl_pbifs.c
erl_atom_table.h
erl_bif_table.h

There may be an obvious reason for them not to be generated by Erlang itself but
I'm not aware of it.

Regards.

--
Anthony Ramine

Bengt Kleberg

unread,

May 14, 2012, 9:08:12 AM5/14/12

to Erlang

Greetings,

Perhaps there is a chicken-and-egg problem with requiring Erlang to
generate files used to build Erlang?

bengt

Anthony Ramine

unread,

May 14, 2012, 9:15:44 AM5/14/12

to bengt....@ericsson.com, Erlang

There is already a chicken-and-egg problem, that's why there are some BEAM files
in the Git repository, look in erts/preloaded/ebin.

Nothing prevents us from generating these C files from some Erlang code and
versioning them in Git too. This way the system would be bootstrapped from the
previously generated files in the repository.

--
Anthony Ramine

Björn-Egil Dahlberg

unread,

May 14, 2012, 11:54:37 AM5/14/12

to Anthony Ramine, Erlang

It could probably be done. That does not mean it should be done.

The preloaded files during development are a pain as it is. Please do not add to it =D

// Björn-Egil

2012/5/14 Anthony Ramine <n.o...@gmail.com>

Kenneth Lundin

unread,

May 21, 2012, 9:27:52 AM5/21/12

to Tom Parker, erlang-q...@erlang.org

The reason for not having documentation of the BEAM instructions is just a matter of prioritization nothing else.

Given a limited amount of resources we have not focused much on the internal design documentation, since we have

enough designers knowing this.

I don't think a documentation of these things is so far away now (but no promise).

I also want to comment regarding stability and compatibility of the .beam file format and thus instruction set.

We always keep backwards compatibility with 2 major releases. This means that for example an R15B based system

can load and run .beam files produced with an R13B based system. In practice this means 3-4 years of compatibility,

The other way around, i.e loading .beam files produced with R15B on an older system is not supported.

/Kenneth Erlang/OTP, Ericsson

Peer Stritzinger

unread,

May 23, 2012, 5:51:57 AM5/23/12

to Kenneth Lundin, erlang-q...@erlang.org

The sparse documentation of Erlang internals can be seen as a
opportunity also. I'm not sure if this needs to be done by the OTP
team at Ericsson.

The first opportunity is for a 1-3 day tutorial about Erlang internals
during the Erlang conferences (probably as part of Erlang University).
I would book a training like this as soon as it is offered alongside a
conference I can attend (Are you listening Erlang Solutions ?)

The second opportunity is for a book author. Perfect would be a book
about Erlang internals like TCP/IP Illustrated Vol 2. The
Implementation (http://www.amazon.com/TCP-IP-Illustrated-Vol-Implementation/dp/020163354X)
where basically every line of code of the BSD TCP/IP implementation is
explained.

Probably a more realistic goal would be something along the lines of
the Design and Implementation of the *BSD Operating System e.g.:
(http://www.amazon.com/Implementation-Operating-paperback-Addison-Wesley-Systems/dp/0132317923,
http://www.amazon.com/Design-Implementation-FreeBSD-Operating-System/dp/0201702452)

I think the persons who are teaching the tutorial/writing the book
need not necessarily be members of the OTP team but should rather be
great teachers/writers. The would need good access to the people that
have the internals knowledge however.

With this we can have Erlang internals knowledge (I'm not only talking
about beam) more spread and the OTP team can concentrate on making
their great new releases.

Cheers
Peer Stritzinger

> enough designers knowing this.
>
> I don't think a documentation of these things is so far away now (but no
> promise).
>
> I also want to comment regarding stability and compatibility of the .beam
> file format and thus instruction set.
> We always keep backwards compatibility with 2 major releases. This means
> that for example an R15B based system
> can load and run .beam files produced with an R13B based system. In practice
> this means 3-4 years of compatibility,
> The other way around, i.e loading .beam files produced with R15B on an older
> system is not supported.
>
> /Kenneth Erlang/OTP, Ericsson
>
>
>

Thomas Lindgren

unread,

May 28, 2012, 11:25:54 AM5/28/12

to Richard O'Keefe, erlang-q...@erlang.org

----- Original Message -----
> From: Richard O'Keefe <o...@cs.otago.ac.nz>
> To: Thomas Lindgren <thomasl...@yahoo.com>
> Cc: Michael Turner <michael.eu...@gmail.com>; "erlang-q...@erlang.org" <erlang-q...@erlang.org>
> Sent: Thursday, May 10, 2012 8:10 AM
> Subject: Re: [erlang-questions] Is there a good source for documentation on BEAM?
>

>( 1) We were told that BEAM documentation isn't needed because

> there are other Erlang implementations.
> (2) I ask whether any of those other implementations ever kept
> up with The Real Thing. (By the way, as far as I know,
> none of them ever supported bit syntax, and my recent
> attempt to install GERL failed miserably.)
> (3) Suddenly we are told that the abandoning of those other
> things just *proves* that we don't need BEAM documentation.
> ?

It looks like this discussion has terminated, and I think I've made whatever points I wanted to be made so I'll leave it at that. But since this seemed to be unclear, let me recapitulate:

BEAM docs are not needed to produce a second source implementation, as shown by several examples. (1)

Also, there has so far been little practical interest shown by Erlang users in such a second source. So implementation efforts may be in vain. (3)

My personal view, at least, is that most of the difficulty in "keeping up with The Real Thing", Erlang/OTP, would be not in reproducing BEAM but in writing a fully compatible implementation tracking the rest of the runtime, ERTS. (2)

So, is there a _practical_ case for doing these docs? In particular, will the effort result in useful external contributions that outweigh time spent? Not at all clear to me.

Hope it helps.

Best regards,
Thomas

Richard O'Keefe

unread,

May 28, 2012, 6:08:45 PM5/28/12

to Thomas Lindgren, erlang-q...@erlang.org

On 29/05/2012, at 3:25 AM, Thomas Lindgren wrote:
> BEAM docs are not needed to produce a second source implementation, as shown by several examples. (1)

People were not asking for BEAM documentation in order to produce
a second source implementation. That's a red herring. People
ask for BEAM documentation in order to understand and perhaps
revise the main implementation.

One thing that Open Source is about is about community contributions
and it is awfully hard to contribute to something that is
under-documented.

>
> Also, there has so far been little practical interest shown by Erlang users in such a second source. So implementation efforts may be in vain. (3)

Again, the red herring. This is *NOT* about second implementations.
All of the additional implementations I've ever heard of used their
own back end, *not* BEAM, and would *not* have received any great
benefit from BEAM being documented.

There is detailed documentation available for Lua and Icon and the
WAM and several other VMs around, so there's not even any great
advantage in learning about VMs from BEAM.

> My personal view, at least, is that most of the difficulty in "keeping up with The Real Thing", Erlang/OTP, would be not in reproducing BEAM but in writing a fully compatible implementation tracking the rest of the runtime, ERTS. (2)

Well, not _just_ that, but that's a big part of it.

> So, is there a _practical_ case for doing these docs? In particular, will the effort result in useful external contributions that outweigh time spent? Not at all clear to me.

Bearing in mind that "BEAM documentation" and "second system implementation"
are about as irrelevant to each other as any two topics about the same
language can be, what _might_ we realistically expect from BEAM documentation?

(1) There are now several languages implemented on top of Erlang.
Some are interpreted in Erlang, some are compiled to Erlang ASTs,
and some are compiled to BEAM. Compiling to the BEAM is *much*
harder than it needs to be because of the current undocumentation.

This isn't *replacing* Erlang/OTP but *augmenting* it.

(2) I've proposed a number of minor but handy extensions to Erlang
syntax that lack a reference implementation because I have no idea
what the compiler should generate for them; these could use the
existing BEAM.

(3) There are existing things that could be compiled more efficiently.
I'm thinking here in particular of list comprehension. I'm not
sure if the improved translation can be done with the existing
BEAM or if minor extensions would be needed, because the
undocumentation for the BEAM does not make clear any range
limits or other restrictions on BEAM instructions.

(4) There are even bigger changes, like frames, where even estimating
the scope of the changes is hard because of the undocumentation.

But above all you are making an assumption which I utterly reject,
namely that documentation is a COST and ONLY a cost, that producing
documentation provides no DIRECT benefits to the people writing the
documentation.

On the contrary,
- you may find defects as you document
- you may be able to structure the documentation so that
test cases can automatically be extracted
- you may in the very act of explaining a limitation see
how it can be overcome
- you may be able to extract parts of the implementation
automatically from the documentation
and I could go on.

There's a slogan I learned from a business textbook:
find the indispensable man and fire him!
There was something I found unsettling: we were told in this
thread that there wasn't any need for documentation because
the Erlang/OTP maintainers had enough people who _knew_ this
stuff.

One of my colleagues here was managing a software project
once; a key employee went on holiday and was murdered in a
far country. A former student of mine was starting up a
new company with me as a consultant, and being a worse
driver than he thought, drove at speed into a tree. The
tree survived. One of my former colleagues, a very
intelligent and likable guy, was cycling down a Melbourne
street and got knocked over by a hit-and-run driver. The
result was head injury and someone who could dress himself
but couldn't program if his life depended on it. When I
was at Quintus, the founder who wrote the compiler and was
the only person who really understood it quit and was never
heard from again.

Fergus O'Brien supervised an MSc on "Organisational
Forgetting" and the fact of organisational forgetting has
haunted the fringes of my mind ever since.

People die. People get head injuries. People quit.
Documents get lost. (If you ever find an architecture-
of-the-ICL-2900 manual, I'd like to see it.) Even if
things _are_ written down, organisations forget _where_.
If they aren't written down, they WILL be forgotten
sooner or later.

People
Well, my colleague Andrew Trotman once had a key employee

We

Joe Armstrong

unread,

May 30, 2012, 4:45:10 AM5/30/12

to Richard O'Keefe, erlang-q...@erlang.org

Brilliant! and well argued. Thank you Richard. This last argument is
excellent brain-food.

<aside> I like the way many Erlang discussion threads turn into
meta-discussions about the underlying
problems .. great stuff </aside>

So "stuff" should be well documented precisely because one day the creator
of the stuff will get bored or die and the encompassing organisation
will collectively
forget how it works.

I've notice a lot the "some other guy knows this stuff" phenomena -
I've been chasing a particular
problem for months. I always get the "I don't know but X knows" - I
ask X and they refer to X1 etc.
But X1 does not know ... I'm running out of leads. It appears that
*nobody* knows.

Fortunately the BEAM is not yet there. I know who knows - and when I
ask them they *do* know
so I just hope the entire OTP group aren't on the same plane to an
Erlang conference in a far away
land ...

I conclude it should be documented.

The next question is "priority" ... in the real world we juggle
priorities. The case *is* made that
we should document the BEAM but is this more or less important than
implementing frames etc?
As always, a tricky question...

/Joe

>
> People
> Well, my colleague Andrew Trotman once had a key employee

and ... ??

James Hague

unread,

May 30, 2012, 10:04:41 AM5/30/12

to erlang-questions

When there was some documentation long ago, the bulk of it was about
the dozens and dozens of BEAM opcodes, and that's not actually the
important part.

What *really* matters are all the undocumented assumptions in the
system. What are the rules about floating point registers? Which BIFs
can cause garbage collection and how are they treated differently than
non-GC BIFs? When should registers be marked as unused? When are
destructive tuple updates allowed? What inefficient-looking sequences
are improved by the BEAM loader?

I can learn BEAM by disassembling code (anyone who is interested can
can use http://prog21.dadgum.com/127.html as a starting point), but I
can't learn the underlying rules and philosophies of the VM.

BEAM documentation can be as simple as:
* One-line-per-instruction description of opcodes.
* Couple of pages of VM architecture and rules.
* Two HOWTO docs: adding a BIF and adding a VM instruction.
* List of transformations done by the loader.

james

unread,

May 30, 2012, 5:39:22 PM5/30/12

to Joe Armstrong, erlang-q...@erlang.org

> If they aren't written down, they WILL be forgotten
> sooner or later.

I would also like to suggest that the act of writing down a design
forces a degree of rigour that is hard to achieve otherwise. I have
several times thought that I had considered a design from 'every angle',
only to find that when I *write it down*, some of the angles were
imperial and some were metric, and some were just smoke and mirrors.

When something is written down, the handwaving stops and the engineering
starts.

OvermindDL1

unread,

May 30, 2012, 7:01:03 PM5/30/12

to ja...@mansionfamily.plus.com, erlang-q...@erlang.org

On Wed, May 30, 2012 at 3:39 PM, james <ja...@mansionfamily.plus.com> wrote:
> I would also like to suggest that the act of writing down a design forces a
> degree of rigour that is hard to achieve otherwise. I have several times
> thought that I had considered a design from 'every angle', only to find that
> when I *write it down*, some of the angles were imperial and some were
> metric, and some were just smoke and mirrors.
>
> When something is written down, the handwaving stops and the engineering
> starts.

I *completely* agree with this. At my work we *have* to write down
our systems, the API's, interfaces, backend, everything, as we code
it. It really does help to solidify how the system should work, helps
us to consider angles that we would not have considered before. Even
if we never use the documentation again, the fact that we are forced
to write it in *detail* and keep it up to date really does have a
great boon on the quality of code. If you do not find problems in
your code when writing documentation, then likely your documentation
is not detailed enough.

Stu Bailey

unread,

May 30, 2012, 7:42:22 PM5/30/12

to Joe Armstrong, erlang-q...@erlang.org

Excellent! This should be codified into Armstrong's Law of Technology
Obfuscation!

On Mon, May 7, 2012 at 4:47 AM, Joe Armstrong <erl...@gmail.com> wrote:
> I think it works like this:
>
> 1) first you don't understand how the X works (X=Beam, JVM, X11,
> ... you name it)
> 2) You struggle - and think - google and have a hot bath
> 3) Eureka - bath flows over
> 4) Now you can understand it - and you can also remember why you
> could not understand it
> 5) Now it's easy you understand it
> 6) You see no reason to document it since it's obvious
>
> Round about 4) there is a small window of opportunity to explain to
> other people how it works.
> Once you get to 6) it's very difficult to remember what it felt like
> at point 2) and consequently difficult
> to write decent documentation.
>
> /Joe
>
>
> On Mon, May 7, 2012 at 11:27 AM, Michael Turner
> <michael.eu...@gmail.com> wrote:
>> "Actually, I don't think such docs are all _that_ crucial -- who
>> really needs to know, except a small number of VM implementors?"
>>
>> Aren't Erlang's chances of greater mindshare improved by making it
>> easier to become a VM implementor? I doubt very much that Java would
>> be where it is today had it not been for clear VM specification.
>> That's not to say that Erlang should follow in all of Java's
>> footsteps, even if it could. But I have to say I was a boggled to
>> learn that you can't find out what the VM opcodes mean without reading
>> the source (and maybe not even then, if the source contains bugs
>> vis-a-vis some idealized machine model.)
>>
>> -michael turner
>>
>>
>>
>> On Mon, May 7, 2012 at 5:46 PM, Thomas Lindgren
>> <thomasl...@yahoo.com> wrote:
>>>
>>>
>>>
>>>>________________________________
>>>> From: Jonathan Coveney <jcov...@gmail.com>
>>>>To: erlang-q...@erlang.org
>>>>Sent: Monday, May 7, 2012 8:39 AM
>>>>Subject: [erlang-questions] Is there a good source for documentation on BEAM?
>>>>
>>>>
>>>>This question seems to come up now and again, and it's surprising to me that a crucial part of the documentation isn't better documented. Is there a reason that it is the case? Is the reason that there is no VM spec to give the devs the flexibility to change the intermediate layer without having to worry about backwards compatibility to the degree that Java does?
>>>
>>>
>>> Actually, I don't think such docs are all _that_ crucial -- who really needs to know, except a small number of VM implementors? (And they should read the source to get at all the goodies.) But perhaps someone on the list might be moved to do a tutorial presentation on an Erlang Factory or something?
>>>
>>> (By the way, I too assume not doing it is to avoid getting bogged down into minutiae.)
>>>
>>> If you want to learn more about some of the intellectual roots, try these:
>>> http://wambook.sourceforge.net/
>>>
>>> http://dl.acm.org/citation.cfm?id=188051
>>>
>>>
>>> Best regards,
>>> Thomas

Joe Armstrong

unread,

Jun 1, 2012, 3:03:02 AM6/1/12

to ja...@mansionfamily.plus.com, erlang-q...@erlang.org

On Wed, May 30, 2012 at 11:39 PM, james <ja...@mansionfamily.plus.com> wrote:
>> If they aren't written down, they WILL be forgotten
>> sooner or later.
>
>
> I would also like to suggest that the act of writing down a design forces a
> degree of rigour that is hard to achieve otherwise. I have several times
> thought that I had considered a design from 'every angle', only to find that
> when I *write it down*, some of the angles were imperial and some were
> metric, and some were just smoke and mirrors.

I agree 100% - when I get *really stuck* - I write out in clear
English what I want my program to do.

I often start programming without a clear idea of what my program is to do - I
"think" I have a clear idea but as the program evolves I find that the
original idea was
unclear. Writing down in another language (ie English, instead of
Erlang or C or whatever) forces
a cognitive shift in my brain - I suspect that actually *different*
parts of the brain are involved.
I can also feel the ideas moving around in my head - feel is too
strong a word here.

There is also another strange phenomena - until a program is
completed, the problem
is not really solved. But as soon as the program is completed in its
entirety, a strange thing
happens - then, and not sooner, I realise that a better solution was possible.

Why is this? - I suspect a different part of the brain is involved.
Some part of the brain
says "move on problem completed" and as soon as that happens the "move
on" part of the brain
realises that the solution you have just arrived at was flawed - so
throw it all away and start again.

Knuth, he of the wise ways, says this rewriting process should be
repeated seven times.

In organisations this causes problems. Just about when the system is
to be delivered
you fix the final bug and then realize that it was
all wrong and needs a total rewrite, and you start the rewrite the day
before delivery ...
I have never met a project manager on the planet who understands this
(Project mangers are from Venus,
Programmers are from Mars)

I also believe in working in a distraction-free environment (no phone,
email, twitters etc) you have to
listen very carefully to catch the small fleeting thoughts in your
brain. When I solve sudoku I
get instant flashes where I see where numbers are to be placed - but
these are fleeting and easy to miss.
I suspect my right-brain instantly sees a solution and tries to tell
my left-brain, but since I wasn't listening I missed it.

Programming is actually a form of "applied thinking" where you can
actually test the results of the
thought process. Since all the real work takes place inside the brain,
a brain-friendly environment is
essential. I once worked in an open-plan office, after a while I
noticed my (programmer) productivity
dropped to zero and that all my programs were written at home.

I could wax on but i'm supposed to be writing a book, but have been
distracted by the Erlang mailing
list ...

Cheers

/Joe

Michael Turner

unread,

Jun 1, 2012, 3:31:13 AM6/1/12

to Joe Armstrong, erlang-q...@erlang.org

Joe writes: "... you fix the final bug and then realize that it was

all wrong and needs a total rewrite, and you start the rewrite the day

before delivery ...."

A friend of mine, former coder who went on to manage coders, once met
me for a beer and moaned that he'd spent the day "fighting a sudden
outbreak of Truth and Beauty." ;-)

This is actually an advantage of generating as much of your
implementation as possible from your documentation (as ROK suggests
above): it gets you more of a Separation of Concerns between where you
want your Truth (the spec) and where you need the Beauty (the code.)

With ths point in mind, I started

http://beam.referata.com/wiki/Main_Page

If Semantic Mediawiki isn't powerful enough to generate at least the
*makings* of a VM implementation from a rigorous (but still reasonably
human-readable spec), I don't know what is.

The above main page is, of course, wrong. In almost every possible
way. But anybody can help out in fixing it ....

-michael turner

Robert Virding

unread,

Jun 1, 2012, 12:34:58 PM6/1/12

to Stu Bailey, erlang-q...@erlang.org

Or to make it more concise:

When you don't understand something you can't write the documentation, while when you understand you see no need to write the documentation. So you never write the documentation.

Robert

Fred Hebert

unread,

Jun 1, 2012, 5:25:52 PM6/1/12

to Robert Virding, Erlang

That depends. As a personal rule, I see 3 general levels of documentation:

1. Beginner guide/tutorial
2. Reference Manual
3. Architecture

1. The first type is the one that scratches the surface, gives a few usage examples, and generally helps newcomers and users.

It explains why you'd use the library/module/thing, and directs in how to do so. It holds the user's hands so they can quickly solve whatever problem they have.

The raw version of this is either reading test cases or Open Source code that uses the product.

2. The reference manual is a listing made to help those who already know the item. It details everything but gives few explanations. It can be used to deepen one's knowledge, but won't do any hand holding, or very little.

EDocs and the general javadoc-style stuff fit this well.

The raw version of it is 'read the source, Luke'. It works for experienced users, and is of nearly no use to others.

3. Architecture is the doc that explains why the app was built the way it is, a view of how it works from 10,000 feet. It shows how to understand the app were you to dive in its source.

This kind of doc explains the rationale of some choices made, and is especially helpful to developers or contributors to your code, so that they do not undo future plans, respect trade off decisions you made, and know where to dive in to extend things without causing headaches to anyone.

I know of no raw version of this; it's experience and in-depth knowledge of the product's life. Writing this doc is often vital to raise the quality and relevance of contributions received.

----

In my opinion, you don't have great documentation until you covered the 3 aspects. You can have poor, decent, or good docs without all of them, but greatness requires to cover all bases.

A self-proclaimed complete book, in-depth tutorial, or course, should touch all 3, for example.

Understanding your code well and making it self-explanatory sadly rarely helps with more than one out of 3 points.

On Jun 1, 2012 12:35 PM, "Robert Virding" <robert....@erlang-solutions.com> wrote:

Reply all

Reply to author

Forward