[erlang-questions] Trace-Driven Development

72 views
Skip to first unread message

Henning Diedrich

unread,
Jun 2, 2012, 3:02:10 PM6/2/12
to Erlang Questions
Should there be any interest I would like to discuss tracing and if it may be more valuable for the actor model than unit tests.

A couple of days ago, during Jesper Louis Andersen's very enlightening debugging and tracing tutorial at Ericsson, Joe Armstrong asked whether showing visual traces first may be a good approach to teaching Erlang, book-side.

I would like to propose again that trace-driven development may possibly be the way to teach andalso program Erlang.

In the instance, we were talking about visual traces, like these:

http://www.erlang.org/doc/apps/et/coffee_order.png

I mean it in the slightly unrealistic way that most people probably appreciate test-driven programming as a great idea but seldom execute it as dogmatically as preached.

But I realize that what I actually do when programming a new app, or when looking for truly nasty bugs, is almost always pretty 'trace-driven'. Writing traces into the program has become a routine step for me when coding medium complicated stuff that uses supervisors, monitors and multiple process that come to live at various times throughout the lifetime of an Erlang (OTP) application -- and most of all, during the start up. And I have always found surprises, which process comes to live when. I think Jesper related a similar story how they found tons of bugs that no-one had realized were there when using tracing (not sure any more.)

I believe that the main aggravating factor is that you don't construct an OTP application from scratch
that often. You'll have to look things up again every next time. My guess is that this is true for the vast majority of Erlang programmers. Of course, for the ultimate expert, it may be hard to grasp what I am even talking about because it's all so 'obvious'.

But what I really find myself currently doing, before writing tests, is writing traces. I also haven't found out how to plan out and write meaningful stateful tests up front, before I even made it to architect the application.

My concrete order is: writing first, get to compile, put in traces, try to run, debug.

I am so sure meanwhile that I will need the traces that I don't wait anymore until I have gotten lost in searching for a bug and not even knowing what happens before the crash and what parts of the program may have silently died and what other parts may have waited for which other part.

These things will be obvious for expert Erlang programmers who have set up applications time and again. But few people using Erlang for productivity reasons will ever come into that position. And for finding bugs: that's exactly the moment when reality doesn't follow your fantasy and a reality check in the form of a trace is one of the first tools to turn to. Is it not?

So trace-driven development sounds like it could be a useful recommendation. And if it is, a new best practice of waterfalling your application into existence could maybe emerge, similar to what Joe describes in his book (if I remember well) how he almost always starts out.

I learned a lot in Jesper's tutorial. For example that because Erlang and OTP keep growing, there are now three distinct mechanisms, with respective modules and function calls, to trace and debug. And despite me using traces a lot, I hadn't used any of them, for the wrong reasons it turned out. (And for productivity, a simple io:format can beat getting tied up into using more powerful but less simple minded options.)

So I am asking this out of honest ignorance:

Instead of tracing, is there a way to write meaningful, concise tests that are meant or at least good for checking the sequence of process communications? In other words, are there better alternatives to follow traces with the naked eye? I do not mean a heavy construct, or an also-possible use, but a test-package made for that and/or a useful practice that someone is in fact applying?

And it need not be the visual traces as the image linked to above. A ping-pong output in the terminal may do (if not skewered by delays or re-sequences or contentions in io:format.)

For trace-driven development, what concrete procedure with which concrete calls and parameters to erl can be a best practice for starting out? I am sure it' s 'obvious' for many, but at least for me, not. What could be the hello world case for a trace-driven approach?

And finally, how do you approach setting up a new app, finding a bug -- how much tracing does everyone use in these things? And is anyone also using io:format instead of the OTP goodness, if so, for a good reason?

Best,
Henning



Michael Turner

unread,
Jun 4, 2012, 12:52:37 AM6/4/12
to Henning Diedrich, Erlang Questions
I'm a little embarrassed to be doing a "reply all" on this message,
because I'm (still) such a stop-start Erlang newbie. What compensates
for the mortification, however, is passages like the following, which
suggest I'm hardly the only one who should be embarrassed:

"Erlang tracing is a seething pile of pain that involves reasonably
complex knowledge of clever ports, tracing return formats, and
specialized tracing MatchSpecs (which are really their own special
kind of hell). The tracing mechanism is very powerful indeed, but it
can be hard to grasp."

Obviously, that kind of statement has no place in the official
documentation of a professional product. Oh, except that's precisely
where I found it:

http://www.erlang.org/doc/apps/et/et_intro.html#id62156

Now, if the formal documentation hosted by Ericsson seems determined
to frighten me away, I'll oblige Ericsson and run off. That "hell"
paragraph immediately followed two paragraphs that I couldn't make
much sense of, so it pretty much validated my confusion and dismay at
that point, while warning me that it was only going to get worse.

[I wrote 5 more lines of rant here, then deleted them in the interests
of being diplomatic.]

I started using seq_trace. It has its own documentation problems, of
course. For example: since seq_trace is an implementation of Lamport
clocks, you should *say* somewhere (like, in the first paragraph,
maybe?) that it's an implementation of Lamport clocks. Don't make it
sound like your own invention. That's dishonest. And don't make people
infer it. That's wasting people's time. Just say it. It only takes a
few seconds to type "seq_trace implements Lamport clocks."

Still, I found seq_trace relatively simple and usable, and I'm now
doing unit testing on a module as I develop it further, based on
collecting and filtering seq_trace results with a small amount of code
I wrote myself. At some point, I expect to finally put the horse in
front of the cart and do *trace*-driven development. But before then,
I should make a decision: do I keep building ever more sophisticated
match filtering on top of seq_trace, undoubtedly reinventing wheel
after wheel, or do I bite the bullet and plunge into what Jayson
Vantuyl describes as "hell"?

It's a discouraging choice.

-michael turner
> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Michael Truog

unread,
Jun 4, 2012, 1:21:54 AM6/4/12
to Michael Turner, Erlang Questions
I think your link is slightly wrong:
http://www.erlang.org/doc/apps/et/et_tutorial.html#id58003

Ulf Wiger

unread,
Jun 4, 2012, 2:52:46 AM6/4/12
to Michael Turner, Erlang Questions
4 jun 2012 kl. 06:52 skrev Michael Turner <michael.eu...@gmail.com>:

> I'm a little embarrassed to be doing a "reply all" on this message,
> because I'm (still) such a stop-start Erlang newbie. What compensates
> for the mortification, however, is passages like the following, which
> suggest I'm hardly the only one who should be embarrassed:
>
> "Erlang tracing is a seething pile of pain that involves reasonably
> complex knowledge of clever ports, tracing return formats, and
> specialized tracing MatchSpecs (which are really their own special
> kind of hell). The tracing mechanism is very powerful indeed, but it
> can be hard to grasp."
>
> Obviously, that kind of statement has no place in the official
> documentation of a professional product.

You are absolutely right about that.

> I started using seq_trace. It has its own documentation problems, of
> course. For example: since seq_trace is an implementation of Lamport
> clocks, you should *say* somewhere (like, in the first paragraph,
> maybe?) that it's an implementation of Lamport clocks. Don't make it
> sound like your own invention.

Actually, I believe it is more correct to say that it's based on the "forlopp trace" thst exists in Ericsson's AXE switches (using the proprietary language PLEX). Whether forlopp trace was based on Lamport clocks, I couldn't say. I don't know when it was introduced in the AXE (Lamport published his paper in 1978, the same year that the first digital AXE was taken into service), and to what extent it was informed by Lamport's work.

Anyway, not noting in the docs that sequence trace mimicks AXE's forlopp trace seems forgiveable, since very few people would know what that is. :)

See e.g. http://es.scribd.com/mobile/doc/83784121, starting at page 109.

BR,
Ulf W

Joe Armstrong

unread,
Jun 4, 2012, 3:49:01 AM6/4/12
to Henning Diedrich, Erlang Questions
On Sat, Jun 2, 2012 at 9:02 PM, Henning Diedrich <hd2...@eonblast.com> wrote:
> Should there be any interest I would like to discuss tracing and if it may
> be more valuable for the actor model than unit tests.
>
> A couple of days ago, during Jesper Louis Andersen's very enlightening
> debugging and tracing tutorial at Ericsson, Joe Armstrong asked whether
> showing visual traces first may be a good approach to teaching Erlang,
> book-side.
>
> I would like to propose again that trace-driven development may possibly be
> the way to teach andalso program Erlang.
>
> In the instance, we were talking about visual traces, like these:
>
> http://www.erlang.org/doc/apps/et/coffee_order.png

These are great.

If I have a complex protocol to understand I often draw these (on
paper) but then
throw them away after the code has been written (the throwing them
away bit is wrong I know)
The problem is "published code" that ends up in books git repositories
does not contain all
the design notes used to create the code - not is there a clue as to
how the code was created.

(I'm thinking about ways to bundle "research" and code toghether so
that *both* get published)

Message sequence charts (MSCs) are parts of SDL and UML and there are many tools
(which I don't use) to make them - a quick google turned up this
http://en.wikipedia.org/wiki/MscGen which is food for thought.

Using the trace BIFs it should be easy (famous last words) to make a
checker that
checks if the observed messages correspond to allowed flows in the MSC

This strikes me as a abstraction-lift over unit tests ...

>
> I mean it in the slightly unrealistic way that most people probably
> appreciate test-driven programming as a great idea but seldom execute it as
> dogmatically as preached.
>
> But I realize that what I actually do when programming a new app, or when
> looking for truly nasty bugs, is almost always pretty 'trace-driven'.
> Writing traces into the program has become a routine step for me when coding
> medium complicated stuff that uses supervisors, monitors and multiple
> process that come to live at various times throughout the lifetime of an
> Erlang (OTP) application -- and most of all, during the start up. And I have
> always found surprises, which process comes to live when. I think Jesper
> related a similar story how they found tons of bugs that no-one had realized
> were there when using tracing (not sure any more.)
>
> I believe that the main aggravating factor is that you don't construct an
> OTP application from scratch that often. You'll have to look things up again
> every next time. My guess is that this is true for the vast majority of
> Erlang programmers. Of course, for the ultimate expert, it may be hard to
> grasp what I am even talking about because it's all so 'obvious'.
>

I for one have to look things up a lot of the time - the difference is
that I know
exactly where to look.


> But what I really find myself currently doing, before writing tests, is
> writing traces. I also haven't found out how to plan out and write
> meaningful stateful tests up front, before I even made it to architect the
> application.

That's what I'd do - I'd write the traces first.

Possibly http://www.mcternan.me.uk/mscgen/ could be integrated with
this work flow

I'm not sure about testing the state. I just want to see what is in the messages
Good question. You'd have to make a mini-language to describe message sequences
something like TCL expect

[{send,client,server,hello}, %% client sends a
hello message to server
{receive,client,server,ack} %% cleint receives an
ack from server
]

For such a spec one could generate a PNG and test code :-)
(There is a library for making PNGs well hidden in the erlang distribution
code:which(egd) will find it for you)

(aside: there are lots of good things in the erlang distribution which
nobody knows about
like egd.erl -- one of lifes mysteries is why these are not documented
or promoted -
I suspect lack of time and "not core businesss")

>
> And finally, how do you approach setting up a new app, finding a bug -- how
> much tracing does everyone use in these things? And is anyone also using
> io:format instead of the OTP goodness, if so, for a good reason?

Absolutely - I use io:format all the time. I put the io:formats next
to send and receive statements.

The good reason was "I didn't know about the trace bifs"
(modified truth) - I knew about them but hadn't realized how good they
were until
I attended Jespers (excellent) tutorial. I'd relegated traces to an
obscure appendix in my
book (Appendix E.3) where nobody could find it.

In the next edition (Which I'm working on *now*) I will introduce
tracing far far earlier


Cheers

/Joe

>
> Best,
> Henning

Michael Turner

unread,
Jun 4, 2012, 3:52:08 AM6/4/12
to Ulf Wiger, Erlang Questions
"(Lamport published his paper in 1978, the same year that the first
digital AXE was taken into service), ...."

Leaving publication delays aside: as with so many ideas that get named
after a single person, Lamport clocks (in the sense of the
counter-pairs used in seq_trace) are at least partially credited to
the earlier efforts of others, by Lamport himself, in the original
paper, and here:

http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#time-clocks

In particular, he credits the authors of an RFC submitted in 1975:

http://merlot.tools.ietf.org/pdf/rfc677.pdf

> See e.g. http://es.scribd.com/mobile/doc/83784121, starting at page 109.

Yes, I've been pointed to this claim before in a previous "actually
Ericsson invented that" discussion. Looking at the section closely,
however, I don't see how either Forlopp Identity Registers or Forlopp
Identity itself correspond to Lamport clocks. All I see is something
that vaguely corresponds to seq_trace's "contamination" algorithm,
together with an "error intensity count" that doesn't seem to have
anything to do with Lamport clocks.

Regardless of who first invented this technique, I think my point
stands: the seq_trace documentation should call what it does by its
proper name. Hardly anybody knows what forlopp means, whereas
Lamport's paper won the "2000 PODC Influential Paper Award (later
renamed the Edsger W. Dijkstra Prize in Distributed Computing)," and
the "ACM SIGOPS Hall of Fame Award in 2007."

So there's a classic distributed-systems paper that gives the
technique its name, yet somehow, even though Kenneth Lundin agreed,
all the way back in 2007, that seq_trace implements Lamport clocks

http://erlang.org/pipermail/erlang-questions/2007-May/026827.html

this fact somehow hasn't made it into the documentation. Everybody has
to rediscover it. This is not just a waste of their time, there's also
the opportunity cost for Erlang/OTP (and Ericsson) in people *not*
discovering it because it's not named as such at www.erlang.org.
Tracing tools building on seq_trace, exploiting the formal properties
that were elucidated by Lamport and elaborated on by others, might
considerably improve on what's available now. But as far as I can
tell, seq_trace isn't the foundation for *anything* in OTP.

-michael turner

Torben Hoffmann

unread,
Jun 4, 2012, 4:01:33 AM6/4/12
to Joe Armstrong, Erlang Questions


On 04/06/2012 09:49, Joe Armstrong wrote:
> <snip>
>
> Message sequence charts (MSCs) are parts of SDL and UML and there are many tools
> (which I don't use) to make them - a quick google turned up this
> http://en.wikipedia.org/wiki/MscGen which is food for thought.
If you want to create MSCs I recommend using other tools than MscGen.
I have tried MscGen and I think that is not so good as PlantUML -
http://plantuml.sourceforge.net/

Warning: I have not tried the latest version of MscGen, so my knowledge
might be outdated.

Cheers,
__
/orben

--
http://www.linkedin.com/in/torbenhoffmann

Richard Carlsson

unread,
Jun 4, 2012, 4:10:20 AM6/4/12
to erlang-q...@erlang.org
IMO, Mats Cronqvist's excellent "redbug" interface to the tracing
mechanism ought to be productified (or at least documented...) and
included in the standard OTP distribution. That would make Erlang's
tracing mechanisms something that a beginner could use from day one,
rather than being considered an advanced technique. Today it lives in
obscurity as part of his "eper" tools:

https://github.com/massemanet/eper

/Richard

PS. As I'm the one who actually meets him every day at work, I could
take it upon myself to pester him until it gets properly documented, but
I think OTP should start taking a look at redbug and give their opinion
on what might be needed for including it in the distribution.

Gleb Peregud

unread,
Jun 4, 2012, 4:29:39 AM6/4/12
to Richard Carlsson, erlang-q...@erlang.org
On Mon, Jun 4, 2012 at 10:10 AM, Richard Carlsson
<carlsson...@gmail.com> wrote:
> IMO, Mats Cronqvist's excellent "redbug" interface to the tracing mechanism
> ought to be productified (or at least documented...) and included in the
> standard OTP distribution.

+10000

Ulf Wiger

unread,
Jun 4, 2012, 6:07:16 AM6/4/12
to Michael Turner, Erlang Questions

On 4 Jun 2012, at 09:52, Michael Turner wrote:

> So there's a classic distributed-systems paper that gives the
> technique its name, yet somehow, even though Kenneth Lundin agreed,
> all the way back in 2007, that seq_trace implements Lamport clocks
>
> http://erlang.org/pipermail/erlang-questions/2007-May/026827.html
>
> this fact somehow hasn't made it into the documentation. Everybody has
> to rediscover it. This is not just a waste of their time, there's also
> the opportunity cost for Erlang/OTP (and Ericsson) in people *not*
> discovering it because it's not named as such at www.erlang.org.

First of all, I didn't dispute that seq_trace implements Lamport clocks,
only that the requirements for seq_trace had different origins. In a
similar fashion, one might claim that Erlang was inspired by Tony
Hoare's CSP, but that would be rewriting history, even though the pieces
mostly fit. There was a lot of work done on concurrency algorithms in the
'70s and '80s; the telecoms industry (and Ericsson) started designing
software-controlled phone switches in the '60s. It's very hard to untangle
after the fact who inspired whom (but it's certainly a fascinating excercise
- for one thing, Bjarne and co at one point visited Niklaus Wirth and came
to the conclusion that many of the things they had been working on were
manifest in his Modula-2… smart people working on the same type of
problems will sometimes independently arrive at very similar conclusions.)

My main reason for responding was that you accused the OTP team of being
"dishonest" in not mentioning where their ideas came from. I maintain that
they are not; only that many of the inputs that *actually* informed the
implementation were either confidential or proprietary enough to be of
no interest to the people reading the manuals.

A question is of course how many people are helped by the seq_trace
documentation mentioning the relation to Lamport clocks. Some might,
but others couldn't care less; they just want to know how to use the
functionality. We have had examples of OTP man pages in the past that
have gone into great detail about algorithm choices, resulting only in
terrifying anyone who came there just to learn how to use the API.

This is of course the big challenge when writing product manuals.
Many things that are of academic interest must be left out of the
manual if it ends up confusing the reader. In this case, I'm pretty sure
it could be worked in without harming readability, but it is of course
perfectly possible to use seq_trace without understanding, or even
being aware of the existence of, Lamport clocks.

The OTP team is known to accept patches to the documentation,
so please feel free to contribute to a more helpful way to describe
the tracing support. I'm sure it would be universally appreciated.

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com

Gustav Simonsson

unread,
Jun 4, 2012, 7:22:54 AM6/4/12
to erlang-q...@erlang.org
+1

Especially under the "Advanced examples" section there is less need for
brevity
since users clicking on that section likely are prepared for multiple,
possibly long, examples of using the Event Tracer.

// Gustav Simonsson

>
> BR,
> Ulf W
>
> Ulf Wiger, Co-founder& Developer Advocate, Feuerlabs Inc.

Joe Armstrong

unread,
Jun 4, 2012, 8:40:33 AM6/4/12
to Torben Hoffmann, Erlang Questions
On Mon, Jun 4, 2012 at 10:01 AM, Torben Hoffmann
<torben...@gmail.com> wrote:
>
>
> On 04/06/2012 09:49, Joe Armstrong wrote:
>>
>> <snip>
>>
>>
>> Message sequence charts (MSCs) are parts of SDL and UML and there are many
>> tools
>> (which I don't use)  to make them - a quick google turned up this
>> http://en.wikipedia.org/wiki/MscGen which is food for thought.
>
> If you want to create MSCs I recommend using other tools than MscGen.
> I have tried MscGen and I think that is not so good as PlantUML -
> http://plantuml.sourceforge.net/
>

I just tried PlantUML - very nice - worked out of the box :-)

/Joe

Eric Moritz

unread,
Jun 4, 2012, 9:24:32 AM6/4/12
to Erlang Questions
Does anyone have any resources (blog post, talk slides, etc) explaining Trace Driven Development?  I have seen quite a few references to it lately but my Google search efforts come up short when I attempt to find an explanation of it.

Thanks,
Eric Moritz.

Eric Moritz

unread,
Jun 4, 2012, 10:06:57 AM6/4/12
to Erlang Questions
Disregard my previous question.  I read the head of the thread and realized that trace driven development is something that Henning Diedrich coined.

Eric. 

Michael Turner

unread,
Jun 4, 2012, 11:10:45 AM6/4/12
to Ulf Wiger, Erlang Questions
On Mon, Jun 4, 2012 at 7:07 PM, Ulf Wiger <u...@feuerlabs.com> wrote:

>> this fact somehow hasn't made it into the documentation. Everybody has
>> to rediscover it. This is not just a waste of their time, there's also
>> the opportunity cost for Erlang/OTP (and Ericsson) in people *not*
>> discovering it because it's not named as such at www.erlang.org.
>
> First of all, I didn't dispute that seq_trace implements Lamport clocks,

And I never said you disputed it. If there's a dispute, it's whether
AXE did. I can't see that.

> only that the requirements for seq_trace had different origins.

That would be a more credible claim if the counters in seq_trace
weren't computed in a manner *identical* to the usual descriptions of
them found in the literature starting over 30 years ago.

> ... smart people working on the same type of
> problems will sometimes independently arrive at very similar conclusions.)

The copyright on the manual starts with the year 2001. Over two
decades after Lamport famously wrote, seq_trace implements exactly
what's typically described, and the AXE document -- which doesn't
supply nearly enough detail to verify independent invention -- is the
best you can offer as evidence of independent invention?

> My main reason for responding was that you accused the OTP team of being
> "dishonest" in not mentioning where their ideas came from.

"Ideas" is *your* plural, not mine. But look at the facts: there was
an obvious thing to call it -- obvious to anyone who's studied
concurrent programming, and I think you all have. One thing it's never
called in the literature is "sequential tracing."

As for dishonesty: If somebody had this algorithm described to them,
was assigned to write the documentation, and did so in a hurry,
without checking with their informant about origins, that makes them
more sloppy than dishonest.

But not taking the trouble to even *briefly* credit the foundational
work by a famous computer scientist, in the official documentation, in
2007, after the question arose on the list and was answered in the
affirmative? That's getting *really* sloppy.

> ... I maintain that
> they are not; only that many of the inputs that *actually* informed the
> implementation were either confidential or proprietary enough to be of
> no interest to the people reading the manuals.

This is an algorithm made famous among people concerned with
concurrent programming in the late 70s and in seq_trace it's
implemented identically to the canonical implementation. And I don't
have evidence of any implementation in Erlang/OTP before 2001. An idea
this widespread and remarked upon -- and by the year 2000, celebrated
as a landmark paper -- but somehow people missed that they had
reinvented a wheel? I don't think so. At best, they missed how sloppy
their documentation was. And is.

> A question is of course how many people are helped by the seq_trace
> documentation mentioning the relation to Lamport clocks. Some might,
> but others couldn't care less; they just want to know how to use the
> functionality.

OK, so now the defense is "who cares who invented it?" There is reason
to care. When there's decades worth of literature referring to its
uses, *everybody* should care. Anything else is an encouragement to
reinvent wheels.

> We have had examples of OTP man pages in the past that
> have gone into great detail about algorithm choices, resulting only in
> terrifying anyone who came there just to learn how to use the API.

What a strawman argument. I asked: how long does it take to add
"seq_trace implements Lamport clocks"? Nowhere have I suggested that
Ericsson's documentation on seq_trace be a comprehensive treatment of
the subject. Don't put words in my mouth. I only ask for this: Credit
where it's due, and give people a starting point for further
investigation (or for the more savvy among those doing concurrent
programming, to orient them quickly to what seq_trace does.) In the
same four words.

> This is of course the big challenge when writing product manuals.

Writing a 4-word sentence giving credit where it's due is no big
writing challenge. And in this case, it's an ethical imperative.

> Many things that are of academic interest must be left out of the
> manual if it ends up confusing the reader.

In your last response, you said you weren't even sure whether
seq_trace implemented Lamport clocks. A glance by anyone familiar with
them can verify that it does. Despite your previous level of ignorance
about Lamport clocks, it now sounds like you know for a fact that they
are only of academic interest. Where is THAT certainty coming from? A
comprehensive literature survey done in a few hours?

> ... In this case, I'm pretty sure
> it could be worked in without harming readability, but it is of course
> perfectly possible to use seq_trace without understanding, or even
> being aware of the existence of, Lamport clocks.

As it happens, so far, I haven't used the Lamport's relative timestamp
feature. But I'm coming up to the point where the fact that they help
describe a partial ordering of events is an essential part of my
testing. I *will* be looking to the literature to see what's been done
on this subject. Who wouldn't, given that people have over 30 years of
experience with it now, and must have written about it?

> The OTP team is known to accept patches to the documentation,
> so please feel free to contribute to a more helpful way to describe
> the tracing support. I'm sure it would be universally appreciated.

The one time I tried to get a patch into anything in Erlang (something
that was causing a build under FreeBSD to segfault), it didn't make it
into the next release -- even though other people helped get it into
shape for submission. Pardon me for being a little leery of the
process.

I hereby suggest, for whomever it's easier: add the line "seq_trace
implements Lamport clocks." (Alternatively, "Lamport timestamps.")
Kenneth Lundin already admitted it back in 2007, on this list. It's
long past time to get it in there.

And I suggest that you NOT take the opportunity to add any supposed
"history" about how AXE implemented Lamport clocks until you can
demonstrate that it did. From what I can see in the documentation you
pointed to, AXE did not.

-michael turner

Robert Virding

unread,
Jun 4, 2012, 12:00:28 PM6/4/12
to Michael Turner, Erlang Questions
I really think there are two issues here: what they are, or what they are commonly known as; and the generating idea behind them. I can't comment about seq_trace as I have absolutely no idea how they actually got the idea to do it like it is, whether from Lamport or from "forlopp trace" in AXE. And how you should describe them is another matter. IF the idea came "forlopp trace" then you should say that and then maybe add a comment that it is similar to Lamport clocks.

Another example is when people say that Erlang implements the actor model. Well perhaps, but when we designed Erlang it was definitely not the actor model which influenced us as we had not heard about the actor model at that time. Even though it had been invented much earlier. We worked from our ideas of how such a system could/should be implemented. With maybe a little CSP in it, at least that is where the '!' came from, and a long defunct '?'. The actor model first came when people told us we had implemented it. So if we don't mention the actor model we are not being dishonest, but truthful.

Robert

Ulf Wiger

unread,
Jun 4, 2012, 1:56:47 PM6/4/12
to Michael Turner, Erlang Questions

On 4 Jun 2012, at 17:10, Michael Turner wrote:
>
> The copyright on the manual starts with the year 2001. Over two
> decades after Lamport famously wrote, seq_trace implements exactly
> what's typically described, and the AXE document -- which doesn't
> supply nearly enough detail to verify independent invention -- is the
> best you can offer as evidence of independent invention?

Well, FWIW, it was introduced before 2001. I'm guessing 1998
or even 1997 (OTP R4, I believe).

Listen, I tried to politely point out to you that the *requirements*
for sequence trace in Erlang grew from Ericsson developers'
experience using "forlopp tracing" in AXE. Much like Erlang itself
was created - as Joe has described it - as an effort to "make something
like AXE, only better".

This is also why it is called "sequential tracing" (let's call this an
educated guess, as I was not part of the naming discussion,
although I *was* in the loop when the requirement came up).

The people who asked for the feature, and helped finance the
development of it, did not ask for Lamport clocks - they asked for
something like "forlopp tracing" in AXE. "Forlopp" is a bastardization
of the Swedish word "förlopp", which means "sequence". In the
world of AXE, it was a way to associate events to a "transaction",
which could be restarted if something went wrong.

Sequence tracing was therefore a perfectly logical thing to call
it in Erlang, as it is one of the correct ways to translate "forlopp
trace" to real English.

There was no deceit, sloppiness or arrogance behind it.

I have told you that the details of AXE's implementation are
proprietary. I provided one link to a document that *was* publicly
available, and that mentions "forlopp tracing". Take that as a
form of existence proof of "sequence tracing" in AXE.

As for "independent verification", am I to understand that I'm being
chided for not revealing secret Ericsson design details in a public
forum just so you can independently verify my claims?

(However, ordering may not have been a terribly difficult problem
to manage in AXE, as it was a single-CPU system, never distributed.
When the OTP team implemented similar support, obviously they
had to make something that worked in a distributed system).

>> A question is of course how many people are helped by the seq_trace
>> documentation mentioning the relation to Lamport clocks. Some might,
>> but others couldn't care less; they just want to know how to use the
>> functionality.
>
> OK, so now the defense is "who cares who invented it?" There is reason
> to care. When there's decades worth of literature referring to its
> uses, *everybody* should care. Anything else is an encouragement to
> reinvent wheels.

I did agree that the reference to Lamport clocks could be added to
the documentation, since it is not without interest. The one thing
I don't agree with is that the OTP team deserves infamy for not
having done so already.

When publishing research, it is extremely bad form not to mention
prior art. When writing a user guide, one has to consider whether
describing details of the implementation actually helps the user,
*and* whether those details are something one wants to commit
to as part of the interface.

I hope you can appreciate the difference:

- Kenneth agreeing on a mailing list that Lamport clocks are used
is a service to the community, but not a commitment to stay with
this design choice indefinitely (not that I think there are that many
other good ways to do it).

- Inserting details about the implementation in the Reference Manual
elevates that information to become part of the interface.

Another, perhaps more appropriate, place to insert the reference is
as a comment in the source code. It isn't there, however.

Granted, the interface *as described* pretty much commits to Lamport
clocks. It does seem reasonable to mention that, although I'm still
not convinced that many will find it that helpful.

> As it happens, so far, I haven't used the Lamport's relative timestamp
> feature. But I'm coming up to the point where the fact that they help
> describe a partial ordering of events is an essential part of my
> testing. I *will* be looking to the literature to see what's been done
> on this subject. Who wouldn't, given that people have over 30 years of
> experience with it now, and must have written about it?

Perhaps this thesis can be of interest then?

http://www.student.nada.kth.se/~nstahle/staahle_niklas.pdf

(Interestingly, the thesis project was run at Ericsson, on real-time
'transaction' or 'sequence' tracing. While it mentions Lamport and
a few other techniques, it fails to mention that both the AXE and
Erlang - both established Ericsson technologies at the time -
already supported this feature. ;-)

>> The OTP team is known to accept patches to the documentation,
>> so please feel free to contribute to a more helpful way to describe
>> the tracing support. I'm sure it would be universally appreciated.
>
> The one time I tried to get a patch into anything in Erlang (something
> that was causing a build under FreeBSD to segfault), it didn't make it
> into the next release -- even though other people helped get it into
> shape for submission. Pardon me for being a little leery of the
> process.

Patching can be pretty hairy, but note that patching the documentation
is much easier than patching the emulator. ;-)

> And I suggest that you NOT take the opportunity to add any supposed
> "history" about how AXE implemented Lamport clocks until you can
> demonstrate that it did. From what I can see in the documentation you
> pointed to, AXE did not.

Actually, there is some mention of it in the seq_trace docs already:

"A possible output from the system's sequential_tracer (inspired by AXE-10 and MD-110) could look like:

17:<0.30.0> Info {0,1} WITH
"**** Trace Started ****"
17:<0.31.0> Received {0,2} FROM <0.30.0> WITH
{<0.30.0>,the_message}
17:<0.31.0> Info {2,3} WITH
"We are here now"
17:<0.30.0> Received {2,4} FROM <0.31.0> WITH
{ack,{received,the_message}}"

(MD-110 is another proprietary, PLEX-based, Ericsson system).

More Ericsson history than that is hardly needed, but does
somewhat strengthen my claim that these systems were the main
source of inspiration.

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com



eigenfunction

unread,
Jun 4, 2012, 3:25:16 PM6/4/12
to erlang-q...@erlang.org
Thank you very much for PlantUML.

On Jun 4, 10:01 am, Torben Hoffmann <torben.leh...@gmail.com> wrote:
> On 04/06/2012 09:49, Joe Armstrong wrote:> <snip>
>
> > Message sequence charts (MSCs) are parts of SDL and UML and there are many tools
> > (which I don't use)  to make them - a quick google turned up this
> >http://en.wikipedia.org/wiki/MscGenwhich is food for thought.
>
> If you want to create MSCs I recommend using other tools than MscGen.
> I have tried MscGen and I think that is not so good as PlantUML -http://plantuml.sourceforge.net/
>
> Warning: I have not tried the latest version of MscGen, so my knowledge
> might be outdated.
>
> Cheers,
> __
> /orben
>
> --http://www.linkedin.com/in/torbenhoffmann
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questi...@erlang.orghttp://erlang.org/mailman/listinfo/erlang-questions

Michael Turner

unread,
Jun 5, 2012, 2:01:53 AM6/5/12
to Ulf Wiger, Erlang Questions
On Tue, Jun 5, 2012 at 2:56 AM, Ulf Wiger <u...@feuerlabs.com> wrote:

> Listen, I tried to politely point out to you that the *requirements*
> for sequence trace in Erlang grew from Ericsson developers'
> experience using "forlopp tracing" in AXE.

What you actually wrote:

"Anyway, not noting in the docs that sequence trace mimicks AXE's
forlopp trace seems forgiveable."

But there's nothing in the AXE documentation you pointed me to that
shows AXE forlopp trace doing Lamport clocks, which is the relevant
point.

What you actually wrote:

"... many of the inputs that *actually* informed the implementation
were either confidential or proprietary enough to be of no interest to
the people reading the manuals."

Yeah, but there's nothing proprietary or confidential about Lamport
clocks, described in a publication that's cited *directly* in
publications by about half-a-dozen Erlang contributors in the 1990s.
Since that Lamport publication is cited in over 2000 other
publications, including one on the internals of Mnesia, it's hard to
see how it can be deemed, in advance, "of no interest to the people
reading the manuals." *I* read the manual on seq_trace. My reaction? I
expressed considerable confusion, here:

http://erlang.org/pipermail/erlang-questions/2011-June/059235.html

and here

https://groups.google.com/group/erlang-programming/browse_thread/thread/3633ef0d84d63bf3/ef39b31f33011f8c

That was before it finally dawned on me: "Lamport clocks! Why the hell
don't they say so!? Lots has been written about Lamport clocks, and
undoubtedly somewhere something has been written more clearly and more
usefully than what I'm looking in here. Obviously, if I want to get
the most out of seq_trace's trace tokens, reading the seq_trace
documentation isn't going to get me much farther. I'll have to go to
the open literature."

> Much like Erlang itself
> was created - as Joe has described it - as an effort to "make something
> like AXE, only better".

Yeah, the same Joe Armstrong who's co-author of a book about Erlang
whose second edition in 1993 cites Leslie Lamport's paper defining
Lamport clocks.

> This is also why it is called "sequential tracing" (let's call this an
> educated guess, as I was not part of the naming discussion,
> although I *was* in the loop when the requirement came up).

If you want to call everything that seq_trace implements "sequential
tracing," OK, fine. But if you've got Lamport clocks in that
implementation, and in the API, for God's sake, call them Lamport
clocks. Say so in the beginning. Lots of people know what to do with
these. Not giving those people a way to get their usual handle on them
means you're leaving talent on the table. Telling them early in the
documentation would inspire confidence in documentation that's so rife
with basic copyediting errors as to inspire very little confidence
indeed.

> The people who asked for the feature, and helped finance the
> development of it, did not ask for Lamport clocks - they asked for
> something like "forlopp tracing" in AXE. "Forlopp" is a bastardization
> of the Swedish word "förlopp", which means "sequence". In the
> world of AXE, it was a way to associate events to a "transaction",
> which could be restarted if something went wrong.

Seq_trace is not "the feature" -- it's a collection of features. It
came out of a milieu in which I've identified half-a-dozen people who,
if their publications in the 1990s are any indication, know perfectly
well what Lamport clocks are.

> Sequence tracing was therefore a perfectly logical thing to call
> it in Erlang, as it is one of the correct ways to translate "forlopp
> trace" to real English.
>
> There was no deceit, sloppiness or arrogance behind it.

I can see that there might be "no deceit," but only by way of
sloppiness. And the sloppiness is pretty obvious.

As for arrogance, well, when you gesture vaguely in the direction of
AXE documentation that contains no evidence of any independent
invention of Lamport clocks, in support of the claim "we invented that
too", and you do it apparently without actually closely comparing that
the relevant sections of that document, the seq_trace documentation,
and what Lamport wrote.... OK, I'll be kind, and just call that "yet
more sloppiness," not arrogance.

> I have told you that the details of AXE's implementation are
> proprietary. I provided one link to a document that *was* publicly
> available, and that mentions "forlopp tracing". Take that as a
> form of existence proof of "sequence tracing" in AXE.

I see a tracing facility there. I don't see Lamport clocks. How many
times do I have to say this before you'll actually take a look for
yourself?

> As for "independent verification", am I to understand that I'm being
> chided for not revealing secret Ericsson design details in a public
> forum just so you can independently verify my claims?

There's nothing secret about Lamport clocks. Kenneth Lundin said
seq_trace implements Lamport clocks, in 2007. Mnesia uses Lamport
clocks, as reported in 1999. The first book about Erlang (2nd ed.
1993) refers to the 1978 paper in which Lamport clocks were defined.
Where is the excuse for *not* saying that seq_trace implements Lamport
clocks?

[snip]

> I did agree that the reference to Lamport clocks could be added to
> the documentation, since it is not without interest. The one thing
> I don't agree with is that the OTP team deserves infamy for not
> having done so already.

I don't believe the entire team deserves infamy. I DO believe that
whoever's responsible for documenting seq_trace has been, at best,
very lax.

As for any actual motivation to obscure origins, the best I can come
up with is one that is still very compelling indeed: having a
grotesquely swollen patent pool, for purposes of intimidation and/or
deterrence. After all, it's much easier to get a patent attorney to
write one of those not-very-inventive software patents if you say,
"well, there really IS no prior art on this one, except our own, of
course. Which is proprietary." At its extremes, you get disgusted
engineering teams pulling pranks like the one James Gosling described:
a draft patent describing the electrical power switch.

http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=44&f=G&l=50&co1=AND&d=PTXT&s1=%22gosling,+james%22.INNM.&OS=IN/%22gosling,+james%22&RS=IN/%22gosling,+james%22

http://www.theregister.co.uk/2010/08/17/golsing_on_sun_goofy_patent_contestas/

> When publishing research, it is extremely bad form not to mention
> prior art. When writing a user guide, one has to consider whether
> describing details of the implementation actually helps the user,
> *and* whether those details are something one wants to commit
> to as part of the interface.
>
> I hope you can appreciate the difference:

Oh, I definitely can. When the documentation isn't very good, a
reference in it to published research behind its API can at least
encourage confidence.

I found the seq_trace documentation pretty confusing. I pressed on
anyway, because I didn't like my alternative: other trace facilities
that are (literally) described in the company's official documentation
as hellish. Then I saw how seq_trace generates trace tokens, which
made it sound like some serious thought had been put into seq_trace --
if only I could figure out the implications of the algorithm. After a
couple of unanswered questions about this aspect of seq_trace, on this
list, I finally did figure it out: I was looking at Lamport clocks.

Knowing that seq_trace implements Lamport clocks would have helped me.
Maybe lots of people never even get as far as I got with seq_trace
before shrugging and deciding it's something like gs was and is --
basically a moribund, abandoned line of approach to tracing in
Erlang/OTP, even if the documentation doesn't come right out and say
so. (In the case of gs, getting the documentation to come right out
and say so required prodding from me, after I discovered I was at a
dead end with it.)

That's a waste. An easily remedied waste.

How long does something have to stay in beta? I'm finding other
versions of the seq_trace documentation with copyrights going back to
1991. If, as you say, the requirements come out of the late 90s,
that's like FIFTEEN YEARS IN BETA.

> - Kenneth agreeing on a mailing list that Lamport clocks are used
>  is a service to the community, but not a commitment to stay with
>  this design choice indefinitely (not that I think there are that many
>  other good ways to do it).

You mean if I write to seq_trace as a spec that uses a pretty standard
(and in fact famous) way of generating trace tokens, it might change
out from under me? May I ask, why? To what? Or would that be inquiring
of Ericsson's trade secrets?

> - Inserting details about the implementation in the Reference Manual
>  elevates that information to become part of the interface.

In the case of seq_trace implementing Lamport clocks, that sounds like
an excellent idea to me, given the vintage of the idea and how, in the
test of time, that idea has withstood over three decades.

> Another, perhaps more appropriate, place to insert the reference is
> as a comment in the source code. It isn't there, however.

Gosh, yet another place where you save people some comprehension time
by just writing four words. I wonder how many more?

> Granted, the interface *as described* pretty much commits to Lamport
> clocks. It does seem reasonable to mention that, although I'm still
> not convinced that many will find it that helpful.

Of course not. After all, somebody would have to read all 2000+ papers
that cite Lamport's paper, before they could be *really* sure.

[snip]
> Perhaps this thesis can be of interest then?
>
> http://www.student.nada.kth.se/~nstahle/staahle_niklas.pdf
>
> (Interestingly, the thesis project was run at Ericsson, on real-time
> 'transaction' or 'sequence' tracing. While it mentions Lamport and
> a few other techniques, it fails to mention that both the AXE and
> Erlang - both established Ericsson technologies at the time -
> already supported this feature. ;-)

Yeah, when you don't identify the wheel by name, making it hard to
find in a search, people reinvent that wheel. How amazing!

I've also seen a professor's assignment online: he starts with simple
trace framework in Erlang, and the student is then asked to implement
Lamport clocks. No mention of seq_trace. I guess it's because that
professor searched on "Lamport clocks" in the Erlang documentation and
said, "Hm, that's odd: no Lamport clocks."

Nice job of hiding gold. Next step: cast doubt on whether gold is
actually all that valuable, when compared to something unstated that
you might invent one of these days.

>>> The OTP team is known to accept patches to the documentation,
>>> so please feel free to contribute to a more helpful way to describe
>>> the tracing support. I'm sure it would be universally appreciated.
>>
>> The one time I tried to get a patch into anything in Erlang (something
>> that was causing a build under FreeBSD to segfault), it didn't make it
>> into the next release -- even though other people helped get it into
>> shape for submission. Pardon me for being a little leery of the
>> process.
>
> Patching can be pretty hairy, but note that patching the documentation
> is much easier than patching the emulator. ;-)

I'm sure it *can* be hairy, but mine was a single-character change, as
I recall, and it only asked that a C auto array have a dimension of 1
rather than zero (a write to that array unsurprising caused the stack
to be trashed on the platform I was trying to build on, which made
using source-level debugging a non-starter, which made debugging a
total pain in the ass.)

>> And I suggest that you NOT take the opportunity to add any supposed
>> "history" about how AXE implemented Lamport clocks until you can
>> demonstrate that it did. From what I can see in the documentation you
>> pointed to, AXE did not.
>
> Actually, there is some mention of it in the seq_trace docs already:

[snip]

It shows traces "inspired" by AXE, but without the sequence tokens
that would suggest that AXE implemented Lamport clocks -- IF it ever
did.

> More Ericsson history than that is hardly needed, ...

I'd say that history is entirely dispensable. I'd even throw out the
AXE traces, if you could use the space to instead say something about
how useful the trace tokens can be.

> ... but does
> somewhat strengthen my claim that these systems were the main
> source of inspiration.

We're not talking about whether AXE was "the main source of
inspiration". We're talking about whether the seq_trace documentation
(especially given how confusing it is to begin with) should mention
that it implements Lamport clocks, since, after all, (a) it does, and
(b) lots of people who do concurrent programming know what Lamport
clocks are already -- including about half-a-dozen in the Erlang group
(at least if what they cite in their publications is any indication.)

-michael turner

Anders Wei

unread,
Jun 5, 2012, 2:04:45 AM6/5/12
to Henning Diedrich, Erlang Questions

I really think it’s a great idea.

For daily erlang programming, I will always collect call traces and get familiar of the modules that I am not familiar with.

Actually I wrote a small tool to generate call flows, but there is a poor muti-process support, so it doesn’t require Lamport clock.

It just collect the dbg:tracer output and print it out in another way, it’s simple but it works fine J

It work like this:

Michael Turner

unread,
Jun 5, 2012, 3:47:02 AM6/5/12
to Ulf Wiger, Erlang Questions
Correction: I wrote that the seq_trace documentation "shows traces
"inspired" by AXE, but without the sequence tokens
that would suggest that AXE implemented Lamport clocks -- IF it ever did."

In fact, the seq_trace documentation does feature traces with the
sequence tokens as generated by Lamport's algorithm.

My point, however, stands: "inspired" is not "virtually identical",
and definitely not in this case. In the actual AXE documentation Ulf
points to

http://www.scribd.com/doc/83784121/AXE-System-Testing-1-Apz-212

the only counter mentioned for tracing is an "error intensity
counter." Lamport clocks generate timestamps based on *two* counters,
neither of them specific to error reporting.

Ulf describes the customer's requirements for seq_trace as "like AXE
only better". If I may suggest: Ericsson improved on the AXE traces by
adding (perhaps among other things) Lamport clocks. There is ample
evidence, in the citations and direct references in at least two
open-literature publications by Erlang luminaries, that they were
quite aware of Lamport's work. Did they just forget all about it, when
writing seq_trace? It's possible, I suppose. Or perhaps an engineer
employed by the customer suggested the feature, without mentioning
Lamport? Who knows? All I know is: if that's what you're doing, that's
what you should call it.

And speaking of what to call things: I don't think you should still be
calling seq_trace "beta", if (as Ulf says) it originated ca 1997. I'd
do the interface differently, but the more important thing to me now
is stability and correctness.

-michael turner

Ulf Wiger

unread,
Jun 5, 2012, 3:52:24 AM6/5/12
to Michael Turner, Erlang Questions

On 5 Jun 2012, at 08:01, Michael Turner wrote:

> We're not talking about whether AXE was "the main source of
> inspiration". We're talking about whether the seq_trace documentation
> (especially given how confusing it is to begin with) should mention
> that it implements Lamport clocks, since, after all, (a) it does, and
> (b) lots of people who do concurrent programming know what Lamport
> clocks are already -- including about half-a-dozen in the Erlang group
> (at least if what they cite in their publications is any indication.)

*You* are talking about that, and in the process, firing off rants
insulting people who have worked much harder than you to make
Erlang a solid platform, implying that they are lazy, possibly dishonest
and ignorant. I claim that seq_trace implements Lamport clocks "by
accident", and that it was not the original purpose, nor a complete
solution to the problem. It may be "all it does", but it's not all it should
do. I don't say that out of ignorance, but because I still recall many of
the discussions before, during and after the implementation of
seq_trace.

Meanwhile, you tried to submit a patch once, and since it went
badly, you can't be bothered to submit a suggestion for how to fix
this documentation bug that upsets you (but apparently not that
many others, judging by the lack of sympathetic uproar).

Still, you are making some good points. If you would dispense with
the sarcasm, chances are people would be more inclined to listen.
Also, if you would be less focused on defending your position, we
might actually be able to agree on something.

The reason I reply to you is not because I want to get back at you
for insulting me (I think even you could agree that I have a track
record of favoring constructive dialogue here), but because I think
this is an area that really needs improvement.

There is a clear process in place: either ask the OTP team nicely to
do this - understanding that they have a ton of higher-priority tasks
on their plates, or figure out how to build a documentation patch yourself.
The bar of acceptance is much lower for documentation patches.
Having patched the docs myself, my own experience is that the biggest
frustration is dealing with FOP - it runs out of heap and throws exceptions
everytime I try to build the whole docs. That, and of course the problem
of writing good documentation in the first place.

I've made this suggestion to you before - I even suggested that your
documentation patch would be universally appreciated (provided it
improves on the current docs of course). You have declined. In my book,
this leaves you with the "ask nicely" option. That, or you can try to pay
someone money to make your itch their priority. Erlang Solutions actually
offers that service (and no, I work for neither them nor Ericsson).

But please consider this: seq_trace was never intended as a general
implementation of Lamport clocks. It is a tracing facility, and the purpose,
as I've been trying to tell you, was to allow for tracing selectively on e.g.
a call setup sequence in a phone switch*. You shouldn't rely on it for
any other purpose. The reference to "beta status" remains there, I would
guess, because people still find the API and descriptions confusing.
I think that confusion goes much deeper than just lacking a reference
to Lamport clocks. Maybe the API should really change into something
that no longer looks very much like Lamport clocks?

That seq_trace is completely independent of the built-in tracing is also
misleading. While you can run seq_trace without tracing enabled, the
match specifications for the trace BIFs have support for seq_trace.
The seq_trace functionality is also at least partly implemented in
erl_bif_trace.c

(Actually, not requiring tracing is a great feature, not least if it's to be
used for other purposes - which it shouldn't in its present state - but
one can turn it around and note that one of the biggest drawbacks
of erlang's tracing support is that only one tracer per process at a time
can be supported. This makes it hard to make broader use of tracing.)

It might well be that the best way forward would be to provide a clean
API for Lamport clocks - and seq_trace is pretty close to that already -
and then rework the sequence tracing support (which is badly needed,
and shamefully under-used) to make clear and intuitive use of it.
This could achieve two good things: making Lamport clocks available
in Erlang (not just by accident, but as a documented and supported
feature), and making sequence tracing more intuitive to use.

But now we're looking at a slightly larger and more difficult task.

I venture a guess that this is very close to what you've been trying
to say, but your refusal to accept that the seq_trace API was not
meant to implement Lamport clocks, and might well depart from them
if it makes it more fit for purpose, reduces the chances that your
suggestions for improvement will actually be accepted. You need to
accept the whole picture.

* Just to expand a bit on what the challenge is: in a phone switch,
or similar, operating under potentially high load and with fairly
extreme availability demands, running a sequence trace is not
primarily a question of ordering concurrent events - for practical
purposes, the trace timestamps are often sufficient for that -
the big problem is to selectively triggering trace output at just
the right places and turning it off as soon as it is no longer needed.
In this respect, Lamport clocks are one brilliant and convenient
technique that can be built on to solve part of the problem. The
thesis I linked to lists a few other approaches.

(And no, the failure to mention Erlang's support for real-time
tracing in that thesis is more likely to be due to internal rivalry,
or simply lack of interest in technologies that they can't use
anyway - Erlang due to past policy issues and AXE since it's
a legacy system using a weird programming language).

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com



Ulf Wiger

unread,
Jun 5, 2012, 4:19:54 AM6/5/12
to Michael Turner, Erlang Questions

On 5 Jun 2012, at 09:47, Michael Turner wrote:

> All I know is: if that's what you're doing, that's
> what you should call it.

An alternative, as I expanded on in my last email, which I sent just
before reading this one, is that perhaps they should be doing something
else instead. Seq_trace is not well understood for the purpose it was
intended for. It should perhaps be reworked entirely.

If so, it does seem like a good idea to change seq_trace to 'lamport',
make it clearly a generic implementation of Lamport clocks, for
whatever purpose.

This could be done today. As it affects the VM, it should be an EEP, I think.
The initial implementation of 'lamport' could be completely based on
seq_trace, but renaming functions and changing the documentation so
that it clearly references relevant papers and illustrates how it could be
used. It ought to be perfect for e.g. "model tracing", similar to what 'et'
does (another API that is woefully under-used since the documentation
turns people away). Code could be inserted as "executable comments"
basically indicating "we are now in <this state> in the model". With such
code in place, one could do quite sophisticated visualizations of a
running system

It doesn't seem like such a module ought to have a system_tracer()
function. Rather, tracing on Lamport clock events should then be
more intuitively integrated into the tracing BIFs (halfway there already).

Actually, 'et' handles seq_trace events and processes them for use in
the visualization. However, the documentation doesn't make the
connection. The seq_trace events are included in the type signatures,
but never mentioned elsewhere.

This is interesting. It seems as if 'et' could rely entirely on seq_trace.
Instead, it more or less mandates global tracing. Why?

> And speaking of what to call things: I don't think you should still be
> calling seq_trace "beta", if (as Ulf says) it originated ca 1997. I'd
> do the interface differently, but the more important thing to me now
> is stability and correctness.

I agree this is a problem, like with parameterized modules. You shouldn't
have beta or unsupported features lingering for years. Either make them
supported or remove them and possibly provide something better.

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com



Michael Turner

unread,
Jun 5, 2012, 6:03:02 AM6/5/12
to Ulf Wiger, Erlang Questions
Ulf, you're saying *I'm* off-topic, in how I'm responding to you? Have
you already forgotten how YOU joined this conversation? Here it is
again:

Me:
> I started using seq_trace. It has its own documentation problems, of
> course. For example: since seq_trace is an implementation of Lamport
> clocks, you should *say* somewhere (like, in the first paragraph,
> maybe?) that it's an implementation of Lamport clocks. Don't make it
> sound like your own invention.

You:
"Actually, I believe it is more correct to say that it's based on the
"forlopp trace" thst exists in Ericsson's AXE switches (using the
proprietary language PLEX). Whether forlopp trace was based on Lamport
clocks, I couldn't say. I don't know when it was introduced in the AXE
(Lamport published his paper in 1978, the same year that the first
digital AXE was taken into service), and to what extent it was
informed by Lamport's work."

But now, when I say this:
>> We're talking about whether the seq_trace documentation
>> (especially given how confusing it is to begin with) should mention
>> that it implements Lamport clocks

You respond with this:

"*You* are talking about that, ...."

ALL of the apparent topic drift on this particular point, since your
first response to me, is from you. I've tried to stay on the point: if
you're using Lamport clocks, exposing them in an API, *admitting* (as
Kenneth Lundin did, on this list, in 2007) that you're using them,
then the documentation should say so. To save people time, if nothing
else. But especially so that people who are looking for a Lamport
clock implementation in Erlang will be able to find it easily in
searches.

> I claim that seq_trace implements Lamport clocks "by
> accident", and that it was not the original purpose, nor a complete
> solution to the problem.

(a) You have no evidence of this,

but worse --

(b) all of the evidence I've been able to find is on the other side:
open-literature publications by members of the Erlang group, showing
they are familiar, perhaps even intimate, with Lamport's work.

As to the "original purpose" of "accidentally" implementing Lamport
clocks in seq_trace, what, pray tell, WAS the original purpose of an
"accidental" implementation of them, if it wasn't basically the same
as Lamport's purpose? Just to have some intriguing pairs of numbers to
look at, in otherwise-boring traces?

> or figure out how to build a documentation patch yourself.

If in fact you represent the Ericsson point of view on this issue, I'd
be wasting my time: it'll be rejected for the reason that "we invented
that independently." I'd like to know what the Ericsson point of view
is, before I try something that might be futile.

> Also, if you would be less focused on defending your position, we
> might actually be able to agree on something.

If you would be less focused on defending a position that I'm not sure
the Erlang group actually takes, we might actually be able to agree on
something. So far, only Robert Virding has spoken up on this issue. I
pointed out to him that he's coauthor on a 1993 publication that cites
Lamport's paper -- long before the 1997-8 timeframe you give for
seq_trace requirements acquisition and implementation. He hasn't
responded since.

> ... dealing with FOP - it runs out of heap and throws exceptions
> everytime I try to build the whole docs.

Yes. If I want to make a four-word change, I'll might have to spend
all day getting set up. That was my last experience with trying to get
the Erlang/OTP documentation toolchain working (admittedly on a
slightly unusual platform.)

I'm used to wikis: learn a little markup, and you can fix any problem
almost instantly. Right now, there's someone in Ericsson who could
make my proposed four-word change almost instantly, since they've got
it all set up already.

> [purpose] was to allow for tracing selectively on e.g.
> a call setup sequence in a phone switch*. You shouldn't rely on it for
> any other purpose.

*Boink* The beta-warning in the seq_trace documentation says:

"... the programming interface still might undergo minor adjustments
(possibly incompatible) based on feedback from users."

What little feedback I can find in public suggests that people are
only too happy to have discovered that seq_trace implements Lamport
clocks. And yet you think I shouldn't "rely" on their continued
presence, because dropping them (or changing them to something else
incompatible) is a possible "minor adjustment" to the "interface"? My
idea of a "minor adjustment" would be to define SeqTraceInfo (and
maybe the timestamp) as a record rather than a bare tuple. I would
suggest some such change, except it's really not that important
compared to backward compatibility for other users of seq_trace out
there.

> your refusal to accept that the seq_trace API was not
> meant to implement Lamport clocks, and might well depart from them

If you mean "might well depart" in the *future*, how is that a "minor
interface adjustment"? SerialInfo - the timestamp for Lamport logical
clocks - is all over the API. No, that's a "major algorithm change."

If you mean "might well depart" *now*, why did Kenneth Lundin say it
implements them? If he's wrong, why did nobody in Ericsson correct
him? Lamport clocks are Lamport clocks, regardless of "intent."

> That seq_trace is completely independent of the built-in tracing is also
> misleading.

WTF? Where did I say it was "completely independent"? Where did anyone
say it was? And it seems pretty "built-in" to me. I mean, how do you
do seq_trace (including the Lamport clock counter increments) without
a hack to Send? That's not very far above bare metal.

> ... one of the biggest drawbacks
> of erlang's tracing support is that only one tracer per process at a time
> can be supported.

*Boink*. seq_trace is *part* of "erlang's tracing support". In what
way is it limited to "one tracer *per* process at a time"? It's
limited to one *tracer process* at a time, but, if I'm reading the
documentation right (*sigh*), it can selectively trace an arbitrary
number of tokens through any number of processes they might propagate
into. If you want individual tracer processes on (say) a per-token
basis, or on a per-process basis, I guess you could just use your
single tracer process as a dispatcher for the others. A few more lines
of code, no biggie.

> (And no, the failure to mention Erlang's support for real-time
> tracing in that thesis is more likely to be due to internal rivalry,
> or simply lack of interest in technologies that they can't use
> anyway - Erlang due to past policy issues and AXE since it's
> a legacy system using a weird programming language).

*Double boink*. The thesis was submitted in late 2008. I see Kenneth
Lundin affirming, in mid-2007, that seq_trace implements Lamport
clocks, for an Erlang/OTP that had been open sourced for years by
then. You're saying it's possible that the author of that thesis might
not have been able to find out about seq_trace (or Lamport clocks), or
was not able to use seq_trace (or its Lamport clocks), because of
"internal rivalry" or "past policy issues"???

Things must be a lot weirder in there than I ever suspected. And yet
I'm supposed to be confident that if I submit a patch to the seq_trace
documentation informing users that it implements Lamport clocks, it's
very likely to be taken up?

Please tell me you really meant something else by that last, but just
totally garbled it.

-michael turner

Michael Turner

unread,
Jun 5, 2012, 6:42:27 AM6/5/12
to Ulf Wiger, Erlang Questions
Ulf, when I write "seq_trace implements Lamport clocks", please try to
read it as you would "This ANSI Standard C compiler implements IEEE
arithmetic." That doesn't mean "this ANSI Standard C compiler *is*
IEEE arithmetic."

If people want to *add* new API elements to seq_trace, fine. I'd could
go for a different way of calling stuff, and of representing the
relevant data structures. Not important, but nice.

If people want to add new features to seq_trace, fine. Maybe it could use some.

Riak uses vector clocks, which are based on scalar (Lamport) clocks.
They can be problematic, since vector clocks involve n-length vectors,
where n is the number of processes concerned. Still, if somebody
wanted to add *that* to seq_trace, I'd be cool with it.

But changing "seq_trace" to "lamport" is

(a) semantically wrong, since seq_trace *implements* Lamport clocks
but is not *simply* Lamport clocks,

and

(b) pragmatically wrong, since it breaks any existing code that
depends on seq_trace, and also breaks anything out there that has
implemented a module called "lamport" independently.

By the way, I can't figure out why it's called "sequential tracing".
If somebody told me, "it has to be called 'X tracing', solve for X,"
I'd say, "X = 'parallel'", not "X = 'sequential'." Does the
"sequential" refer to the fact that the (single) tracer process
receives a stream of events? OK, but ... that's not the important
thing -- precisely because those events didn't necessarily happen in
the order received, in *real* time. (Whatever that is - Lamport points
out that understanding of the theory of relativity gave him some
insights into this problem). That's the point of having logical clocks
like Lamport's - to help sort out chronology -- and, you hope --
causality -- to the extent possible when you can't rely 100% on
real-time clocks.

Which brings up another point, already raised by Scott Fritchie, here:

http://erlang.org/pipermail/erlang-questions/2007-May/026822.html

and not adequately addressed in Kenneth Lundin's reply, here:

http://erlang.org/pipermail/erlang-questions/2007-May/026827.html

Scott writes of seq_traces real-time
(seconds/milliseconds/microseconds) timestamp:

"Inviso uses the (optional) timestamp, but that's the erlang:now()
value, and even an NTP time sync may not be good enough for a busy
system to avoid bogus event ordering. I have enough problems with NTP
on or lab machines -- it shouldn't be hard, but apparently it is,
because their NTP daemons aren't running 1 time in 8.

"Don't get me started about time drift in Linux virtual machines,
VMware and Xen both. {sigh}"

Yes. And don't get ME started about a terrestrial node drifting out of
sync with one that's orbiting at velocities where relativistic effects
start to add up.

Scott makes good points, but the documentation for seq_trace carries
no cautionary notices about relying on the real-time timestamps it
reports. This seems an odd omission to me, since seq_trace would seem
to be especially useful in cases where real-time clocks are
unreliable. You could even dispense with real-time timestamps in
seq_trace, and what's left would still have a substantial raison
d'etre.

-michael turner

Ulf Wiger

unread,
Jun 5, 2012, 7:56:53 AM6/5/12
to Michael Turner, Erlang Questions

On 5 Jun 2012, at 12:03, Michael Turner wrote:

> I've tried to stay on the point: if
> you're using Lamport clocks, exposing them in an API, *admitting* (as
> Kenneth Lundin did, on this list, in 2007) that you're using them,
> then the documentation should say so.

I have agreed that it wouldn't hurt to add that to the existing
documentation, but have also argued that one needs to
remember the *purpose* of seq_trace, and discuss whether
the current API is the right one, and what changes to the
documentation would best help users to make use of it.

It could well be that such a process would result in an API
documentation that *does not* expose Lamport clocks, or
(as I suggested), creates a separate component that exposes
Lamport clocks in a more obvious and generic way.

> To save people time, if nothing
> else. But especially so that people who are looking for a Lamport
> clock implementation in Erlang will be able to find it easily in
> searches.

If that is the purpose, then creating a separate 'lamport' module
would be a much better solution, obviously.

>
>> I claim that seq_trace implements Lamport clocks "by
>> accident", and that it was not the original purpose, nor a complete
>> solution to the problem.
>
> (a) You have no evidence of this,

I was there, remember?

With 'by accident' I mean that they could have solved it differently,
and then the API would not have exposed Lamport clocks, and
they would still have fulfilled the requirement.

I'm not saying they didn't know they were using Lamport clocks.
I'm saying it was not what the customer asked for, and the man
page, such as it is, reflects what the customer had ordered.

I guess a better way of putting it is that it was coincidental.

> As to the "original purpose" of "accidentally" implementing Lamport
> clocks in seq_trace, what, pray tell, WAS the original purpose of an
> "accidental" implementation of them, if it wasn't basically the same
> as Lamport's purpose? Just to have some intriguing pairs of numbers to
> look at, in otherwise-boring traces?

What the original purpose was is exactly what I have tried to
tell you. I won't repeat it here.

I was not, however, present when Leslie Lamport started thinking
about Lamport clocks, so I can't speak from own experience.
But as you mentioned yourself, he traces it back to a different
problem:
http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#time-clocks

"The origin of this paper was a note titled The Maintenance of
Duplicate Databases by Paul Johnson and Bob Thomas.
I believe their note introduced the idea of using message
timestamps in a distributed algorithm. […]
Because Thomas and Johnson didn't understand exactly what
they were doing, they didn't get the algorithm quite right; their
algorithm permitted anomalous behavior that essentially
violated causality. I quickly wrote a short note pointing this
out and correcting the algorithm.
[…]
"It didn't take me long to realize that an algorithm for totally ordering
events could be used to implement any distributed system.
A distributed system can be described as a particular sequential
state machine that is implemented with a network of processors.
The ability to totally order the input requests leads immediately
to an algorithm to implement an arbitrary state machine by a
network of processors, and hence to implement any distributed
system. So, I wrote this paper, which is about how to implement
an arbitrary distributed state machine. As an illustration, I used
the simplest example of a distributed system I could think of--a
distributed mutual exclusion algorithm."

My read on that: he didn't originally set out to solve the problem
of capturing sequence traces in a real-time system, but noted
after a while that his proposed solution was extremely general.

The OTP team could have set out to implement sequence
tracing, decided to do it using Lamport clocks, then realizing that
the implementation could easily be generalized, and changed
the API and documentation accordingly.

This is not what happened. It could still happen.

>> or figure out how to build a documentation patch yourself.
>
> If in fact you represent the Ericsson point of view on this issue, I'd
> be wasting my time: it'll be rejected for the reason that "we invented
> that independently." I'd like to know what the Ericsson point of view
> is, before I try something that might be futile.

I don't represent Ericsson, and that is not at all what I have been saying.


> So far, only Robert Virding has spoken up on this issue. I
> pointed out to him that he's coauthor on a 1993 publication that cites
> Lamport's paper -- long before the 1997-8 timeframe you give for
> seq_trace requirements acquisition and implementation. He hasn't
> responded since.

What Robert wrote was that he was not part of the team that
implemented seq_trace. He also doesn't represent Ericsson
(anymore).

>> your refusal to accept that the seq_trace API was not
>> meant to implement Lamport clocks, and might well depart from them
>
> If you mean "might well depart" in the *future*, how is that a "minor
> interface adjustment"? SerialInfo - the timestamp for Lamport logical
> clocks - is all over the API. No, that's a "major algorithm change."

The man page speaks of minor adjustments. I argue that one should
perhaps consider a major overhaul. For B/W compatibility, it would
be better to introduce a new, better API.

From your other mail:

> But changing "seq_trace" to "lamport" is
>
> (a) semantically wrong, since seq_trace *implements* Lamport clocks
> but is not *simply* Lamport clocks,
>
> and
>
> (b) pragmatically wrong, since it breaks any existing code that
> depends on seq_trace, and also breaks anything out there that has
> implemented a module called "lamport" independently.

I didn't suggest making an incompatible change to seq_trace,
but basing the initial implementation of 'lamport' on the
seq_trace implementation.

The unprefixed namespace belongs to OTP. This matter has
been debated many times. It's not a very good setup, but
so it is.

> If you mean "might well depart" *now*, why did Kenneth Lundin say it
> implements them? If he's wrong, why did nobody in Ericsson correct
> him? Lamport clocks are Lamport clocks, regardless of "intent."

So they are, but that doesn't necessarily mean that their use should
be commited to the API and highlighted in the documentation.
Note - this is meant in general terms, as I didn't object to adding
a reference to Lamport in the current seq_trace documentation.

An example: the Erlang docs describe how the VM samples the
length of the receiver's message queue when sending a message,
and penalizes the sender with extra reductions if the queue length
exceeds a certain threshold.

This was a cool way to implement poor-man's flow control in a
single-core system, but in a many-core system, it's a pretty bad idea.
As it's been committed to the documentation, it is harder to change
now that it is arguably more of a burden than a feature.

This is an example of why it is so important to go back to the
original purpose, and ask, as Bjarne was wont to say "What's
the bloody problem?" What problem did we originally set out to
solve, and what changes might be needed to ensure that we
solve that problem well - and keep solving it well based on where
we're heading?

Some things are better not added to the reference manual.

>
>> That seq_trace is completely independent of the built-in tracing is also
>> misleading.
>
> WTF? Where did I say it was "completely independent"? Where did anyone
> say it was?

It's in the seq_trace man page. First paragraph of Description,
second sentence. I didn't claim you said it, but could have
been clearer about that - apologies.


>> ... one of the biggest drawbacks
>> of erlang's tracing support is that only one tracer per process at a time
>> can be supported.
>
> *Boink*. seq_trace is *part* of "erlang's tracing support". In what
> way is it limited to "one tracer *per* process at a time"?

Erlang's tracing support allows only one tracer (process)
per process. This is a well-known and documented limitation.

The seq_trace system_tracer allows only one per node.

A generic lamport clock implementation has no need for
such limitations. And while you could use seq_trace for
other purposes, these issues become distracting.

As it is, seq_trace is caught somewhere in the middle. It is a
fairly nice implementation of Lamport clocks, but not really
intended for, or entirely fit for, use as a generic Lamport clock
implementation. As a solution for sequence/transaction/forlopp
tracing, it is half-baked and slightly confusing. It kindof works,
but few people understand it well enough to use it. Even OTP
doesn't necessarily use it in all places where it would fit.

This indicates that the seq_trace API and docs could evolve
in either of two different directions - or be reworked, split,
and made more inutitive, addressing both issues at the
same time.


>> (And no, the failure to mention Erlang's support for real-time
>> tracing in that thesis is more likely to be due to internal rivalry,
>> or simply lack of interest in technologies that they can't use
>> anyway - Erlang due to past policy issues and AXE since it's
>> a legacy system using a weird programming language).
>
> *Double boink*. The thesis was submitted in late 2008. I see Kenneth
> Lundin affirming, in mid-2007, that seq_trace implements Lamport
> clocks, for an Erlang/OTP that had been open sourced for years by
> then. You're saying it's possible that the author of that thesis might
> not have been able to find out about seq_trace (or Lamport clocks), or
> was not able to use seq_trace (or its Lamport clocks), because of
> "internal rivalry" or "past policy issues"???

Not sure what you mean by "boink". I read it as an insult, but
perhaps it's just your way of expressing surprise?

> Things must be a lot weirder in there than I ever suspected.

Indeed. But that's definitely a side track. I just mentioned in
passing that you'd think a thesis about *real-time tracing*
(not Lamport clocks, although it mentions them, and other
techniques) should mention the two best exponents of
real-time tracing in the company. Again note the bigger
picture here - *tracing*. Global ordering of events is one
challenge. There are others.

> And yet I'm supposed to be confident that if I submit a patch to
> the seq_trace documentation informing users that it implements
> Lamport clocks, it's very likely to be taken up?

If it improves the documentation, yes, of course.
This was seconded by Gustav Simonsson, who works in the OTP
team. He even suggested where to put it.

The OTP team is by no means hampered by any policy (esp.
*past* policy) decisions not to mention Erlang.

Ulf Wiger

unread,
Jun 5, 2012, 8:08:33 AM6/5/12
to Michael Turner, Erlang Questions

On 5 Jun 2012, at 12:42, Michael Turner wrote:

> Ulf, when I write "seq_trace implements Lamport clocks", please try to
> read it as you would "This ANSI Standard C compiler implements IEEE
> arithmetic." That doesn't mean "this ANSI Standard C compiler *is*
> IEEE arithmetic."

In that particular example, adding it to the documentation is an important
service to the user. Indeed, it might event be part of the requirements for
the compiler.

> By the way, I can't figure out why it's called "sequential tracing".
> If somebody told me, "it has to be called 'X tracing', solve for X,"
> I'd say, "X = 'parallel'", not "X = 'sequential'." Does the
> "sequential" refer to the fact that the (single) tracer process
> receives a stream of events?

It is actually wrong, IMHO. It should be 'sequence tracing'.

The problem was how to dynamically trigger a trace of e.g.
one single call setup flow in a system that handled hundreds
of calls per second. What has usually been done is that you
turn on tracing on just about everything, then push a single
call through the system. Obviously, this doesn't work in a live
system, so users (used to similar tracing support in the AXE)
asked for a way to trace on a single sequence of events,
as it triggered various activity in the system.

The huge, overshadowing problem for people needing to trace
on a live system, is that you have to be really careful that whatever
trace you turn on doesn't kill the system you are trying to study.

This is why any such tracing support needs to be extremely well
thought out, and intuitive for the people who are expected to use
it. We are not there today, and the docs need improving, but those
who do need to understand what the intent of the component is
in the first place.

> Scott makes good points, but the documentation for seq_trace carries
> no cautionary notices about relying on the real-time timestamps it
> reports. This seems an odd omission to me, since seq_trace would seem
> to be especially useful in cases where real-time clocks are
> unreliable. You could even dispense with real-time timestamps in
> seq_trace, and what's left would still have a substantial raison
> d'etre.


Agreed, including the "don't get me started…" part. ;-)

Michael Turner

unread,
Jun 5, 2012, 11:14:56 AM6/5/12
to Ulf Wiger, Erlang Questions
On Tue, Jun 5, 2012 at 8:56 PM, Ulf Wiger <u...@feuerlabs.com> wrote:
>
> On 5 Jun 2012, at 12:03, Michael Turner wrote:
>
>> I've tried to stay on the point: if
>> you're using Lamport clocks, exposing them in an API, *admitting* (as
>> Kenneth Lundin did, on this list, in 2007) that you're using them,
>> then the documentation should say so.
>
> I have agreed that it wouldn't hurt to add that to the existing
> documentation, ...

Actually, you've said that it could hurt -- by committing to a
decision that you think might be premature. (Somehow. After 15 years.)

> but have also argued that one needs to
> remember the *purpose* of seq_trace, ...

I can't remember any purpose of seq_trace that's not described in the
documentation for it. And that documentation says, right there:

"Sequential tracing makes it possible to trace all messages resulting
from one initial message."

There's nothing in Lamport clocks that's in conflict with that purpose.

At one point it says:

"In the following sections Sequential Tracing and its most fundamental
concepts are described."

And in the first of those following sections, it says:

"The purpose [of the trace token component, Serial, with its Previous
and Current] is to uniquely identify each traced event within a trace
sequence and to order the messages chronologically and in the
different branches if any."

Then it goes on to describe "the algorithm" for updating those
counters. Not "an algorithm." The algorithm.

If Lamport clocks are not part and parcel of the purpose of seq_trace
according to the reference manual, then I guess you're privy to some
*secret* purpose of seq_trace.

> ... and discuss whether
> the current API is the right one, and what changes to the
> documentation would best help users to make use of it.

That need has existed for a while. *Failing* to describe seq_trace as
implementing Lamport clocks can only have worked against that need. It
means lots of people who might otherwise have had intelligent things
to say about the current API and how the documentation could be
improved have not even had seq_trace come to their attention. And
that's a lot of people. It's potentially everyone interested in Erlang
who has also had the relevant education. Is there a textbook out there
on distributed systems and MIMD parallel processing that *doesn't*
bring up Lamport clocks?

> It could well be that such a process would result in an API
> documentation that *does not* expose Lamport clocks, or
> (as I suggested), creates a separate component that exposes
> Lamport clocks in a more obvious and generic way.

Hiding useful functionality? Changing the API to break existing code?
It may well be that the sky can be turned a nice shade of green, then
yellow.

You want an "obvious and generic way" to "expose" Lamport clocks?
seq_trace already does it (except in being so subtle as to not name
them as such in the documentation.) You don't have to use those
Lamport clocks in seq_trace if you don't want to.

And I can't imagine a way to use Lamport clocks without also doing
tracing -- that's practically what they are for. Unless by "tracing"
you (narrowly) mean "written output to puzzle over for debugging
purpose."

MY purpose is testing (the "trace-driven development" of this thread,
if anything), and when my tests pass, after the tracer process has
seen messaging behavior that conforms to my spec, I don't want to see
*anything* on the output. seq_trace as it stands can do that for me.
Why should I complain?

>> To save people time, if nothing
>> else. But especially so that people who are looking for a Lamport
>> clock implementation in Erlang will be able to find it easily in
>> searches.
>
> If that is the purpose, then creating a separate 'lamport' module
> would be a much better solution, obviously.

Oh, yeah, obviously. Look, Lamport clocks exist to trace the behavior
of processes, and seq_trace can be used without using its Lamport
clocks. Why is separation better than just maintaining backward
compatibility, at this point?

>>
>>> I claim that seq_trace implements Lamport clocks "by
>>> accident", and that it was not the original purpose, nor a complete
>>> solution to the problem.
>>
>> (a) You have no evidence of this,
>
> I was there, remember?

No, I don't remember. How can I? I wasn't there. A claim that you
independently re-invented something by accident, after years of
exposure to co-workers who clearly know what that thing is (if their
book is any indication -- and you must have read that book) does not
qualify as evidence. And since below you contradict yourself, saying
first that Lamport clock behavior was a customer requirement, then
saying the OTP had discretion over whether to implement them, I don't
have much reason to trust your memory.

> With 'by accident' I mean that they could have solved it differently,
> and then the API would not have exposed Lamport clocks, and
> they would still have fulfilled the requirement.

Without exposure of the Lamport clocks in the seq_trace API, there's
no reason to implement them in seq_trace in the first place. Don't you
understand what they do?

> I'm not saying they didn't know they were using Lamport clocks.
> I'm saying it was not what the customer asked for, and the man
> page, such as it is, reflects what the customer had ordered.

You can't have it both ways, Ulf. The man page "reflects" Lamport
clocks, so you're saying the customer was asking for Lamport clocks in
their tracer (whether they called them that or not). Lamport clocks
are cited in the first book on Erlang. Lamport clocks are part of the
implementation of Mnesia. So you're saying that the customer ordered a
certain behavior, and nobody in the Erlang group recognized that the
customer was asking for Lamport clocks? Making it an "accident"? What
sense does that make?

> I guess a better way of putting it is that it was coincidental.

Color me incredulous.

>> As to the "original purpose" of "accidentally" implementing Lamport
>> clocks in seq_trace, what, pray tell, WAS the original purpose of an
>> "accidental" implementation of them, if it wasn't basically the same
>> as Lamport's purpose? Just to have some intriguing pairs of numbers to
>> look at, in otherwise-boring traces?
>
> What the original purpose was is exactly what I have tried to
> tell you. I won't repeat it here.

You have not told me what that purpose putting Lamport clocks in seq_trace was.

The purpose? As far as any reasonable reader should be concerned, the
document *defines* seq_trace in terms of Lamport clocks -- see my
excerpts above. That makes Lamport clocks integral to its purpose. At
worst, the reader should be prepared for changes to the *interface*,
not the implementation.. You seem to think that seq_trace could have
hidden Lamport clocks, when in fact hiding them would only have
defeated their purpose in a tracing package. This makes no sense at
all.

[snip long quote from Lamport]:
> My read on that: he didn't originally set out to solve the problem
> of capturing sequence traces in a real-time system, but noted
> after a while that his proposed solution was extremely general.

So what? I'm not crediting Lamport with seq_trace, much less with AXE
forlopp's. I'm only seeking credit for Lamport in the seq_trace
documentation. Which he deserves. And which we all deserve, since it
makes it easier to find his work in Erlang if you already know his
work, and easier (upon reading this fact in the seq_trace
documentation) to find other people's work where it uses Lamport
clocks for various practical purposes, results that might be
implemented in Erlang, redounding the the benefit and glory of
Erlang/OTP in the process. How does anybody lose? I don't get it.

> The OTP team could have set out to implement sequence
> tracing, decided to do it using Lamport clocks, then realizing that
> the implementation could easily be generalized, and changed
> the API and documentation accordingly.

First you say (above) that Lamport clocks were a customer requirement.
Now you're saying the OTP team had discretion in this matter. It can't
be both.

> This is not what happened. It could still happen.

And I could skate across hell -- when it freezes over.

[snip comments about Virding's contribution to this debate.]

> The man page speaks of minor adjustments. I argue that one should
> perhaps consider a major overhaul. For B/W compatibility, it would
> be better to introduce a new, better API.

Well, you're free to fork Erlang/OTP and try to sell people on the
result. As it is, not having Lamport clocks mentioned explicitly in
the seq_trace documentation means that there's basically no customer
base to address anyway, since hardly anybody ever found out they were
in there.

> I didn't suggest making an incompatible change to seq_trace,

You've repeatedly suggested it might be desirable. Even in this e-mail.

> but basing the initial implementation of 'lamport' on the
> seq_trace implementation.

But if Lamport clocks subsequently disappear from the seq_trace
implementation, as you seem to think should happen, you've created
backwards incompatibility. So what's the difference?

>> If you mean "might well depart" *now*, why did Kenneth Lundin say it
>> implements them? If he's wrong, why did nobody in Ericsson correct
>> him? Lamport clocks are Lamport clocks, regardless of "intent."
>
> So they are, but that doesn't necessarily mean that their use should
> be commited to the API and highlighted in the documentation.

As far as any reasonable reader should be concerned, the document
*defines* seq_trace in terms of Lamport clocks -- see my excerpts
above. At worst, the reader should be prepared for changes to the
*interface*, not the implementation.

> Note - this is meant in general terms, as I didn't object to adding
> a reference to Lamport in the current seq_trace documentation.

Yes, you did. You openly feared it would overcommit Erlang/OTP to
Lamport clocks in the implementation of seq_trace.

> An example: the Erlang docs describe how the VM samples the
> length of the receiver's message queue when sending a message,
> and penalizes the sender with extra reductions if the queue length
> exceeds a certain threshold.....
> As it's been committed to the documentation, it is harder to change
> now that it is arguably more of a burden than a feature.

If Erlang/OTP has overcommitted itself on one point, that still says
nothing about whether seq_trace also has. I don't see where it does,
and I've been using it for months. If you don't want to use
seq_trace's Lamport clocks, you don't have to. (I don't -yet.) You
will pay for it only in a counter-increment and some copying of those
counters on each trace call -- computational costs that are completely
overwhelmed, I'm sure, by everything else required to do any tracing
at all. As for using Lamport clocks independent of seq_trace *as a
debugging tool*, I see no reason why people can't, nor much reason why
they should be bothered by the fact that they are using a package
originally intended for debug traces -- it won't pose any significant
added burden on them, either computationally or in coding keystrokes,
over having a separate implementation. (If you can even *have* a
separate implementation of Lamport clocks that doesn't basically
replicate almost everything seq_trace does.)

> This is an example of why it is so important to go back to the
> original purpose, and ask, as Bjarne was wont to say "What's
> the bloody problem?" What problem did we originally set out to
> solve, and what changes might be needed to ensure that we
> solve that problem well - and keep solving it well based on where
> we're heading?

MY problem was that I needed to record a message traffic pattern, and
a way to reasonably order those messages in order to establish whether
that pattern is canonical for my purposes. seq_trace does that for me.
I bet it could also do that job for Riak's vector clocks (if it
doesn't already -- and if it does, that's yet another argument for "it
ain't broke, so don't fix it.")

> Some things are better not added to the reference manual.

Give me an argument that this is such a case. A concrete argument, not
a handwaving one.

>> WTF? Where did I say it was "completely independent"? Where did anyone
>> say it was?
>
> It's in the seq_trace man page. First paragraph of Description,
> second sentence. I didn't claim you said it, but could have
> been clearer about that - apologies.

Yes, if you want to report a bug against the documentation, go ahead.
But it might actually be true enough -- i.e., that you could remove
other tracing APIs from Erlang and seq_trace would still work just
fine. "Completely independent" might be bad writing, but not
necessarily *technically* inaccurate.

>>> ... one of the biggest drawbacks
>>> of erlang's tracing support is that only one tracer per process at a time
>>> can be supported.
>>
>> *Boink*. seq_trace is *part* of "erlang's tracing support". In what
>> way is it limited to "one tracer *per* process at a time"?
>
> Erlang's tracing support allows only one tracer (process)
> per process. This is a well-known and documented limitation.

I wouldn't know, since the documentation scared me off. As already noted.

> The seq_trace system_tracer allows only one per node.

An unfortunate limitation.

> A generic lamport clock implementation has no need for
> such limitations.

Gosh, could it be that if the documentation for seq_trace had always
said it implemented Lamport clocks, this shortcoming would have come
to light much sooner and been remedied long ago?

> And while you could use seq_trace for
> other purposes, these issues become distracting.

I guess it depends on how distractable you are. I find the relative
simplicity of seq_trace a source of consolation: for now, it's keeping
me out of something Jason described as "a special kind of hell." And I
find the existence of 2000+ publications citing Lamport's paper
encouraging as well: I can probably use seq_trace to solve a wide
variety of testing problems.

> As it is, seq_trace is caught somewhere in the middle. It is a
> fairly nice implementation of Lamport clocks, but not really
> intended for, or entirely fit for, use as a generic Lamport clock
> implementation. As a solution for sequence/transaction/forlopp
> tracing, it is half-baked and slightly confusing. It kindof works,
> but few people understand it well enough to use it. Even OTP
> doesn't necessarily use it in all places where it would fit.

Yep. Could the fact that it was never openly identified as
implementing Lamport clocks explain, in large part, why it has
remained obscure? "Oh look, behind the shed: a wheel like the one
we're working on now. Huh. What was it doing behind the shed?"

> This indicates that the seq_trace API and docs could evolve
> in either of two different directions - or be reworked, split,
> and made more inutitive, addressing both issues at the
> same time.

Oh, whatever. Just don't break the current API, OK?

[snip]
> Not sure what you mean by "boink". I read it as an insult, but
> perhaps it's just your way of expressing surprise?

Yes.

[snip]
>> And yet I'm supposed to be confident that if I submit a patch to
>> the seq_trace documentation informing users that it implements
>> Lamport clocks, it's very likely to be taken up?
>
> If it improves the documentation, yes, of course.

The issues here go straight to the question of what constitutes an
"improvement" in this case.

> This was seconded by Gustav Simonsson, who works in the OTP
> team. He even suggested where to put it.

Great. But what's the hangup with just *doing* it? Do you have to
first circulate memos among Ericsson's lawyers or something?

> The OTP team is by no means hampered by any policy (esp.
> *past* policy) decisions not to mention Erlang.

Well, I sure hope people are allowed to mention Erlang, if they work
on OTP. It could get awkward always having to say "that language"
instead.


-michael turner

Ulf Wiger

unread,
Jun 5, 2012, 11:46:47 AM6/5/12
to Michael Turner, Erlang Questions

On 5 Jun 2012, at 17:14, Michael Turner wrote:

> No, I don't remember. How can I? I wasn't there.

Because I told you so previously in this thread.

Whatever factual points you may have raised in this discussion,
I've had enough of your sarcasm and insults. Feel free to
do with the information as you wish, and good luck either
submitting a patch on the documentation or getting someone
else to do it for you.

/Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com



Michael Turner

unread,
Jun 5, 2012, 12:43:04 PM6/5/12
to Ulf Wiger, Erlang Questions
On Wed, Jun 6, 2012 at 12:46 AM, Ulf Wiger <u...@feuerlabs.com> wrote:

> Whatever factual points you may have raised in this discussion,
> I've had enough of your sarcasm and insults.

Nice timing, because I've had enough of the contradictions in what you write.

Speaking of which, is it really true, as you say, that seq_trace
doesn't work between nodes? In "Distribution", the documentation says:

"Sequential tracing between nodes is performed transparently .... In
order to be able to perform sequential tracing between distributed
Erlang nodes, the distribution protocol has been extended (in a
backward compatible way). An Erlang node which supports sequential
tracing can communicate with an older (OTP R3B) node but messages
passed within that node can of course not be traced."

Always helps to know what you're talking about, and have actual
evidence for what you claim. In this case, I have no evidence, not
having tried it.

-michael turner

Ulf Wiger

unread,
Jun 5, 2012, 12:55:30 PM6/5/12
to Michael Turner, Erlang Questions

On 5 Jun 2012, at 18:43, Michael Turner wrote:

> Speaking of which, is it really true, as you say, that seq_trace
> doesn't work between nodes?

It's not true, and I didn't say so.

I wrote:

> However, ordering may not have been a terribly difficult problem
> to manage in AXE, as it was a single-CPU system, never distributed.
> When the OTP team implemented similar support, obviously they
> had to make something that worked in a distributed system

I wrote that the *AXE* was never distributed. As far as I know, it
still isn't. And I wrote that OTP *obviously* had to make something
that worked in a distributed system.

BR,
Ulf W
Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com



Henning Diedrich

unread,
Jun 5, 2012, 7:50:24 PM6/5/12
to Erlang Questions
On 6/4/12 6:52 AM, Michael Turner wrote:
I started using seq_trace ... an implementation of Lamport clocks. ... Just say it. 

I would like to try a brief on what Lamport clocks are, based on [1] + [2].

I think that it's freakishly exciting because it obviously is an application of the Special Theory of Relativity, as Lamport mentions.

From that, I sense a promise of finally understanding something I run up against ever since tackling Erlang, how to /think/ parallel. Special Relativity really might be the answer to that: it cancels out the physical clock that is unattainable as part of any solution at any rate. Giving it up, in essence skipping this reaching for an objective reference for events, may be the liberating move to cut the knot. In the process I also found a new favored quote: "Systems in which an event can happen before itself do not seem to be physically meaningful."


Lamport clocks are simple counters, one per process.

They help to order events occurring in parallel processes, with 'false positives' as price for the ease of the algorithm.

False positives would be events listed as having happening one after the other while "really" (causally) having occurred "at the same time".

This matters less because an order of events can never express with certainty what event /did/ affect which other event, but only which event /did not/ (the later the earlier). And this is not violated when concurrent events are ordered in sequence.

What is lost is only the accurate reflection of possibility: ordered by Lamport clocks, some events that look like they possibly could have affected some other (later) events, in reality could not.

Lamport clocks work like this:

* Every process has a counter.
* Before any 'event' it increments it.
* Also before sending a message, it increments it.
* The process sends its counter value with every message.
* The receiving process sets its counter to the greater of itself or the received.
* It increments it.
* It assigns it to the event of receiving.


I found Lamport's original paper [2] much more accessible than usually, and the illustrations he gives (Fig. 1 - 3) are easy to grasp and illuminating. It's really mostly about the 3 1/2 first pages.

Writes Lamport: "Acknowledgment. The use of timestamps to order operations, and the concept of anomalous behavior are due to Paul Johnson and Robert Thomas."

Best,
Henning

[1] http://en.wikipedia.org/wiki/Lamport_timestamps
[2] http://research.microsoft.com/users/lamport/pubs/time-clocks.pdf

Henning Diedrich

unread,
Jun 5, 2012, 8:40:55 PM6/5/12
to Erlang Questions
I think from the pyre, this must be salvaged as the actual point I was going for:


On 6/4/12 6:52 AM, Michael Turner wrote:
I'm now
doing unit testing on a module as I develop it further, based on
collecting and filtering seq_trace results

Is this a common approach, who else does this and how?

Using self-written stuff based on seq_trace, or et_collector?

I should make a decision: do I keep building ever more sophisticated
match filtering on top of seq_trace, undoubtedly reinventing wheel
after wheel, or do I bite the bullet and plunge into what Jayson
Vantuyl describes as "hell"?

One Open Source loving word regarding the 'hell' quote:
I'm hardly the only one who should be embarrassed:

"Erlang tracing is a seething pile of pain that involves reasonably
complex knowledge of clever ports, tracing return formats, and
specialized tracing MatchSpecs (which are really their own special
kind of hell). The tracing mechanism is very powerful indeed, but it
can be hard to grasp."

Obviously, that kind of statement has no place in the official
documentation of a professional product.  

Honest statements like the one you are citing may be the actual luxury and strong value of Open Source efforts, which are not directly a commercial product. Possible only because they /are/ not a product. For one user, fresh air like this only creates trust and allows to (even very precisely) set expectations and alertness to shortfalls in both documentation and package.

That is not to paint the situation rosy, it would be lovely if it was better. But I appreciate the honesty, very much.

Best,
Henning

Michael Truog

unread,
Jun 5, 2012, 10:47:26 PM6/5/12
to Henning Diedrich, Erlang Questions
On 06/05/2012 05:40 PM, Henning Diedrich wrote:

One Open Source loving word regarding the 'hell' quote:
I'm hardly the only one who should be embarrassed:

"Erlang tracing is a seething pile of pain that involves reasonably
complex knowledge of clever ports, tracing return formats, and
specialized tracing MatchSpecs (which are really their own special
kind of hell). The tracing mechanism is very powerful indeed, but it
can be hard to grasp."

Obviously, that kind of statement has no place in the official
documentation of a professional product.  

Honest statements like the one you are citing may be the actual luxury and strong value of Open Source efforts, which are not directly a commercial product. Possible only because they /are/ not a product. For one user, fresh air like this only creates trust and allows to (even very precisely) set expectations and alertness to shortfalls in both documentation and package.

That is not to paint the situation rosy, it would be lovely if it was better. But I appreciate the honesty, very much.


I agree that honesty in the documentation is much more beneficial, due to it being Open Source, rather than the duplicity that might otherwise be present in corporate documentation ("Politics-Oriented Software Development": Documentation, http://www.kuro5hin.org/story/2005/1/28/32622/4244).

The issues mentioned previously in the email thread seem to indicate that Trace-Driven Development would require more documentation and details, especially documentation that is immediately relevant to the beginner.  Putting redbug into OTP seems like it would help reduce the learning curve, just since it is the popular approach to Erlang tracing and has existed for some time.  The goal with such changes would be to make tracing simpler in a way that discourages the natural programmer reaction to information discovery during interactive debugging (of systems that are not yet live), which is inserting print/logging statements.  The tendency to use print statements is a habit from other languages and is often the common denominator when debugging, so making tracing as simple as a print statement (documentation-wise and usability-wise) seems like a sensible goal.

- Michael

Michael Turner

unread,
Jun 6, 2012, 1:05:14 AM6/6/12
to Ulf Wiger, Erlang Questions
Ulf, I misinterpreted this:

> The seq_trace system_tracer allows only one [tracer (process)] per node.
>
> A generic lamport clock implementation has no need for
> such limitations.

I read this as claiming that seq_trace tracing is limited to a node,
which is (now, after a good night's sleep) rather obviously an
overstretched interpretation, at best.

That said, however, the seq_trace documentation seems to say
conflicting things on this point, and I assumed you were working from
where it says this:

"The system tracer will only receive those trace events that occur
locally within the Erlang node. To get the whole picture of a
sequential trace that involves processes on several Erlang nodes, the
output from the system tracer on each involved node must be merged
(off line)."

If the documentation *also* says the following, later --

"Sequential tracing between nodes is performed transparently. [C-node
part snipped] In order to be able to perform sequential tracing
between distributed Erlang nodes, the distribution protocol has been
extended (in a backward compatible way). An Erlang node which supports
sequential tracing can communicate with an older (OTP R3B) node but
messages passed within that node can of course not be traced."

... then it seems very contradictory to me. I haven't tried to
multi-node work with seq_trace yet, so I don't quite get any of this.
Maybe the two statements can be reconciled (with some fact that's
currently unstated?)

My big concern right now: I'd like to build on seq_trace, but maybe
it's a foundation of shifting sands? It's happened to me before. I
started building a graphics interface last year on the assumption that
gs was going to be around for a while. Then I noticed I seemed to be
taking a long walk on an incompleted pier. I inquired. It turned out
that support for it had been abandoned. Without notice in the
documentation of deprecation. (That notice has since been added.)

-michael turner



> I wrote that the *AXE* was never distributed. As far as I know, it
> still isn't.

Well, then: since the logical clocks of Lamport's scheme address the
problem of *physical* clock skew between systems, if AXE was never
distributed, then it never needed to implement Lamport clocks, and
thus it never implemented the full functionality of seq_trace. So can
we finally lay that claim of simultaneous independent invention to
rest? It seems to have become some kind of assumption among Erlang
insiders, but there's no evidence for it, and (apparently) no
motivation at the time to have done it.

-michael turner

Michael Turner

unread,
Jun 6, 2012, 2:02:30 AM6/6/12
to Henning Diedrich, Erlang Questions
On Wed, Jun 6, 2012 at 9:40 AM, Henning Diedrich <hd2...@eonblast.com> wrote:
> I think from the pyre, this must be salvaged as the actual point I was going
> for:
>
>
> On 6/4/12 6:52 AM, Michael Turner wrote:
>
>> I'm now
>> doing unit testing on a module as I develop it further, based on
>> collecting and filtering seq_trace results
>
>
> Is this a common approach, who else does this and how?

I'd like to know. I'm guessing that one of the big problems I've had
with understanding Ulf here is that, for him, tracing is, by
definition, a way to collect data about *anomalous* behavior. To me,
tracing is "selectively (and only on occasion) collecting data about
behavior." Period. You can do whatever you want with that data. The
behavior doesn't have to be pathological. In fact, you can use the
data as some assurance of correctness - the "occasion" can be running
a test suite. Which is to say, the "trace-driven development" of this
thread.

Is it problematic? Of course it is. I accept the possibility that some
of my seq_trace-enabled tests could pass with flying colors only
because the delays incurred by recording the trace tokens suppress
some race condition that actually does arise in production runs. But,
hey, that's testing for you: it can only prove the presence of errors,
not their absence.

On your point about documentation (which is basically a good one,
don't get me wrong):

Jason:
> "Erlang tracing is a seething pile of pain that involves reasonably
> complex knowledge of clever ports, tracing return formats, and
> specialized tracing MatchSpecs (which are really their own special
> kind of hell). The tracing mechanism is very powerful indeed, but it
> can be hard to grasp."

Me:
> Obviously, that kind of statement has no place in the official
> documentation of a professional product.

(Which Ulf agrees with!)

You write:
> Honest statements like the one you are citing may be the actual luxury and
> strong value of Open Source efforts, which are not directly a commercial
> product. Possible only because they /are/ not a product. For one user, fresh
> air like this only creates trust and allows to (even very precisely) set
> expectations and alertness to shortfalls in both documentation and package.
>
> That is not to paint the situation rosy, it would be lovely if it was
> better. But I appreciate the honesty, very much.

There are three issues here:
(1) how honest one should be in a given context,
(2) in what style one should couch that honesty,
(3) whether Erlang is "directly" a commercial product.

Last one first: we hear a lot in this forum about how Erlang/OTP falls
short in one way or another because of "priorities". These priorities
are those of a company, Ericsson, that has ways of making money off of
Erlang/OTP. Erlang/OTP is thus an open source effort *and* a
commercial product. You can say it's not "directly" commercial,
because they don't sell Erlang/OTP distros. But an economist would
say, "It's just the degenerate case of the shelf price being zero."
There are plenty of software products where the non-zero shelf-price
is negligible compared to all the other costs to the user -- support
contracts, buying add-ins, employing consultants to configure the
thing, and to reconfigure it, and so on.

Being a product means Erlang/OTP gets the best of both worlds in some
ways, but also the worst of both worlds in other ways. This thread
dissolved into acrimony between me and Ulf, and I believe largely
because seq_trace has been somewhat the victim of a "worst of both
worlds" scenario, of a kind I can only speculate about since the
disambiguating details are behind the corporate veil.

How that relates to the first two issues:

I appreciated Jason's wording when it was in the context of his blog.
Unfortunately, when those words are put in the documentation for a
corporation's product, they convey the message "Wow, we got some heavy
shit here -- but you need it, so you better buy one of our training
courses run either by us or some of our alumni, to help get you
through it." I.e., it comes across as neither appropriate in style nor
exactly ethical in the promotion of services. (Pretty bad advertising
either way.)

On a blog, I can take Jason's"special kind of hell" with a grain of
salt, and with real amusement. But on what amounts to a corporate
website? As long as we're extolling honesty here, let's say it: isn't
Ericsson paying the bills for erlang.org? Passages like these, in
*that* context, leave me feeling confused and disturbed. (But they
also diverted me to seq_trace, which it turns out is quite the nugget
of gold. So it's not all bad.)

While we're on appropriate style, let me point out the kind of thing I
prefer. Steve Johnson, in his yacc paper, wrote this:

"... it is better that the keywords be reserved; that is, be forbidden
for use as variable names. There are powerful stylistic reasons for
preferring this, anyway."

and, in a version long since gone, he added a footnote:

* He says, weakly.

It made me sad to see that footnote disappear, as AT&T Unix distros
became ever more official in the early 80s. The footnote had
epitomized his quietly self-deprecating wit. Removing it left him
sounding like some kind of pontificator he never was (at least as far
as I could tell from his talks at U.C. Berkeley back then, which were
always standing-room-only). Official documentation has to be official.
But it doesn't have to be overbearing. The best stuff always reminds
you, every few pages, that there's a human being back there somewhere,
who only wants to help.

By all means, write vividly and well. Especially on blogs. But if
there's a blog entry that describes an aspect of your product as "a
special kind of hell," that's a signal that you need to improve your
documentation (and maybe your product) not simply echo the sentiment.
Oh, yes, I know: "priorities." *Sigh*.

-michael turner

Ulf Wiger

unread,
Jun 6, 2012, 2:29:11 AM6/6/12
to Michael Turner, Erlang Questions

On 6 Jun 2012, at 07:05, Michael Turner wrote:

> Ulf, I misinterpreted this:
>
>> The seq_trace system_tracer allows only one [tracer (process)] per node.
>>
>> A generic lamport clock implementation has no need for
>> such limitations.
>
> I read this as claiming that seq_trace tracing is limited to a node,
> which is (now, after a good night's sleep) rather obviously an
> overstretched interpretation, at best.

In Erlang/OTP parlance, the tracer is a process that receives the
trace output generated when tracing is enabled. In the normal
trace support, you can have 0 or 1 tracers per process. The trace
messages generated can very well be the result of messages
received from, or sent to, other nodes.

In order to collect and merge trace output, you can use ttb.
TTB interfaces with 'dbg', and both (as well as 'et') are able to
process seq_trace output.

TTB does have some utility functions to help with sequence
tracing, but when it merges trace output, it only looks at the
timestamp (even for seq_trace output).

Since seq_trace tokens are passed to the receiving process
in a message send, it has to work across nodes, but the *trace
output*, for both sequence trace and normal trace, is only
emitted to the respective local tracer(s) on each node.

(The trace BIFs can also use an IP port or file descriptor
as a tracer, but the principle is the same).

> That said, however, the seq_trace documentation seems to say
> conflicting things on this point, and I assumed you were working from
> where it says this:
>
> "The system tracer will only receive those trace events that occur
> locally within the Erlang node. To get the whole picture of a
> sequential trace that involves processes on several Erlang nodes, the
> output from the system tracer on each involved node must be merged
> (off line)."

There are two levels to seq_trace:

- the exchanging of tokens to maintain the counters
- the collection of possibly emitted trace data

The former works across nodes. The latter assumes the presence
of some multi-node collector and merge function. OTP offers ttb,
which handles seq_trace output, but not completely (see above).

The thing about seq_trace that is slightly different from normal trace
is that it invites the active cooperation of the processes themselves
(e.g. calling functions like seq_trace:print() and seq_trace:set_token()).

The "infection process" works transparently, once initiated. It can
also be initiated through trace patterns, making it entirely
transparent to the processes being traced.

Actually, I have on several occasions raised the suggestion that
there should be a function to emit a trace event from within the
code. Today, that's normally done by calling some empty function
(which is also how 'et' does it). For some reason, I - and apparently
others - have overlooked that seq_trace in fact has such a function.

Michael Turner

unread,
Jun 6, 2012, 2:53:53 AM6/6/12
to Michael Truog, Erlang Questions
Michael Truog:
"The tendency to use print statements is a habit from other languages
and is often the common denominator when debugging, so making tracing
as simple as a print statement (documentation-wise and usability-wise)
seems like a sensible goal."

A very sensible goal.

Early on in using seq_trace, I mixed the styles, but it was annoying
because my io:format() calls resulted in message passing in the io
module, which got traced by seq_trace when tracing was on, and my
trace collections were often trashed with distracting stuff about io.
Clearly, I needed some way to filter my traces, or I had to give up
print-statement-style tracing. But because I'd been scared off of
trace-matching, and didn't want to give up prints, I decided instead
to write something like this, in my improvised layer over seq_trace
called "ts":

format(Fmt,Args) ->
Save = seq_trace:set_token([]), % turn off tracing, if it's on
io:format (Fmt, Args),
seq_trace:set_token(Save). % turn tracing back on again, if it was off

Wherever I had an io:format call in my code, I replaced the "io" with "ts".

This is just a stopgap. My ts can't wrap everything in OTP. It shows
only what a relative Erlang newbie (i.e., me) would do with a
non-mainstream tracer package (seq_trace) while grappling with a mix
of styles. And that I mixed styles is probably important too: I think
newbies will want to cling to the old life-preserver while they grope
for the ladder. Unless I'm out in left field yet again.

-michael turner

Ulf Wiger

unread,
Jun 6, 2012, 2:55:55 AM6/6/12
to Michael Turner, Erlang Questions

On 6 Jun 2012, at 08:02, Michael Turner wrote:

> I'd like to know. I'm guessing that one of the big problems I've had
> with understanding Ulf here is that, for him, tracing is, by
> definition, a way to collect data about *anomalous* behavior. To me,
> tracing is "selectively (and only on occasion) collecting data about
> behavior." Period. You can do whatever you want with that data. The
> behavior doesn't have to be pathological. In fact, you can use the
> data as some assurance of correctness - the "occasion" can be running
> a test suite. Which is to say, the "trace-driven development" of this
> thread.

No, I don't think that.

For sure, tracing is indispensable for debugging - not least since
you can trace on exceptions - but it is equally important for
profiling, for example.

There are several uses that one could imagine for permanent
service in a live system: event triggers, memory monitoring,
etc. Unfortunately, turning on such tracers would mean that
the processes being thus monitored couldn't be traced for
purposes of debugging.

This makes the (one tracer per process) limitation more
limiting that it may at first seem, and forces most people to
reserve tracing for debugging and profiling purposes during
testing.

That Lamport clocks are useful for other things was illustrated
even by Lamport in his original paper, as he used them to
solve the mutual exclusion problem (in his later musings,
he noted that some people thought the paper was *only*
about implementing mutexes:

> Many computer scientists claim to have read it. But I have
> rarely encountered anyone who was aware that the paper
> said anything about state machines. People seem to think
> that it is about either the causality relation on events in a
> distributed system, or the distributed mutual exclusion
> problem. People have insisted that there is nothing about
> state machines in the paper. I've even had to go back and
> reread it to convince myself that I really did remember what
> I had written. (http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#time-clocks)

It would be absolutely brilliant to have native and *general*
support for Lamport clocks in the Erlang VM. You think they
already exist, in seq_trace, whereas I think Lamport clocks
are only coincidentally exposed there as part of a solution
to selectively observe sequences of events in a running
system, subject to the usual limitations of the tracing sub-
system - limitations that in practice render them near-useless
for other purposes, even if OTP were to approve of such
uses, which they don't.

(In the email where Kenneth admitted that seq_trace
implemented Lamport clocks, he also wrote that you shouldn't
use them for any other purpose than that described
in the seq_trace docs.)

Changing this would require a strategic decision and some
deep thinking from the OTP team.

Granted, for *your* intended purpose, they are absolutely
fine. It's entirely in line with what they were first made for.

(In fact, Quviq's QuickCheck relies on Lamport's "happens
before" relation to reduce the state space during
random testing of concurrent Erlang code. They don't
use seq_trace, though - nor, normally, the built-in tracing.)

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com



Tim Watson

unread,
Jun 6, 2012, 4:31:11 AM6/6/12
to Henning Diedrich, Erlang Questions

On 6 Jun 2012, at 01:40, Henning Diedrich wrote:

> I think from the pyre, this must be salvaged as the actual point I was going for:
>
> On 6/4/12 6:52 AM, Michael Turner wrote:
>>
>> I'm now
>> doing unit testing on a module as I develop it further, based on
>> collecting and filtering seq_trace results
>
> Is this a common approach, who else does this and how?
>

I do the 'collect intermediate data for later verification' a fair bit, sometimes using seq_trace, but not always.

> Using self-written stuff based on seq_trace, or et_collector?
>

I had been doing self written stuff - common_test configuration that gets 'auto applied' during init_per_suite/init_per_testcase. I have also used the dbg module (and redbug from time to time, after discovering it by accident when using eper to look at node monitoring approaches) to simply log useful trace information during test runs, which can help easily diagnose why a test is failing.

I have thought many times that the (various?) tracing facilities could make for the foundations of a nice testing framework. I am now going to play with et_collector - an application which I'd not paid any attention to TBH - and see where all the different pieces end up.

BTW, if anyone has built, or plans to build, stable and general purpose testing tools based on the trace facilities, I will be *VERY* interested. I've been working on some code instrumentation tooling that allows for comments in production code that get transformed into actual function calls when compiled with -Dtest or whatever. One of the several uses I have for this (apart from introducing artificial time delays and the like) is to make trace calls, so et:trace_me is a good target for these kind of things. I had planned to implement the trace calls with some additional meta-variables so you can capture process/call state easily, for example:

%% NB: DO NOT REMOVE - this *comment* is used to generate code during test runs
%%
%% BUG: arbitrary delays in shutdown notification can lead to cluster failures - bug #99741
%%
%% @trace [pid, stack]
%% @delay [random, 1000, 1000000, ms]

gen_server:reply(ReplyTo, shutting_down),
....

I have not finished thinking about how the trace facilities can/should be used as a general purpose mechanism to verify behaviour yet. Using 'ordinary' tracing is subject to data not making it to the tracer process IIRC, so seq_trace is presumably the way to go if you want to verify things. Setting trace tokens and the like can be easily done with instrumentation, minimising the need to litter production code with 'test-only' expressions. This could be done with annotations, *special* comments or configuration driven build-time transforms.

Further thoughts about *this* topic would be most interesting IMHO.

>> <snip>

Ulf Wiger

unread,
Jun 6, 2012, 4:31:57 AM6/6/12
to Michael Turner, Erlang Questions

On 6 Jun 2012, at 08:02, Michael Turner wrote:

> There are three issues here:
> (1) how honest one should be in a given context,

Indeed, and perhaps also in what *context* that honesty
even applies and can be considered correct.

Erlang tracing may be a "seething pile of pain" from one
perspective, yet such a statement can easily convey the
idea that tracing in Erlang is inferior to tracing in other
language environments.

Technology constantly evolves, of course, and this
could eventually become true (a bit like Erlang's error
messages were considered very helpful until python
came along and set a new standard - eventually
forcing Erlang to improve too).

> This thread dissolved into acrimony between me and Ulf,
> and I believe largely because seq_trace has been
> somewhat the victim of a "worst of both
> worlds" scenario, of a kind I can only speculate about
> since the disambiguating details are behind the
> corporate veil.

Discussions in a public forum do not need to dissolve
into acrimony just because there is disagreement.
I tried to offer *some* insight into the background of
seq_trace, but you need to respect the fact that I am
not at liberty to discuss Ericsson-internal project details,
*especially* since I no longer work for Ericsson.

But I can share part of the blame. Offering you a public
link mentioning forlopp tracing in AXE was not intended
to mean "look, here is proof of X", but mostly as a kind
of alibi for myself - since there exists documentation on
the net mentioning these Ericsson-proprietary aspects,
I consider myself justified in mentioning them, without
violating my confidentiality obligations to my former
employer. I wasn't clear about this intent, and you read
something else into it.

Nor did I ever try to claim that Ericsson 'invented' either
tracing, forlopp tracing or Lamport clocks. I have observed,
though, in other contexts, that some of the early work at
Ericsson that led, among other things, to Erlang, actually
happened in *parallel* with much of the seminal work
by Dijkstra, Lamport, Hoare et al. Bjarne Däcker, Mike
Williams and others were discussing these things internally
already in the '70s (I have seen some cute stencil reports,
clearly typewriter-written with traditional glue used to
insert pictures, '70s discussions about the importance of
selective message reception, long before Erlang came
about). The designs of the AXE, laid down in the
early '70s, where to some (probably fairly large) extent
based on the experiences of building the AKE switch
in the '60s. Surely they were also tracking closely what
happened at e.g. Bell Labs, since they were pushing the
envelope at the same time. Ericsson was actually a
bit player until the AXE came to dominate the switch
market, and its designs had a *huge* influence internally.

http://www.ericssonhistory.com/templates/Ericsson/Article.aspx?id=2095&ArticleID=1384&CatID=362&epslanguage=EN

http://books.google.se/books?id=07NmhqkOqwsC&pg=PR14&lpg=PR14#v=onepage&q&f=false (pg 233)

Of course, even though the AXE *control system* was
single-CPU, it was a multi-processor system, and telephony
systems have formed distributed systems, using signaling
protocols, since at least the 60s - see the above book,
page 451, for example. I don't know if they did tracing across
processors or across switches even. I have no documents
describing AXE tracing to that detail, and I have not worked
on the AXE myself (although many of my colleagues at the
time had).

One technology that I long wondered if Ericsson had
invented was the Specification and Description Language,
SDL. It was so pervasive at Ericsson, and seemed to have
been used from the beginning of time. On page 267 in the
above book, the origins are traced to Kawashima 1971,
but LM Ericsson was represented in the early standardization
work at the time, and was certainly one of the early adopters,
possibly also shaping parts of the standard.

For this reason, I've assumed the habit of not excluding
the possibility that certain things were 'known' inside
large telecom companies (at least among a select few)
before they were made known to the world, perhaps by
some other institution. But to find out, you typically have
to sit down with some old guy and talk to them - maybe
they even have some old stencils archived that they
can pull out. You won't find out through normal
citation searches.

This was all before the Web. Sorting out who informed
whom this many years afterwards is not easy, and when
building products in a proprietary environment, one is
usually not in a habit of doing so (although one should).
I would not expect industrial programmers in the 90s
to be much helped by mentions of Lamport clocks, however
technically accurate. Many Ericsson programmers at the
time, though, *would* be helped by examples comparing
to AXE and MD110.

Again, this is just background. Obviously, in the age of
the Web, Open Source and NoSQL, the documentation needs
to evolve, become more transparent and relevant to the
programmers of today. For this purpose, I think the OTP
documentation process is too formal. Blogs, wikis, etc
are an excellent complement, but community contributions
to the OTP documentation is also a good way forward.

Erlang does straddle two worlds - one that is pretty fanatical
about transparency, and one that is almost exactly the
opposite. Much of Erlang's documentation evolved to serve
the latter. The discussion about the "special kind of hell" quote
illustrates what tends to happen when perspectives collide.

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com



Reply all
Reply to author
Forward
0 new messages