[erlang-questions] what is the "race condition bug in core Erlang" mentioned by @damienkatz?

130 views
Skip to first unread message

Anton Lebedevich

unread,
Jan 11, 2013, 2:32:07 AM1/11/13
to erlang-q...@erlang.org
Hello.

In the article
http://damienkatz.net/2013/01/the_unreasonable_effectiveness_of_c.html
he writes about "race condition bug in core Erlang".

I wonder is that bug known/reported/fixed? Is there more detailed
description of it?

Regards,
Anton Lebedevich.
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Jeremy Ong

unread,
Jan 11, 2013, 2:41:29 AM1/11/13
to Anton Lebedevich, Erlang
To quote the section of interest:

"At Couchbase we recently spent easily 2+ man/months dealing with a crash in the Erlang VM. We wasted a ton of time tracking down something that was in the core Erlang implementation, never sure what was happening or why, thinking perhaps the flaw was something in our own plug-in C code, hoping it was something we could find and fix. It wasn't, it was a race condition bug in core Erlang. We only found the problem via code inspection of Erlang. This is a fundamental problem in any language that abstracts away too much of the computer."

I'm also interested in what he was doing to come across this "bug."

Dmitry Demeshchuk

unread,
Jan 11, 2013, 3:23:03 AM1/11/13
to Anton Lebedevich, erlang-questions
Sorry for my whole answer being too sarcastic.

He asked that on twitter several months ago: https://twitter.com/damienkatz/statuses/247816369863798785

I searched if he actually asked anything on mailing list (here, or at erlang-bugs), but he didn't. And there doesn't seem to be a thread that's related to the bug he mentioned, in either of the lists. Quite a healthy approach to serious bugs, huh?



As a matter of a small off-topic, that's a weird article written by a person who's making a weird product. And, while my vision of Couchbase may be way too subjective, the post has obvious flaws:

1. No links. Well, one link to a page with strange benchmarks, two links to Couchbase site and one link to the author's twitter. Wonderful.

2. No actual proofs of the provided statements. "C is great", "C is high-level enough", "C has extremely professional community", and so on. Don't get me wrong, I like C, it's really great for some purposes. But there are a lot of tasks that C is not suited for (web services, admin scripts, applications with complicated business logic, and fo on and so on). High-level? Come on, it's almost a cross-processor assembler, still highly dependent on your processor architecture and OS. Professional community? Is that why so many programs have tons of memory leaks and segfaults?

3. He mentioned some "bug in core Erlang". Why no references to the mailing list, again? No explanation of what exactly was happening. Did they contact OTP team at all, or just were trying to fix it themselves?
--
Best regards,
Dmitry Demeshchuk

Henning Diedrich

unread,
Jan 11, 2013, 3:43:36 AM1/11/13
to Anton Lebedevich, erlang-q...@erlang.org
I love that how languages can be love affairs etc.

A race condition in core Erlang, I am sure Damien will share his find.

In the meantime maybe it's worth looking at the political circumstances.

Some might note not only that you fall out of love and then you're irrationally deeply disappointed. You'll find all the feeling of understanding was an illusion in the first place. And sometimes you're even right. But that CouchDB surfed the Erlang hype, a while ago Damien was able to close a deal, and for some reason I don't know anyone quite understood announced that he'll reprogram it all in C.

Maybe it was an astounding proposition to program a transactional, local (!) database in the age of Big Data in a language that happens to be transactional by nature but is really made for distribution, and it's not too surprising when that premise is now abandoned. CouchDB is great for certain things, I have no doubt about that, how else could it be so successful.

But maybe one could ask, with the distribution layer of Couchbase coming from Membase [1] (which means it would still be Erlang?) but the local storage being in C (coming from memcached I believe), was there simply a necessity in play because C would be a better fit with the rest of the local part of Membase? Like after renaming things, the CouchDB principle would be reprogrammed, to replace or amend the memcached parts in Membase, to become Coucbase, so it had to be in C? And dealing only with the local storage parts, for a database, which was probably the task … I am not sure that's a natural for Erlang.

You wouldn't think someone could be talking himself publicly into loving his partner in a forced marriage?

Me for instance, I love C. Erlang always makes me feel stupid. Who wants that.

Henning


[1] old: http://blog.couchbase.com/why-membase-uses-erlang

Raoul Duke

unread,
Jan 11, 2013, 1:34:50 PM1/11/13
to Erlang
boy, i hope they submitted a patch.

Filipe David Manana

unread,
Jan 11, 2013, 1:42:04 PM1/11/13
to Raoul Duke, Erlang
On Fri, Jan 11, 2013 at 6:34 PM, Raoul Duke <rao...@gmail.com> wrote:
> boy, i hope they submitted a patch.

As Aliaksey said, they were. The commits are:

https://github.com/erlang/otp/commit/98c745ac9b3f7a74f15b99c9292204c115dcf322
(vm crash, master-pu branch)

https://github.com/erlang/otp/commit/bcbd925da0544249bd31ffab04bd65bdbdc3d10f
(just a file descriptor leak, master branch)

> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions



--
Filipe David Manana,

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

Raoul Duke

unread,
Jan 11, 2013, 1:50:15 PM1/11/13
to Erlang
> As Aliaksey said, they were. The commits are:

thanks! somehow i failed to see that big explanatory email at all. i
guess i somehow deleted it w/out noticing it in the thread in my mail
reader or something.

Max Lapshin

unread,
Jan 11, 2013, 2:11:15 PM1/11/13
to Filipe Manana, Erlang
Ok, my question is: why do they think that fixing the same bug in C program, written around libuv will be easier?
I've written a video streaming server in ObjectiveC and I can tell how hard is to find bugs in single threaded callback-style program.

So, this is a hard-to-find-bug. When you fix all easy-to-find-bugs there are only hard bugs. Erlang has all easy bugs fixed, left only hard bugs. This is why we will hear more and more that some horrible bug inside Erlang VM is found and fixed.


Frankly speaking, I dont understand complaints about erlang performance, when half of your system is written in C. Fast path is coded in C and erlang has excelent capabilities for it.
I've used such approach to
1) capture UDP packets in fast manner with my own hand-crafted gen_udp https://github.com/erlyvideo/flussonic/blob/master/apps/mpegts/c_src/mpegts_udp.c
2) use direct mmap access (which is impossible in Java): https://github.com/erlyvideo/flussonic/blob/master/apps/flussonic/c_src/mmap.c
3) write USB video capture: https://github.com/erlyvideo/uvc/blob/master/c_src/uvc.c
4) write my own database for storing ticks: https://github.com/maxlapshin/stockdb/blob/master/c_src/stockdb_format.c

So I really don't understand the problem.

You are writing a database server and use erlang prim_file? Sorry, are you really sure that you are writing a database server or you are just playing a new shiny toy?
Was it a problem to open very simple erlang sources and find that prim_file is designed for non-blocking, not for speed?

For example, when I understood that erlang's gen_udp cannot accept 10K of messages per second, I've rewritten it in C in a couple of days. If I was writing a database server with high requirements to disk IO, I would definitely use blocking but fast direct file access.

But I don't see any of these reasonings in Damien's post, I see only "created cultures that focus on the wrong things". Yes, of course C created a culture which is focused only on right things.

Raoul Duke

unread,
Jan 11, 2013, 2:16:37 PM1/11/13
to Erlang
On Fri, Jan 11, 2013 at 11:11 AM, Max Lapshin <max.l...@gmail.com> wrote:
> So, this is a hard-to-find-bug. When you fix all easy-to-find-bugs there are
> only hard bugs. Erlang has all easy bugs fixed, left only hard bugs. This is
> why we will hear more and more that some horrible bug inside Erlang VM is
> found and fixed.

(not to disagree, just wondering out loud.) these kinds of bugs always
make me wonder if they are there and hard because the design isn't
helping as much as it should. when i hear "race condition" i think
"gosh why aren't we all using dataflow?" for example, not that it
might have any bearing on this particular case. (of course the
dataflow core would have to be bug free ha ha.)

> Was it a problem to open very simple erlang sources and find that prim_file is designed for non-blocking, not for speed?

i thought the issue was an actual bug, not a design choice?

Max Bourinov

unread,
Jan 12, 2013, 4:57:35 AM1/12/13
to Max Lapshin, Erlang
+1

Best regards,
Max



Reply all
Reply to author
Forward
0 new messages