[erlang-questions] Messing with heart. Port and NIF, which one is better?

Gokul Evuri

unread,

Feb 13, 2013, 2:03:34 AM2/13/13

to erlang-q...@erlang.org

Hello,

I am beginner-level erlang user and i got two questions here,

is there a way to save my processes if the VM crashes?

And the second question,

Is there any good argument to use NIF instead of creating a connected process for a port.

P.S: I know if there is a bug in NIF the VM might crash unlike a port-process, which can be restarted when it crashes.

Thank You,

--
Gokul Reddy Evuri,

Garrett Smith

unread,

Feb 13, 2013, 10:01:15 AM2/13/13

to Gokul Evuri, erlang-q...@erlang.org

Hi Gokul,

On Wed, Feb 13, 2013 at 1:03 AM, Gokul Evuri <chandu....@gmail.com> wrote:
> Hello,
> I am beginner-level erlang user and i got two questions here,
> is there a way to save my processes if the VM crashes?

Welcome!

If you use gen_server (or the simplified equivalent e2_service -- see
http://e2project.org) you have a clear life cycle for your process:

- init is used to define the initial state for the process when it starts

- handle_xxx can be used to modify the state in response to various messages

If you want to "persist your processes", you're talking about a)
persisting the process state as it changes and b) providing a way to
load the last known state at process start.

E.g.

init(InitArgs) ->
{ok, load_state_from_disk(InitArgs)}

handle_msg(Msg, _From, State) ->
NewState = do_something_with_msg(Msg, State),
save_state_to_disk(NewState),
{reply, ok, NewState}.

> And the second question,
> Is there any good argument to use NIF instead of creating a connected
> process for a port.

The NIF interface is appropriate for defining simple functions in C.
There are lots of 3rd party libraries where NIFs are used to plugin in
long running, multi-threaded facilities, but this seems misguided to
me.

If there's even a small chance that your C program will crash, use a C
port. Don't assume that the overhead of serializing messages over
stdio is going to ruin your application performance. The safety of an
external port is profoundly valuable.

To rant slightly, it's surprisingly common to see people readily
accept the "speed" trade off (use a NIF) over "safety" (use an
external C port). This is an unfortunate trend. I use various NIF
based libraries in production and routinely deal with Erlang crashes
as a result (I plan to rewrite the more troubling ones as C ports). So
what goes very fast (or so the thinking goes) suddenly goes to *zero*
when the VM crashes -- and stays at zero until everything is restarted
and initialized. This can be devastating in the very case where
speed/throughput is most important -- very high load.

Those are cases where a slightly slower interface can be paid off by a
*much* faster recovery on error. The benefit goes up if you can
distribute your work across multiple C ports -- the death of any one
will only impact 1/N of your system.

As an additional point, an external C port lets you write your C code
using the Very Good Pattern of crash-early. I.e. if you don't like a
particular state in your C process (e.g. assertion failure) call
exit() and die! If you're forced to be defensive because you're afraid
of killing the entire VM, your going *backward* to the days of
bug-hiding and mysterious behavior. Why bother with Erlang in the
first place?

All that said, if your extension in C is a simple side effect free
function or is otherwise "safe" -- a NIF will give you a simpler path
to implement it and avoids the overhead of a system process.

Garrett
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Scott Lystig Fritchie

unread,

Feb 14, 2013, 5:30:50 AM2/14/13

to Garrett Smith, Gokul Evuri, erlang-q...@erlang.org

I'm starting a new'ish thread to mention a bit of experience that Basho
has had with NIFs in Riak.

Garrett Smith <g...@rre.tt> wrote:

>> And the second question, Is there any good argument to use NIF
>> instead of creating a connected process for a port.

gs> The NIF interface is appropriate for defining simple functions in C.
gs> There are lots of 3rd party libraries where NIFs are used to plugin
gs> in long running, multi-threaded facilities, but this seems misguided
gs> to me.

"Simple functions in C" is a tricky matter ... and it has gotten tricker
with the Erlang/OTP releases R15 and R16.

In R14 and earlier, it wasn't necessarily a horrible thing if you had C
code (or C++ or Fortran or ...) that executed in NIF context for half a
second or more. If your NIF was executing for that long, you knew that
you were interfering with the Erlang scheduler Pthread that was
executing your NIF's C/C++/Fortran/whatever code. That can cause some
weird delays in executing other Erlang processes, but for some apps,
that's OK.

However, with R15, the internal guts of the Erlang process scheduler
Pthreads has changed. Now, if you have a NIF that executes for even a
few milliseconds, the scheduler algorithm can get confused. Instead of
blocking an Erlang scheduler Pthread, you both block that Pthread *and*
you might cause some other scheduler Pthreads to decide incorrectly to
go to sleep (because there aren't enough runnable Erlang processes to
bother trying to schedule). Your 8/16/24 CPU core box can find itself
down to only 3 or 2 active Erlang scheduler Pthreads when there really
is more than 2-3 cores of work waiting.

So, suddenly your "simple functions in C" are now "simple functions in C
that must finish execution in about 1 millisecond or less". If your C
code might take longer than that, then you must use some kind of thread
pool to transfer the long-running work away from the Erlang scheduler
Pthread. Not simple at all, alas.

-Scott

Michael Truog

unread,

Feb 14, 2013, 5:52:17 AM2/14/13

to Scott Lystig Fritchie, Gokul Evuri, erlang-q...@erlang.org

These problems are what NIF native processes will solve, right? The only other alternative would be to use the async thread pool within a port driver, which may not help the schedulers and is obsoleted by native processes (not to mention the job queue per thread situation which can block on long jobs).

Rickard Green

unread,

Feb 14, 2013, 6:52:36 AM2/14/13

to Erlang Questions

Native code (drivers and NIFs) have always been expected to execute for very short periods of time. The major difference is that it is more clearly documented today.

The number of problems you might run into if you run native code that do not behave well has increased though. This is, however, not new to R15. This has been the case since R11, due to optimizations of the smp runtime system. One such optimization was multiple run-queues that was introduced in R13. Regarding scheduling, R12 to R13 is where the major difference is.

In some cases we could try to fix problems caused by native code that do not behave well. This would however very often cause a performance penalty that always have to be paid, and it would also prevent us from implementing a lot of optimizations. In my opinion this would just be plain wrong. The VM has never been intended for scheduling of arbitrary native code. A NIF or a driver is supposed to be aware of the VM and help it, not break it.

>> -Scott
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-q...@erlang.org
>> http://erlang.org/mailman/listinfo/erlang-questions
>
> These problems are what NIF native processes will solve, right?

No, but "dirty schedulers" were supposed to ease implementation of things like this. Note that already today you got all the primitives you need, as for example threads for NIFs and drivers.

> The only other alternative would be to use the async thread pool within a port driver, which may not help the schedulers and is obsoleted by native processes (not to mention the job queue per thread situation which can block on long jobs).

Regards,
Rickard Green, Erlang/OTP, Ericsson AB

Max Lapshin

unread,

Feb 14, 2013, 7:05:33 AM2/14/13

to Michael Truog, Gokul Evuri, Erlang-Questions Questions

Can enif_consume_timeslice help to mark nif-blocked pthread still active?

Sverker Eriksson

unread,

Feb 14, 2013, 11:16:45 AM2/14/13

to Max Lapshin, Erlang-Questions Questions

Max Lapshin wrote:
> Can enif_consume_timeslice help to mark nif-blocked pthread still active?
>

Creating custom threads is one way to do lengthy native work while being
nice to the VM.

Another way is to divide your work into smaller pieces and do repeated
calls from Erlang to your NIF until the work is done. This is where
enif_consume_timeslice (new in R16B) can be useful. You tell the runtime
system an estimation <http://tyda.se/search/estimation?w_lang=en> of how
much of the scheduling timeslice (about 1ms) you have consumed and
enif_consume_timeslice will return back if your timeslice is exhausted
or not.
Note that this is still co-operative scheduling. enif_consume_timeslice
does NOT do any preemptive scheduling itself, you need to voluntarily
return from your NIF in order to yield the scheduler thread to do other
work.

For drivers, erl_drv_consume_timeslice will also be available in R16B.

/Sverker, Erlang/OTP, Ericsson

<http://tyda.se/search/voluntarily?w_lang=en>

Garrett Smith

unread,

Feb 14, 2013, 12:21:05 PM2/14/13

to Scott Lystig Fritchie, Gokul Evuri, erlang-q...@erlang.org

Thanks for highlighting this Scott.

Sean Cribbs went into some of these details last night at the Chicago
Riak meetup.

I imagine this has serious implications for the 0MQ bindings, which
are NIF implemented. I'm currently running everything under R14, so am
apparently insulated, but this overall sounds quite bad.

Have you seen this behavior in port drivers?

Garrett

Max Lapshin

unread,

Feb 14, 2013, 1:08:00 PM2/14/13

to Garrett Smith, Gokul Evuri, Erlang-Questions Questions

How do you write receivers in nif?

Janos Hary

unread,

Feb 14, 2013, 2:11:20 PM2/14/13

to erlang-q...@erlang.org

I'm rewriting a large C++ project in Erlang. Some third party libraries have
to remain in C/C++. When I chose the new implementation language for the
project a major factor was the effort needed to interface it with legacy
code. I evaluated many other languages and have to say Erlang is the winner
is this field (as well as in many others).

The libraries I'm using do REALY long running tasks (robotic control ~30sec,
optical media burning ~30min). When I used them in a C++ GUI program I had
to put them on background threads. I also needed to solve messaging between
background and GUI thread to present progress messages. Interfacing a C++
code with another C++ code is just as tedious, if not more, then writing a
NIF driver. I don't expect more support from my programming environment than
Erlang provides. Some of our (programmers) tasks have certain complexity,
and they cannot be solved always with the simplest tools. That's why we are
well paid professionals :)

Crashing the VM from a NIF also concerns me, and I think to run this
functionality in a different node. So I would keep the simplicity of the NIF
code and use Erlang's built in distributed features to run them in a
separate VM.

Janos

Michael Truog

unread,

Feb 14, 2013, 2:28:36 PM2/14/13

to Garrett Smith, Gokul Evuri, erlang-q...@erlang.org

The erlzmq2 NIF uses a separate thread for the receive and enif_send is used to provide the incoming data, with locks inbetween. So, I don't see why the impact would be serious, just since a separate thread is used, while the NIF functions do not block in the C code.

Michael Truog

unread,

Feb 14, 2013, 2:39:02 PM2/14/13

to Rickard Green, Erlang Questions

I understand we have the thread primitives already for NIFs and drivers. It just bothers me that when you create your own thread pool, you put the burden on the Operating System kernel scheduler, causing CPU contention. It seems like the Erlang VM would have more insight as to how to schedule the NIFs, even if they are misbehaving, as long as reductions are bumped properly, or execution time is used as a way of extrapolating a reduction count that makes sense to the VM. One way of simplifying this, might be to have a yield function which is called frequently for blocking NIF functions, such that the yield function handles reporting a reduction count which impacts scheduling. You know more about the problems than I do, I am just voicing concern with having to depend on the kernel scheduler.

Thanks,
Michael

Rickard Green

unread,

Feb 14, 2013, 6:14:48 PM2/14/13

to Erlang Questions

I don't see any real benefits of this approach. When you really need threading an enif_yield() function would not do you any good. That is, when linking against third-party libraries doing lengthy work (that you aren't able to modify), or when doing blocking I/O. Threading should preferably only be used as a last resort. If you can divide your lengthy work into multiple chunks and do multiple calls this is always preferred and much better. Implementing an enif_yield() that works on all platforms is also not as easy as it may seem. It can of course be implemented, but this is not a prioritized feature.

Reply all

Reply to author

Forward