[erlang-questions] Process Dictionary vs Proplist in Web Frameworks

33 views
Skip to first unread message

Ngoc Dao

unread,
Oct 28, 2009, 7:46:41 AM10/28/09
to erlang-questions
Hi,

Web frameworks are normally designed as layers (web server ->
middleware -> front controller -> controller -> action -> view ->
etc.). Data needs to be passed from one layer to another. There are 2
ways to pass:
1. Proplist (environment variables)
2. Process dictionary

The 2nd way:
* Is Simple and natural in Erlang because normally one HTTP request is
processed by one process.
* Makes application code which uses the framework appear to be clean,
because application developer does not have to manually pass an ugly
proplist arround and arround.

I want to ask about the (memory, CPU etc.) overhead of process
dictionary, compared to proplist. Which way should be used in a web
framework?

Thanks.

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Max Lapshin

unread,
Oct 28, 2009, 7:49:47 AM10/28/09
to Ngoc Dao, erlang-questions
Even such stateful languages like Ruby use explicit passing of all
request information in
hash table. There is absolutely no use in using some mutable dictionaries.

Ngoc Dao

unread,
Oct 28, 2009, 8:01:32 AM10/28/09
to Max Lapshin, erlang-questions
In Rails you write in action:
@my_var = ...

Then later to take out the variable in view:
<%= @my_var %>

I think it is not explicit passing.

Attila Rajmund Nohl

unread,
Oct 28, 2009, 8:05:18 AM10/28/09
to erlang-questions
2009/10/28, Ngoc Dao <ngocda...@gmail.com>:

> Hi,
>
> Web frameworks are normally designed as layers (web server ->
> middleware -> front controller -> controller -> action -> view ->
> etc.). Data needs to be passed from one layer to another. There are 2
> ways to pass:
> 1. Proplist (environment variables)
> 2. Process dictionary
>
> The 2nd way:
> * Is Simple and natural in Erlang because normally one HTTP request is
> processed by one process.

I don't have experience with web frameworks, but some erlang libraries
use a number of processes "behind the scene", and callbacks might be
executed in a quite different process context, so the process
dictinary is less then useful in these cases.

> * Makes application code which uses the framework appear to be clean,
> because application developer does not have to manually pass an ugly
> proplist arround and arround.

It's the erlang way to carry that kind of stuff around, just think
about the always-present State variable in gen_* callbacks.

Rapsey

unread,
Oct 28, 2009, 8:08:05 AM10/28/09
to erlang-q...@erlang.org
What you wish to use is up to you. Process dictionary will definitely be
faster, but it makes the code more difficult to manage and debug.


Sergej

Davide Marquês

unread,
Oct 28, 2009, 9:23:51 AM10/28/09
to Rapsey, erlang-q...@erlang.org
>
> What you wish to use is up to you. Process dictionary will definitely be
> faster, but it makes the code more difficult to manage and debug.
>
+1!!

The process dictionary might seem appealing at first but if you use it
you'll be unable to reason about the data flow just by looking at the
connections across your functions/modules. Debugging stops being a matter of
just looking at your code but and starts requiring that you reason about the
side effects past code interactions might have produced in the process'
current state. Good luck with that. ;)

:Davide

James Hague

unread,
Oct 28, 2009, 12:21:46 PM10/28/09
to erlang-questions
> Even such stateful languages like Ruby use explicit passing of all
> request information in hash table. There is absolutely no use in
> using some mutable dictionaries.

If the data remains constant once the request is parsed, then pass it
around as a dictionary / gb_tree / etc. (or even just a list of {Key,
Value} tuples, which has the fastest look-up time for shortish lists
if you use lists:keyfind).

Now in the case where you need to be making random reads AND WRITES to
data like this, across a number of functions, then I can totally
understand using the process dictionary. When I run into situations
like that I wish I could just write a particular module in Python and
be done with it.

Steve Davis

unread,
Oct 28, 2009, 6:18:21 PM10/28/09
to erlang-q...@erlang.org
I believe the art of using the PD is to stick rigorously to "write
once".

/s

Ngoc Dao

unread,
Oct 29, 2009, 1:26:25 AM10/29/09
to Steve Davis, erlang-q...@erlang.org
Yes, because of "write once", I think the trade-off of using process
dictionary for web is OK.

For proplist, is there a trick (macro?) to add syntactic sugar to put and get?

List2 = [{Key, Value} | List]
and
proplists:get_value(Key, List)
are somewhat verbose.

Geoff Cant

unread,
Oct 29, 2009, 7:39:59 AM10/29/09
to Ngoc Dao, erlang-q...@erlang.org
Ngoc Dao <ngocda...@gmail.com> writes:

> Hi,
>
> Web frameworks are normally designed as layers (web server ->
> middleware -> front controller -> controller -> action -> view ->
> etc.). Data needs to be passed from one layer to another. There are 2
> ways to pass:
> 1. Proplist (environment variables)
> 2. Process dictionary
>
> The 2nd way:
> * Is Simple and natural in Erlang because normally one HTTP request is
> processed by one process.
> * Makes application code which uses the framework appear to be clean,
> because application developer does not have to manually pass an ugly
> proplist arround and arround.
> I want to ask about the (memory, CPU etc.) overhead of process
> dictionary, compared to proplist. Which way should be used in a web
> framework?

I would strongly advise against using the process dictionary to pass
data between part of an erlang web framework.[1]

You say that "normally one HTTP request is processed by one process" -
by using the process dictionary you require that this is the case for
all code that uses your framework. By making this decision, you are
tying the hands of the users of your framework. They can no longer
choose the process model that suits their problem and must use a one
process per request design. In erlang it's quite common to hand requests
off to other processes, possibly on other nodes, for execution to
balance load, to move computation closer to needed resources, to turn
synchronous tasks into asynchronous ones, to alter process memory
profile, to isolate failures and so on. The use
of the process dictionary precludes all these approaches.

You also say that using the process dictionary "makes application code
which uses the framework appear to be clean". From the point of view of
the maintenance programmer, nothing could be further from the
truth.

Good erlang code does not use the process dictionary. Erlang programmers
usually only have to think about the function body and arguments to work
out what its going to do. Sprinkling 'get' and 'put' through the code
means that an erlang programmer trying to understand your code now has
to read all the code to figure out why something is happening. The order
in which functions are called becomes important. The behaviour of
functions in other modules becomes important because now there's a
back-channel to propagate bugs, er, state between parts of the code.


As a (curmudgeonly) future web framework user, I would almost certainly
not choose a framework based on the use of non-erlangy features[2] such
as the process dictionary firstly because the code would be more difficult to
understand when someone would need to maintain it and secondly because
the process dictionary would prevent me from using a different process
model if I needed to.

Using a proplist or 'dict' or some opaque datastructure and an API
module is the natural, erlangy way to solve your problem.

Good luck with your framework,
--
Geoff Cant

[1] More generally, I would strongly advise against using the process
dictionary.
[2] Ditto for parameterized modules and hierarchal module names[3]
[3] I'm already guilty of this, but promise not to do it again.

Jayson Vantuyl

unread,
Oct 29, 2009, 4:50:40 PM10/29/09
to Ngoc Dao, erlang-questions
I'd say no, for a few reasons.

1. It makes it difficult if you ever do need to split stuff up into
multiple processes. As soon as you need a "helper" process, all of
your data is inaccessible.
2. It makes it difficult to debug via tracing. You can trace
function calls, tracing changes to the process dictionary is a bit
more hairy.
3. If you want to "hide" the proplist, just pass around some sort of
"request" record. Then you don't see it unless you access the
record. Or use macros and message a "request" process for the data.
4. If you use Mnesia, transactions can be retried, so you may end up
with your mutations happening multiple times if anything happens
inside of a transaction.

I don't deny that Erlang needs some better syntax and conventions for
passing data around (preferably via namespace magic like Ruby or
Python), but the process dictionary can break a quite a few
assumptions and break quite a few useful patterns. This could be fine
if you're the only one writing the code, but you the process
dictionary, in the wrong hands, can make many an Erlanger curse your
name.

--
Jayson Vantuyl
kag...@souja.net

Steve Davis

unread,
Oct 29, 2009, 5:13:31 PM10/29/09
to erlang-q...@erlang.org

On Oct 29, 6:39 am, Geoff Cant <n...@erlang.geek.nz> wrote:

> Good erlang code does not use the process dictionary.

Note that both gen_server and wx libraries make use of the PD.

/s

Steve Vinoski

unread,
Oct 29, 2009, 6:12:42 PM10/29/09
to Steve Davis, erlang-q...@erlang.org
On Thu, Oct 29, 2009 at 5:13 PM, Steve Davis <steven.cha...@gmail.com
> wrote:

>
>
> On Oct 29, 6:39 am, Geoff Cant <n...@erlang.geek.nz> wrote:
>
> > Good erlang code does not use the process dictionary.
>
> Note that both gen_server and wx libraries make use of the PD.


As does Yaws. The fact that a new process spawned by a request handler
process can't access the Yaws data stored in the request handler process's
dictionary sometimes comes up as an issue, but all in all it's not a
frequent problem. Still, Klacke or I will soon be adding a function to Yaws
to copy the necessary data into the process dictionary of a new process to
allow users to avoid this problem, but again, from what I've seen the
problem doesn't come up all that often in practice.

--steve

Geoff Cant

unread,
Oct 29, 2009, 8:34:31 PM10/29/09
to Steve Davis, erlang-q...@erlang.org
Steve Davis <steven.cha...@gmail.com> writes:

> On Oct 29, 6:39 am, Geoff Cant <n...@erlang.geek.nz> wrote:
>
>> Good erlang code does not use the process dictionary.
>
> Note that both gen_server and wx libraries make use of the PD.
>

I think you'll find that gen_server makes almost no use of the process
dictionary. The gen_server code itself makes only one reference to the
process dictionary - to pass the pid of the parent process between the
proc_lib:init stage and the gen_server:enter_loop stage through
arbitrary intervening user code.

proc_lib only uses the process dictionary to store the initial_call and
parent pids of the process. The initial_call is only used in exit reports
and the ancestor/parent pid information is used mainly in exit reports,
though also occasionally by appmon to draw the process ancestry tree and
httpc_manager to implement the is_inets_manager() function.

I don't think the minimal use of the process dictionary in either of
these cases undermines the case that good erlang code does not use the
process dictionary. Occasionally you might have to, to work around
historic code that you can't refactor for backwards compatibility
reasons, or for some particular debugging cases. But that doesn't fall
into my definition of good - just the best possible given the
constraints. New erlang code should strive to avoid the process
dictionary if at all possible.


In the case of 'wx', the process dictionary is used to pass around a
#wx_env{} structure - I'm not quite sure why this wasn't just a
parameter instead. Maybe there's something else (like port ownership?)
that makes tying the structure to the process in the process dict make
sense? I'm kinda curious about this one - if there's a good argument to
be made for using the process dictionary, I'll amend or retract my
anti-process-dictionary bigotry :)

Cheers,
--
Geoff Cant

Ngoc Dao

unread,
Oct 29, 2009, 10:06:02 PM10/29/09
to Steve Vinoski, erlang-q...@erlang.org
Steve,

So Yaws will stick with process dictionary?

Actually I am creating this:
http://github.com/ngocdaothanh/ale

It only adds some routing rules and MVC conventions to make Yaws
easier to use. A typical Ale application would have a front controller
behind Yaws' back. Behind Yaws' back, either process dictionary or
proplist can be used, and I want to decide which one to use.

Ngoc.


On Fri, Oct 30, 2009 at 7:12 AM, Steve Vinoski <vin...@gmail.com> wrote:
> As does Yaws. The fact that a new process spawned by a request handler
> process can't access the Yaws data stored in the request handler process's
> dictionary sometimes comes up as an issue, but all in all it's not a
> frequent problem. Still, Klacke or I will soon be adding a function to Yaws
> to copy the necessary data into the process dictionary of a new process to
> allow users to avoid this problem, but again, from what I've seen the
> problem doesn't come up all that often in practice.
>
> --steve

________________________________________________________________

Steve Vinoski

unread,
Oct 29, 2009, 11:23:26 PM10/29/09
to Ngoc Dao, erlang-q...@erlang.org
On Thu, Oct 29, 2009 at 10:06 PM, Ngoc Dao <ngocda...@gmail.com> wrote:

> Steve,
>
> So Yaws will stick with process dictionary?
>

To the best of my knowledge, yes.


> Actually I am creating this:
> http://github.com/ngocdaothanh/ale
>
> It only adds some routing rules and MVC conventions to make Yaws
> easier to use.


What parts are difficult to use?


> A typical Ale application would have a front controller
> behind Yaws' back. Behind Yaws' back, either process dictionary or
> proplist can be used, and I want to decide which one to use.
>

Klacke of course has the final say when it comes to Yaws, but I'm pretty
certain there are no plans to move it away from using the process
dictionary.

--steve

Ngoc Dao

unread,
Oct 30, 2009, 12:51:21 AM10/30/09
to Steve Vinoski, erlang-q...@erlang.org
> What parts are difficult to use?

If Yaws aims at the application level, then it lacks the sense of web
application framework like Sinatra, Merb, Rails etc.

If Yaws aims at the web server level, then everything is OK:
* The .yaws file part is like Apache CGI
* The appmods part is like Java Servlet
* Based on the powerful bare bone Yaws provides, higher level
framework like Nitrogen can be *easily* constructed

Ngoc

________________________________________________________________

Max Lapshin

unread,
Oct 30, 2009, 1:39:04 AM10/30/09
to Ngoc Dao, Steve Vinoski, erlang-q...@erlang.org
Damn, I've programming Rails for many years and I really don't see any
reason to use global hash tables!
Explicitly passed data is very, very convenient and predictive way to
glue SEPARATED layers.

Using of PD in gen_server is its own internal way to live, I haven't
seen this PD outside and I'm not going
to think about it. But using PD to pass variables from controller
layer to view layer in erlang is a VERY, VERY bad
way of programming. It is a sufficient reason not to use such software
at all, because it is a sign of very bad quality of other code.

Explicit passing of data generated in controller, to view is:
a) clear to understand
b) clear to hook and modify, cache, etc.
c) testable
d) separatable (you may move templates from erlang to other application server)

PD is:
a) unclear, what is required and what is passed to template
b) unmodifieable and unhookable. Business logic migrates to templates
c) very, very hard to test

Rapsey

unread,
Oct 30, 2009, 1:41:25 AM10/30/09
to erlang-q...@erlang.org
Speaking of nitrogen, they use the process dictionary quite a bit. I think
the PD is kind of like goto. If you use it wisely, you won't have any
problems. If you abuse it, you're making a huge mess.


Sergej

Ngoc Dao

unread,
Oct 30, 2009, 2:49:18 AM10/30/09
to Max Lapshin, Steve Vinoski, erlang-q...@erlang.org
Max,

I agree that using PD arbitrarily is bad. But using PD restrictively
is controllable.

A web processing process has some unique properties:
* It is normally short-lived, you want as high req/s as possible right?
* PD are normally propagated one-way. If you want to spawn a new
process, you can clone the PD.

Take Ale for an example, see:
http://github.com/ngocdaothanh/ale/blob/master/src/ale.erl

* Things in PD are set only ONCE (well, app_add_head and app_add_js
are accumalative).
* Keys in PD are normally literal. You can easily track back where a
value is set.
* Keys in PD are namespaced, i.e: {app, title}. Things of Yaws, things
of Ale, things of the app in PD are separated.
* When you want to put a thing in PD from a controller, you use
ale:app(Key, Value). When you want to get the thing out from a view,
you use ale:app(Key). You don't spam the PD arbitrarily, you use the
designated API.

This way, PD is like environment variables of a Linux shell session.
The environment is not for sharing mutable things, it is used for
setting things.

Ngoc

Max Lapshin

unread,
Oct 30, 2009, 2:51:42 AM10/30/09
to Ngoc Dao, Steve Vinoski, erlang-q...@erlang.org
This bad practice prevents from any clean way of injecting into chain
of rendering HTML.
When all data, required for rendering, is passed explicitly, it is
possible to intercept it in filters
and change.

Ngoc Dao

unread,
Oct 30, 2009, 3:07:16 AM10/30/09
to Max Lapshin, Steve Vinoski, erlang-q...@erlang.org
In order to intercept, I think the explicit thing you need to know is
the processing flow. For example:
browser -> yaws -> front controller -> app-wide filter ->
controller-wide filter -> action -> view -> layout -> front controller
-> yaws -> browser

The environment may be transfered implicitly inside the PD along the way.

I don't know if this is relevant to this discussion, but I am creating
this app which uses Ale which uses Yaws which uses PD:
http://github.com/ngocdaothanh/khale

Nitrogen is another example of the (successful?) use of PD.

Ngoc

Max Lapshin

unread,
Oct 30, 2009, 3:09:25 AM10/30/09
to Ngoc Dao, Steve Vinoski, erlang-q...@erlang.org
On Fri, Oct 30, 2009 at 10:07 AM, Ngoc Dao <ngocda...@gmail.com> wrote:
> In order to intercept, I think the explicit thing you need to know is
> the processing flow. For example:
> browser -> yaws -> front controller -> app-wide filter ->
> controller-wide filter -> action -> view -> layout -> front controller
> -> yaws -> browser

Damn, You look at mutable Ruby and try to implement in immutable
Erlang the same thing.
after_filter in Rails can modify anything, including internal hidden
instance variables.
What can you do with set-once variables in Erlang?

Dan Gudmundsson

unread,
Oct 30, 2009, 3:14:45 AM10/30/09
to Geoff Cant, erlang-q...@erlang.org
On Fri, Oct 30, 2009 at 1:34 AM, Geoff Cant <n...@erlang.geek.nz> wrote:
>
> In the case of 'wx', the process dictionary is used to pass around a
> #wx_env{} structure - I'm not quite sure why this wasn't just a
> parameter instead. Maybe there's something else (like port ownership?)
> that makes tying the structure to the process in the process dict make
> sense? I'm kinda curious about this one - if there's a good argument to
> be made for using the process dictionary, I'll amend or retract my
> anti-process-dictionary bigotry :)

My argument for having the 'env' in the process dictionary was that it
is static,
and that most of the calls in wx are prefixed with an 'TheObject' as
well, so I didn't
want the user to keep sending both the 'env' and 'This' in all calls,
wxFrame:setTitle(Env, TheObject, "My window Title"),
vs C++
TheObject->setTiltle("My Window Title"),

I could have put the 'env' in each object reference, but that would
have increased every
object reference in erlang with two words, also for static calls which
don't work on an
object I would then have to add an 'env' parameter, which isn't there
in the C++ lib.

/Dan

Ngoc Dao

unread,
Oct 30, 2009, 3:32:00 AM10/30/09
to Max Lapshin, Steve Vinoski, erlang-q...@erlang.org
> What can you do with set-once variables in Erlang?

For example:
* In the controller you set-once an article in PD:
http://github.com/ngocdaothanh/khale/blob/master/removable/article/c_article.erl
* In the view you take out the article and render:
http://github.com/ngocdaothanh/khale/blob/master/removable/article/v_article_show.erl

The philosophy behind "set-once" is that you need to pass things in
only one way, from a layer to layers behind it. For this purpose PD is
a perfect medium, and its its implicitness would make the code less
verbose. As a framework developer, I think it would make for app
developers happy because their code would be less verbose.

When you want to start Eshell, would you want to type "erl" or the
full path to "erl"?

Ngoc

Max Lapshin

unread,
Oct 30, 2009, 3:38:36 AM10/30/09
to Ngoc Dao, Steve Vinoski, erlang-q...@erlang.org
You try not to hear me and not to answer my questions.
HOW are you going to implement filters, that modify data, that is
going to be rendered with PD?

Ngoc Dao

unread,
Oct 30, 2009, 3:50:09 AM10/30/09
to Max Lapshin, Steve Vinoski, erlang-q...@erlang.org
I have already answered:
* See the flow
* Use the designated API

I feel that you want to implicitly say that you find no use in PD and
it should be removed from the next version of Erlang.

Attila Rajmund Nohl

unread,
Nov 1, 2009, 7:13:39 AM11/1/09
to erlang-questions
2009/10/29, Geoff Cant <n...@erlang.geek.nz>:
>[...] Erlang programmers

> usually only have to think about the function body and arguments to work
> out what its going to do. Sprinkling 'get' and 'put' through the code
> means that an erlang programmer trying to understand your code now has
> to read all the code to figure out why something is happening. The order
> in which functions are called becomes important. The behaviour of
> functions in other modules becomes important because now there's a
> back-channel to propagate bugs, er, state between parts of the code.

This is not related to the process dictionary. If the programmer
implements e.g. a handle_call in gen_server 'A' (or anything that's
called from that handle_call), he has to make sure that he doesn't
call gen_server 'B' if there's a chance that 'B' called 'A' first -
otherwise there would be a deadlock. In this case also a lot of other
code gets important and it's fairly common that a handle_call (or a
function called from handle_call) gets implemented...

Like it or not, it's important that in what circumstances a function
is called - this is a limitation in Erlang's "functional
languageness", but actually this is necessary to do anything useful
with the language.

The process dictionary could be great for environment-like variables,
which are only set once, but used in very many places and it's very
inconvenient to pass around one more parameter. They don't show up in
function traces - but they do show up in the output of
erlang:process_info(), where e.g. the gen_server state is not shown,
even though it would be dead useful.

Tamas Nagy

unread,
Nov 4, 2009, 3:49:00 AM11/4/09
to Attila Rajmund Nohl, Tamas Nagy, erlang-questions
Hi Attila,

Well if you would like to see the gen_server's state you can always do
a sys:get_status/1.

Regards,
Tamas

Tamas Nagy
Erlang Training & Consulting
http://www.erlang-consulting.com

Reply all
Reply to author
Forward
0 new messages