Accessibility of variables using Rapache/brew.

83 views
Skip to first unread message

Jouni Kallunki

unread,
Nov 11, 2009, 3:26:58 AM11/11/09
to rapache
Hi all,

I've come across a puzzling problem trying to build a small web app
with RApache. It seems that if I define a variable in one R script it
is sometimes accessible in other scripts and sometimes it isn't. The
global environment accessible through globalenv() is not always the
same . Maybe there are multiple R-processes running in the background?
It even happens that the variable defined in one script is unavailable
for some time and then appears again, so it doesn't seem to be related
to any cleaning of old variables or such.

I find the RApache / brew combination quite powerful and flexible, a
big thanks to Jeff Horner for the great work. Any ideas on solving
this problem or finding workarounds would really help me further.

Regards, Jouni Kallunki


Here's a small example that illustrates the issue:

I have two R-scripts which are run through brew:

/var/www/sandbox/test/one.r
-------------
...
This is file "one.r" <br>
<% a <- seq(1,10) %>
<% assign("a", a, envir=globalenv()) %>

ls() = <%= ls() %> <br>
ls( globalenv()) = <%= ls( globalenv()) %> <br>
This is a : <%= a%>
...

/var/www/sandbox/test/two.r
-------------
...
This is file "two.r" <br>
ls() = <%= ls() %> <br>
ls( globalenv()) = <%= ls( globalenv()) %> <br>

This is a : <%= a%> <br>
...

httpd.conf looks like this:
<Directory /var/www/sandbox/test>
SetHandler r-script
RHandler brew::brew
</Directory>

If I now load "http://localhost/sandbox/test/two.r" the variable "a"
is first accessible, but disappears after a while leading to an
error:

"RApache Warning/Error!!!
Error in cat(a) : object 'a' not found "

But after a while it might work fine again (without relolading the
page "one.r").

Jeffrey Horner

unread,
Nov 11, 2009, 10:35:18 AM11/11/09
to rap...@googlegroups.com
Jouni Kallunki wrote on 11/11/2009 02:26 AM:
> Hi all,
>
> I've come across a puzzling problem trying to build a small web app
> with RApache. It seems that if I define a variable in one R script it
> is sometimes accessible in other scripts and sometimes it isn't. The
> global environment accessible through globalenv() is not always the
> same . Maybe there are multiple R-processes running in the background?
> It even happens that the variable defined in one script is unavailable
> for some time and then appears again, so it doesn't seem to be related
> to any cleaning of old variables or such.

Hi Jouni,

What you are experiencing is normal. Your apache web server runs
multiple child processes concurrently, and each has it's own R run-time
embedded in it. If you have persistent data you'd like each request to
access, then it's best to use a database or something similar.

Jeff
--
http://biostat.mc.vanderbilt.edu/JeffreyHorner

Jouni Kallunki

unread,
Nov 12, 2009, 2:19:01 AM11/12/09
to rapache
Hi Jeff,

Many thanks on your prompt answer. This clears the issue for me. I was
suspecting something like this, apparently I should read more about
the apache.
I guess that the functions and libraries loaded at the start time, as
defined in the httpd.conf are shared by all the R run-times? So this
would then be the place to load all general functions.

-Jouni

Jeffrey

unread,
Nov 12, 2009, 9:11:05 AM11/12/09
to rapache


On Nov 12, 1:19 am, Jouni Kallunki <jou...@gmail.com> wrote:
> Hi Jeff,
>
> Many thanks on your prompt answer. This clears the issue for me. I was
> suspecting something like this, apparently I should read more about
> the apache.
> I guess that the functions and libraries loaded at the start time, as
> defined in the httpd.conf are shared by all the R run-times? So this
> would then be the place to load all general functions.

Yes, sort of. Here's hopefully a better explanation of apache:

When apache starts, it designates a parent process which then creates
a collection of child processes. Each process, including the parent,
goes through an initialization phase by parsing the config files and
executing the directives found within. Thus once you execute
'apachectl start' or '/etc/init.d/apache start' and it returns, apache
is ready to accept and server request.

At this point, all of your R instances living within each apache
process has initialized. But of course you know that none of them have
a shared environment. Each has it's own global environment, so
stuffing data into one doesn't mean it's going to migrate to the
others. This is the behavior you've encountered.

Here's the wrinkle. A child process will die off after it has served
so many requests, and it's the parent's job to start a new child
process. And that child process of course goes through the same
initialization phase, parsing the config files and executing the
directives. Completely new process with no previous state from the
dead child.

So, it's important to have in your head an idea of apache processes
coming and going and you never know which one's they are; that's the
parent's job. And it's your job as the programmer to solve the problem
of saving important data to an appropriate place outside of apache and
having a mechanism to access that data concurrently with other
requests.

Of course this is a well known problem, and other web programming
systems like PHP, ruby on rails, django, and so forth have solved
these with a myriad of solutions. It's just that I haven't found a
really good solution yet for rapache, and I'm sort of hoping the
community of R developers who write web applications with rapache will
grow and arrive at an agreeable solution.

Jeff

Jouni Kallunki

unread,
Nov 13, 2009, 1:38:42 AM11/13/09
to rapache
Hi Jeff,

thanks for the clarifying answer. I think I got the idea now, and I
have fixed this in my app. Actually this wasn't a big problem at least
in my current program, reloading the data doesn't take too much time
as the set is small. Dealing with a large set would then require a bit
more thinking about the design to make an efficient app.


Jouni

Gergely Daróczi

unread,
Jan 5, 2011, 9:09:26 AM1/5/11
to Jouni Kallunki, rap...@googlegroups.com
Hi,

I think that in such situation the MPM worker module might be a good
soultion. See: http://httpd.apache.org/docs/2.0/mod/worker.html
Not sure, but setting the number of server to 1, and setting the
ThreadsPerChild directive high enought, you will get the same R
environment all time.
I do not know how would this Apache server behave under heavy load
thought.
To deal with theese difficulties, two instances of web servers could
be required (one for Rapache, one for other services).
I hope this could work.

Gergely

Jeffrey Horner

unread,
Jan 5, 2011, 10:39:33 AM1/5/11
to rap...@googlegroups.com, Jouni Kallunki
On Wed, Jan 5, 2011 at 8:09 AM, Gergely Daróczi <ger...@snowl.net> wrote:
> Hi,
>
> I think that in such situation the MPM worker module might be a good
> soultion. See: http://httpd.apache.org/docs/2.0/mod/worker.html
> Not sure, but setting the number of server to 1, and setting the
> ThreadsPerChild directive high enought, you will get the same R
> environment all time.
> I do not know how would this Apache server behave under heavy load
> thought.
> To deal with theese difficulties, two instances of web servers could
> be required (one for Rapache, one for other services).
> I hope this could work.

Yes, this would work, but there's one wrinkle: rApache implements a
global interpreter lock (GIL) when it detects that it's running in the
worker MPM. It has to because R is not threaded.

Jeff

> --
> You received this message because you are subscribed to the Google Groups "rapache" group.
> To post to this group, send email to rap...@googlegroups.com.
> To unsubscribe from this group, send email to rapache+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/rapache?hl=en.
>
>

--
http://biostat.mc.vanderbilt.edu/JeffreyHorner

Reply all
Reply to author
Forward
0 new messages