I Enjoyed the Meeting Tonight, Starting Over in R

1 view
Skip to first unread message

Bill

unread,
Sep 29, 2010, 11:32:57 PM9/29/10
to Houston R users
I'm sitting here waiting on a data download, so I thought in the mean
time, I could answer at least one of the questions that came up at the
meeting tonight (I'll try to answer the others tomorrow). But before
I do that, I just wanted to say that I enjoyed our discussions
tonight. It's great to put faces behind the names on a message
board.

The "start over with R" comment that I made tonight came from a
discussion that Ross Ihaka (one of the originators of R) had
recently. Here are some references:

http://xianblog.wordpress.com/2010/09/13/simply-start-over-and-build-something-better/

The above is the link that has the 5 line function (if you include the
"}" as a line) that may or may not assign the value outside of the
function. For example:

f =function() {
if (runif(1) > .5)
x = 10
x
}

y<-7
for (i in 1:10) {
y<-f()
cat("y=", y, " \n")
}

Is y 7 or 10? As you might guess, variations of this situation can
easily show up (usually, right before a deadline).



More links to R issues:

http://www.stat.auckland.ac.nz/~ihaka/
http://www.stat.auckland.ac.nz/~ihaka/downloads/Compstat-2008.pdf
http://xianblog.wordpress.com/2010/09/06/insane/
http://www.stat.columbia.edu/~cook/movabletype/archives/2010/09/ross_ihaka_to_r.html

Here are some links to Patrick Burns sites:
http://www.burns-stat.com/
http://www.portfolioprobe.com/blog/

Here is Burn's R Inferno that was discussed:
http://www.burns-stat.com/pages/Tutor/R_inferno.pdf



Bill

ed.goodwin

unread,
Sep 30, 2010, 11:51:23 AM9/30/10
to Houston R users
Interestingly enough, I'm running v2.11.1 and I get an error when I
try to run that code.

y= 10
y= 10
Error in f() : object 'x' not found

I'm not quite sure why that is (makes no sense in light of the
discussion we were having yesterday). Anyone else getting this?

On Sep 29, 10:32 pm, Bill <bill_...@yahoo.com> wrote:
> I'm sitting here waiting on a data download, so I thought in the mean
> time, I could answer at least one of the questions that came up at the
> meeting tonight (I'll try to answer the others tomorrow).   But before
> I do that, I just wanted to say that I enjoyed our discussions
> tonight.   It's great to put faces behind the names on a message
> board.
>
> The "start over with R" comment that I made tonight came from a
> discussion that Ross Ihaka (one of the originators of R) had
> recently.    Here are some references:
>
> http://xianblog.wordpress.com/2010/09/13/simply-start-over-and-build-...
>
> The above is the link that has the 5 line function (if you include the
> "}" as a line) that may or may not assign the value outside of the
> function.   For example:
>
> f =function() {
> if (runif(1) > .5)
> x = 10
> x
>
> }
>
> y<-7
> for (i in 1:10) {
>   y<-f()
>   cat("y=", y, " \n")
>
> }
>
> Is y 7 or 10?    As you might guess, variations of this situation can
> easily show up (usually, right before a deadline).
>
> More links to R issues:
>
> http://www.stat.auckland.ac.nz/~ihaka/http://www.stat.auckland.ac.nz/~ihaka/downloads/Compstat-2008.pdfhttp://xianblog.wordpress.com/2010/09/06/insane/http://www.stat.columbia.edu/~cook/movabletype/archives/2010/09/ross_...
>
> Here are some links to Patrick Burns sites:http://www.burns-stat.com/http://www.portfolioprobe.com/blog/

Hadley Wickham

unread,
Sep 30, 2010, 12:03:41 PM9/30/10
to ed.goodwin, Houston R users
I think this slightly modified example show's what's going on, a little better:

f <- function() {
x <- 10
if (runif(1) > .5) x else y
}

y <- 5
f()

Hadley

> --
> You received this message because you are subscribed to the Google Groups "Houston R users" group.
> To post to this group, send email to hous...@googlegroups.com.
> To unsubscribe from this group, send email to houston-r+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/houston-r?hl=en.
>
>

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Bill

unread,
Sep 30, 2010, 12:52:30 PM9/30/10
to Houston R users
Ed,

Oops, I was playing around with x before I did the cut/paste. That
y<-7 should have been x<-7

f =function() {
if (runif(1) > .5)
x = 10
x
}


x<-7
for (i in 1:10) {
y<-f()
cat("y=", y, " \n")
}



Bill
> >http://www.stat.auckland.ac.nz/~ihaka/http://www.stat.auckland.ac.nz/......
>
> > Here are some links to Patrick Burns sites:http://www.burns-stat.com/http://www.portfolioprobe.com/blog/
>
> > Here is Burn's R Inferno that was discussed:http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
>
> > Bill- Hide quoted text -
>
> - Show quoted text -

Roberto Bertolusso

unread,
Sep 30, 2010, 1:50:15 PM9/30/10
to Bill, Houston R users
Hi Bill,

as Ed pointed out, the function will trigger an error each time runif(1)
returns a value <= .5, as x does not get assigned and, therefore, is a
non-existent object. This is actually a very good thing, because it
means R is doing the right thing. To prove your point, you need to make
sure x is defined before calling the function, as below

> f =function() {
+ if (runif(1) > .5)
+ x = 10
+ x
+ }
> y<-7
> x=4


> for (i in 1:10) {

+ y<-f()
+ cat("y=", y, " \n")
+ }
y= 4
y= 10
y= 10
y= 4
y= 10
y= 4
y= 4
y= 4
y= 4
y= 4

This way the function returns the global value when the x is not
assigned inside the function (runif is <= .5) and the local when it is
defined. You are right about this double-standard of R, but you have to
admit that this short function is carelessly coded. You should make sure
that x has a value (for example initializing x at the beginning of the
function, or with an else clause). In fact this function, provided
nobody has assigned a value to the "global" x before calling it, should
trigger the error if the "if" does not succeed (no global value to
return). So you solve your problem not by using unique names for your
variables inside functions and elsewhere in the code, but by making sure
those variables are *always* (not randomly) initialized inside your
function (even a NA - not avalilable - value). See that with this simple
modification (x=NA), the function never returns 4.

> f =function() {
+ x=NA
+ if (runif(1) > .5)
+ x = 10
+ x
+ }
>
> y<-7
> x=4


> for (i in 1:10) {

+ y<-f()
+ cat("y=", y, " \n")
+ }
y= 10
y= NA
y= NA
y= 10
y= NA
y= NA
y= 10
y= NA
y= 10
y= NA

Consider this code
> f =function() {
+ a = r + 3
+ a
+ }
> f()
Error in f() : object 'r' not found

"r" is not assigned a value inside the function, but I pretend to assign
"a" the value of r + 3. It will trigger an error. If now I pre-define r
(globally) before calling the function:
> r=2
> f()
[1] 5

it works. I find it very difficult to blame R for this.

Best
Roberto

Bill

unread,
Sep 30, 2010, 2:20:18 PM9/30/10
to Houston R users
Roberto,

Great analysis. And, I basically agree with what you said.
However, as we discussed last night, I have a problem with the fact
that R will allow a global variable into a local scope without the
programmer specifically calling for it. It is dangerous for me to
assume that each programmer "forced" each variable in a function to
"local". So, I typically look through their function to list their
variables, looking for problems. Just as you said, I could run their
function in a clean environment without global variables to see if it
provides an error, but I hadn't thought of that before....I'll try
that in the future.


Bill
> >http://xianblog.wordpress.com/2010/09/13/simply-start-over-and-build-...
>
> > The above is the link that has the 5 line function (if you include the
> > "}" as a line) that may or may not assign the value outside of the
> > function.   For example:
>
> > f =function() {
> > if (runif(1) > .5)
> > x = 10
> > x
> > }
>
> > y<-7
> > for (i in 1:10) {
> >   y<-f()
> >   cat("y=", y, " \n")
> > }
>
> > Is y 7 or 10?    As you might guess, variations of this situation can
> > easily show up (usually, right before a deadline).
>
> > More links to R issues:
>
> >http://www.stat.auckland.ac.nz/~ihaka/
> >http://www.stat.auckland.ac.nz/~ihaka/downloads/Compstat-2008.pdf
> >http://xianblog.wordpress.com/2010/09/06/insane/
> >http://www.stat.columbia.edu/~cook/movabletype/archives/2010/09/ross_...
>
> > Here are some links to Patrick Burns sites:
> >http://www.burns-stat.com/
> >http://www.portfolioprobe.com/blog/
>
> > Here is Burn's R Inferno that was discussed:
> >http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
>

Hadley Wickham

unread,
Sep 30, 2010, 2:48:42 PM9/30/10
to Bill, Houston R users
> Great analysis.   And, I basically agree with what you said.
> However, as we discussed last night, I have a problem with the fact
> that R will allow a global variable into a local scope without the
> programmer specifically calling for it.    It is dangerous for me to
> assume that each programmer "forced" each variable in a function to
> "local".    So, I typically look through their function to list their
> variables, looking for problems.   Just as you said, I could run their
> function in a clean environment without global variables to see if it
> provides an error, but I hadn't thought of that before....I'll try
> that in the future.

I don't think it's a weakness of R - I think it's a strength. This
paper has lots of examples showing how useful lexical scope can be: R.
Gentleman and R. Ihaka. Lexical scope and statistical computing.
Journal of Computational and Graphical Statistics, 9: 491–508, 2000.

Hadley

Roberto Bertolusso

unread,
Sep 30, 2010, 2:57:56 PM9/30/10
to Bill, Houston R users
You are right, Bill, about R not protecting against functions badly
programmed by others. Fortunately the most popular packages have been
out enough time and have debugged by the community.
I think this kind of errors should pop-up early (it seems very plausible
that always someone of the vast community will have a clean environment
when using the functions and get the error, and report it). In our case,
Ed had one and reported it to the list! :)
Very new packages, of course, you never know...

Best
Roberto

Hadley Wickham

unread,
Sep 30, 2010, 2:59:26 PM9/30/10
to Roberto Bertolusso, Bill, Houston R users
> I think this kind of errors should pop-up early (it seems very plausible
> that always someone of the vast community will have a clean environment
> when using the functions and get the error, and report it). In our case,
> Ed had one and reported it to the list! :)
> Very new packages, of course, you never know...

For quite some time now, one of the automated tests run as part of the
package checking process has caught this. See checkUsage() in the
codetools package if you want to do it yourself.

Roberto Bertolusso

unread,
Sep 30, 2010, 3:16:40 PM9/30/10
to Hadley Wickham, Bill, Houston R users
Better yet! It's even taken care automatically! So, Bill, you can relax
on this!

Best
Roberto

Bill

unread,
Sep 30, 2010, 3:34:34 PM9/30/10
to Houston R users
Hadley,

We haven' met, but I have run across your name many times.
Hopefully, we'll meet in the future. Your ggplot() package is
great.

I'll look at the "Ihaka paper".

I haven't used checkUsage() before. I just downloaded it as part of
the codetools package and tried it on a few functions, including the
"f" function above. I may be missing something, but I think the
"f" function above still might be a problem (x is set to global in
real time).


Bill

John Garvin

unread,
Oct 1, 2010, 3:14:19 PM10/1/10
to Houston R users
I think we're talking about two different things. Hadley: I believe
Bill's complaint is not about lexical scope itself, but the fact that
a variable might be locally or globally scoped depending on _runtime
control flow_. (In the example, it's more than just the fact that
there are local and global variables; a random number actually
determines whether x is local or global.) In other words, scope is
lexical, but declaration is dynamic; if an assignment to x occurs,
then x is local; otherwise, x is global.

As you might imagine, this presents a difficulty in compiling R. ;-)
In fact, a non-trivial part of RCC is devoted to figuring out whether
variable uses are local, global, or ambiguous.

It's interesting to note how Python gets around this problem. In
Python, if there's an assignment to x anywhere in a function, then
it's a local variable throughout the function, whether the assignment
actually happens or not. Translate Bill's example to Python--

import random
def f():
if random.random() > 0.5: x = 10
return x

x = 7
for i in range(10): f()

--and if x is not 10 you get an UnboundLocalError.

John

John Garvin

unread,
Oct 1, 2010, 3:21:22 PM10/1/10
to Houston R users
Oh yeah, I forgot to add: thank you for pointing out checkUsage; it
looks very interesting. It looks like it returns with no complaints on
Bill's example, though, even with all=TRUE.

John

Bill

unread,
Oct 2, 2010, 3:33:28 PM10/2/10
to Houston R users
Hadley,

John is right when he says there are two issues here. And, I'm
complaining about both of them.

By far, my biggest complaint is the creation of more than one
environment at run time. As John said, how do you compile (and
efficiently debug) that? My other complaint is about programmers
using global variables inside a function without specifically calling
that out. I agree with Roberto above when he said "....but you have
to admit that this short function is carelessly coded.....".

The article you suggested has some good examples of what I'm whining
about:

http://www.stat.auckland.ac.nz/~ihaka/downloads/lexical.pdf

The first few examples show the problem, however on page 5 there is an
actual function (boot) that might be borrowed and used in the real
world:

boot<-
function(x, statistic, bootreps) {
n<-length(x)
sapply(1:bootreps,
function(dummy)
statistic(sample(x,n, replace = TRUE)))
}

As the article says, ".....When sapply is invoked it will evaluate its
arguments in the environment of the calling function....". It then
says, "....Now, when the anonymous function is evaluated we encounter
the symbols statistic, x, and n which are free variables.....". I
realize that these free (do you call them locally global?) variables
are only free in their environment, however this is a case where a
function is DESIGNED to use variables that are global in the local
scope of the function. Back to Roberto's comment "....but you have to
admit that this short function is carelessly coded....". I normally
wouldn't call the above code "carelessly coded", however after
agreeing with that comment the first time, for the exact same reason
it's really hard for me to not agree with it a second time. To short
circuit my argument into a few words.....lexical elegance doesn't buy
the groceries.

Don't get me wrong. I like R very much. I'm just whining about my
lack of control over someone else's code. I realize that sounds
more than a little pushy, however, unless we are all aware of this
extra freedom in R, using the same old scoping assumptions from C,
Basic, Fortran, etc, can get us all into strange situations.

As I said at the meeting, when I borrow a function from someone else,
I go looking for variable names. After I generate that list, I make
it a point to not use those names, just in case. My tradeoff is the
risk of higher debug times versus the convenience of preferred
variable names. The money always wins. I appreciate you pointing
out that checkUsage() function. It will come in handy as I scan
borrowed functions. By the way, I only borrow a "borrowed function"
if the author gives me permission to use it.

One last comment about global variables being used inside functions.
My experience has been that when a global variable shows up in a
borrowed function, it is usually a remnant from testing and debugging,
not an intentional use of the global variable. It may be an old flag,
an intermediate calculation, or whatever. The programmer simply
missed it when he was cleaning things up after testing. A related
example is what happened on this board. My code in the first entry
for this thread had a "y<-7" remnant that was the result of me
screwing around with various ideas. That line started off as "x<-7",
and after I executed the code the first time, x had a value of 7. I
then played around with all sorts of stuff until I decided enough was
enough, I needed to post this thing and move on. Well, I killed off
all of my playing around (but I overlooked that "y<-7"), and ran one
last execution of the code (which showed that everything was
fine......x was still 7). We all saw the results of this
situation.

Anyway, enough whining.

Bill
> >> For more options, visit this group athttp://groups.google.com/group/houston-r?hl=en.- Hide quoted text -
Reply all
Reply to author
Forward
0 new messages