The number of requests for R help

36 views
Skip to first unread message

Martin Holt

unread,
Mar 13, 2009, 9:43:18 AM3/13/09
to MedS...@googlegroups.com
I've seen a number of R devoteees praise R to the hilt, especially because
it's open source, and I've heard a lot about and am experiencing its steep
learning curve. Part of the way that I've gone about learning R has been to
join its mailing list. I get a digest every day. A quick survey of this
shows 130 -150 items, questions and answers, a day. That seems a high number
to me, but I don't want to criticise the responders. And I'm sure that the
number of R users is rising all the time. Mind you, I've also heard that the
list is not accommodating for the more simple questions.

What's the point of all this ? Is R as good as the people who like it say it
is ? Is there any way of getting an objective opinion ? I've read that it is
questioned whether the FDA or MHRA would accept analyses done with it, but I
think devotees would say that that's because its so flexible. Is there a log
with R as there is with SPSS and SAS, eg. ? IMO, it's much more
programming-focussed (not surprising considering its roots), and so whether
you fit with it or not depends on you. Fine, but is it
reliable/validatable......what sort of development and validation goes into
its packages.....have you ever experienced a significant bug - 130-150 items
a day ? I would just like to be reassured that the sense that I have about R
(that it is worth the effort to learn it) is the right one. Any comments ?

BW,
Martin

Gary Collins

unread,
Mar 13, 2009, 10:10:02 AM3/13/09
to MedStats
Hi Martin,

No apologies but I'm a R-user so will undoubtedly have biased views.

I've used R since about 1997/98 on a unix machine when I started my
PhD after using SPLUS for the preceeding 3-4 years, and can I honestly
say I've never been experienced a major bug which has caused me
problems or concerns - you are equally likely to find a bug in SAS or
STATA as you are with R - only that if there is one spotted and
reported in R - it'll be fixed and a patch released almost immediately
- not sure whether SAS or STATA could act that quickly.

The level of email on the R mailing list is high - but are mainly
about queries to do something and not bugs. The list can be
unfriendly, if you don't follow the posting guidelines which are clear
- and are there to help the person who raises a query get the best
response - if one asks a question with limited information or
unreproducible code then you're wasting most people's time. They will
answer "simple" questions, but they do expect you to have done a
little bit of homework before asking anything (i.e. searching the
documentation, help files or email archives as most likely someone
would've asked it before).

What you've read about FDA and MHRA and R is incorrect and generally
misleading (often raised by non R users - and preconceptions about
open source software, linux is opensource - would one query this?) and
I'm sure others will come up with evidence and information of such -
if you search the R archives, there are several threads which discuss
R and regulatory issues in considerable detail. And if you have a
quick search on the web, you'll find various presentations done by FDA
statisticians a the annual R conference that whilst not officially
endorsing R, as they don't endorse any statistical software.

Also, there is a regulatory compliance and valdation issues guidance
document on the R website for use of R in clinical trials.

Hope my biased view helps.

Gary
------------------------------------
Dr Gary S Collins
Medical Statistician
Centre for Statistics in Medicine
Wolfson College Annexe
University of Oxford
Linton Road
Oxford, OX2 6UD

Tel: +44 (0)1865 284418
Fax: +44 (0)1865 284424
www.csm-oxford.org.uk
------------------------------------

BXC (Bendix Carstensen)

unread,
Mar 13, 2009, 10:12:10 AM3/13/09
to MedS...@googlegroups.com
Here is a brief shot at an answer from a long term SAS and R-user:

> -----Original Message-----
> From: MedS...@googlegroups.com
> [mailto:MedS...@googlegroups.com] On Behalf Of Martin Holt
> Sent: 13. marts 2009 14:43
> To: MedS...@googlegroups.com
> Subject: {MEDSTATS} The number of requests for R help
>
>
> I've seen a number of R devoteees praise R to the hilt,
> especially because it's open source, and I've heard a lot
> about and am experiencing its steep learning curve. Part of
> the way that I've gone about learning R has been to join its
> mailing list. I get a digest every day. A quick survey of
> this shows 130 -150 items, questions and answers, a day. That
> seems a high number to me, but I don't want to criticise the
> responders. And I'm sure that the number of R users is rising
> all the time. Mind you, I've also heard that the list is not
> accommodating for the more simple questions.
>
> What's the point of all this ? Is R as good as the people who
> like it say it is ?

Only if you get to like it...

> Is there any way of getting an objective opinion ?

No. Is there for Stata, SAS,....?

> I've read that it is questioned whether the FDA or
> MHRA would accept analyses done with it, but I think devotees
> would say that that's because its so flexible. Is there a log
> with R as there is with SPSS and SAS, eg. ?

You can write an R-program using either the built-in editor or Tinn-R or
Emacs or whatever you like. Once it runs, you can run it in BATCH
mode, which will give you commands (i.e. function calls) and output interspersed.
I find this appealing because you have what you have done right next to
the result. Pretty much the style you get from running a do-file in Stata,
but different from the annoyiong separation by SAS.

One of the real advantages of R is that anywhere a variable name or a constant
is needed, you can stick in a function expression, also in the definition of graph
sizes etc. This makes the finetuning of graphs much easier as you can let R calculate
a lot of the consequences of changing graphics elements or data.

That is why the programming focus is such a simplification.
Just think how you will get a y-axis that stretches from 0 to max(y) in SAS!
And then change the dataset!

> IMO, it's much
> more programming-focussed (not surprising considering its
> roots), and so whether you fit with it or not depends on you.
> Fine, but is it reliable/validatable......what sort of
> development and validation goes into its packages.....have
> you ever experienced a significant bug - 130-150 items a day
> ?

The packages are validated in the sense that they run and that the
doumentation matches the definition. Wheter the packages produce
correct results is another matter, which is not checked. Except by the
user community. I am the maintainer of the Epi package and get to correct
a few bugs a year that people come across.
However mostly bugs that makes a particular function crash.

> I would just like to be reassured that the sense that I
> have about R (that it is worth the effort to learn it) is the
> right one. Any comments ?

It is the only program that allows you to make decent graphs in finite time.
And as YOU want them, in all other packages you are essentially forced to
accept what the package takes a decent, unless you will put weeks into it.

> BW,
> Martin

BW
Bendix
_______________________________________________

Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2-4
DK-2820 Gentofte
Denmark
+45 44 43 87 38 (direct)
+45 30 75 87 38 (mobile)
b...@steno.dk http://www.biostat.ku.dk/~bxc
www.steno.dk

Tobias Verbeke

unread,
Mar 13, 2009, 10:40:16 AM3/13/09
to MedS...@googlegroups.com
On Fri, Mar 13, 2009 at 2:43 PM, Martin Holt <theh...@care4free.net> wrote:
>
> I've seen a number of R devoteees praise R to the hilt, especially because
> it's open source, and I've heard a lot about and am experiencing its steep
> learning curve. Part of the way that I've gone about learning R has been to
> join its mailing list. I get a digest every day. A quick survey of this
> shows 130 -150 items, questions and answers, a day. That seems a high number
> to me, but I don't want to criticise the responders. And I'm sure that the
> number of R users is rising all the time. Mind you, I've also heard that the
> list is not accommodating for the more simple questions.

One can ask very simple questions and receive to the point answers
in no time, but one is supposed to consider recommendations of the
posting guide before posting.

> What's the point of all this ? Is R as good as the people who like it say it
> is ?

Yes. Ask Google, Pfizer, J&J, ...

> Is there any way of getting an objective opinion ? I've read that it is
> questioned whether the FDA or MHRA would accept analyses done with it, but I

Reviewers at FDA receive R trainings amongst other reasons as submissions using
R for analysis increase.

> think devotees would say that that's because its so flexible.

No. In my consulting work I have already been confronted with submission-related
R work at top 5 pharma companies.

> Is there a log
> with R as there is with SPSS and SAS, eg. ? IMO, it's much more
> programming-focussed (not surprising considering its roots), and so whether

It is true it is both an environment for data analysis and a
full-featured programming
language. It does not offer the whistles and bells of other
statistical packages,
but inside a powerful integrated development environment (IDE) such as
Eclipse (StatET plug-in), one can benefit from a significant
productivity increase.

> you fit with it or not depends on you. Fine, but is it
> reliable/validatable......what sort of development and validation goes into
> its packages.....

Packages can only be published on CRAN if they obey strict formal
criteria that are checked by an R package checker, which is quite
a formidable quality assurance tool.

For the verification of correctness, you cannot avoid to define your
software tests yourself, but the beauty of R is that software testing
is at the heart of the package concept and comes with the package.

> have you ever experienced a significant bug - 130-150 items
> a day ?

I have recently experienced two bugs. One in a multiple comparison procedure
of SAS that will be fixed at an unstated release date of a future
version, another
one in an R routine for GEE that was fixed two days after I reported it to the
maintainer.

The CRAN contains 1700+ of user contributed packages that all make use
of the R base packages. I think this is quite a big test bed...

> I would just like to be reassured that the sense that I have about R
> (that it is worth the effort to learn it) is the right one. Any comments ?

To quote Dirk Eddelbuettel quoting David Kane:

If you don’t go with R now, you will someday.
– David Kane on r-sig-finance, 30 Nov 2004

Best,
Tobias

Marc Schwartz

unread,
Mar 13, 2009, 4:39:24 PM3/13/09
to MedS...@googlegroups.com


Martin,

With respect to the e-mail list and the volume, you might want to read
this blog entry at REvolution Computing (one of the commercial R
vendors and with whom I have no formal association). It's just easier
to read it there than my original post to r-help since the plots are
on the page:

http://blog.revolution-computing.com/2009/01/comparing-mailing-list-traffic-for-r-sas-and-splus.html

Note that the vast majority of posts to r-help are for assistance and
responses to queries, they are not bug reports, which actually go to a
separate list. Yes, r-help is a high volume list and increasing over
time as the above blog post shows. In addition, there are several
special interest e-mail lists, which provide subject specific fora and
have offloaded some of the r-help volume, which would otherwise be
even higher. You can get a sense of the other lists here:

http://www.R-project.org/mail.html

The list is responsive to those who post reasonable questions,
especially if they provide sufficient information to aid the
respondents and if they demonstrate that they have made some attempt
to research an answer via the extensive resources provides. There is
an R Posting Guide here:

http://www.R-project.org/posting-guide.html

which provides insights into how to engage in posting in a fashion to
maximize the likelihood of getting a reasonable and timely response.
One needs to bear in mind that the list is supported by volunteers
(many of whom are academics) and that like most such lists for open
source software, it tends to be more 'direct' in terms of responses,
lacking the warm fuzzies that might permeate other commercial vendor
based lists. That being said it is a terrific community and IMHO,
second to none in terms of the quality of support provided, either for
a FOSS application or a commercial one. The folks that tend to get
chastised, which actually happens less these days, are folks who are
used to getting paid commercial support and may not have the
experience or the perspective of using voluntary support lists. Folks
who take the time to read the Posting Guide and develop a sense of the
community are not likely to have negative experiences.


With respect to the development of R, please read this document which
reviews R's software development life cycle and regulatory issues:

http://www.r-project.org/doc/R-FDA.pdf


For additional background on the regulatory issues, see:

https://stat.ethz.ch/pipermail/r-help/2009-January/184569.html

and:

http://www.linkedin.com/groupAnswers?discussionID=1600313&viewQuestionAndAnswers=&gid=77616


As has been discussed elsewhere, SPSS and other vendors, now including
SAS, have or are developing interfaces to R. They would not be doing
this if R's influence in the marketplace was not substantive and
growing.

Finally, yes, R can provide a log of code and output, either directly
within the R console, or via one of quite a few GUI based add-ons.
IMHO, the best approach is to use R within Emacs along with ESS (Emacs
Speaks Statistics), which provides an integrated environment for the
conduct of reproducible research. Using the combination of R and LaTeX
within "Sweave", one can create in effect, self-documenting programs
to generate complex reports containing text, tables and graphics. This
is the way in which I generate all clinical study reports. Frank has
some additional information and links here:

http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/StatReport

If you are not a fan of Emacs, there are other editors which provide
syntax highlighting and some of which provide an integrated
interaction with R, enabling the use of the editor for coding and then
submission to an R process directly, rather than having to copy and
paste into an R console. Many of these are listed here:

http://www.sciviews.org/_rgui/

HTH,

Marc Schwartz

Martin Holt

unread,
Mar 13, 2009, 7:12:29 PM3/13/09
to MedS...@googlegroups.com
Thanks to everyone for their comments, especially to Marc Schwartz...I need
to read more of the references that your references referenced !
I'm reassured.

Given my background as Quality Manager in the Medical Devices and
subsequently Pharmaceutical industry, I was most interested in Marc's
reference http://www.r-project.org/doc/R-FDA.pdf however.

If something was found to be seriously wrong with a product in ,say,
Clinical Diagnostics, a "recall" was initiated....which might just be
limited to notifying all purchasers of the product of the fault. This was an
active process, not relying on customers to check particular sites (which
they might not know about anyway). I note that R has
http://bugs.r-project.org/cgi-bin/R , but here you would have to search for
your problem. It says in the document that this site is mirrored on one of
the email lists (r-devel) - are announcements made here specifying that a
bug has been found and where to find more details, etc ? I expect so, so
users would have to monitor that list...are newly identified bugs obvious ?
I also note that there are several hundred reports listed on
http://bugs.r-project.org/cgi-bin/R (excluding 8603 trashcan), but these
might be all reports received to-date (the site does not make that clear).

http://www.r-project.org/doc/R-FDA.pdf choses 21 CFR Part 11 Compliance:
Electronic Records, Electronic Signatures as the regulation to deal with. I
don't have a copy of 21 CFR to hand, but I wonder if this is the most
appropriate section to be answering. Section 5.1 says, "Therefore, it is not
mandated that 21 CFR Part 11 is appropriate to data analysis software
systems that are not primarily intended for storage and transmission of
electronic medical records. It remains the responsibility of an individual
organization however to define the applicability of Part 11 and validation
to their systems. For readers who agree that Part 11 does not apply to these
types of systems, this document still serves the purpose of providing a high
degree of confidence that R can comply with these and other validation
regulations." It is these other validation regulations that I feel I might
be more interested in. There will be others out there who can put me right.
Then Section 7 focusses on the same part. Section 7.2 11.10(b) "The ability
to generate accurate and complete copies of records in both human readable
and electronic form suitable for inspection, review, and copying" says "The
R Foundation understands this item to mean that any records created or
maintained in the system must be accurate and complete. These records must
be available in both human readable and electronic form." Yet there is
nothing in that section that addresses "accurate".

Throughout Section 7, the sentence, "R is not intended to create, maintain,
modify or delete Part 11 relevant records but to perform calculations and
draw graphics." is repeated throughout the sections. This is used to negate
the applicability of these particular regulations (electronic records), but
where is the good stuff about where performing calculations and drawing
graphics is supported ?

There is feedback from users post-release. This takes me back to my first
point, "Is there somewhere where new bugs are listed free from other
extraneous postings, that don't require searching for ?"

The document makes the point that R sits within a host environment that must
be regulated to regulate R. And, as Marc has said, it is up to the
individual (but this is where I think FDA and MHRA might get a bit twitchy.
When adding x gm of salt to a solution you had to initial your recording of
x, get someone else to initial a check, and the same for the addition to the
solution. Not much left to the individual there !)

I've read this so far, and I don't want to be negative, and it is clear that
SAS, SPSS, FDA are buying into R, and that the future is bright. I suppose
http://www.r-project.org/doc/R-FDA.pdf just had the opposite effect on me to
the one it intended.

Best Wishes,
Martin

David Winsemius

unread,
Mar 18, 2009, 4:34:22 PM3/18/09
to MedStats


On Mar 13, 7:12 pm, "Martin Holt" <theho...@care4free.net> wrote:
> Thanks to everyone for their comments, especially to Marc Schwartz...I need
> to read more of the references that your references referenced !
> I'm reassured.
>
> Given my background as Quality Manager in the Medical Devices and
> subsequently Pharmaceutical industry, I was most interested in Marc's
> reference <http://www.r-project.org/doc/R-FDA.pdf> however.
snipped
> Note that R has <http://bugs.r-project.org/cgi-bin/R> , but here you would have to search for
> your problem. It says in the document that this site is mirrored on one of
> the email lists (r-devel) - are announcements made here specifying that a
> bug has been found and where to find more details, etc ? I expect so, so
> users would have to monitor that list...are newly identified bugs obvious ?
> I also note that there are several hundred reports listed on
>  http://bugs.r-project.org/cgi-bin/R(excluding 8603 trashcan), but these
> might be all reports received to-date (the site does not make that clear).

The answers to those questions could in large part be answered by some
checking of the archives. If you were serious about investigating the
bug management process and had done some of the "homework" on what was
freely available and what was requested in the posting guides and then
wanted to ask further questions, the r-devel list is there to be
queried.

When you look at the bug reports you will see that they go back many
years. Some of the wishlist-fixed category back to (at least) version
1.5. The status as "fixed" has not been carefully maintained. For
example in the apparently not fixed "accuracy "section the problem
with asin is pretty clearly not a real bug. The arcsine of a number
that is effectively 1 should not be returned as a number. The problem
with pgamma has been fixed (I just tested it on a current version) but
the report has not been moved to accuracy-fixed. The difference
between R and SAS is that such collections are visible with R and
hidden behind a slow moving business interface for SAS.

--
David Winsemius, MD, a currently noisy newbie on r-help

Martin Holt

unread,
Mar 19, 2009, 2:55:41 PM3/19/09
to MedS...@googlegroups.com
Searching the archives, checking the posting guide, ultimately asking
r-devel should throw up your bug...but see below.

I took a look at trashcan, curious what I'd find with there being over 8000.
I was impressed by the traceability, though the subject headings and
descriptions were generally such that you would have to go down to the level
of the messages to see if it was your problem or not. To be fair, the other
categories, eg models, were more specific.

The fact that the database contains old reports does not make it complete. I
happened across the word, "penis" so did a search on that. There were 142
records in trashcan and 1 in test. That these hadn't just been deleted gives
confidence in the completeness of the database.

I searched on "regression". Under "Models-Fixed" was 9290. Under notes was
"not true".
OP: "Doesn't nls function support subset? It seems not to work.
And, there are no information in the online help.
Has it sunk into oblivion?"
Prof Brian Ripley:"This is not the place to ask a question: do read the FAQ.
?nls shows that nls does have a 'subset' argument, and it does work.
Compare" and he gives an example. "I find your comments baffling: please
provide a reproducible example of
your claims."

Peter Dalgaard "Whatever gave you those ideas?

Yes, nls supports subset.
Yes, it does seem to work.
Yes, it is documented in the online help.
No, it has not sunk into anything."

I guess this is one of those postings where the OP hasn't followed the
posting-guide. I think Prof Ripley's response addresses the problem well. I
know R and SAS are not used solely by the pharmaceutical industry, but its a
standard that other users would like. In this industry, change control is
very important. The problem I have with Peter Dalgaard's response is that it
does not accept the possibility that the OP is having a problem using R in
his setting. The OP later comes back saying that his data contained
NAs......does that exonerate R ? And he found it out himself. I know we're
very lucky to have a network of very talented people providing R as
open-source software, and I know I can't argue from one example. But someone
recently praised R for sorting out a bug within 2 days (comparing it to
SAS), which is great...was it tested ? with the problem data or data like it
? the fix didn't introduce any other problems ? etc..basically change
control procedures, which in an organisation can be proceduralised and
signed off, but in a network of talented programmers who say, "this code
works for me"..maybe that's too harsh. I suppose that basically I am aware
of a lot of growth (change) occurring in R, which feels more like a network
rather than an organisation, and wonder if the changes are under
control.....are the resources there to do so ?
Judging from its success....yes...but I don't think I'm being unreasonable,
am I ?

Best Wishes,
Martin Holt

----- Original Message -----
From: "David Winsemius" <dwins...@comcast.net>
To: "MedStats" <MedS...@googlegroups.com>
Sent: Wednesday, March 18, 2009 8:34 PM
Subject: {MEDSTATS} Re: The number of requests for R help

Gary Collins

unread,
Mar 19, 2009, 4:30:01 PM3/19/09
to MedS...@googlegroups.com
Hi Martin,

Not 100% sure where I see your argument going here...but going back to
your original question, i.e. would the MHRA or FDA accept analyses done
in R.

First can I point you to the following presentation, done by an FDA
employee at the 2007 annual R conference - you might find it useful and
interesting.

http://www.r-project.org/conferences/useR-2007/program/presentations/soukup.pdf

But also, as far as I was aware, it's not solely that R has to be
validated but the process of using R. If your analyses requires using a
cox model for example, then how can you be sure that the code "gives"
you the "correct" answer. Much of this will be outlined in various SOPs
which your unit has in place that you follow.

I've also been through an MHRA inspection a couple of years ago, and was
asked about "validation", and the question was asked about
routines/programs one writes or uses and how can one be confident that
the results obtained from your particular software package, be it SAS,
SPLUS, STATA, R or whatever and focus is on ensuring that your code
produces reliable and consistent results, i.e. by running code against
known examples and ensuring that they match.

I've also been through various early stages of FDA pre-IND and SPA
approval, where in the statistical analysis plans, it was clearly stated
that R would be used in the analysis and this was never mentioned by any
of the FDA reviewers.

Hope this helps

Gary

Gary Collins

unread,
Mar 19, 2009, 4:41:26 PM3/19/09
to MedS...@googlegroups.com
In addition Martin, can I point you to a relatively recent email thread
on the R mail list which might be of interest.

http://tolstoy.newcastle.edu.au/R/e6/help/09/01/0889.html

and

http://blog.revolution-computing.com/2009/02/using-r-in-the-pharmaceutical-industry.html

Gary

Martin Holt

unread,
Mar 19, 2009, 5:41:51 PM3/19/09
to MedS...@googlegroups.com
Thanks, Gary.

The answers to my concerns were referenced in the second of your links:

http://www.r-project.org/doc/R-FDA.pdf

6.1 Operational Overview
The development, release and maintenance of R is, broadly, a collaborative
process involving the R Development
Core Team (hereafter referred to as R Core). Members of R Core represent
multiple statistical
disciplines and are based at academic, not-for-pro t and industry-a liated
institutions on multiple continents.
Most communications amongst the members of R Core take place electronically
via e-mail and similar means.
A non-public e-mail list (r-core) provides a common forum for discussions
amongst the members of R Core.
An archive of the list is available to facilitate R Core in documenting and
reviewing these discussions, as
they pertain to development decisions and related issues.
R Core does meet, collectively and/or in smaller groups, with a level of
frequency dictated by multiple
factors, including taking advantage of regularly scheduled conferences where
members of R Core may already
be in attendance. Such conferences include those that are speci c to
statistical computing and R itself
(http://www.r-project.org/conferences.html). These routine communications
and meetings ensure that the
collaborative e orts are appropriately coordinated and prioritized as
ongoing development takes place.
Reasonable software development and testing methodologies are employed by R
Core in order to maximize
the accuracy, reliability and consistency of R's performance. While some
aspects of R's development are
handled collaboratively, others are handled by members of the team with
speci c interests and expertise in
focused areas.
Importantly, as R is released under the terms of the GPL, all of the source
code underlying R, whether it be
in R, C or FORTRAN, is available for peer review by all members of the R
user community. Thus, all of the
functionality embodied within R is subject to continuous critique and
improvement relative to its accuracy,
reliability and consistency.
The size of the R user community (di cult to de ne precisely, because there
are no sales transactions,
but conservatively estimated as being in the tens of thousands, with some
independent estimates in the
hundreds of thousands), provides for extensive review of source code and
testing in \real world" settings
outside the con nes of the formalized testing performed by R Core. This is a
key distinction, related to
product quality, between R and similar software that is only available to
end users in a binary, executable
format. In conjunction with detailed documentation and references provided
to end users, the size of the R
user community, all having full access to the source code, enables a
superior ability to anticipate and verify
R's performance and the results produced by R.
Additional documentation regarding the activities of R Core as they pertain
to development, goals and
related activities, including coding guidelines, are available for review:
à R Developer Page (http://developer.r-project.org/)
à R Internals { A Guide to the Internal Structures of R and Coding Standards
for the R Core Team
(http://cran.r-project.org/doc/manuals/R-ints.html)

This reference had already been given by Marc Schwartz, a co-author, in a
previous message. Sorry, Marc.
Enough.

BW,
Martin

Felix_B

unread,
Mar 25, 2009, 7:04:50 AM3/25/09
to MedStats
After all the postive comments I would like to raise some concern
about some of the not standard R packages. I expirienced it twice that
there was a serious error in R packages (not a bug, an error in the
algorithm). The authors of the first one did not reply, the authoprs
of the second one said they know about it but do not have the time to
fix it. I wonder how many esp. of the PhD/postdoc written packages,
which I am sure work for their project, are really working correctly
in all situations? Not all of them work on their packages as hard and
great work as e.g. D. Bates with his lme4 (GLMM) package and it he and
users still discover bugs and flaws. . I do not want to criticise R, I
am using it and I believe that the core packages are as valid as from
commerecial software (or better) but as I said, I have got doubts with
some hardly used ones.
Best wishes, Felix

Neil Shephard

unread,
Mar 25, 2009, 7:12:57 AM3/25/09
to MedS...@googlegroups.com
On Wed, Mar 25, 2009 at 11:04 AM, Felix_B <Felix...@gmx.de> wrote:
>
> After all the postive comments I would like to raise some concern
> about some of the not standard R packages. I expirienced it twice that
> there was a serious error in R packages (not a bug, an error in the
> algorithm). The authors of the first one did not reply, the authoprs
> of the second one said they know about it but do not have the time to
> fix it. I wonder how many esp. of the PhD/postdoc written packages,
> which I am sure work for their project, are really working correctly
> in all situations?  Not all of them work on their packages as hard and
> great work as e.g. D. Bates with his lme4 (GLMM) package and it he and
> users still discover bugs and flaws. . I do not want to criticise R, I
> am using it and I believe that the core packages are as valid as from
> commerecial software (or better) but as I said, I have got doubts with
> some hardly used ones.

The beauty of R packages over commercial alternatives though is that
_you_ can access the code that underlies the routines you are using
and if you find them to be inaccurate you can correct them, so really
its a rather mute point whether package authors actively maintain
packages.

In fact I believe over the years that a number of packages have
"changed hands" from one maintainer to another (although I've no hard
data to support this claim), and old outdated packages occasionally
get put out to graze
(http://www.stats.bris.ac.uk/R/src/contrib/Orphaned/)

Neil
--
"The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data." ~ John Tukey (1986), "Sunset salvo". The American
Statistician 40(1).

Email - nshe...@gmail.com
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/

Felix_B

unread,
Mar 25, 2009, 10:59:50 AM3/25/09
to MedStats
But in this case R is only suitable for "real" statisticians, who are
able to check all the routines before they use them...
Or can you report inaccurate routines to some experts?
I do not want to condemn R just to point towards some potential
problems
Felix
>
> The beauty of R packages over commercial alternatives though is that
> _you_ can access the code that underlies the routines you are using
> and if you find them to be inaccurate you can correct them, so really
> its a rather mute point whether package authors actively maintain
> packages.
>
> In fact I believe over the years that a number of packages have
> "changed hands" from one maintainer to another (although I've no hard
> data to support this claim), and old outdated packages occasionally
> get put out to graze
> (http://www.stats.bris.ac.uk/R/src/contrib/Orphaned/)
>
> Neil
> --
> "The combination of some data and an aching desire for an answer does
> not ensure that a reasonable answer can be extracted from a given body
> of data." ~ John Tukey (1986), "Sunset salvo". The American
> Statistician 40(1).
>
> Email - nsheph...@gmail.com
> Website -http://slack.ser.man.ac.uk/
> Photos -http://www.flickr.com/photos/slackline/

Peter Flom

unread,
Mar 25, 2009, 11:07:46 AM3/25/09
to MedStats
Felix_B <Felix...@gmx.de> wrote

>But in this case R is only suitable for "real" statisticians, who are
>able to check all the routines before they use them...
>Or can you report inaccurate routines to some experts?
>I do not want to condemn R just to point towards some potential
>problems


This is too broad.

It is true that some packages in R may not be right - that is the price you pay
for having the latest material available. This also happens with statistics books,
after all - errors slip in, sometimes typos, sometimes major errors.

The base packages are very thoroughly checked.

For the more cutting edge packages, one should keep in mind who wrote the package;
just as one would do with a book.


Peter

Peter L. Flom, PhD
Statistical Consultant
www DOT peterflomconsulting DOT com

Mitchell Maltenfort

unread,
Mar 25, 2009, 11:09:05 AM3/25/09
to MedS...@googlegroups.com
Seems to be an existing list of "recommended" packages that the
upstream R team prefers.

http://cran.r-project.org/bin/linux/debian/

https://launchpad.net/ubuntu/jaunty/+package/r-recommended

Seems to me if you stick with those packages, you should be in good shape.

On Wed, Mar 25, 2009 at 10:59 AM, Felix_B <Felix...@gmx.de> wrote:
>
> But in this case R is only suitable for "real" statisticians, who are
> able to check all the routines before they use them...
> Or can you report inaccurate routines to some experts?
> I do not want to condemn R just to point towards some potential
> problems
> Felix

--
Due to the recession, requests for instant gratification will be
deferred until arrears in scheduled gratification have been satisfied.

Neil Shephard

unread,
Mar 25, 2009, 11:16:24 AM3/25/09
to MedS...@googlegroups.com
On Wed, Mar 25, 2009 at 3:09 PM, Mitchell Maltenfort <mma...@gmail.com> wrote:
>
> Seems to be an existing list of "recommended" packages that the
> upstream R team prefers.
>
> http://cran.r-project.org/bin/linux/debian/
>
> https://launchpad.net/ubuntu/jaunty/+package/r-recommended
>
> Seems to me if you stick with those packages, you should be in good shape.

I believe they are simply the packages for which pre-compiled versions
are available and maintained for Debian based distrubtions (of which
the Ubuntu family is one such entity), because Debian is a binary
based distribution (in contrast to something like Gentoo which is
source based). It likely

The closest to "official" packages for R would likely be 'base' and
others listed under
http://cran.r-project.org/doc/FAQ/R-FAQ.html#R-Add_002dOn-Packages
although note the caveats about re-organisation of these.

Neil


--
"The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data." ~ John Tukey (1986), "Sunset salvo". The American
Statistician 40(1).

Email - nshe...@gmail.com

Steve Simon

unread,
Mar 25, 2009, 11:24:15 AM3/25/09
to MedS...@googlegroups.com
Felix_B wrote:

> After all the positive comments I would like to raise some concern
> about some of the not standard R packages. I experienced it twice that


> there was a serious error in R packages (not a bug, an error in the

> algorithm). The authors of the first one did not reply, the authors


> of the second one said they know about it but do not have the time to
> fix it. I wonder how many esp. of the PhD/postdoc written packages,
> which I am sure work for their project, are really working correctly
> in all situations? Not all of them work on their packages as hard and
> great work as e.g. D. Bates with his lme4 (GLMM) package and it he and
> users still discover bugs and flaws. . I do not want to criticise R, I
> am using it and I believe that the core packages are as valid as from
> commerecial software (or better) but as I said, I have got doubts with
> some hardly used ones.

This is an interesting point, and one that I have thought a lot about.

It is a given that no statistical package is perfect. Anecdotes about
algorithmic errors in R need to be counterbalanced against similar
reports in other major packages like SAS and SPSS. Does R have more
errors because it is not developed by a major corporation? Does it have
less because the open source license gives you the ability to inspect
the code?

I don't know. We could use a nice empirical study, but it would cost too
much in labor and time to collect this data. I also suspect that SAS,
SPSS, and the other major commercial companies would be unwilling to
share some of the data needed.

L knew a statistician who tried to run the same method on two different
packages to make sure that there were no errors in the algorithms of
either package. An admirable trait, I'm sure, but only practical in a
limited number of circumstances.

In a perfect world, there would be systematic testing by independent
parties of the validity of the algorithms used by R, SAS, SPSS (dare I
say Excel) and so forth, but most of the testing is done haphazardly.

Another nice thing would be if there were more benchmark data sets with
known results available. When a new program comes out, test it with all
the benchmarks. NIST has some nice benchmarks, but they are limited to
some of the more common statistical algorithms.

http://www.itl.nist.gov/div898/strd/

The U.S. Food and Drug Administration has guidelines on software
validation that are probably highly relevant

http://www.fda.gov/cdrh/comp/guidance/938.html

though I have never had time to review this document. Perhaps others on
this list can comment on this. I believe there was a paper about this at
the R conference in Ames last year, but I don't have a citation.

Barring a formal approach to validating statistical algorithms, any
discussion of whether R is more or less reliable than the major
commercial packages is going to be based more on emotion and personal
prejudices than real facts.
--
Steve Simon, Standard Disclaimer.
Sign up for my brand new newsletter,
The Monthly Mean, at www.pmean.com/news

Neil Shephard

unread,
Mar 25, 2009, 11:41:28 AM3/25/09
to MedS...@googlegroups.com
On Wed, Mar 25, 2009 at 3:24 PM, Steve Simon <n...@pmean.com> wrote:
>
> It is a given that no statistical package is perfect. Anecdotes about
> algorithmic errors in R need to be counterbalanced against similar reports
> in other major packages like SAS and SPSS.

Kellie B. Keeling and Robert J. Pavur, A comparative study of the
reliability of nine statistical software packages. Computational
Statistics & Data Analysis 51:3811-3831

^^ Compares Excel (2000/XP), Excel 2003, JMP 5.0, Minitab 14.0 R
1.9.1, SAS 9.1, SPSS 12.0, Stata 8.1 and StatCrunch 3.0

Greg Snow

unread,
Mar 25, 2009, 12:24:30 PM3/25/09
to MedStats
There are task views on CRAN that give recommended packages for
specific topics.

There is also http://www.crantastic.org/ which is a place where users
can rate and review individual packages. It stalled a bit for a
while, but recently there has been icreased interest in ramping it up
again.

You can also always post to R-help and ask if anyone has had problems
with a package, or express your concern about a package that future
potential users can find when searching the archives (assuming that
they will actually search the archive).

Personally, while I have been using R/S for quite a while, when using
a new package/technique I will usually simulate some data so that I
know the "truth", then analyse that data to verify that the routine is
doing what I think (reveals problems with the package, or with my
understanding, or my typos, or ...).
> > Photos -http://www.flickr.com/photos/slackline/- Hide quoted text -
>
> - Show quoted text -

Martin Holt

unread,
Mar 25, 2009, 12:48:22 PM3/25/09
to MedS...@googlegroups.com
Steve Simon wrote (below)

> Barring a formal approach to validating statistical algorithms, any
> discussion of whether R is more or less reliable than the major commercial
> packages is going to be based more on emotion and personal prejudices than
> real facts.

I think that is largely true. FDA and MHRA inspectors, however, will have
more objective experience. If they were to visit Felix' institution and see
the problems he has had, they would not be impressed that one R-author did
not reply whilst another said that he did not have the time to deal with it.
Nor would they be satisfied that one day someone out there with the skills
to put the two packages right would do so. These issues relate to my earlier
email (19 Mar 18:56). Would SAS/SPSS handle it better ? I think that's a
matter of experience and reputation, rather than "emotion and personal
prejudices." Which is why I started this thread.

Peter Flom compares writing R to writing books, and says that one should be
as careful with some R packages as one would be with some books, depending
on the author. I'm sure R would not be happy to see that said ! Books are
not inspected; how you use R and how they deal with you is inspected. This
is more of an argument for more critical checking of the "less reputable" R
packages.......but here, R would say that it is up to the user to do this.
Which takes you back to Felix' position.

Is it acceptable (to regulatory authorities) for software 'companies' to
say, "It is up to the user to validate how he uses our product, for every
procedure that he uses, with every database that he uses." ? If so, then
maybe Steve Simon is right, it might come down to "emotion and personal
prejudices".

Incidentally, I quoted the same FDA link as Steve, on software validation.
What interested me that it is 2002: seven years old.

Best Wishes,


Martin
----- Original Message -----

Brett Magill

unread,
Mar 25, 2009, 1:51:44 PM3/25/09
to MedS...@googlegroups.com
On Wed, Mar 25, 2009 at 11:48 AM, Martin Holt <theh...@care4free.net> wrote:
>
> Steve Simon wrote (below)
>
>> Barring a formal approach to validating statistical algorithms, any
>> discussion of whether R is more or less reliable than the major commercial
>> packages is going to be based more on emotion and personal prejudices than
>> real facts.
>
> I think that is largely true. FDA and MHRA inspectors, however, will have
> more objective experience. If they were to visit Felix' institution and see
> the problems he has had, they would not be impressed that one R-author did
> not reply whilst another said that he did not have the time to deal with it.
> Nor would they be satisfied that one day someone out there with the skills
> to put the two packages right would do so. These issues relate to my earlier
> email (19 Mar 18:56). Would SAS/SPSS handle it better ? I think that's a
> matter of experience and reputation, rather than "emotion and personal
> prejudices." Which is why I started this thread.

This is no different than using a SAS macro that is user-contributed.
Buyer beware. It may or may not be correct and you may or may not get
support. Perhaps the distinction between base packages and user
packages is more blurred in R. User-developed and supported packages
are certainly more prolific in R (and in S-PLUS too, BTW) because the
R language facilitates the development of new analytic procedures that
are not available in the base package than in, say, SAS or SPSS.

Brett Magill

unread,
Mar 25, 2009, 2:17:50 PM3/25/09
to MedS...@googlegroups.com

Your criticism is not unlike going to here:

http://mayoresearch.mayo.edu/mayo/research/biostat/sasmacros.cfm

and downloading the available SAS macros and then complaining about
SAS when one of those macros is erroneous, has a bug, or is not well
supported.

R has a set of base packages and a set of contributed packages (see
http://cran.r-project.org/) (which include a set of recommended
packages that are installed by default). Some of the contributed
packages are of very high quality and some may not be.

Felix_B

unread,
Mar 25, 2009, 6:01:21 PM3/25/09
to MedStats
I think there is a difference. In general, I can trust the offical SAS/
Stata/Genstat/... programmes and know that I need to be careful with
user-written macros. And if there is a bug in SAS and I know that I
can report it to them and they will try to fix it asap.


On 25 Mar, 17:51, Brett Magill <magi...@sbcglobal.net> wrote:

Brett Magill

unread,
Mar 26, 2009, 12:25:50 AM3/26/09
to MedS...@googlegroups.com
On Wed, Mar 25, 2009 at 5:01 PM, Felix_B <Felix...@gmx.de> wrote:
>
> I think there is a difference. In general, I can trust the offical SAS/
> Stata/Genstat/... programmes and know that I need to be careful with
> user-written macros. And if there is a bug in SAS and I know that I
> can report it to them and they will try to fix it asap.

Again, not unlike R base and recommended.

Bernardo Rangel Tura

unread,
Mar 26, 2009, 5:44:39 AM3/26/09
to MedS...@googlegroups.com
On Wed, 2009-03-25 at 15:01 -0700, Felix_B wrote:
> I think there is a difference. In general, I can trust the offical SAS/
> Stata/Genstat/... programmes and know that I need to be careful with
> user-written macros. And if there is a bug in SAS and I know that I
> can report it to them and they will try to fix it asap.

Well, if a discussion is about bugs it not a true.

All software in world have bugs , including R/SAS/Stata/Genstat/etc

Other day i found a bug in round function of Stata, this function not
follow IEEE 754-2008 (Round to nearest, ties to even) standard.

This can bias estimative of a parameter. Because all number X.5 is round
to X+1, if Stata follow IEEE 754-2008 half numbers x.5 round for x and
half round for x+1.


--
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil

Martin Holt

unread,
Mar 26, 2009, 6:35:11 AM3/26/09
to MedS...@googlegroups.com
This discussion has been off-track in one way for some time. Some people
have sought to validate R by comparison with other softwares. This is not
validation. This is a *Medical* Statistics forum: life and QoL are at stake.
So it is important that each package meets the standard required, first and
foremost. Then you can compare them. Other industries are also faced with a
similar challenge. This increases the customer base who have need of such a
standard. It may seem that I am targetting R: I am, in the sense of trying
to establish that it meets the standard required in the regulatory field of
Medical Statistics.

I'm sorry for the length of this posting. It's mostly quotes, because I
don't want to inadvertently misrepresent anyone.

The silence coming back from members re Regulatory Standards in this field
has been deafening. The best (only ?) link was provided by Marc Schwartz:
http://www.r-project.org/doc/R-FDA.pdf . This includes in the title
"Guidance Document" and is largely based on 21 CFR Part 11. The document
itself expresses unease about the appropriateness of this standard. This is
echoed in
http://www.fda.gov/OHRMS/DOCKETS/98fr/04d-0440-gdl0002.pdf
"Guidance for Industry - Computerized Systems Used in Clinical
Investigations (2007)", which is as far as I can tell the latest FDA Paper
on the subject:
"In March 1997, FDA issued 21 CFR part 11, which provides criteria for
acceptance by FDA,
under certain circumstances, of electronic records, electronic signatures,
and handwritten
signatures executed to electronic records as equivalent to paper records and
handwritten
signatures executed on paper. After the effective date of 21 CFR part 11,
significant concerns
regarding the interpretation and implementation of part 11 were raised by
both FDA and
industry. As a result, we decided to reexamine 21 CFR part 11 with the
possibility of proposing
additional rulemaking, and exercising enforcement discretion regarding
enforcement of certain
part 11 requirements in the interim." (SAS also used 21 CFR part 11).

FDA's "Guidance for Industry - Computerized Systems Used in Clinical
Investigations (2007)" is just that, another guidance document. It uses the
word "should" throughout, and actually defines "should" as "FDA's guidance
documents, including this guidance, do not establish legally enforceable
responsibilities. Instead, guidances describe the Agency's current thinking
on a topic and should be viewed only as recommendations, unless specific
regulatory or statutory requirements are cited. The use of the word should
in Agency guidances means that something is suggested or recommended, but
not required." The previous 2002 document is in a similar vein.

The upshot of all this is that the only FDA regulatory standard in force
that might be appropriate at the moment is 21 CFR part 11, recognised by the
industry and FDA as not being the best. This would explain the lack of
feedback on this point, and why people are validating by comparison.

What are we left with ? http://www.r-project.org/ under "What is R?" starts
with GNU GPLv2. Under FAQs, R-FAQ, 1.1 is
"1.1 Legalese
This document is copyright © 1998-2009 by Kurt Hornik.

This document is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2, or (at your option) any later
version.

This document is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.

A copy of the GNU General Public License is available via WWW at

http://www.gnu.org/copyleft/gpl.html. "

and the above link takes you to GNU GPLv3 (not 2), in which it states,
"Distributors that provide Installation Information under GPLv3 are not
required to provide "support service" for the product. What kind of "support
service"do you mean?
This includes the kind of service many device manufacturers provide to help
you install, use, or troubleshoot the product. If a device relies on access
to web services or similar technology to function properly, those should
normally still be available to modified versions, subject to the terms in
section 6 regarding access to a network." Clearly, R do provide support, a
lot of it.....I'd need a lawyer but I wonder where that puts them.
It also gives guidance to those who want to modify code, and, in view of the
proposed links with SAS/SPSS how the licence works in this way....but that's
bye-the-bye.

In http://www.r-project.org/doc/R-FDA.pdf , under "2 The Scope......." it
lists Base-R and Recommended Packages, and then goes on to say, "This
document is NOT in any fashion, applicable to other R-related software and
add-on packages made available via other parties, such as users or even
members of the R Development Core Team, who may, from time to time, make
their software available via the Comprehensive R Archive Network (CRAN) or
other software distribution repositories and vehicles." So even though the
contributed packages are listed on the R-website, and they 'make the claim'
that they are tested (with no obvious disclaimer other than the umbrella one
shown above), they are not supported although....correct me if I'm wrong...
that's not there to be seen. This is what Brett says in his email comparing
these packages to picking SAS code off the net: they are equally
non-supported....but that's no way to argue validity....as Felix later
pointed out.

Final point (thank God): in http://www.r-project.org/doc/R-FDA.pdf the R
doc, it says,
"It is important to note that there is a significant obligation on the part
of the end-user's organization to
define, create, implement and enforce R installation, validation and
utilization related Standard Operating
Procedures (SOPs) within the end-user's environment. These SOPs should
define appropriate and reasonable
quality control processes to manage end-user related risk within the
applicable regulatory framework. The
details and content of any such SOPs are beyond the scope of this document."
The FDA state that the following SOPs *should* be present :
http://www.fda.gov/OHRMS/DOCKETS/98fr/04d-0440-gdl0002.pdf , Appendix A
"Standard operating procedures (SOPs) and documentation pertinent to the use
of a computerized
system should be made available for use by appropriate study personnel at
the clinical site or
remotely and for inspection by FDA. The SOPs should include, but are not
limited to, the
following processes.
. System setup/installation (including the description and specific use of
software,
hardware, and physical environment and the relationship)
. System operating manual
. Validation and functionality testing
. Data collection and handling (including data archiving, audit trails, and
risk assessment)
. System maintenance (including system decommissioning)
. System security measures
. Change control
. Data backup, recovery, and contingency plans
. Alternative recording methods (in the case of system unavailability)
. Computer user training
. Roles and responsibilities of sponsors, clinical sites and other parties
with respect to the
use of computerized systems in the clinical trials"
But remember, "should", and they also say,

"This guidance represents the Food and Drug Administration's (FDA's) current
thinking on this topic. It
does not create or confer any rights for or on any person and does not
operate to bind FDA or the public.
You can use an alternative approach if the approach satisfies the
requirements of the applicable statutes
and regulations. If you want to discuss an alternative approach, contact the
FDA staff responsible for
implementing this guidance. If you cannot identify the appropriate FDA
staff, call the appropriate
number listed on the title page of this guidance."

I'll try to summarise. No one (including FDA) is happy with the only FDA
Standard currently in force (21 CFR Part 11), and R's doc addressing that
was a Guidance doc as is the latest FDA doc (2007). The one legal doc is GPL
GNUv2 (or is it 3?) for R. Having said that, inspectors would expect
companies to pay attention to the guidelines and be seen to be moving
towards them. But that does not help with my goal which was to establish
that the different softwares met "the standard", before comparing them.
Without a standard, it does seem to come down to what Steve Simon said,

"Barring a formal approach to validating statistical algorithms, any
discussion of whether R is more or less reliable than the major commercial
packages is going to be based more on emotion and personal
prejudices than real facts."

The lives and QoL of those people affected by (Medical) Statistics are in
the hands of those using the software. They will need to be sure that it is
fit for purpose....they will need to find appropriate databases and test
it...and they will need to be given time to do this by their supervisors.
FDA will want SOPs (examples listed above), and this is stated in R's
Guidance regulatory document. Statistician's will then need to follow these
SOPs. All of this requires time and the requisite skills. Having worked in
industry (inspected) and the NHS (not inspected), I must admit to being
concerned. Time (especially) and the requisite skills are not easily found.

Best Wishes,
Martin Holt


----- Original Message -----
From: "Brett Magill" <mag...@sbcglobal.net>
To: <MedS...@googlegroups.com>

Reply all
Reply to author
Forward
0 new messages