[R] R in the NY Times

46 views
Skip to first unread message

Zaslavsky, Alan M.

unread,
Jan 7, 2009, 8:10:20 AM1/7/09
to r-h...@r-project.org
This article is accompanied by nice pictures of Robert and Ross.

Data Analysts Captivated by Power of R
http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html

January 7, 2009
Data Analysts Captivated by R’s Power
By ASHLEE VANCE

To some people R is just the 18th letter of the alphabet. To others, it’s the rating on racy movies, a measure of an attic’s insulation or what pirates in movies say.

R is also the name of a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.

But R has also quickly found a following because statisticians, engineers and scientists without computer programming skills find it easy to use.

“R is really important to the point that it’s hard to overvalue it,” said Daryl Pregibon, a research scientist at Google, which uses the software widely. “It allows statisticians to do very intricate and complicated analyses without knowing the blood and guts of computing systems.”

It is also free. R is an open-source program, and its popularity reflects a shift in the type of software used inside corporations. Open-source software is free for anyone to use and modify. I.B.M., Hewlett-Packard and Dell make billions of dollars a year selling servers that run the open-source Linux operating system, which competes with Windows from Microsoft. Most Web sites are displayed using an open-source application called Apache, and companies increasingly rely on the open-source MySQL database to store their critical information. Many people view the end results of all this technology via the Firefox Web browser, also open-source software.

R is similar to other programming languages, like C, Java and Perl, in that it helps people perform a wide variety of computing tasks by giving them access to various commands. For statisticians, however, R is particularly useful because it contains a number of built-in mechanisms for organizing data, running calculations on the information and creating graphical representations of data sets.

Some people familiar with R describe it as a supercharged version of Microsoft’s Excel spreadsheet software that can help illuminate data trends more clearly than is possible by entering information into rows and columns.

What makes R so useful — and helps explain its quick acceptance — is that statisticians, engineers and scientists can improve the software’s code or write variations for specific tasks. Packages written for R add advanced algorithms, colored and textured graphs and mining techniques to dig deeper into databases.

Close to 1,600 different packages reside on just one of the many Web sites devoted to R, and the number of packages has grown exponentially. One package, called BiodiversityR, offers a graphical interface aimed at making calculations of environmental trends easier.

Another package, called Emu, analyzes speech patterns, while GenABEL is used to study the human genome.

The financial services community has demonstrated a particular affinity for R; dozens of packages exist for derivatives analysis alone.

“The great beauty of R is that you can modify it to do all sorts of things,” said Hal Varian, chief economist at Google. “And you have a lot of prepackaged stuff that’s already available, so you’re standing on the shoulders of giants.”

R first appeared in 1996, when the statistics professors Ross Ihaka and Robert Gentleman of the University of Auckland in New Zealand released the code as a free software package.

According to them, the notion of devising something like R sprang up during a hallway conversation. They both wanted technology better suited for their statistics students, who needed to analyze data and produce graphical models of the information. Most comparable software had been designed by computer scientists and proved hard to use.

Lacking deep computer science training, the professors considered their coding efforts more of an academic game than anything else. Nonetheless, starting in about 1991, they worked on R full time. “We were pretty much inseparable for five or six years,” Mr. Gentleman said. “One person would do the typing and one person would do the thinking.”

Some statisticians who took an early look at the software considered it rough around the edges. But despite its shortcomings, R immediately gained a following with people who saw the possibilities in customizing the free software.

John M. Chambers, a former Bell Labs researcher who is now a consulting professor of statistics at Stanford University, was an early champion. At Bell Labs, Mr. Chambers had helped develop S, another statistics software project, which was meant to give researchers of all stripes an accessible data analysis tool. It was, however, not an open-source project.

The software failed to generate broad interest and ultimately the rights to S ended up in the hands of Tibco Software. Now R is surpassing what Mr. Chambers had imagined possible with S.

“The diversity and excitement around what all of these people are doing is great,” Mr. Chambers said.

While it is difficult to calculate exactly how many people use R, those most familiar with the software estimate that close to 250,000 people work with it regularly. The popularity of R at universities could threaten SAS Institute, the privately held business software company that specializes in data analysis software. SAS, with more than $2 billion in annual revenue, has been the preferred tool of scholars and corporate managers.

“R has really become the second language for people coming out of grad school now, and there’s an amazing amount of code being written for it,” said Max Kuhn, associate director of nonclinical statistics at Pfizer. “You can look on the SAS message boards and see there is a proportional downturn in traffic.”

SAS says it has noticed R’s rising popularity at universities, despite educational discounts on its own software, but it dismisses the technology as being of interest to a limited set of people working on very hard tasks.

“I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

But while SAS plays down R’s corporate appeal, companies like Google and Pfizer say they use the software for just about anything they can. Google, for example, taps R for help understanding trends in ad pricing and for illuminating patterns in the search data it collects. Pfizer has created customized packages for R to let its scientists manipulate their own data during nonclinical drug studies rather than send the information off to a statistician.

The co-creators of R express satisfaction that such companies profit from the fruits of their labor and that of hundreds of volunteers.

Mr. Ihaka continues to teach statistics at the University of Auckland and wants to create more advanced software. Mr. Gentleman is applying R-based software, called Bioconductor, in work he is doing on computational biology at the Fred Hutchinson Cancer Research Center in Seattle.

“R is a real demonstration of the power of collaboration, and I don’t think you could construct something like this any other way,” Mr. Ihaka said. “We could have chosen to be commercial, and we would have sold five copies of the software.”

Copyright 2009 The New York Times Company

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bill Pikounis

unread,
Jan 7, 2009, 8:53:02 AM1/7/09
to r-h...@r-project.org
Pardon my exuberance, but this is simply awesome. What a treat to find
on the front web page of the NY Times this morning under Technology. I
think the article is very well written by the author, and I think it
captures top highlights of why the software and community are so
special.

Continued high gratitude to all of R-core and the R community for its
unique accomplishments. Every bit of praise is well-earned and
deserved.

I have continuously claimed to colleagues (primarily pharma industry)
for the past 8 years or so that R is the most exciting going on in the
area of statistics.

Thanks,
Bill

####################

Bill Pikounis
Statistician

On Wed, Jan 7, 2009 at 08:10, Zaslavsky, Alan M.
<zasl...@hcp.med.harvard.edu> wrote:
> This article is accompanied by nice pictures of Robert and Ross.
>
> Data Analysts Captivated by Power of R
> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>
> January 7, 2009
> Data Analysts Captivated by R's Power
> By ASHLEE VANCE
>

______________________________________________

Frank E Harrell Jr

unread,
Jan 7, 2009, 9:00:28 AM1/7/09
to Zaslavsky, Alan M., r-h...@r-project.org
This is great to see. It's interesting that SAS Institute feels that
non-peer-reviewed software with hidden implementations of analytic
methods that cannot be reproduced by others should be trusted when
building aircraft engines.

Frank


--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University

Frank E Harrell Jr

unread,
Jan 7, 2009, 9:25:31 AM1/7/09
to Bill Pikounis, r-h...@r-project.org
Bill Pikounis wrote:
> Pardon my exuberance, but this is simply awesome. What a treat to find
> on the front web page of the NY Times this morning under Technology. I
> think the article is very well written by the author, and I think it
> captures top highlights of why the software and community are so
> special.
>
> Continued high gratitude to all of R-core and the R community for its
> unique accomplishments. Every bit of praise is well-earned and
> deserved.
>
> I have continuously claimed to colleagues (primarily pharma industry)
> for the past 8 years or so that R is the most exciting going on in the
> area of statistics.
>
> Thanks,
> Bill

Amen to that, and in addition, R is now the top tool for everyday
analysis, not just a research statistician's tool.

Frank

>
> ####################
>
> Bill Pikounis
> Statistician
>
>
>
> On Wed, Jan 7, 2009 at 08:10, Zaslavsky, Alan M.
> <zasl...@hcp.med.harvard.edu> wrote:
>> This article is accompanied by nice pictures of Robert and Ross.
>>
>> Data Analysts Captivated by Power of R
>> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>>
>> January 7, 2009
>> Data Analysts Captivated by R's Power
>> By ASHLEE VANCE
>>
>
> ______________________________________________
> R-h...@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University

______________________________________________

Simon Pickett

unread,
Jan 7, 2009, 9:33:00 AM1/7/09
to Frank E Harrell Jr, Bill Pikounis, r-h...@r-project.org
I would like to add that I would have spent many more years doing my PhD if
it wasnt for R! all data management, statistics and graphics were conducted
using it. This was the direction my university and many more research
institutes appear to be heading.

It probably doesnt get said enough and I am sure I speak for all young
researchers I am very much in debt for all the kind souls who have helped me
and other newbies on this forum over the years,

Thanks very much R team.

Kevin E. Thorpe

unread,
Jan 7, 2009, 9:44:20 AM1/7/09
to r-h...@r-project.org
Zaslavsky, Alan M. wrote:
> This article is accompanied by nice pictures of Robert and Ross.
>
> Data Analysts Captivated by Power of R
> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>
>
> January 7, 2009 Data Analysts Captivated by R’s Power By ASHLEE VANCE
>
>
> SAS says it has noticed R’s rising popularity at universities,
> despite educational discounts on its own software, but it dismisses
> the technology as being of interest to a limited set of people
> working on very hard tasks.
>
> “I think it addresses a niche market for high-end data analysts that
> want free, readily available code," said Anne H. Milley, director of
> technology product marketing at SAS. She adds, “We have customers who
> build engines for aircraft. I am happy they are not using freeware
> when I get on a jet.”
>

Thanks for posting. Does anyone else find the statement by SAS to be
humourous yet arrogant and short-sighted?

Kevin

--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin....@utoronto.ca Tel: 416.864.5776 Fax: 416.864.6057

Marc Schwartz

unread,
Jan 7, 2009, 9:50:09 AM1/7/09
to Kevin E. Thorpe, r-h...@r-project.org
on 01/07/2009 08:44 AM Kevin E. Thorpe wrote:
> Zaslavsky, Alan M. wrote:
>> This article is accompanied by nice pictures of Robert and Ross.
>>
>> Data Analysts Captivated by Power of R
>> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>>
>>
>>
>> January 7, 2009 Data Analysts Captivated by R’s Power By ASHLEE VANCE
>>
>>
>> SAS says it has noticed R’s rising popularity at universities,
>> despite educational discounts on its own software, but it dismisses
>> the technology as being of interest to a limited set of people
>> working on very hard tasks.
>>
>> “I think it addresses a niche market for high-end data analysts that
>> want free, readily available code," said Anne H. Milley, director of
>> technology product marketing at SAS. She adds, “We have customers who
>> build engines for aircraft. I am happy they are not using freeware
>> when I get on a jet.”
>>
>
> Thanks for posting. Does anyone else find the statement by SAS to be
> humourous yet arrogant and short-sighted?
>
> Kevin

It is an ignorant comment by a marketing person who has been spoon fed
her lines...it is also a comment being made from a very defensive and
insecure posture.

Congrats to R Core and the R Community. This is yet another sign of R's
growth and maturity.

Regards,

Marc Schwartz

Tony Breyal

unread,
Jan 7, 2009, 9:39:59 AM1/7/09
to r-h...@r-project.org
Thank you for posting this, I found it a very enjoyable read!

I am curious, is there an archive of 'R in the Media' or 'R in the
Press' articles somewhere? It would be interesting to see how the
perception of R has changed/evolved over time relative to other
packages.

Cheers,
Tony Breyal


On 7 Jan, 13:10, "Zaslavsky, Alan M." <zasla...@hcp.med.harvard.edu>
wrote:


> This article is accompanied by nice pictures of Robert and Ross.
>
> Data Analysts Captivated by Power of R

>  http://www.nytimes.com/2009/01/07/technology/business-computing/07pro...

> R-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html

Rubén Roa-Ureta

unread,
Jan 7, 2009, 10:00:28 AM1/7/09
to r-h...@r-project.org
Zaslavsky, Alan M. wrote:
> This article is accompanied by nice pictures of Robert and Ross.
>
> Data Analysts Captivated by Power of R
> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>
Thanks for the heads up. The R morale is going through the roof!
I've given three courses on R since the second half of 2007 here in
Chile (geostatistics, Fisheries Libraries for R, and generalized linear
models) and all my three audiences (professionals working in academia,
government, and private research institutions) were very much impressed
by the power of R. I spent as much time on R itself as on the
statistical topics, since students wanted to learn data management and
graphics once they started to grasp the basic elements.
R creators, Core Team, package creators and maintainers, and experts on
the list, thanks so much for such a great work and such an open
attitude. You lead by example.
Rubén

Jeffrey J. Hallman

unread,
Jan 7, 2009, 10:44:14 AM1/7/09
to r-h...@stat.math.ethz.ch
The article quotes John Chambers, but it doesn't mention that R started out as
an implementation of the S language. I don't suppose Insightful is too happy
about that.

The SAS spokesman quoted in the article is clearly whistling past the graveyard.
--
Jeff

Darin A. England

unread,
Jan 7, 2009, 10:45:54 AM1/7/09
to r-h...@r-project.org
On Wed, Jan 07, 2009 at 08:00:28AM -0600, Frank E Harrell Jr wrote:
> This is great to see. It's interesting that SAS Institute feels that
> non-peer-reviewed software with hidden implementations of analytic
> methods that cannot be reproduced by others should be trusted when
> building aircraft engines.
>
> Frank

Unfortunately, that type of FUD issued by the SAS marketing person still
works. I see it at my employer (a large healthcare company.) It's a
battle to change a culture, but ironically the recession helps.
People are now taking notice of the obscene licensing fees for SAS.

Darin

Duncan Murdoch

unread,
Jan 7, 2009, 10:17:49 AM1/7/09
to Kevin E. Thorpe, r-h...@r-project.org
On 1/7/2009 9:44 AM, Kevin E. Thorpe wrote:
> Zaslavsky, Alan M. wrote:
>> This article is accompanied by nice pictures of Robert and Ross.
>>
>> Data Analysts Captivated by Power of R
>> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>>
>>
>> January 7, 2009 Data Analysts Captivated by R’s Power By ASHLEE VANCE
>>
>>
>> SAS says it has noticed R’s rising popularity at universities,
>> despite educational discounts on its own software, but it dismisses
>> the technology as being of interest to a limited set of people
>> working on very hard tasks.
>>
>> “I think it addresses a niche market for high-end data analysts that
>> want free, readily available code," said Anne H. Milley, director of
>> technology product marketing at SAS. She adds, “We have customers who
>> build engines for aircraft. I am happy they are not using freeware
>> when I get on a jet.”
>>
>
> Thanks for posting. Does anyone else find the statement by SAS to be
> humourous yet arrogant and short-sighted?

To me it just seemed like a "blast from the past".

Duncan Murdoch

Peter Dalgaard

unread,
Jan 7, 2009, 11:03:09 AM1/7/09
to Jeffrey J. Hallman, r-h...@stat.math.ethz.ch
Jeffrey J. Hallman wrote:
> The article quotes John Chambers, but it doesn't mention that R started out as
> an implementation of the S language. I don't suppose Insightful is too happy
> about that.

You mean Tibco...

The statement that S "failed to generate broad interest" is also a bit
misleading. I believe S-PLUS had more than 100000 users in its day,
although it may be true that its success was mainly in the academic
world. Obviously the pool of people who knew S from the preceding decade
was very important for the early development of R.

--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dal...@biostat.ku.dk) FAX: (+45) 35327907

Max Kuhn

unread,
Jan 7, 2009, 10:29:32 AM1/7/09
to marc_s...@comcast.net, r-h...@r-project.org
> "You can look on the SAS message boards and see there is a proportional downturn in traffic."

I think that I actually made this statement about both the SAS and
Splus traffic...

I wasn't really trying to be critical of SAS. I was trying to get
across that SAS focused their resources on features that had nothing
to do with *statistical analysis* (e.g. data warehousing etc.)

--

Max

David M Smith

unread,
Jan 7, 2009, 11:22:02 AM1/7/09
to Tony Breyal, r-h...@r-project.org
On Wed, Jan 7, 2009 at 6:39 AM, Tony Breyal <tony....@googlemail.com> wrote:
> Thank you for posting this, I found it a very enjoyable read!
>
> I am curious, is there an archive of 'R in the Media' or 'R in the
> Press' articles somewhere? It would be interesting to see how the
> perception of R has changed/evolved over time relative to other
> packages.

That's a great idea, and I just created an "Rmedia" category on the
REvolutions R blog to track exactly such articles. You can find it
here:

http://blog.revolution-computing.com/rmedia/

If anyone knows of any other mainstream articles about R available
online please let me know, and I'll do a round-up post in that section
to make sure they're captured.

By the way, we're writing about R and issues related to R daily at:

http://blog.revolution-computing.com

# David Smith

--
David M Smith <da...@revolution-computing.com>
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (Seattle, USA)

Bryan Hanson

unread,
Jan 7, 2009, 11:26:44 AM1/7/09
to r-h...@r-project.org
I believe the SAS person shot themselves in the foot more in more ways than
one. In my mind, the reason you would pay, as Frank said, for


> non-peer-reviewed software with hidden implementations of analytic
> methods that cannot be reproduced by others

Would be so that you can sue them later when a software problem in the
designing of the engine makes your plane fall out of the sky!

Bryan
*************
Bryan Hanson
Professor of Chemistry & Biochemistry
DePauw University, Greencastle IN USA


>> ³I think it addresses a niche market for high-end data analysts that


>> want free, readily available code," said Anne H. Milley, director of
>> technology product marketing at SAS. She adds, ³We have customers who
>> build engines for aircraft. I am happy they are not using freeware
>> when I get on a jet.²
>>
>
> Thanks for posting. Does anyone else find the statement by SAS to be
> humourous yet arrogant and short-sighted?
>
> Kevin

______________________________________________

Marc Schwartz

unread,
Jan 7, 2009, 11:56:53 AM1/7/09
to Bryan Hanson, r-h...@r-project.org
I would also point out that the use of the term "freeware" as opposed to
"FOSS" by the SAS rep, comes off as being unprofessional and
deliberately condescending...

The author of the article, to his credit, was pretty consistent in using
open source terminology.

Regards,

Marc

on 01/07/2009 10:26 AM Bryan Hanson wrote:
> I believe the SAS person shot themselves in the foot more in more ways than
> one. In my mind, the reason you would pay, as Frank said, for
>
>> non-peer-reviewed software with hidden implementations of analytic
>> methods that cannot be reproduced by others
>
> Would be so that you can sue them later when a software problem in the
> designing of the engine makes your plane fall out of the sky!

______________________________________________

Andrew Choens

unread,
Jan 7, 2009, 1:01:55 PM1/7/09
to Darin A. England, r-h...@r-project.org

> Unfortunately, that type of FUD issued by the SAS marketing person still
> works. I see it at my employer (a large healthcare company.) It's a
> battle to change a culture, but ironically the recession helps.
> People are now taking notice of the obscene licensing fees for SAS.
>
> Darin

I agree. I work for a consulting firm (human services) and my boss
prefers us to use SPSS, rather than R. It's painful. I have version 11
installed on my Windows laptop. Next year, the license expires!

For someone coming from a SPSS background, R is a little mind-blowing,
simply because it is so much more powerful. But, perseverance pays off.
Once I master Sweave and such, I'll be able to churn out reports much
more quickly than I ever could with SPSS.

I do wish the author of the article had included comments from SPSS, in
addition to the humorous FUD from the SAS spokesperson. Newer versions
of SPSS actually have the option of using R for data analysis, in
addition to the SPSS engine. It would have been interesting to compare
the corporate responses of the two companies.

--
Insert something humorous here. :-)

Erik Iverson

unread,
Jan 7, 2009, 1:03:19 PM1/7/09
to marc_s...@comcast.net, r-h...@r-project.org
I pointed a friend of mine toward the article, to which he replied:

"I hope that they run SAS on Solaris too, god only knows how tainted the
syscalls are in that linux freeware."

Of course, now Solaris is 'freeware', too, so I suppose that according to
SAS, running SAS on Windows is the best way to be sure you're getting the
right answers.

Ajay ohri

unread,
Jan 7, 2009, 1:29:48 PM1/7/09
to David M Smith, r-h...@r-project.org, Tony Breyal
you can use google alerts to track media coverage of R using some keywords

regards,

ajay

[[alternative HTML version deleted]]

Ted Harding

unread,
Jan 7, 2009, 1:30:48 PM1/7/09
to Erik Iverson, r-h...@r-project.org, marc_s...@comcast.net
On 07-Jan-09 18:03:19, Erik Iverson wrote:
> I pointed a friend of mine toward the article, to which he replied:
>
> "I hope that they run SAS on Solaris too, god only knows how tainted
> the syscalls are in that linux freeware."
>
> Of course, now Solaris is 'freeware', too, so I suppose that according
> to SAS, running SAS on Windows is the best way to be sure you're
> getting the right answers.

I'm not so sure about that. Since the article described R as
"a supercharged version of Microsoft's Excel", surely people
should run R on Windows and be *ab*so*lute*ly* sure of getting
the right answers (and supercharged to boot)????
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jan-09 Time: 18:30:39
------------------------------ XFMail ------------------------------

Barry Rowlingson

unread,
Jan 7, 2009, 1:35:29 PM1/7/09
to Darin A. England, r-h...@r-project.org
2009/1/7 Darin A. England <eng...@cs.umn.edu>:

> Unfortunately, that type of FUD issued by the SAS marketing person still
> works. I see it at my employer (a large healthcare company.)

I see it here, at a university. Quote: "We couldn't possibly do our
analysis using some software we've just downloaded from a web site"
*facepalm*

> It's a
> battle to change a culture, but ironically the recession helps.
> People are now taking notice of the obscene licensing fees for SAS.

They'll just keep increasing their educational discount, or as we
say, "the first hit is free"...

BaRRy

Tony Breyal

unread,
Jan 7, 2009, 2:51:56 PM1/7/09
to r-h...@r-project.org
Google Alerts are great, but unfortuantly the brevity of R's name is
the main problem i think.

though, thinking about it, i suppose if one could work out the 'best'
key words to use, it might be possible to not get too many miss-
classified results, e.g.,

http://news.google.com/news?hl=en&ned=us&nolr=1&q=r+open+source+programming+language&btnG=Search

or something like that. Will be keeping an eye on David's page from
time to time though, just in case he catches anything :-)

lovely to see R getting the attention it so rightly deserves.


On 7 Jan, 18:29, "Ajay ohri" <ohri2...@gmail.com> wrote:
> you can use google alerts to track media coverage of R using some keywords
>
> regards,
>
> ajay
>
> On Wed, Jan 7, 2009 at 9:52 PM, David M Smith <
>
>
>
> da...@revolution-computing.com> wrote:

> > On Wed, Jan 7, 2009 at 6:39 AM, Tony Breyal <tony.bre...@googlemail.com>

> R-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html

Wacek Kusnierczyk

unread,
Jan 7, 2009, 3:03:55 PM1/7/09
to Kevin E. Thorpe, R help
Kevin E. Thorpe wrote:
> Zaslavsky, Alan M. wrote:
>> SAS says it has noticed R’s rising popularity at universities,
>> despite educational discounts on its own software, but it dismisses
>> the technology as being of interest to a limited set of people
>> working on very hard tasks.
>>
>> “I think it addresses a niche market for high-end data analysts that
>> want free, readily available code," said Anne H. Milley, director of
>> technology product marketing at SAS. She adds, “We have customers who
>> build engines for aircraft. I am happy they are not using freeware
>> when I get on a jet.”
>>
>
> Thanks for posting. Does anyone else find the statement by SAS to be
> humourous yet arrogant and short-sighted?

there must be something wrong with me, but i can't find anything
'humorous yet arrogant and short-sighted' in the idea that engines for
aircraft be built with software that does not advertise itself with
'ABSOLUTELY NO WARRANTY.'


vQ

Marc Schwartz

unread,
Jan 7, 2009, 3:07:51 PM1/7/09
to Max Kuhn, r-h...@r-project.org
on 01/07/2009 09:29 AM Max Kuhn wrote:
>> "You can look on the SAS message boards and see there is a proportional downturn in traffic."
>
> I think that I actually made this statement about both the SAS and
> Splus traffic...
>
> I wasn't really trying to be critical of SAS. I was trying to get
> across that SAS focused their resources on features that had nothing
> to do with *statistical analysis* (e.g. data warehousing etc.)


Presuming that the Google Groups archive of SAS-L is reasonably complete:

http://groups.google.com/group/comp.soft-sys.sas/about

The monthly posting frequency data since 1993 is:

Posts <- structure(list(Jan = c(NA, 546L, 548L, 853L, 1007L, 894L, 514L,
1720L, 1826L, 1941L, 1832L, 1636L, 2122L, 2722L, 2750L, 2305L,
357L), Feb = c(NA, 511L, 734L, 1024L, 1150L, 1068L, 493L, 1519L,
1537L, 1845L, 1846L, 1652L, 1960L, 1645L, 926L, 2255L, NA), Mar = c(NA,
658L, 963L, 805L, 1108L, 945L, 659L, 1177L, 1915L, 2010L, 1755L,
2188L, 629L, 1711L, 1728L, 2712L, NA), Apr = c(NA, 681L, 792L,
1052L, 1315L, 784L, 1077L, 1163L, 1467L, 2199L, 1757L, 1826L,
2169L, 2796L, 2766L, 2789L, NA), May = c(NA, 712L, 945L, 1163L,
1212L, 448L, 778L, 1963L, 1735L, 2373L, 1863L, 1836L, 2283L,
3147L, 2974L, 2025L, NA), Jun = c(NA, 751L, 1002L, 999L, 1127L,
813L, 540L, 1615L, 1905L, 2133L, 1701L, 2606L, 2407L, 2723L,
2691L, 2368L, NA), Jul = c(15L, 763L, 775L, 1184L, 1074L, 896L,
476L, 1572L, 2027L, 2445L, 1926L, 1843L, 2061L, 761L, 2435L,
2607L, NA), Aug = c(458L, 975L, 969L, 1053L, 692L, 823L, 612L,
1696L, 1976L, 1492L, 1689L, 2143L, 1793L, 2027L, 2592L, 2584L,
NA), Sep = c(330L, 703L, 745L, 1176L, 947L, 894L, 1351L, 1491L,
1439L, 1864L, 1646L, 1784L, 1365L, 2714L, 1868L, 2554L, NA),
Oct = c(219L, 805L, 691L, 1197L, 900L, 1129L, 1708L, 1669L,
1592L, 2133L, 1832L, 1712L, 1427L, 2983L, 2320L, 2434L, NA
), Nov = c(472L, 752L, 773L, 911L, 853L, 733L, 1720L, 1490L,
1636L, 1663L, 1545L, 1786L, 1518L, 2848L, 2112L, 1984L, NA
), Dec = c(517L, 666L, 765L, 844L, 677L, 492L, 1595L, 1298L,
1424L, 1520L, 1445L, 2148L, 1524L, 2374L, 1948L, 1921L, NA
)), .Names = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = "data.frame",
row.names = c("1993",
"1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001",
"2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009"
))

> Posts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1993 NA NA NA NA NA NA 15 458 330 219 472 517
1994 546 511 658 681 712 751 763 975 703 805 752 666
1995 548 734 963 792 945 1002 775 969 745 691 773 765
1996 853 1024 805 1052 1163 999 1184 1053 1176 1197 911 844
1997 1007 1150 1108 1315 1212 1127 1074 692 947 900 853 677
1998 894 1068 945 784 448 813 896 823 894 1129 733 492
1999 514 493 659 1077 778 540 476 612 1351 1708 1720 1595
2000 1720 1519 1177 1163 1963 1615 1572 1696 1491 1669 1490 1298
2001 1826 1537 1915 1467 1735 1905 2027 1976 1439 1592 1636 1424
2002 1941 1845 2010 2199 2373 2133 2445 1492 1864 2133 1663 1520
2003 1832 1846 1755 1757 1863 1701 1926 1689 1646 1832 1545 1445
2004 1636 1652 2188 1826 1836 2606 1843 2143 1784 1712 1786 2148
2005 2122 1960 629 2169 2283 2407 2061 1793 1365 1427 1518 1524
2006 2722 1645 1711 2796 3147 2723 761 2027 2714 2983 2848 2374
2007 2750 926 1728 2766 2974 2691 2435 2592 1868 2320 2112 1948
2008 2305 2255 2712 2789 2025 2368 2607 2584 2554 2434 1984 1921
2009 357 NA NA NA NA NA NA NA NA NA NA NA


One can then review the annual posting frequency via:

pdf("SAS-L.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),
beside = TRUE,
cex.names = 0.6, main = "SAS-L Traffic",
cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
line = 2, cex = 0.5)

dev.off()


There would appear to be marked increases in 2000 and again in 2006.
However, it has been flat for the past 3 calendar years. No decline yet,
but it will happen in due course...

No comparable posting data table exists for S-News as far as I can find,
so I wrote a quick program to read the S-News archive pages here:

http://www.biostat.wustl.edu/archives/html/s-news/

and get monthly posting counts, using the 'Thread' based html pages,
where each monthly embedded post link has a URL of the form:

http://www.biostat.wustl.edu/archives/html/s-news/YYYY-MM/msgXXXXX.html


Thus, the program I used is:

TD <- paste(rep(1998:2009, each = 12), sprintf("%02d", 1:12), sep = "-")
Posts <- numeric(length(TD))

for (i in seq(along = TD))
{
URL <- paste("http://www.biostat.wustl.edu/archives/html/s-news/",
TD[i], "/threads.html", sep = "")

cat(URL, "\n")

if (!inherits(try(con <- readLines(URL)), "try-error"))
{
Posts[i] <- length(grep("msg.*\\.html", con))
rm(con)
} else {
Posts[i] <- NA
}
}


Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
rownames(Posts) <- 1998:2009
colnames(Posts) <- month.abb

That gives you:

Posts <- structure(c(NA, 210, 264, 246, 230, 189, 197, 174, 109, 51, 48,
5, 273, 173, 313, 232, 255, 179, 230, 161, 87, 59, 63, NA, 378,
313, 285, 252, 242, 218, 257, 193, 99, 74, 58, NA, 293, 300,
264, 300, 228, 196, 151, 182, 123, 48, 47, NA, 330, 334, 306,
331, 219, 189, 164, 174, 107, 46, 31, NA, 243, 254, 247, 282,
248, 217, 175, 109, 96, 34, 27, NA, 219, 284, 245, 258, 230,
221, 154, 159, 84, 47, 40, NA, 209, 270, 302, 260, 207, 187,
187, 144, 97, 39, 28, NA, 191, 300, 204, 260, 221, 186, 195,
107, 68, 35, 41, NA, 241, 253, 251, 229, 280, 295, 150, 98, 73,
70, 30, NA, 181, 300, 261, 232, 228, 197, 176, 82, 53, 56, 27,
NA, 141, 194, 176, 194, 177, 142, 176, 84, 20, 41, 36, NA), .Dim = c(12L,
12L), .Dimnames = list(c("1998", "1999", "2000", "2001", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009"), c("Jan",
"Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")))


> Posts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1998 NA 273 378 293 330 243 219 209 191 241 181 141
1999 210 173 313 300 334 254 284 270 300 253 300 194
2000 264 313 285 264 306 247 245 302 204 251 261 176
2001 246 232 252 300 331 282 258 260 260 229 232 194
2002 230 255 242 228 219 248 230 207 221 280 228 177
2003 189 179 218 196 189 217 221 187 186 295 197 142
2004 197 230 257 151 164 175 154 187 195 150 176 176
2005 174 161 193 182 174 109 159 144 107 98 82 84
2006 109 87 99 123 107 96 84 97 68 73 53 20
2007 51 59 74 48 46 34 47 39 35 70 56 41
2008 48 63 58 47 31 27 40 28 41 30 27 36
2009 5 NA NA NA NA NA NA NA NA NA NA NA


Which can then be graphed by:

pdf("S-News.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),
beside = TRUE,
cex.names = 0.6, main = "S-News Traffic",
cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
line = 2, cex = 0.5)

dev.off()

The consistent decline in posting frequency since 1999 is notable. The
temporal association with the introduction of R is perhaps profound.

As long as I am on the subject, I figured that I would do the same for
R-Help. The downside is that readLines() (really url() ) does not
support https:, so I took a somewhat different approach, using wget:


TD <- paste(rep(1997:2009, each = 12), month.name, sep = "-")
Posts <- numeric(length(TD))

for (i in seq(along = TD))
{
URL <- paste("https://stat.ethz.ch/pipermail/r-help/",
TD[i], "/thread.html", sep = "")

cat(URL, "\n")

CMD <- paste("wget", URL)
system(CMD)

if (file.exists("thread.html"))
{
con <- readLines("thread.html")
Posts[i] <- length(grep("[0-9]+\\.html", con))
rm(con)
unlink("thread.html")
} else {
Posts[i] <- NA
}
}

Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
rownames(Posts) <- 1997:2009
colnames(Posts) <- month.abb


This gives you:

Posts <- structure(c(NA, 135, 226, 205, 558, 884, 1017, 1116, 1746,
2075, 1714, 2490, 462, NA, 79, 145, 355, 583, 697, 1137, 1580, 1724,
1920, 1907, 2583, NA, NA, 114, 195, 377, 651, 880, 1203, 1946,
1703, 2270, 2191, 2740, NA, 92, 101, 189, 377, 470, 965, 1488,
1657, 2057, 1818, 2145, 2487, NA, 36, 90, 161, 504, 552, 1057,
1268, 1561, 1887, 2029, 2210, 2517, NA, 47, 105, 186, 418, 550,
926, 1319, 1714, 2056, 1811, 2307, 2774, NA, 41, 110, 184, 293,
615, 918, 1344, 1618, 1872, 1785, 2138, 3268, NA, 37, 64, 148,
356, 562, 824, 1210, 1493, 1777, 1898, 2241, 2813, NA, 40, 94,
203, 434, 678, 705, 1443, 1534, 1709, 1902, 2028, 2990, NA, 76,
96, 231, 418, 657, 1055, 1567, 1712, 1810, 2328, 2708, 3037,
NA, 61, 184, 318, 433, 825, 1038, 1605, 1895, 1907, 2127, 2594,
2730, NA, 57, 105, 221, 422, 530, 742, 1158, 1481, 1508, 1450,
2028, 2399, NA), .Dim = c(13L, 12L), .Dimnames = list(c("1997",
"1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005",
"2006", "2007", "2008", "2009"), c("Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))


> Posts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1997 NA NA NA 92 36 47 41 37 40 76 61 57
1998 135 79 114 101 90 105 110 64 94 96 184 105
1999 226 145 195 189 161 186 184 148 203 231 318 221
2000 205 355 377 377 504 418 293 356 434 418 433 422
2001 558 583 651 470 552 550 615 562 678 657 825 530
2002 884 697 880 965 1057 926 918 824 705 1055 1038 742
2003 1017 1137 1203 1488 1268 1319 1344 1210 1443 1567 1605 1158
2004 1116 1580 1946 1657 1561 1714 1618 1493 1534 1712 1895 1481
2005 1746 1724 1703 2057 1887 2056 1872 1777 1709 1810 1907 1508
2006 2075 1920 2270 1818 2029 1811 1785 1898 1902 2328 2127 1450
2007 1714 1907 2191 2145 2210 2307 2138 2241 2028 2708 2594 2028
2008 2490 2583 2740 2487 2517 2774 3268 2813 2990 3037 2730 2399
2009 462 NA NA NA NA NA NA NA NA NA NA NA


Which again can be graphed as:

pdf("R-Help.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),
beside = TRUE,
cex.names = 0.6, main = "R-Help Traffic",
cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
line = 2, cex = 0.5)

dev.off()


Now....there's a healthy growth curve.... :-)

Note that the annual traffic volume for 2008 on R-Help exceeds that on
SAS-L.

For convenience, I am attaching each of the 3 plots.

Regards,

Marc Schwartz

SAS-L.pdf
S-News.pdf
R-Help.pdf

Spencer Graves

unread,
Jan 7, 2009, 3:19:09 PM1/7/09
to Wacek Kusnierczyk, R help
What kind of warranty does SAS offer? I haven't read their EULA
recently, but if an airplane fell out of the sky because of a bug in SAS
code, I'd be surprised if SAS was eager to pay damages!

Spencer

Duncan Murdoch

unread,
Jan 7, 2009, 3:23:45 PM1/7/09
to Wacek Kusnierczyk, R help
On 1/7/2009 3:03 PM, Wacek Kusnierczyk wrote:
> Kevin E. Thorpe wrote:
>> Zaslavsky, Alan M. wrote:
>>> SAS says it has noticed R’s rising popularity at universities,
>>> despite educational discounts on its own software, but it dismisses
>>> the technology as being of interest to a limited set of people
>>> working on very hard tasks.
>>>
>>> “I think it addresses a niche market for high-end data analysts that
>>> want free, readily available code," said Anne H. Milley, director of
>>> technology product marketing at SAS. She adds, “We have customers who
>>> build engines for aircraft. I am happy they are not using freeware
>>> when I get on a jet.”
>>>
>>
>> Thanks for posting. Does anyone else find the statement by SAS to be
>> humourous yet arrogant and short-sighted?
>
> there must be something wrong with me, but i can't find anything
> 'humorous yet arrogant and short-sighted' in the idea that engines for
> aircraft be built with software that does not advertise itself with
> 'ABSOLUTELY NO WARRANTY.'

Yes, everyone knows that the lack of warranty should be hidden in the
fine print, and say something like this:

"Institute warrants that the media on which SAS/C OnlineDoc is furnished
will be free from defects in material and workmanship under normal use
for a period of ninety (90) days from the date of delivery of SAS/C
OnlineDoc. Licensee’s exclusive remedy for breach of this warranty shall
be replacement of the defective media by the Institute. Institute and
its licensors disclaim all other warranties, express or implied,
including, but not limited to, any implied warranties of merchantability
and/or fitness for a particular purpose whether alleged to arise by law,
by reason of custom or usage in the trade, or by course of dealing. "

(Sorry, I couldn't find SAS/Stat's lack of warranty. I found this one
at
http://support.sas.com/documentation/onlinedoc/sasc/doc700/html/common/agreement.htm)

Duncan Murdoch

Mitchell Maltenfort

unread,
Jan 7, 2009, 3:23:10 PM1/7/09
to R help
On Wed, Jan 7, 2009 at 3:19 PM, Spencer Graves <spencer...@pdf.com> wrote:
> What kind of warranty does SAS offer? I haven't read their EULA recently,
> but if an airplane fell out of the sky because of a bug in SAS code, I'd be
> surprised if SAS was eager to pay damages!
>
> Spencer
>
>


And that's an issue that always comes up on Linux v. Microsoft -- just
because you pay money for it doesn't mean you're buying meaningful
guarantees.
--
Due to the recession, requests for instant gratification will be
deferred until arrears in scheduled gratification have been satisfied.

Thomas Adams

unread,
Jan 7, 2009, 3:25:10 PM1/7/09
to Wacek Kusnierczyk, R help
Wacek,

One would hope that if someone were to use software to "build engines
for aircraft", that said person would sufficiently test the software to
have confidence in it, whether it had a "Warranty" or not — at least
that's my mode of operation…

Cheers!
Tom


--
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL: thomas...@noaa.gov

VOICE: 937-383-0528
FAX: 937-383-0033

Douglas Bates

unread,
Jan 7, 2009, 3:57:43 PM1/7/09
to marc_s...@comcast.net, r-h...@r-project.org
On Wed, Jan 7, 2009 at 8:50 AM, Marc Schwartz <marc_s...@comcast.net> wrote:
> on 01/07/2009 08:44 AM Kevin E. Thorpe wrote:
>> Zaslavsky, Alan M. wrote:
>>> This article is accompanied by nice pictures of Robert and Ross.
>>>
>>> Data Analysts Captivated by Power of R
>>> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>>>
>>>
>>>
>>> January 7, 2009 Data Analysts Captivated by R's Power By ASHLEE VANCE
>>>
>>>
>>> SAS says it has noticed R's rising popularity at universities,
>>> despite educational discounts on its own software, but it dismisses
>>> the technology as being of interest to a limited set of people
>>> working on very hard tasks.
>>>
>>> "I think it addresses a niche market for high-end data analysts that
>>> want free, readily available code," said Anne H. Milley, director of
>>> technology product marketing at SAS. She adds, "We have customers who
>>> build engines for aircraft. I am happy they are not using freeware
>>> when I get on a jet."
>>>
>>
>> Thanks for posting. Does anyone else find the statement by SAS to be
>> humourous yet arrogant and short-sighted?
>>
>> Kevin

> It is an ignorant comment by a marketing person who has been spoon fed
> her lines...it is also a comment being made from a very defensive and
> insecure posture.

To some extent but we should also realize that open source software is
a nonsensical idea to those in the commercial software business. It
just doesn't fit into their world view.

As part of the 40th anniversary of Technometrics there will be a
discussion article on "The Future of Statistical Computing" by Leland
Wilkinson in the Nov. 2008 issue. (I say "will be" because I don't
see it on the web site yet.) Lee is the creator of Systat and is now
associated with SPSS, Inc. which bought Systat. I am one of the
discussants and I agreed with most of what Lee had to say except with
regard to the role of open source software. Lee looked at the market
share of SAS, SPSS, Stata, S-PLUS, Minitab, etc. in statistical
software and based his projections on that. He had some ball park
figure for the "market share" of R and concluded that it wouldn't
really be important. My response was that this misses the point. R
is a community, not a "product" in the traditional software sense. I
referred to Eric Raymond's essay "The Cathedral and the Bazaar", which
I think is still relevant in contrasting the views of those in the
commercial software and the open source software communities.

> Congrats to R Core and the R Community. This is yet another sign of R's
> growth and maturity.

______________________________________________

Gabor Grothendieck

unread,
Jan 7, 2009, 6:24:57 PM1/7/09
to r-h...@r-project.org
Here is the same number of messages/posts data
for each of S, SAS, R:
- reworked into a 3 column ts class time series
- with Jan 2009 removed since its not complete
- leading and trailing NA rows removed

At end we plot the raw data as well as the time
series of totals and show loess smooths for each.

By running the code below we see that the:
- sum of the three seems to be rising at a constant rate
- S is declining
- SAS and R are rising
- R is rising the fastest through its completed its phase
of highest growth which ended around 2004

tt3 <- structure(c(15, 458, 330, 219, 472, 517, 546, 511, 658, 681,
712, 751, 763, 975, 703, 805, 752, 666, 548, 734, 963, 792, 945,
1002, 775, 969, 745, 691, 773, 765, 853, 1024, 805, 1052, 1163,
999, 1184, 1053, 1176, 1197, 911, 844, 1007, 1150, 1108, 1315,
1212, 1127, 1074, 692, 947, 900, 853, 677, 894, 1068, 945, 784,
448, 813, 896, 823, 894, 1129, 733, 492, 514, 493, 659, 1077,
778, 540, 476, 612, 1351, 1708, 1720, 1595, 1720, 1519, 1177,
1163, 1963, 1615, 1572, 1696, 1491, 1669, 1490, 1298, 1826, 1537,
1915, 1467, 1735, 1905, 2027, 1976, 1439, 1592, 1636, 1424, 1941,
1845, 2010, 2199, 2373, 2133, 2445, 1492, 1864, 2133, 1663, 1520,
1832, 1846, 1755, 1757, 1863, 1701, 1926, 1689, 1646, 1832, 1545,
1445, 1636, 1652, 2188, 1826, 1836, 2606, 1843, 2143, 1784, 1712,
1786, 2148, 2122, 1960, 629, 2169, 2283, 2407, 2061, 1793, 1365,
1427, 1518, 1524, 2722, 1645, 1711, 2796, 3147, 2723, 761, 2027,
2714, 2983, 2848, 2374, 2750, 926, 1728, 2766, 2974, 2691, 2435,
2592, 1868, 2320, 2112, 1948, 2305, 2255, 2712, 2789, 2025, 2368,
2607, 2584, 2554, 2434, 1984, 1921, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
273, 378, 293, 330, 243, 219, 209, 191, 241, 181, 141, 210, 173,
313, 300, 334, 254, 284, 270, 300, 253, 300, 194, 264, 313, 285,
264, 306, 247, 245, 302, 204, 251, 261, 176, 246, 232, 252, 300,
331, 282, 258, 260, 260, 229, 232, 194, 230, 255, 242, 228, 219,
248, 230, 207, 221, 280, 228, 177, 189, 179, 218, 196, 189, 217,
221, 187, 186, 295, 197, 142, 197, 230, 257, 151, 164, 175, 154,
187, 195, 150, 176, 176, 174, 161, 193, 182, 174, 109, 159, 144,
107, 98, 82, 84, 109, 87, 99, 123, 107, 96, 84, 97, 68, 73, 53,
20, 51, 59, 74, 48, 46, 34, 47, 39, 35, 70, 56, 41, 48, 63, 58,
47, 31, 27, 40, 28, 41, 30, 27, 36, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 92, 36, 47, 41, 37, 40, 76, 61, 57, 135,
79, 114, 101, 90, 105, 110, 64, 94, 96, 184, 105, 226, 145, 195,
189, 161, 186, 184, 148, 203, 231, 318, 221, 205, 355, 377, 377,
504, 418, 293, 356, 434, 418, 433, 422, 558, 583, 651, 470, 552,
550, 615, 562, 678, 657, 825, 530, 884, 697, 880, 965, 1057,
926, 918, 824, 705, 1055, 1038, 742, 1017, 1137, 1203, 1488,
1268, 1319, 1344, 1210, 1443, 1567, 1605, 1158, 1116, 1580, 1946,
1657, 1561, 1714, 1618, 1493, 1534, 1712, 1895, 1481, 1746, 1724,
1703, 2057, 1887, 2056, 1872, 1777, 1709, 1810, 1907, 1508, 2075,
1920, 2270, 1818, 2029, 1811, 1785, 1898, 1902, 2328, 2127, 1450,
1714, 1907, 2191, 2145, 2210, 2307, 2138, 2241, 2028, 2708, 2594,
2028, 2490, 2583, 2740, 2487, 2517, 2774, 3268, 2813, 2990, 3037,
2730, 2399), .Dim = c(186L, 3L), .Dimnames = list(NULL, c("SAS",
"S", "R")), .Tsp = c(1993.5, 2008.91666666667, 12), class = c("mts",
"ts"))

tt4 <- cbind(tt3, rowSums(tt3))
colnames(tt4) <- c(colnames(tt3), "Sum")
ts.plot(tt4, col = 1:4)
grid()
legend("topleft", colnames(tt4), lty = 1, col = 1:4)

library(dyn)
for(i in 1:4) lines(fitted(dyn$loess(tt4[, i] ~ time(tt4))), col = i)

hadley wickham

unread,
Jan 7, 2009, 7:13:20 PM1/7/09
to Gabor Grothendieck, r-h...@r-project.org
Here's a couple of similar plots created with ggplot2. I chose to
turn the data into a data frame with an explicit date column. Using a
log scale somewhat stabilises the variability.

## SAS-L traffic
sas <- structure(list(Jan = c(NA, 546L, 548L, 853L, 1007L, 894L, 514L,

## s-news traffic


s <- structure(c(NA, 210, 264, 246, 230, 189, 197, 174, 109, 51, 48,
5, 273, 173, 313, 232, 255, 179, 230, 161, 87, 59, 63, NA, 378,
313, 285, 252, 242, 218, 257, 193, 99, 74, 58, NA, 293, 300,
264, 300, 228, 196, 151, 182, 123, 48, 47, NA, 330, 334, 306,
331, 219, 189, 164, 174, 107, 46, 31, NA, 243, 254, 247, 282,
248, 217, 175, 109, 96, 34, 27, NA, 219, 284, 245, 258, 230,
221, 154, 159, 84, 47, 40, NA, 209, 270, 302, 260, 207, 187,
187, 144, 97, 39, 28, NA, 191, 300, 204, 260, 221, 186, 195,
107, 68, 35, 41, NA, 241, 253, 251, 229, 280, 295, 150, 98, 73,
70, 30, NA, 181, 300, 261, 232, 228, 197, 176, 82, 53, 56, 27,
NA, 141, 194, 176, 194, 177, 142, 176, 84, 20, 41, 36, NA), .Dim = c(12L,
12L), .Dimnames = list(c("1998", "1999", "2000", "2001", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009"), c("Jan",
"Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")))

r <- structure(c(NA, 135, 226, 205, 558, 884, 1017, 1116, 1746,


2075, 1714, 2490, 462, NA, 79, 145, 355, 583, 697, 1137, 1580, 1724,
1920, 1907, 2583, NA, NA, 114, 195, 377, 651, 880, 1203, 1946,
1703, 2270, 2191, 2740, NA, 92, 101, 189, 377, 470, 965, 1488,
1657, 2057, 1818, 2145, 2487, NA, 36, 90, 161, 504, 552, 1057,
1268, 1561, 1887, 2029, 2210, 2517, NA, 47, 105, 186, 418, 550,
926, 1319, 1714, 2056, 1811, 2307, 2774, NA, 41, 110, 184, 293,
615, 918, 1344, 1618, 1872, 1785, 2138, 3268, NA, 37, 64, 148,
356, 562, 824, 1210, 1493, 1777, 1898, 2241, 2813, NA, 40, 94,
203, 434, 678, 705, 1443, 1534, 1709, 1902, 2028, 2990, NA, 76,
96, 231, 418, 657, 1055, 1567, 1712, 1810, 2328, 2708, 3037,
NA, 61, 184, 318, 433, 825, 1038, 1605, 1895, 1907, 2127, 2594,
2730, NA, 57, 105, 221, 422, 530, 742, 1158, 1481, 1508, 1450,
2028, 2399, NA), .Dim = c(13L, 12L), .Dimnames = list(c("1997",
"1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005",
"2006", "2007", "2008", "2009"), c("Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))

library(reshape)
sas <- melt(as.matrix(sas), na.rm = TRUE)
r <- melt(r, na.rm = TRUE)
s <- melt(s, na.rm = TRUE)
names(r) <- names(s) <- names(sas) <- c("year", "month", "count")

sas$software <- "sas"
s$software <- "s"
r$software <- "r"
all <- rbind(sas, s, r)
all$date <- with(all,
as.Date(paste(year, month, 15, sep = "-"), "%Y-%b-%d"))


library(ggplot2)
qplot(date, count, data = all, geom = "line", colour = software) +
geom_smooth(se = F, size = 1)
last_plot() + scale_y_log10(breaks = 10^(1:3), labels = 10^(1:3))

yearly <- ddply(all, .(year, software), function(df) c(count = sum(df$count)))
qplot(year, count, data = yearly, geom = "line", colour = software)


Hadley

--
http://had.co.nz/

Spencer Graves

unread,
Jan 7, 2009, 6:53:03 PM1/7/09
to Gabor Grothendieck, r-h...@r-project.org
Thanks, Gabor, Marc, Max:

The image is even more striking (and more accurately reflects
reality, I believe) if you add "log='y'" to "ts.plot".

Best Wishes,
Spencer

Gabor Grothendieck

unread,
Jan 7, 2009, 7:52:17 PM1/7/09
to Spencer Graves, r-h...@r-project.org
I did try the log version as well prior to posting but although
it would seem to exaggerate the difference to me the insights
from plotting the raw data with loess (i.e. constancy of the total, piecewise
constant growth of R) come through best.

Marc Schwartz

unread,
Jan 7, 2009, 9:47:51 PM1/7/09
to Gabor Grothendieck, r-h...@r-project.org, Spencer Graves
> Here's a couple of similar plots created with ggplot2. I chose to
> turn the data into a data frame with an explicit date column. Using a
> log scale somewhat stabilises the variability.
>
> ## SAS-L traffic
> sas <- structure(list(Jan = c(NA, 546L, 548L, 853L, 1007L, 894L, 514L,
> ## s-news traffic

> s <- structure(c(NA, 210, 264, 246, 230, 189, 197, 174, 109, 51, 48,
> 5, 273, 173, 313, 232, 255, 179, 230, 161, 87, 59, 63, NA, 378,
> 313, 285, 252, 242, 218, 257, 193, 99, 74, 58, NA, 293, 300,
> 264, 300, 228, 196, 151, 182, 123, 48, 47, NA, 330, 334, 306,
> 331, 219, 189, 164, 174, 107, 46, 31, NA, 243, 254, 247, 282,
> 248, 217, 175, 109, 96, 34, 27, NA, 219, 284, 245, 258, 230,
> 221, 154, 159, 84, 47, 40, NA, 209, 270, 302, 260, 207, 187,
> 187, 144, 97, 39, 28, NA, 191, 300, 204, 260, 221, 186, 195,
> 107, 68, 35, 41, NA, 241, 253, 251, 229, 280, 295, 150, 98, 73,
> 70, 30, NA, 181, 300, 261, 232, 228, 197, 176, 82, 53, 56, 27,
> NA, 141, 194, 176, 194, 177, 142, 176, 84, 20, 41, 36, NA), .Dim = c(12L,
> 12L), .Dimnames = list(c("1998", "1999", "2000", "2001", "2002",
> "2003", "2004", "2005", "2006", "2007", "2008", "2009"), c("Jan",
> "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
> "Nov", "Dec")))
>
> r <- structure(c(NA, 135, 226, 205, 558, 884, 1017, 1116, 1746,

> 2075, 1714, 2490, 462, NA, 79, 145, 355, 583, 697, 1137, 1580, 1724,
> 1920, 1907, 2583, NA, NA, 114, 195, 377, 651, 880, 1203, 1946,
> 1703, 2270, 2191, 2740, NA, 92, 101, 189, 377, 470, 965, 1488,
> 1657, 2057, 1818, 2145, 2487, NA, 36, 90, 161, 504, 552, 1057,
> 1268, 1561, 1887, 2029, 2210, 2517, NA, 47, 105, 186, 418, 550,
> 926, 1319, 1714, 2056, 1811, 2307, 2774, NA, 41, 110, 184, 293,
> 615, 918, 1344, 1618, 1872, 1785, 2138, 3268, NA, 37, 64, 148,
> 356, 562, 824, 1210, 1493, 1777, 1898, 2241, 2813, NA, 40, 94,
> 203, 434, 678, 705, 1443, 1534, 1709, 1902, 2028, 2990, NA, 76,
> 96, 231, 418, 657, 1055, 1567, 1712, 1810, 2328, 2708, 3037,
> NA, 61, 184, 318, 433, 825, 1038, 1605, 1895, 1907, 2127, 2594,
> 2730, NA, 57, 105, 221, 422, 530, 742, 1158, 1481, 1508, 1450,
> 2028, 2399, NA), .Dim = c(13L, 12L), .Dimnames = list(c("1997",
> "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005",
> "2006", "2007", "2008", "2009"), c("Jan", "Feb", "Mar", "Apr",
> "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))
>
> library(reshape)
> sas <- melt(as.matrix(sas), na.rm = TRUE)
> r <- melt(r, na.rm = TRUE)
> s <- melt(s, na.rm = TRUE)
> names(r) <- names(s) <- names(sas) <- c("year", "month", "count")
>
> sas$software <- "sas"
> s$software <- "s"
> r$software <- "r"
> all <- rbind(sas, s, r)
> all$date <- with(all,
> as.Date(paste(year, month, 15, sep = "-"), "%Y-%b-%d"))
>
>
> library(ggplot2)
> qplot(date, count, data = all, geom = "line", colour = software) +
> geom_smooth(se = F, size = 1)
> last_plot() + scale_y_log10(breaks = 10^(1:3), labels = 10^(1:3))
>
> yearly <- ddply(all, .(year, software), function(df) c(count = sum(df$count)))
> qplot(year, count, data = yearly, geom = "line", colour = software)


Hadley,

You might want to remove the 2009 data from each of the three lists
given that the January data is not yet complete.

The result of including the January 2009 data in your plots is that the
growth trajectory for the smoothed curves for SAS-L and R-Help appear to
be leveling or even declining, when at least for R-Help, that is not the
case. The S-News curve is not affected significantly, given the already
declining counts.

The effect of the 2009 data is most noticeable in the log scale plot.

Thus:

all <- subset(all, year < 2009)

# Linear scale


qplot(date, count, data = all, geom = "line", colour = software) +
geom_smooth(se = F, size = 1)


# Log scale


last_plot() + scale_y_log10(breaks = 10^(1:3), labels = 10^(1:3))


HTH,

hadley wickham

unread,
Jan 7, 2009, 9:58:30 PM1/7/09
to marc_s...@comcast.net, r-h...@r-project.org, Spencer Graves
> You might want to remove the 2009 data from each of the three lists
> given that the January data is not yet complete.
>
> The result of including the January 2009 data in your plots is that the
> growth trajectory for the smoothed curves for SAS-L and R-Help appear to
> be leveling or even declining, when at least for R-Help, that is not the
> case. The S-News curve is not affected significantly, given the already
> declining counts.
>
> The effect of the 2009 data is most noticeable in the log scale plot.
>
> Thus:
>
> all <- subset(all, year < 2009)

Good point - thanks for the fix!

Hadley

--
http://had.co.nz/

Gabor Grothendieck

unread,
Jan 7, 2009, 10:00:14 PM1/7/09
to hadley wickham, marc_s...@comcast.net, r-h...@r-project.org, Spencer Graves
Note that the mts object I posted already had Jan 2009 removed and also
had the NA rows removed.

Dirk Eddelbuettel

unread,
Jan 7, 2009, 10:26:58 PM1/7/09
to Gabor Grothendieck, r-h...@r-project.org

On 7 January 2009 at 18:24, Gabor Grothendieck wrote:
| By running the code below we see that the:
| - sum of the three seems to be rising at a constant rate
| - S is declining
| - SAS and R are rising
| - R is rising the fastest through its completed its phase
| of highest growth which ended around 2004

I wonder whether we need to account for traffic on all the additional r-sig-*
mailing lists ?

Of the handful that I follow, some seem to have taken traffic from r-help.
This could account for (at least parts of) the apparent traffic growth
slowdown since 2004 as many of these added lists appeared only in the last
few years.

Dirk

--
Three out of two people have difficulties with fractions.

Gabor Grothendieck

unread,
Jan 7, 2009, 10:47:58 PM1/7/09
to Dirk Eddelbuettel, r-h...@r-project.org
On Wed, Jan 7, 2009 at 10:26 PM, Dirk Eddelbuettel <e...@debian.org> wrote:
>
> On 7 January 2009 at 18:24, Gabor Grothendieck wrote:
> | By running the code below we see that the:
> | - sum of the three seems to be rising at a constant rate
> | - S is declining
> | - SAS and R are rising
> | - R is rising the fastest through its completed its phase
> | of highest growth which ended around 2004
>
> I wonder whether we need to account for traffic on all the additional r-sig-*
> mailing lists ?
>
> Of the handful that I follow, some seem to have taken traffic from r-help.
> This could account for (at least parts of) the apparent traffic growth
> slowdown since 2004 as many of these added lists appeared only in the last
> few years.
>

Good observation. It would be interesting to combine the data from all
the lists to see what the effect is.

Marc Schwartz

unread,
Jan 8, 2009, 8:51:40 AM1/8/09
to Gabor Grothendieck, r-h...@r-project.org, Dirk Eddelbuettel
on 01/07/2009 09:47 PM Gabor Grothendieck wrote:
> On Wed, Jan 7, 2009 at 10:26 PM, Dirk Eddelbuettel <e...@debian.org> wrote:
>> On 7 January 2009 at 18:24, Gabor Grothendieck wrote:
>> | By running the code below we see that the:
>> | - sum of the three seems to be rising at a constant rate
>> | - S is declining
>> | - SAS and R are rising
>> | - R is rising the fastest through its completed its phase
>> | of highest growth which ended around 2004
>>
>> I wonder whether we need to account for traffic on all the additional r-sig-*
>> mailing lists ?
>>
>> Of the handful that I follow, some seem to have taken traffic from r-help.
>> This could account for (at least parts of) the apparent traffic growth
>> slowdown since 2004 as many of these added lists appeared only in the last
>> few years.
>>
>
> Good observation. It would be interesting to combine the data from all
> the lists to see what the effect is.

Agreed.

You can use the basic framework of the R-Help code that I posted
yesterday to do that.

The key gotcha is that some of the list archives have the posts stored
on a per calendar quarter basis, not monthly. At least one has a mix.
This seems to be somewhat dependent upon list volume, though that is not
a consistent factor.

Thus, you would have to review each archive individually and adjust the
archive URL's in the code accordingly.

You would also see the impact on the subsequent aggregation of the data,
since the monthly time series based analyses (as opposed to yearly) will
have to be adjusted, given the differing granularity of the data.

HTH,

Marc

Max Kuhn

unread,
Jan 8, 2009, 10:16:50 AM1/8/09
to r-h...@r-project.org

Doran, Harold

unread,
Jan 8, 2009, 11:09:56 AM1/8/09
to r-h...@r-project.org
The open-source mentality is invaluable, as most on this list know. That
is what keeps the R evolution progressing at a pace that SAS cannot keep
up with.

On a side note (a very side note), I am a zealot for an exercise program
called Crossfit. Crossfit has adopted the same open-source mentality as
found in the Linux model and has grown into the most valuable fitness
and strength training program on the planet. There is an online journal
(called crossfit journal)
http://library.crossfit.com/free/pdf/CrossFitJournal-Budding_Retrospecti
ve.pdf that lists the three components of the linux open-source model:

The Linux development model:
* Release early and often
* Delegate everything you can
* Be open to the point of promiscuity

Crossfit then followed with its own open-source principles:

The CrossFit development model:
* Release early and often
- Daily!
* Delegate everything you can
- Meet the experts from the realms of climbing, lifting, swimming,
gymnastics, fighting, you name it.
* Be open to the point of promiscuity
- Read the WOD weblog comments.
- Check out the discussion board.
- See photos of athletes puking!

The point being, it is not the program itself that is amazing, but the
people that have made serious contributions to it that make it so. In
the same vein, R is only a representation of the many, many valuable
talented people who are constantly adding to its functionality because
of its open-source nature. That is, R itself is good, useful etc. But,
it is the people that add to it and help it grow as a scientific tool
that keep it as the lingua franca.

Stas Kolenikov

unread,
Jan 8, 2009, 11:42:07 AM1/8/09
to r-h...@r-project.org
On 1/7/09, Gabor Grothendieck <ggroth...@gmail.com> wrote:
> Here is the same number of messages/posts data
> for each of S, SAS, R:
> - reworked into a 3 column ts class time series
> - with Jan 2009 removed since its not complete
> - leading and trailing NA rows removed

My software of choice is Stata, so here are compatible data from
statalist (using
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/):

## Statalist traffic
stata <- structure(c(
654,574,781, 848, 714, 823,1063,1057,
701,625,909, 799, 941,1052,1013,1269,
868,690,937,1155,1040,1113,1125,1252,
640,649,899, 898,1013,1161, 991,1325,
622,697,726,1102, 818,1077,1111,1374,
684,548,651, 876, 964, 963,1125,1078,
717,588,943, 923, 885, 892, 986,1200,
728,575,605, 901,1010,1011,1224,1396,
627,605,712, 807,1098, 951, 939,1446,
844,790,970, 940,1001,1283,1231,1509,
776,644,870, 928,1094, 928, 999,1340,
603,512,670, 824, 794, 951, 739,1056
),
.Dim = c(8L, 12L),
.Dimnames = list(c("2001", "2002", "2003", "2004", "2005",
"2006", "2007", "2008"), c("Jan", "Feb", "Mar", "Apr",


"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))

The list existed from 1994 or 1996 or so, but the data are only
available from 2001. You'd probably be surprised to find out that
based on the list summaries, the size of Stata world is about half of
SAS on the counts plot; and on the log scale, it shows linear (which
means, exponential) growth throughout the range, while both SAS and R
have been slowing down in the last couple of years (with an
explanation already offered regarding the r-sig-* lists).

Of course overall that's an incorrect comparison, to begin with. The
support systems for all three packages are different: most (US)
universities will have dedicated and well-certified SAS gurus
answering most semicolon questions locally, while r-help would be the
first thing on my mind if I cannot get what I need in the docs. I
would thus expect traffic on r-help will to be heavier relative to the
user base.

Another measure of interest might be the number of contributed
packages. The phrase for R is this: "Currently, the CRAN package
repository features 1633 objects including 1625 packages and 8 bundles
containing 34 packages, for a total of 1659 available packages." The
phrase for Stata is this: "Statistical Software Components,
Boston College Department of Economics: There are currently 1275 items
in this series, of which 1274 are downloadable"
(http://logec.repec.org/scripts/seriesstat.pl?item=repec:boc:bocode).
So programming activity in Stata is about 3/4 of that in R at their
face values (you would probably need to downplay both numbers for
obsolete packages, though). Whether SAS has a unified repository of
user contributed modules with direct counts available, I have no clue.

A really good measure for R will be the total # of the downloads of
r-base for all platforms from all CRAN mirrors (and I would expect
that # can be found from the servers' logs). Given that it is so easy
to download everything nice and clean and up to date, I would doubt
anybody will be distributing CD-ROMs with R install files among
friends and colleagues. SAS (and Stata, and SPSS, and Minitab, and...)
should have their (internal) number of licenses sold (and yes those
come on the disks initially), but those are badly blurred by the
network licenses, and are commercial secrets, anyway.

--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

Andrew Choens

unread,
Jan 8, 2009, 2:12:10 PM1/8/09
to Stas Kolenikov, r-h...@r-project.org
On Thu, 2009-01-08 at 10:42 -0600, Stas Kolenikov wrote:
> A really good measure for R will be the total # of the downloads of
> r-base for all platforms from all CRAN mirrors (and I would expect
> that # can be found from the servers' logs). Given that it is so easy
> to download everything nice and clean and up to date, I would doubt
> anybody will be distributing CD-ROMs with R install files among
> friends and colleagues. SAS (and Stata, and SPSS, and Minitab, and...)
> should have their (internal) number of licenses sold (and yes those
> come on the disks initially), but those are badly blurred by the
> network licenses, and are commercial secrets, anyway.

The number of r-core downloads is definitely NOT representative of the
number of people using R. If you use R on Windows or OS X, you will
obviously download R from the mirrors. However, this methodology would
effectively ignore many users of R on Linux. I use R on a regular basis
and I have it installed on three separate systems, all running Ubuntu.
In all of these cases, I am downloading and installing r-core from the
Ubuntu Mirror in the USA, not from CRAN.

Of course, the number of Linux users is miniscule compared to the number
of Windows users, but I think it is safe to say the Linux users are, in
general, a more tech-savvy group than Windows users and are more likely
to be comfortable using R's interactive programming interface. I think
it is also fair to say that MANY (though not all) Linux users would be
uncomfortable installing SPSS or SAS or Stata onto their open-source
system and would prefer to use R. Thus, Linux users probably account for
a higher proportion of R's user-base than they do in the general
computing population. . . . although I do not claim to actually know
this proportion.

Ehh. Comparing the popularity of computer software is incredibly tricky
to do, especially when some of the software being compared in
open-source.


--
Insert something humorous here. :-)

Louis Bajuk-Yorgan

unread,
Jan 8, 2009, 1:28:11 PM1/8/09
to r-h...@r-project.org

As the product manager for S+, I'd like to comment as well. I think the
burgeoning interest in R demonstrates that there's demand for analytics
to solve real, business-critical problems in a broad spectrum of
companies and roles, and that some of the incumbent analytics offerings,
in particular SAS and SPSS, don't sufficiently meet the growing need for
analytics in many major companies.

S+ (now TIBCO Spotfire S+) is of course a commercial software package
based on the S language, which was a forerunner of R as mentioned in the
article, and has been widely adopted. It is currently used in a wide
variety of areas, including Life Sciences, Financial Services, and
Utilities, for applications such as speeding the analysis of clinical
trial data, optimizing portfolios, and assessing potential sites for
building wind farms.

I welcome, respect, and appreciate the vitality, creativity, and sheer
productivity of the R community, and the high quality of statistical
methods the community creates. And, because of the close historical ties
between the two products, it is generally easy to port most R statistics
into the commercial S+ environment, and we have worked to make that
easier in recent releases.

Once in S+, these analytic methods can be incorporated into intuitive
tools for business decision makers and deployed to automated
environments, using visual workflows, web-based applications (using
standard web services), Spotfire Guided Applications for dynamic visual
analysis, and scalable, event-driven architectures using TIBCO's IT
infrastructure. S+ also provides some unique offerings, such as the
ability to flexibly and efficiently analyze very large data sets.

In this way, I feel companies can maximize the value of their analytic
investments to make rapid business decisions, whether those analytics
are developed in R or S+.

Regards,
Lou Bajuk-Yorgan
Sr. Director, Product Management
TIBCO Spotfire Division
lba...@tibco.com

-----Original Message-----
From: r-help-...@r-project.org [mailto:r-help-...@r-project.org]
On Behalf Of Douglas Bates
Sent: Wednesday, January 07, 2009 12:58 PM
To: marc_s...@comcast.net
Cc: r-h...@r-project.org
Subject: Re: [R] R in the NY Times

______________________________________________

Carlos J. Gil Bellosta

unread,
Jan 8, 2009, 2:26:45 PM1/8/09
to Stas Kolenikov, r-h...@r-project.org
On Thu, 2009-01-08 at 10:42 -0600, Stas Kolenikov wrote:
> A really good measure for R will be the total # of the downloads of
> r-base for all platforms from all CRAN mirrors (and I would expect
> that # can be found from the servers' logs).

Hello,

You obviate here that many of us are downloading R from our Linux
distribution repositories directly.

Besides, given the free nature of R, some of us install it in several
computers, even, in my case, briefly in somebody else's computer for a
short time if I have an urgent task to solve. Of course, I would never
do (or be able to do) this with SAS...

So, the number of downloads from CRAN servers seems like a lousy proxy
for the total number of users of SAS.

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com

ohri...@gmail.com

unread,
Jan 8, 2009, 2:28:13 PM1/8/09
to Louis Bajuk-Yorgan, r-h...@r-project.org
Yes I think R as a package can really learn from SAS and SPSS in
making GUI more user friendly , even at the risk of dumbing down some
complexity..

also as a consultant I know that selling software requires a lot of
marketing follow ups..which is why R has lagged behind in actual
implementation and marketing ( who will go on site at a client and
implement)...despite being more robust and of course helping companies
save costs in these critical times.

If you market R more and even get a 10 % share of the commercial
market, imagine how many jobs you save by cutting down software costs
of the employers..

Ajay
www.decisionstats.com


--
Regards,

Ajay Ohri
http://tinyurl.com/liajayohri

Rahul-A...@ubs.com

unread,
Jan 8, 2009, 2:34:53 PM1/8/09
to ohri...@gmail.com, lba...@tibco.com, r-h...@r-project.org
I believe R as a package has everything people with little knowledge of
programming can handle quite easily. Moreover even if someone has no
programming knowledge can learn R without much effort.
I also believe if people in corporate world start using R instead of
other complex software which are very expensive then in this job make we
can save many jobs and can also save people.

Marc Schwartz

unread,
Jan 8, 2009, 2:52:14 PM1/8/09
to Andrew Choens, r-h...@r-project.org
on 01/08/2009 01:12 PM Andrew Choens wrote:
> On Thu, 2009-01-08 at 10:42 -0600, Stas Kolenikov wrote:
>> A really good measure for R will be the total # of the downloads of
>> r-base for all platforms from all CRAN mirrors (and I would expect
>> that # can be found from the servers' logs). Given that it is so easy
>> to download everything nice and clean and up to date, I would doubt
>> anybody will be distributing CD-ROMs with R install files among
>> friends and colleagues. SAS (and Stata, and SPSS, and Minitab, and...)
>> should have their (internal) number of licenses sold (and yes those
>> come on the disks initially), but those are badly blurred by the
>> network licenses, and are commercial secrets, anyway.
>
> The number of r-core downloads is definitely NOT representative of the
> number of people using R. If you use R on Windows or OS X, you will
> obviously download R from the mirrors. However, this methodology would
> effectively ignore many users of R on Linux. I use R on a regular basis
> and I have it installed on three separate systems, all running Ubuntu.
> In all of these cases, I am downloading and installing r-core from the
> Ubuntu Mirror in the USA, not from CRAN.

I would also note that R has been available via the Fedora yum repos for
some time, which as with the Debian/Ubuntu repos, would be missed in
just counting CRAN downloads.

There are quite a few other Linux distributions that have a similar
infrastructure in place where R is available as an 'add-on' or where the
main distribution itself includes R.

Additionally, there are many folks who will build R from source code,
using the updated source tarballs via FTP or, as I do, by getting the
source code right from the R subversion repo. These too would not be
considered in a CRAN based count.

> Of course, the number of Linux users is miniscule compared to the number
> of Windows users, but I think it is safe to say the Linux users are, in
> general, a more tech-savvy group than Windows users and are more likely
> to be comfortable using R's interactive programming interface. I think
> it is also fair to say that MANY (though not all) Linux users would be
> uncomfortable installing SPSS or SAS or Stata onto their open-source
> system and would prefer to use R. Thus, Linux users probably account for
> a higher proportion of R's user-base than they do in the general
> computing population. . . . although I do not claim to actually know
> this proportion.
>
> Ehh. Comparing the popularity of computer software is incredibly tricky
> to do, especially when some of the software being compared in
> open-source.

Correct. Trying extrapolate the number of users from any of these
measures is quite complex, if doable at all.

Even using the posting frequencies as I did yesterday, needs to be taken
with a grain of salt in trying to attempt to get a sense of growth.

As Dirk noted, the many R-SIG-* e-mail lists have offloaded some level
of traffic from R-Help, which may account for the rate of growth in the
R-Help posts declining somewhat since 2004 as Gabor pointed out, even
though the absolute number of annual posts continues to increase.

Reading the posts on SAS-L since yesterday via Google RSS, where the NYT
article was also posted, some have noted that SAS itself offers online
support forums (http://support.sas.com/forums/index.jspa). From a quick
review, it looks like the SAS.com forums date back to perhaps early
2006, thus possibly accounting for some of the leveling of the posts on
SAS-L recently.

HTH,

Marc Schwartz

Carlos J. Gil Bellosta

unread,
Jan 8, 2009, 3:28:07 PM1/8/09
to r-h...@r-project.org
On Thu, 2009-01-08 at 13:52 -0600, Marc Schwartz wrote:
> Reading the posts on SAS-L since yesterday via Google RSS, where the
> NYT
> article was also posted, some have noted that SAS itself offers online
> support forums (http://support.sas.com/forums/index.jspa). From a
> quick
> review, it looks like the SAS.com forums date back to perhaps early
> 2006, thus possibly accounting for some of the leveling of the posts
> on
> SAS-L recently.

Hello,

Not only that: the corporate intranet of SAS (sections of which are
sometime open for external consultants for certain products) also
contain forums with an uneven traffic flow. These will certainly absorb
part of the traffic that would otherwise hit lists like SAS-L.

In fact, in my five years experience working (also as) a SAS consultant,
I have never posted to SAS-L. However, I have posted (or had my requests
posted by other SAS employees) on these lists.

Having said that, I should also add that R represents a threat to SAS
(which does not stand for Statistical Analysis System for a long time
already) in a business segment that very doubtfully accounts for more
than 5-10% of their revenue. They have to sell about 1000 licenses of
SAS/BASE and SAS/STAT in order to match the annual revenues from a
single license for a single "solution" in a single top tier bank.

It is quite amusing, though, to browse SAS marketing internal
documentation --to which I had access some time ago-- on "how to
compete" against R. The SAS salesperson statement in the article seems
to have been extracted verbatim from them.

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com

______________________________________________

Tony Breyal

unread,
Jan 10, 2009, 8:11:52 AM1/10/09
to r-h...@r-project.org
“We have customers who build engines for aircraft. I am happy they are
not using freeware when I get on a jet.”

The lady who made this comment, Anne H. Milley, director of technology
product marketing at SAS, has written a response to try and clarify
what she meant (funilly enough, i got this link from a SAS mate of
mine who is now going to have a look into R for the first time):

http://blogs.sas.com/sascom/index.php?/archives/434-This-post-is-rated-R.html


[quote]
"As for open source and my airplane quote …

My remark reflects a key difference between R and SAS, that of
support, reliability, and validation. Customers value SAS for many
things, including our extensive testing, documentation, 24/7 support,
and training. In contrast, the quality of proliferating R packages is
varied and uneven, especially in complex analytical modules. Mistakes
in these packages can lead to misleading results, even for experienced
users.

The airplane comment was meant to point out this key difference. Not
to condemn open source. In fact, SAS values open-source software. Our
software runs on Linux. We use some open-source tools in development.
And we plan to embrace open source further in the future.

The world has many complex problems. We advocate approaches based on
science, on analysis to address these problems. Making more analytic
methods readily available is a good thing. From SAS; from R; from the
resourceful individuals who innovate with their tools of choice,
regardless of the source."
[end quote]

On 7 Jan, 14:50, Marc Schwartz <marc_schwa...@comcast.net> wrote:
> on 01/07/2009 08:44 AM Kevin E. Thorpe wrote:
>
>
>
> > Zaslavsky, Alan M. wrote:
> >> This article is accompanied by nice pictures of Robert and Ross.
>
> >> Data Analysts Captivated by Power of R

> >>http://www.nytimes.com/2009/01/07/technology/business-computing/07pro...


>
> >> January 7, 2009 Data Analysts Captivated by R’s Power By ASHLEE VANCE
>
> >> SAS says it has noticed R’s rising popularity at universities,
> >> despite educational discounts on its own software, but it dismisses
> >> the technology as being of interest to a limited set of people
> >> working on very hard tasks.
>
> >> “I think it addresses a niche market for high-end data analysts that
> >> want free, readily available code," said Anne H. Milley, director of
> >> technology product marketing at SAS. She adds, “We have customers who
> >> build engines for aircraft. I am happy they are not using freeware
> >> when I get on a jet.”
>
> > Thanks for posting.  Does anyone else find the statement by SAS to be
> > humourous yet arrogant and short-sighted?
>
> > Kevin
>
> It is an ignorant comment by a marketing person who has been spoon fed
> her lines...it is also a comment being made from a very defensive and
> insecure posture.
>

> Congrats to R Core and the R Community. This is yet another sign of R's
> growth and maturity.
>

> Regards,
>
> Marc Schwartz
>
> ______________________________________________
> R-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html

Florian Lengyel

unread,
Jan 10, 2009, 1:16:59 PM1/10/09
to Tony Breyal, r-h...@r-project.org


Ms. Milley mischaracterizes her remark about the relative
unreliability of "freeware"
as if she had employed the term"open source."

David A. Wheeler's "Why Open Source Software / Free Software (OSS/FS,
FLOSS, or FOSS)? Look at the Numbers!" provides quantitative measures
for evaluating open source software, including
market share, reliability, performance, scalability, security, and
total cost of ownership. With respect to
the reliability of open source software, Wheeler writes, "There are a
lot of anecdotal stories that OSS/FS is more reliable, but finally
there is quantitative data confirming that mature OSS/FS programs are
often more reliable [than equivalent proprietary software programs]."
Wheeler lists among his sources the Fuzz Report
http://pages.cs.wisc.edu/~bart/fuzz/fuzz.html .

Barry Rowlingson

unread,
Jan 10, 2009, 1:31:39 PM1/10/09
to Tony Breyal, r-h...@r-project.org
2009/1/10 Tony Breyal <tony....@googlemail.com>:

> [SAS marketroid quote]


> "In fact, SAS values open-source software."

But clearly not enough to open-source SAS itself. It would seem that
SAS values _other_people's_ open source.

If SAS was open source and free, then SAS would collect on all the
other things "Customers value SAS for" - support, testing, training,
docs, etc etc. And there would be a lot more people using it.

Another quote: "We advocate approaches based on science" - closed
source is closed knowledge and is nearer alchemy than science. I may
use proprietary software for video editing or music production, but
when it comes to science, it's got to be open.

Barru

Ajay ohri

unread,
Jan 10, 2009, 1:52:29 PM1/10/09
to Barry Rowlingson, r-h...@r-project.org, Tony Breyal
more on the reasons R is bad for you
http://www.decisionstats.com/2009/01/top-ten-rrreasons-r-is-bad-for-you/

[[alternative HTML version deleted]]

Bert Gunter

unread,
Jan 10, 2009, 2:31:03 PM1/10/09
to Barry Rowlingson, Tony Breyal, r-h...@stat.math.ethz.ch

I think the substance of the issue is that the more eyes on code, the fewer
the bugs (assuming a well-designed examination and debugging process is in
place, as is typical for large open source projects like R). By this
(obvious?)criterion, both the remarks about the dangers of proprietary code
and the greater unreliability of R's lesser-used specialty packages, which
by their nature tend to be less carefully perused, are valid.

Perhaps an argument is that certain code might not get written at all if it
were not proprietary. Device drivers might be an example. But possibly other
than that, it does seem like SAS needs to reconsider their marketing
strategy and advertising claims.

Anecdotal remark: I orginally moved from S-Plus to R because R provided
**better** documentation, support, and had fewer bugs, which were more
rapidly fixed when found. One of my smarter "investment" choices.

Cheers to all,
Bert Gunter

-----Original Message-----
From: r-help-...@r-project.org [mailto:r-help-...@r-project.org] On
Behalf Of Barry Rowlingson
Sent: Saturday, January 10, 2009 10:32 AM
To: Tony Breyal
Cc: r-h...@r-project.org
Subject: Re: [R] R in the NY Times

Kingsford Jones

unread,
Jan 10, 2009, 2:50:36 PM1/10/09
to Tony Breyal, r-h...@r-project.org
The reactions to the NYT article have certainly made for some
interesting reading.

Here are some of the links:

http://overdetermined.net/site/content/new-york-times-article-r

http://jackman.stanford.edu/blog/?p=1053

http://ggorjan.blogspot.com/2009/01/new-york-times-on-r.html

several posts on Andrew Gelman's blog:
http://www.stat.columbia.edu/~gelman/blog/

http://www.reddit.com/r/programming/comments/7nwgq/the_new_york_times_notices_the_r_programming/

comments here: http://bits.blogs.nytimes.com/2009/01/08/r-you-ready-for-r/


It's too bad that SAS has reacted to the negative reactions to their
NYT quote with more FUD. The quote that Tony posted is just a
thinly-veiled jab at R (veiled by a disingenuous "we value open
source" veneer). Perhaps SAS is shooting themselves in the foot with
their reactions; aren't they making it harder if they should ever
decide the best thing to do is to embrace R and the philosophies
behind it? Four years ago, Marc Schwartz posted interesting comments
realted to this:

http://tolstoy.newcastle.edu.au/R/help/04/12/9497.html


On another note, I wonder why in the various conversations there seems
to be pervasive views that a) the FDA won't accept work done in R, and
b) SAS is the only way to effectively handle data?


best,

Kingsford Jones

Johannes Huesing

unread,
Jan 10, 2009, 4:43:45 PM1/10/09
to r-h...@r-project.org
Bert Gunter <gunter...@gene.com> [Sat, Jan 10, 2009 at 08:31:03PM CET]:
[...]

> Perhaps an argument is that certain code might not get written at all if it
> were not proprietary. Device drivers might be an example.

Device drivers are not an example. Linux is ubiquitous _despite_ device
manufacturers being secretive about their protocols and interfaces. There's
a whole lot of people out there who seem to take pride, if not joy, in
reengineering. At the moment I am profiting immensely from the gpsbabel
tool, which translates readily between all different GPS-related formats,
closed or documented.

--
Johannes Hüsing There is something fascinating about science.
One gets such wholesale returns of conjecture
mailto:joha...@huesing.name from such a trifling investment of fact.
http://derwisch.wikidot.com (Mark Twain, "Life on the Mississippi")

Gabor Grothendieck

unread,
Jan 10, 2009, 5:04:49 PM1/10/09
to Bert Gunter, r-h...@stat.math.ethz.ch
There do exist device manufacturers who GPL their device drivers, e.g.

http://freshmeat.net/projects/wanpipe/?branch_id=73783&release_id=290741

Marc Schwartz

unread,
Jan 11, 2009, 12:08:48 PM1/11/09
to Kingsford Jones, r-h...@r-project.org, Tony Breyal
on 01/10/2009 01:50 PM Kingsford Jones wrote:
> The reactions to the NYT article have certainly made for some
> interesting reading.
>
> Here are some of the links:
>
> http://overdetermined.net/site/content/new-york-times-article-r
>
> http://jackman.stanford.edu/blog/?p=1053
>
> http://ggorjan.blogspot.com/2009/01/new-york-times-on-r.html
>
> several posts on Andrew Gelman's blog:
> http://www.stat.columbia.edu/~gelman/blog/
>
> http://www.reddit.com/r/programming/comments/7nwgq/the_new_york_times_notices_the_r_programming/
>
> comments here: http://bits.blogs.nytimes.com/2009/01/08/r-you-ready-for-r/
>
>
> It's too bad that SAS has reacted to the negative reactions to their
> NYT quote with more FUD. The quote that Tony posted is just a
> thinly-veiled jab at R (veiled by a disingenuous "we value open
> source" veneer). Perhaps SAS is shooting themselves in the foot with
> their reactions; aren't they making it harder if they should ever
> decide the best thing to do is to embrace R and the philosophies
> behind it? Four years ago, Marc Schwartz posted interesting comments
> realted to this:
>
> http://tolstoy.newcastle.edu.au/R/help/04/12/9497.html


Thanks for pointing this out Kingsford. The books referenced there are
excellent for providing an understanding of the dynamics that have been
the subject of many of these threads here since the NYT article was
published.

There is a natural tension between leading edge adopters, the "main
stream" and the laggards. Moore's "Crossing the Chasm" provides good
insights into this tension and the acceptance of new products and
technology.

Grove's "Only the Paranoid Survive" shows how individual companies and
even entire industries (think banking and autos today) can suddenly face
an unexpected risk to their survival when they fail to comprehend
marketplace dynamics and take appropriate action.

Microsoft's mis-steps vis-a-vis Vista opened the door for Apple and
Linux to increase their respective marketshare and for open source more
generally (eg. Firefox).

BTW, readers might find this commentary of interest:

Commentary: Create a tech-friendly U.S. government
By Jimmy Wales and Andrea Weckerle

http://www.cnn.com/2009/TECH/01/07/wales.obama.cto/index.html


> On another note, I wonder why in the various conversations there seems
> to be pervasive views that a) the FDA won't accept work done in R, and
> b) SAS is the only way to effectively handle data?


I strongly believe that the comments regarding R and the FDA are overly
negative and pessimistic.

The hurdles to the use of R for clinical trials are shrinking. There has
been substantive activity over the past several years, both internally
at the FDA and within the R community to increase R's acceptance in this
domain.

At the Joint Statistical Meetings in 2006, Sue Bell from the FDA spoke
during a session with a presentation entitled Times 'R' A Changing: FDA
Perspectives on Use of "Open Source". A copy of this presentation is
available here:

http://www.fda.gov/cder/Offices/Biostatistics/Bell.pdf

In 2007, during an FDA committee meeting reviewing the safety profile of
Avandia (Rosiglitazone), the internal FDA meta-analysis performed by Joy
Mele, the FDA statistician, was done using R. A copy of this
presentation is available here:
http://www.fda.gov/ohrms/dockets/ac/07/slides/2007-4308s1-05-fda-mele.ppt

Given the high profile nature of drug safety issues today, that R was
used for this analysis by the FDA itself speaks volumes.

Also in 2007, at the annual R user meeting at Iowa State University, I
had the pleasure and privilege of Chairing a session on the use of R for
clinical trials. The speakers included Frank Harrell (well known to R
users here), Tony Rossini and David James (Novartis Pharmaceuticals) and
Mat Soukup (FDA statistician). Copies of our presentations are available
here, a little more than half way down the page:

http://user2007.org/program/

At that meeting, we also introduced a document that has been updated
since then and approved formally by the R Foundation for Statistical
Computing. The document provides guidance for the use of R in the
regulated clinical trials domain, addresses R's compliance with the
relevant regulations (eg. 21 CFR 11) as well as describing the
development, testing and quality processes in place for R, also known as
the Software Development Life Cycle.

That document is available here:

http://www.r-project.org/doc/R-FDA.pdf

I have heard directly from colleagues in industry that this document has
provided significant value in their internal discussions regarding
implementing the use of R within their respective environments and
assuaging many fears regarding R's use.

Additionally, presentations regarding the use of open source software
and R specifically for clinical trials have been made at DIA and other
industry meetings. This fall, there is a session on the use of R
scheduled for the FDA's Industry Statistics Workshop in Washington, D.C.

For those unfamiliar, I would also point out the membership and
financial donors to the R Foundation for Statistical Computing and take
note of the plethora of large pharma companies and clinical research
institutions:

http://www.r-project.org/foundation/memberlist.html

The use of R within this domain is increasing and will only continue to
progress as R's value becomes increasingly clear to even risk averse
industry decision makers.


Regards,

Marc Schwartz

______________________________________________

Reply all
Reply to author
Forward
0 new messages