Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

[R] R in the NY Times

64 views
Skip to first unread message

Zaslavsky, Alan M.

unread,
Jan 7, 2009, 8:10:20 AM1/7/09
to r-h...@r-project.org
This article is accompanied by nice pictures of Robert and Ross.

Data Analysts Captivated by Power of R
http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html

January 7, 2009
Data Analysts Captivated by R’s Power
By ASHLEE VANCE

To some people R is just the 18th letter of the alphabet. To others, it’s the rating on racy movies, a measure of an attic’s insulation or what pirates in movies say.

R is also the name of a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.

But R has also quickly found a following because statisticians, engineers and scientists without computer programming skills find it easy to use.

“R is really important to the point that it’s hard to overvalue it,” said Daryl Pregibon, a research scientist at Google, which uses the software widely. “It allows statisticians to do very intricate and complicated analyses without knowing the blood and guts of computing systems.”

It is also free. R is an open-source program, and its popularity reflects a shift in the type of software used inside corporations. Open-source software is free for anyone to use and modify. I.B.M., Hewlett-Packard and Dell make billions of dollars a year selling servers that run the open-source Linux operating system, which competes with Windows from Microsoft. Most Web sites are displayed using an open-source application called Apache, and companies increasingly rely on the open-source MySQL database to store their critical information. Many people view the end results of all this technology via the Firefox Web browser, also open-source software.

R is similar to other programming languages, like C, Java and Perl, in that it helps people perform a wide variety of computing tasks by giving them access to various commands. For statisticians, however, R is particularly useful because it contains a number of built-in mechanisms for organizing data, running calculations on the information and creating graphical representations of data sets.

Some people familiar with R describe it as a supercharged version of Microsoft’s Excel spreadsheet software that can help illuminate data trends more clearly than is possible by entering information into rows and columns.

What makes R so useful — and helps explain its quick acceptance — is that statisticians, engineers and scientists can improve the software’s code or write variations for specific tasks. Packages written for R add advanced algorithms, colored and textured graphs and mining techniques to dig deeper into databases.

Close to 1,600 different packages reside on just one of the many Web sites devoted to R, and the number of packages has grown exponentially. One package, called BiodiversityR, offers a graphical interface aimed at making calculations of environmental trends easier.

Another package, called Emu, analyzes speech patterns, while GenABEL is used to study the human genome.

The financial services community has demonstrated a particular affinity for R; dozens of packages exist for derivatives analysis alone.

“The great beauty of R is that you can modify it to do all sorts of things,” said Hal Varian, chief economist at Google. “And you have a lot of prepackaged stuff that’s already available, so you’re standing on the shoulders of giants.”

R first appeared in 1996, when the statistics professors Ross Ihaka and Robert Gentleman of the University of Auckland in New Zealand released the code as a free software package.

According to them, the notion of devising something like R sprang up during a hallway conversation. They both wanted technology better suited for their statistics students, who needed to analyze data and produce graphical models of the information. Most comparable software had been designed by computer scientists and proved hard to use.

Lacking deep computer science training, the professors considered their coding efforts more of an academic game than anything else. Nonetheless, starting in about 1991, they worked on R full time. “We were pretty much inseparable for five or six years,” Mr. Gentleman said. “One person would do the typing and one person would do the thinking.”

Some statisticians who took an early look at the software considered it rough around the edges. But despite its shortcomings, R immediately gained a following with people who saw the possibilities in customizing the free software.

John M. Chambers, a former Bell Labs researcher who is now a consulting professor of statistics at Stanford University, was an early champion. At Bell Labs, Mr. Chambers had helped develop S, another statistics software project, which was meant to give researchers of all stripes an accessible data analysis tool. It was, however, not an open-source project.

The software failed to generate broad interest and ultimately the rights to S ended up in the hands of Tibco Software. Now R is surpassing what Mr. Chambers had imagined possible with S.

“The diversity and excitement around what all of these people are doing is great,” Mr. Chambers said.

While it is difficult to calculate exactly how many people use R, those most familiar with the software estimate that close to 250,000 people work with it regularly. The popularity of R at universities could threaten SAS Institute, the privately held business software company that specializes in data analysis software. SAS, with more than $2 billion in annual revenue, has been the preferred tool of scholars and corporate managers.

“R has really become the second language for people coming out of grad school now, and there’s an amazing amount of code being written for it,” said Max Kuhn, associate director of nonclinical statistics at Pfizer. “You can look on the SAS message boards and see there is a proportional downturn in traffic.”

SAS says it has noticed R’s rising popularity at universities, despite educational discounts on its own software, but it dismisses the technology as being of interest to a limited set of people working on very hard tasks.

“I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

But while SAS plays down R’s corporate appeal, companies like Google and Pfizer say they use the software for just about anything they can. Google, for example, taps R for help understanding trends in ad pricing and for illuminating patterns in the search data it collects. Pfizer has created customized packages for R to let its scientists manipulate their own data during nonclinical drug studies rather than send the information off to a statistician.

The co-creators of R express satisfaction that such companies profit from the fruits of their labor and that of hundreds of volunteers.

Mr. Ihaka continues to teach statistics at the University of Auckland and wants to create more advanced software. Mr. Gentleman is applying R-based software, called Bioconductor, in work he is doing on computational biology at the Fred Hutchinson Cancer Research Center in Seattle.

“R is a real demonstration of the power of collaboration, and I don’t think you could construct something like this any other way,” Mr. Ihaka said. “We could have chosen to be commercial, and we would have sold five copies of the software.”

Copyright 2009 The New York Times Company

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bill Pikounis

unread,
Jan 7, 2009, 8:53:02 AM1/7/09
to r-h...@r-project.org
Pardon my exuberance, but this is simply awesome. What a treat to find
on the front web page of the NY Times this morning under Technology. I
think the article is very well written by the author, and I think it
captures top highlights of why the software and community are so
special.

Continued high gratitude to all of R-core and the R community for its
unique accomplishments. Every bit of praise is well-earned and
deserved.

I have continuously claimed to colleagues (primarily pharma industry)
for the past 8 years or so that R is the most exciting going on in the
area of statistics.

Thanks,
Bill

####################

Bill Pikounis
Statistician

On Wed, Jan 7, 2009 at 08:10, Zaslavsky, Alan M.
<zasl...@hcp.med.harvard.edu> wrote:
> This article is accompanied by nice pictures of Robert and Ross.
>
> Data Analysts Captivated by Power of R
> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>
> January 7, 2009
> Data Analysts Captivated by R's Power
> By ASHLEE VANCE
>

______________________________________________

Frank E Harrell Jr

unread,
Jan 7, 2009, 9:00:28 AM1/7/09
to Zaslavsky, Alan M., r-h...@r-project.org
This is great to see. It's interesting that SAS Institute feels that
non-peer-reviewed software with hidden implementations of analytic
methods that cannot be reproduced by others should be trusted when
building aircraft engines.

Frank


--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University

Frank E Harrell Jr

unread,
Jan 7, 2009, 9:25:31 AM1/7/09
to Bill Pikounis, r-h...@r-project.org
Bill Pikounis wrote:
> Pardon my exuberance, but this is simply awesome. What a treat to find
> on the front web page of the NY Times this morning under Technology. I
> think the article is very well written by the author, and I think it
> captures top highlights of why the software and community are so
> special.
>
> Continued high gratitude to all of R-core and the R community for its
> unique accomplishments. Every bit of praise is well-earned and
> deserved.
>
> I have continuously claimed to colleagues (primarily pharma industry)
> for the past 8 years or so that R is the most exciting going on in the
> area of statistics.
>
> Thanks,
> Bill

Amen to that, and in addition, R is now the top tool for everyday
analysis, not just a research statistician's tool.

Frank

>
> ####################
>
> Bill Pikounis
> Statistician
>
>
>
> On Wed, Jan 7, 2009 at 08:10, Zaslavsky, Alan M.
> <zasl...@hcp.med.harvard.edu> wrote:
>> This article is accompanied by nice pictures of Robert and Ross.
>>
>> Data Analysts Captivated by Power of R
>> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>>
>> January 7, 2009
>> Data Analysts Captivated by R's Power
>> By ASHLEE VANCE
>>
>
> ______________________________________________
> R-h...@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University

______________________________________________

Simon Pickett

unread,
Jan 7, 2009, 9:33:00 AM1/7/09
to Frank E Harrell Jr, Bill Pikounis, r-h...@r-project.org
I would like to add that I would have spent many more years doing my PhD if
it wasnt for R! all data management, statistics and graphics were conducted
using it. This was the direction my university and many more research
institutes appear to be heading.

It probably doesnt get said enough and I am sure I speak for all young
researchers I am very much in debt for all the kind souls who have helped me
and other newbies on this forum over the years,

Thanks very much R team.

Kevin E. Thorpe

unread,
Jan 7, 2009, 9:44:20 AM1/7/09
to r-h...@r-project.org
Zaslavsky, Alan M. wrote:
> This article is accompanied by nice pictures of Robert and Ross.
>
> Data Analysts Captivated by Power of R
> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>
>
> January 7, 2009 Data Analysts Captivated by R’s Power By ASHLEE VANCE
>
>
> SAS says it has noticed R’s rising popularity at universities,
> despite educational discounts on its own software, but it dismisses
> the technology as being of interest to a limited set of people
> working on very hard tasks.
>
> “I think it addresses a niche market for high-end data analysts that
> want free, readily available code," said Anne H. Milley, director of
> technology product marketing at SAS. She adds, “We have customers who
> build engines for aircraft. I am happy they are not using freeware
> when I get on a jet.”
>

Thanks for posting. Does anyone else find the statement by SAS to be
humourous yet arrogant and short-sighted?

Kevin

--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin....@utoronto.ca Tel: 416.864.5776 Fax: 416.864.6057

Marc Schwartz

unread,
Jan 7, 2009, 9:50:09 AM1/7/09
to Kevin E. Thorpe, r-h...@r-project.org
on 01/07/2009 08:44 AM Kevin E. Thorpe wrote:
> Zaslavsky, Alan M. wrote:
>> This article is accompanied by nice pictures of Robert and Ross.
>>
>> Data Analysts Captivated by Power of R
>> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>>
>>
>>
>> January 7, 2009 Data Analysts Captivated by R’s Power By ASHLEE VANCE
>>
>>
>> SAS says it has noticed R’s rising popularity at universities,
>> despite educational discounts on its own software, but it dismisses
>> the technology as being of interest to a limited set of people
>> working on very hard tasks.
>>
>> “I think it addresses a niche market for high-end data analysts that
>> want free, readily available code," said Anne H. Milley, director of
>> technology product marketing at SAS. She adds, “We have customers who
>> build engines for aircraft. I am happy they are not using freeware
>> when I get on a jet.”
>>
>
> Thanks for posting. Does anyone else find the statement by SAS to be
> humourous yet arrogant and short-sighted?
>
> Kevin

It is an ignorant comment by a marketing person who has been spoon fed
her lines...it is also a comment being made from a very defensive and
insecure posture.

Congrats to R Core and the R Community. This is yet another sign of R's
growth and maturity.

Regards,

Marc Schwartz

Tony Breyal

unread,
Jan 7, 2009, 9:39:59 AM1/7/09
to r-h...@r-project.org
Thank you for posting this, I found it a very enjoyable read!

I am curious, is there an archive of 'R in the Media' or 'R in the
Press' articles somewhere? It would be interesting to see how the
perception of R has changed/evolved over time relative to other
packages.

Cheers,
Tony Breyal


On 7 Jan, 13:10, "Zaslavsky, Alan M." <zasla...@hcp.med.harvard.edu>
wrote:


> This article is accompanied by nice pictures of Robert and Ross.
>
> Data Analysts Captivated by Power of R

>  http://www.nytimes.com/2009/01/07/technology/business-computing/07pro...

> R-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html

Rubén Roa-Ureta

unread,
Jan 7, 2009, 10:00:28 AM1/7/09
to r-h...@r-project.org
Zaslavsky, Alan M. wrote:
> This article is accompanied by nice pictures of Robert and Ross.
>
> Data Analysts Captivated by Power of R
> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>
Thanks for the heads up. The R morale is going through the roof!
I've given three courses on R since the second half of 2007 here in
Chile (geostatistics, Fisheries Libraries for R, and generalized linear
models) and all my three audiences (professionals working in academia,
government, and private research institutions) were very much impressed
by the power of R. I spent as much time on R itself as on the
statistical topics, since students wanted to learn data management and
graphics once they started to grasp the basic elements.
R creators, Core Team, package creators and maintainers, and experts on
the list, thanks so much for such a great work and such an open
attitude. You lead by example.
Rubén

Jeffrey J. Hallman

unread,
Jan 7, 2009, 10:44:14 AM1/7/09
to r-h...@stat.math.ethz.ch
The article quotes John Chambers, but it doesn't mention that R started out as
an implementation of the S language. I don't suppose Insightful is too happy
about that.

The SAS spokesman quoted in the article is clearly whistling past the graveyard.
--
Jeff

Darin A. England

unread,
Jan 7, 2009, 10:45:54 AM1/7/09
to r-h...@r-project.org
On Wed, Jan 07, 2009 at 08:00:28AM -0600, Frank E Harrell Jr wrote:
> This is great to see. It's interesting that SAS Institute feels that
> non-peer-reviewed software with hidden implementations of analytic
> methods that cannot be reproduced by others should be trusted when
> building aircraft engines.
>
> Frank

Unfortunately, that type of FUD issued by the SAS marketing person still
works. I see it at my employer (a large healthcare company.) It's a
battle to change a culture, but ironically the recession helps.
People are now taking notice of the obscene licensing fees for SAS.

Darin

Duncan Murdoch

unread,
Jan 7, 2009, 10:17:49 AM1/7/09
to Kevin E. Thorpe, r-h...@r-project.org
On 1/7/2009 9:44 AM, Kevin E. Thorpe wrote:
> Zaslavsky, Alan M. wrote:
>> This article is accompanied by nice pictures of Robert and Ross.
>>
>> Data Analysts Captivated by Power of R
>> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>>
>>
>> January 7, 2009 Data Analysts Captivated by R’s Power By ASHLEE VANCE
>>
>>
>> SAS says it has noticed R’s rising popularity at universities,
>> despite educational discounts on its own software, but it dismisses
>> the technology as being of interest to a limited set of people
>> working on very hard tasks.
>>
>> “I think it addresses a niche market for high-end data analysts that
>> want free, readily available code," said Anne H. Milley, director of
>> technology product marketing at SAS. She adds, “We have customers who
>> build engines for aircraft. I am happy they are not using freeware
>> when I get on a jet.”
>>
>
> Thanks for posting. Does anyone else find the statement by SAS to be
> humourous yet arrogant and short-sighted?

To me it just seemed like a "blast from the past".

Duncan Murdoch

Peter Dalgaard

unread,
Jan 7, 2009, 11:03:09 AM1/7/09
to Jeffrey J. Hallman, r-h...@stat.math.ethz.ch
Jeffrey J. Hallman wrote:
> The article quotes John Chambers, but it doesn't mention that R started out as
> an implementation of the S language. I don't suppose Insightful is too happy
> about that.

You mean Tibco...

The statement that S "failed to generate broad interest" is also a bit
misleading. I believe S-PLUS had more than 100000 users in its day,
although it may be true that its success was mainly in the academic
world. Obviously the pool of people who knew S from the preceding decade
was very important for the early development of R.

--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dal...@biostat.ku.dk) FAX: (+45) 35327907

Max Kuhn

unread,
Jan 7, 2009, 10:29:32 AM1/7/09
to marc_s...@comcast.net, r-h...@r-project.org
> "You can look on the SAS message boards and see there is a proportional downturn in traffic."

I think that I actually made this statement about both the SAS and
Splus traffic...

I wasn't really trying to be critical of SAS. I was trying to get
across that SAS focused their resources on features that had nothing
to do with *statistical analysis* (e.g. data warehousing etc.)

--

Max

David M Smith

unread,
Jan 7, 2009, 11:22:02 AM1/7/09
to Tony Breyal, r-h...@r-project.org
On Wed, Jan 7, 2009 at 6:39 AM, Tony Breyal <tony....@googlemail.com> wrote:
> Thank you for posting this, I found it a very enjoyable read!
>
> I am curious, is there an archive of 'R in the Media' or 'R in the
> Press' articles somewhere? It would be interesting to see how the
> perception of R has changed/evolved over time relative to other
> packages.

That's a great idea, and I just created an "Rmedia" category on the
REvolutions R blog to track exactly such articles. You can find it
here:

http://blog.revolution-computing.com/rmedia/

If anyone knows of any other mainstream articles about R available
online please let me know, and I'll do a round-up post in that section
to make sure they're captured.

By the way, we're writing about R and issues related to R daily at:

http://blog.revolution-computing.com

# David Smith

--
David M Smith <da...@revolution-computing.com>
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (Seattle, USA)

Bryan Hanson

unread,
Jan 7, 2009, 11:26:44 AM1/7/09
to r-h...@r-project.org
I believe the SAS person shot themselves in the foot more in more ways than
one. In my mind, the reason you would pay, as Frank said, for


> non-peer-reviewed software with hidden implementations of analytic
> methods that cannot be reproduced by others

Would be so that you can sue them later when a software problem in the
designing of the engine makes your plane fall out of the sky!

Bryan
*************
Bryan Hanson
Professor of Chemistry & Biochemistry
DePauw University, Greencastle IN USA


>> ³I think it addresses a niche market for high-end data analysts that


>> want free, readily available code," said Anne H. Milley, director of
>> technology product marketing at SAS. She adds, ³We have customers who
>> build engines for aircraft. I am happy they are not using freeware
>> when I get on a jet.²
>>
>
> Thanks for posting. Does anyone else find the statement by SAS to be
> humourous yet arrogant and short-sighted?
>
> Kevin

______________________________________________

Marc Schwartz

unread,
Jan 7, 2009, 11:56:53 AM1/7/09
to Bryan Hanson, r-h...@r-project.org
I would also point out that the use of the term "freeware" as opposed to
"FOSS" by the SAS rep, comes off as being unprofessional and
deliberately condescending...

The author of the article, to his credit, was pretty consistent in using
open source terminology.

Regards,

Marc

on 01/07/2009 10:26 AM Bryan Hanson wrote:
> I believe the SAS person shot themselves in the foot more in more ways than
> one. In my mind, the reason you would pay, as Frank said, for
>
>> non-peer-reviewed software with hidden implementations of analytic
>> methods that cannot be reproduced by others
>
> Would be so that you can sue them later when a software problem in the
> designing of the engine makes your plane fall out of the sky!

______________________________________________

Andrew Choens

unread,
Jan 7, 2009, 1:01:55 PM1/7/09
to Darin A. England, r-h...@r-project.org

> Unfortunately, that type of FUD issued by the SAS marketing person still
> works. I see it at my employer (a large healthcare company.) It's a
> battle to change a culture, but ironically the recession helps.
> People are now taking notice of the obscene licensing fees for SAS.
>
> Darin

I agree. I work for a consulting firm (human services) and my boss
prefers us to use SPSS, rather than R. It's painful. I have version 11
installed on my Windows laptop. Next year, the license expires!

For someone coming from a SPSS background, R is a little mind-blowing,
simply because it is so much more powerful. But, perseverance pays off.
Once I master Sweave and such, I'll be able to churn out reports much
more quickly than I ever could with SPSS.

I do wish the author of the article had included comments from SPSS, in
addition to the humorous FUD from the SAS spokesperson. Newer versions
of SPSS actually have the option of using R for data analysis, in
addition to the SPSS engine. It would have been interesting to compare
the corporate responses of the two companies.

--
Insert something humorous here. :-)

Erik Iverson

unread,
Jan 7, 2009, 1:03:19 PM1/7/09
to marc_s...@comcast.net, r-h...@r-project.org
I pointed a friend of mine toward the article, to which he replied:

"I hope that they run SAS on Solaris too, god only knows how tainted the
syscalls are in that linux freeware."

Of course, now Solaris is 'freeware', too, so I suppose that according to
SAS, running SAS on Windows is the best way to be sure you're getting the
right answers.

Ajay ohri

unread,
Jan 7, 2009, 1:29:48 PM1/7/09
to David M Smith, r-h...@r-project.org, Tony Breyal
you can use google alerts to track media coverage of R using some keywords

regards,

ajay

[[alternative HTML version deleted]]

Ted Harding

unread,
Jan 7, 2009, 1:30:48 PM1/7/09
to Erik Iverson, r-h...@r-project.org, marc_s...@comcast.net
On 07-Jan-09 18:03:19, Erik Iverson wrote:
> I pointed a friend of mine toward the article, to which he replied:
>
> "I hope that they run SAS on Solaris too, god only knows how tainted
> the syscalls are in that linux freeware."
>
> Of course, now Solaris is 'freeware', too, so I suppose that according
> to SAS, running SAS on Windows is the best way to be sure you're
> getting the right answers.

I'm not so sure about that. Since the article described R as
"a supercharged version of Microsoft's Excel", surely people
should run R on Windows and be *ab*so*lute*ly* sure of getting
the right answers (and supercharged to boot)????
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jan-09 Time: 18:30:39
------------------------------ XFMail ------------------------------

Barry Rowlingson

unread,
Jan 7, 2009, 1:35:29 PM1/7/09
to Darin A. England, r-h...@r-project.org
2009/1/7 Darin A. England <eng...@cs.umn.edu>:

> Unfortunately, that type of FUD issued by the SAS marketing person still
> works. I see it at my employer (a large healthcare company.)

I see it here, at a university. Quote: "We couldn't possibly do our
analysis using some software we've just downloaded from a web site"
*facepalm*

> It's a
> battle to change a culture, but ironically the recession helps.
> People are now taking notice of the obscene licensing fees for SAS.

They'll just keep increasing their educational discount, or as we
say, "the first hit is free"...

BaRRy

Tony Breyal

unread,
Jan 7, 2009, 2:51:56 PM1/7/09
to r-h...@r-project.org
Google Alerts are great, but unfortuantly the brevity of R's name is
the main problem i think.

though, thinking about it, i suppose if one could work out the 'best'
key words to use, it might be possible to not get too many miss-
classified results, e.g.,

http://news.google.com/news?hl=en&ned=us&nolr=1&q=r+open+source+programming+language&btnG=Search

or something like that. Will be keeping an eye on David's page from
time to time though, just in case he catches anything :-)

lovely to see R getting the attention it so rightly deserves.


On 7 Jan, 18:29, "Ajay ohri" <ohri2...@gmail.com> wrote:
> you can use google alerts to track media coverage of R using some keywords
>
> regards,
>
> ajay
>
> On Wed, Jan 7, 2009 at 9:52 PM, David M Smith <
>
>
>
> da...@revolution-computing.com> wrote:

> > On Wed, Jan 7, 2009 at 6:39 AM, Tony Breyal <tony.bre...@googlemail.com>

> R-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html

Wacek Kusnierczyk

unread,
Jan 7, 2009, 3:03:55 PM1/7/09
to Kevin E. Thorpe, R help
Kevin E. Thorpe wrote:
> Zaslavsky, Alan M. wrote:
>> SAS says it has noticed R’s rising popularity at universities,
>> despite educational discounts on its own software, but it dismisses
>> the technology as being of interest to a limited set of people
>> working on very hard tasks.
>>
>> “I think it addresses a niche market for high-end data analysts that
>> want free, readily available code," said Anne H. Milley, director of
>> technology product marketing at SAS. She adds, “We have customers who
>> build engines for aircraft. I am happy they are not using freeware
>> when I get on a jet.”
>>
>
> Thanks for posting. Does anyone else find the statement by SAS to be
> humourous yet arrogant and short-sighted?

there must be something wrong with me, but i can't find anything
'humorous yet arrogant and short-sighted' in the idea that engines for
aircraft be built with software that does not advertise itself with
'ABSOLUTELY NO WARRANTY.'


vQ

Marc Schwartz

unread,
Jan 7, 2009, 3:07:51 PM1/7/09
to Max Kuhn, r-h...@r-project.org
on 01/07/2009 09:29 AM Max Kuhn wrote:
>> "You can look on the SAS message boards and see there is a proportional downturn in traffic."
>
> I think that I actually made this statement about both the SAS and
> Splus traffic...
>
> I wasn't really trying to be critical of SAS. I was trying to get
> across that SAS focused their resources on features that had nothing
> to do with *statistical analysis* (e.g. data warehousing etc.)


Presuming that the Google Groups archive of SAS-L is reasonably complete:

http://groups.google.com/group/comp.soft-sys.sas/about

The monthly posting frequency data since 1993 is:

Posts <- structure(list(Jan = c(NA, 546L, 548L, 853L, 1007L, 894L, 514L,
1720L, 1826L, 1941L, 1832L, 1636L, 2122L, 2722L, 2750L, 2305L,
357L), Feb = c(NA, 511L, 734L, 1024L, 1150L, 1068L, 493L, 1519L,
1537L, 1845L, 1846L, 1652L, 1960L, 1645L, 926L, 2255L, NA), Mar = c(NA,
658L, 963L, 805L, 1108L, 945L, 659L, 1177L, 1915L, 2010L, 1755L,
2188L, 629L, 1711L, 1728L, 2712L, NA), Apr = c(NA, 681L, 792L,
1052L, 1315L, 784L, 1077L, 1163L, 1467L, 2199L, 1757L, 1826L,
2169L, 2796L, 2766L, 2789L, NA), May = c(NA, 712L, 945L, 1163L,
1212L, 448L, 778L, 1963L, 1735L, 2373L, 1863L, 1836L, 2283L,
3147L, 2974L, 2025L, NA), Jun = c(NA, 751L, 1002L, 999L, 1127L,
813L, 540L, 1615L, 1905L, 2133L, 1701L, 2606L, 2407L, 2723L,
2691L, 2368L, NA), Jul = c(15L, 763L, 775L, 1184L, 1074L, 896L,
476L, 1572L, 2027L, 2445L, 1926L, 1843L, 2061L, 761L, 2435L,
2607L, NA), Aug = c(458L, 975L, 969L, 1053L, 692L, 823L, 612L,
1696L, 1976L, 1492L, 1689L, 2143L, 1793L, 2027L, 2592L, 2584L,
NA), Sep = c(330L, 703L, 745L, 1176L, 947L, 894L, 1351L, 1491L,
1439L, 1864L, 1646L, 1784L, 1365L, 2714L, 1868L, 2554L, NA),
Oct = c(219L, 805L, 691L, 1197L, 900L, 1129L, 1708L, 1669L,
1592L, 2133L, 1832L, 1712L, 1427L, 2983L, 2320L, 2434L, NA
), Nov = c(472L, 752L, 773L, 911L, 853L, 733L, 1720L, 1490L,
1636L, 1663L, 1545L, 1786L, 1518L, 2848L, 2112L, 1984L, NA
), Dec = c(517L, 666L, 765L, 844L, 677L, 492L, 1595L, 1298L,
1424L, 1520L, 1445L, 2148L, 1524L, 2374L, 1948L, 1921L, NA
)), .Names = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), class = "data.frame",
row.names = c("1993",
"1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001",
"2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009"
))

> Posts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1993 NA NA NA NA NA NA 15 458 330 219 472 517
1994 546 511 658 681 712 751 763 975 703 805 752 666
1995 548 734 963 792 945 1002 775 969 745 691 773 765
1996 853 1024 805 1052 1163 999 1184 1053 1176 1197 911 844
1997 1007 1150 1108 1315 1212 1127 1074 692 947 900 853 677
1998 894 1068 945 784 448 813 896 823 894 1129 733 492
1999 514 493 659 1077 778 540 476 612 1351 1708 1720 1595
2000 1720 1519 1177 1163 1963 1615 1572 1696 1491 1669 1490 1298
2001 1826 1537 1915 1467 1735 1905 2027 1976 1439 1592 1636 1424
2002 1941 1845 2010 2199 2373 2133 2445 1492 1864 2133 1663 1520
2003 1832 1846 1755 1757 1863 1701 1926 1689 1646 1832 1545 1445
2004 1636 1652 2188 1826 1836 2606 1843 2143 1784 1712 1786 2148
2005 2122 1960 629 2169 2283 2407 2061 1793 1365 1427 1518 1524
2006 2722 1645 1711 2796 3147 2723 761 2027 2714 2983 2848 2374
2007 2750 926 1728 2766 2974 2691 2435 2592 1868 2320 2112 1948
2008 2305 2255 2712 2789 2025 2368 2607 2584 2554 2434 1984 1921
2009 357 NA NA NA NA NA NA NA NA NA NA NA


One can then review the annual posting frequency via:

pdf("SAS-L.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),
beside = TRUE,
cex.names = 0.6, main = "SAS-L Traffic",
cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
line = 2, cex = 0.5)

dev.off()


There would appear to be marked increases in 2000 and again in 2006.
However, it has been flat for the past 3 calendar years. No decline yet,
but it will happen in due course...

No comparable posting data table exists for S-News as far as I can find,
so I wrote a quick program to read the S-News archive pages here:

http://www.biostat.wustl.edu/archives/html/s-news/

and get monthly posting counts, using the 'Thread' based html pages,
where each monthly embedded post link has a URL of the form:

http://www.biostat.wustl.edu/archives/html/s-news/YYYY-MM/msgXXXXX.html


Thus, the program I used is:

TD <- paste(rep(1998:2009, each = 12), sprintf("%02d", 1:12), sep = "-")
Posts <- numeric(length(TD))

for (i in seq(along = TD))
{
URL <- paste("http://www.biostat.wustl.edu/archives/html/s-news/",
TD[i], "/threads.html", sep = "")

cat(URL, "\n")

if (!inherits(try(con <- readLines(URL)), "try-error"))
{
Posts[i] <- length(grep("msg.*\\.html", con))
rm(con)
} else {
Posts[i] <- NA
}
}


Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
rownames(Posts) <- 1998:2009
colnames(Posts) <- month.abb

That gives you:

Posts <- structure(c(NA, 210, 264, 246, 230, 189, 197, 174, 109, 51, 48,
5, 273, 173, 313, 232, 255, 179, 230, 161, 87, 59, 63, NA, 378,
313, 285, 252, 242, 218, 257, 193, 99, 74, 58, NA, 293, 300,
264, 300, 228, 196, 151, 182, 123, 48, 47, NA, 330, 334, 306,
331, 219, 189, 164, 174, 107, 46, 31, NA, 243, 254, 247, 282,
248, 217, 175, 109, 96, 34, 27, NA, 219, 284, 245, 258, 230,
221, 154, 159, 84, 47, 40, NA, 209, 270, 302, 260, 207, 187,
187, 144, 97, 39, 28, NA, 191, 300, 204, 260, 221, 186, 195,
107, 68, 35, 41, NA, 241, 253, 251, 229, 280, 295, 150, 98, 73,
70, 30, NA, 181, 300, 261, 232, 228, 197, 176, 82, 53, 56, 27,
NA, 141, 194, 176, 194, 177, 142, 176, 84, 20, 41, 36, NA), .Dim = c(12L,
12L), .Dimnames = list(c("1998", "1999", "2000", "2001", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009"), c("Jan",
"Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")))


> Posts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1998 NA 273 378 293 330 243 219 209 191 241 181 141
1999 210 173 313 300 334 254 284 270 300 253 300 194
2000 264 313 285 264 306 247 245 302 204 251 261 176
2001 246 232 252 300 331 282 258 260 260 229 232 194
2002 230 255 242 228 219 248 230 207 221 280 228 177
2003 189 179 218 196 189 217 221 187 186 295 197 142
2004 197 230 257 151 164 175 154 187 195 150 176 176
2005 174 161 193 182 174 109 159 144 107 98 82 84
2006 109 87 99 123 107 96 84 97 68 73 53 20
2007 51 59 74 48 46 34 47 39 35 70 56 41
2008 48 63 58 47 31 27 40 28 41 30 27 36
2009 5 NA NA NA NA NA NA NA NA NA NA NA


Which can then be graphed by:

pdf("S-News.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),
beside = TRUE,
cex.names = 0.6, main = "S-News Traffic",
cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
line = 2, cex = 0.5)

dev.off()

The consistent decline in posting frequency since 1999 is notable. The
temporal association with the introduction of R is perhaps profound.

As long as I am on the subject, I figured that I would do the same for
R-Help. The downside is that readLines() (really url() ) does not
support https:, so I took a somewhat different approach, using wget:


TD <- paste(rep(1997:2009, each = 12), month.name, sep = "-")
Posts <- numeric(length(TD))

for (i in seq(along = TD))
{
URL <- paste("https://stat.ethz.ch/pipermail/r-help/",
TD[i], "/thread.html", sep = "")

cat(URL, "\n")

CMD <- paste("wget", URL)
system(CMD)

if (file.exists("thread.html"))
{
con <- readLines("thread.html")
Posts[i] <- length(grep("[0-9]+\\.html", con))
rm(con)
unlink("thread.html")
} else {
Posts[i] <- NA
}
}

Posts <- matrix(Posts, ncol = 12, byrow = TRUE)
rownames(Posts) <- 1997:2009
colnames(Posts) <- month.abb


This gives you:

Posts <- structure(c(NA, 135, 226, 205, 558, 884, 1017, 1116, 1746,
2075, 1714, 2490, 462, NA, 79, 145, 355, 583, 697, 1137, 1580, 1724,
1920, 1907, 2583, NA, NA, 114, 195, 377, 651, 880, 1203, 1946,
1703, 2270, 2191, 2740, NA, 92, 101, 189, 377, 470, 965, 1488,
1657, 2057, 1818, 2145, 2487, NA, 36, 90, 161, 504, 552, 1057,
1268, 1561, 1887, 2029, 2210, 2517, NA, 47, 105, 186, 418, 550,
926, 1319, 1714, 2056, 1811, 2307, 2774, NA, 41, 110, 184, 293,
615, 918, 1344, 1618, 1872, 1785, 2138, 3268, NA, 37, 64, 148,
356, 562, 824, 1210, 1493, 1777, 1898, 2241, 2813, NA, 40, 94,
203, 434, 678, 705, 1443, 1534, 1709, 1902, 2028, 2990, NA, 76,
96, 231, 418, 657, 1055, 1567, 1712, 1810, 2328, 2708, 3037,
NA, 61, 184, 318, 433, 825, 1038, 1605, 1895, 1907, 2127, 2594,
2730, NA, 57, 105, 221, 422, 530, 742, 1158, 1481, 1508, 1450,
2028, 2399, NA), .Dim = c(13L, 12L), .Dimnames = list(c("1997",
"1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005",
"2006", "2007", "2008", "2009"), c("Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))


> Posts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1997 NA NA NA 92 36 47 41 37 40 76 61 57
1998 135 79 114 101 90 105 110 64 94 96 184 105
1999 226 145 195 189 161 186 184 148 203 231 318 221
2000 205 355 377 377 504 418 293 356 434 418 433 422
2001 558 583 651 470 552 550 615 562 678 657 825 530
2002 884 697 880 965 1057 926 918 824 705 1055 1038 742
2003 1017 1137 1203 1488 1268 1319 1344 1210 1443 1567 1605 1158
2004 1116 1580 1946 1657 1561 1714 1618 1493 1534 1712 1895 1481
2005 1746 1724 1703 2057 1887 2056 1872 1777 1709 1810 1907 1508
2006 2075 1920 2270 1818 2029 1811 1785 1898 1902 2328 2127 1450
2007 1714 1907 2191 2145 2210 2307 2138 2241 2028 2708 2594 2028
2008 2490 2583 2740 2487 2517 2774 3268 2813 2990 3037 2730 2399
2009 462 NA NA NA NA NA NA NA NA NA NA NA


Which again can be graphed as:

pdf("R-Help.pdf", height = 4, width = 7)

mp <- barplot(rowSums(Posts, na.rm = TRUE),
beside = TRUE,
cex.names = 0.6, main = "R-Help Traffic",
cex.axis = 0.75, las = 1)

mtext(text = rowSums(Posts, na.rm = TRUE), at = mp, side = 1,
line = 2, cex = 0.5)

dev.off()


Now....there's a healthy growth curve.... :-)

Note that the annual traffic volume for 2008 on R-Help exceeds that on
SAS-L.

For convenience, I am attaching each of the 3 plots.

Regards,

Marc Schwartz

SAS-L.pdf
S-News.pdf
R-Help.pdf

Spencer Graves

unread,
Jan 7, 2009, 3:19:09 PM1/7/09
to Wacek Kusnierczyk, R help
What kind of warranty does SAS offer? I haven't read their EULA
recently, but if an airplane fell out of the sky because of a bug in SAS
code, I'd be surprised if SAS was eager to pay damages!

Spencer

Duncan Murdoch

unread,
Jan 7, 2009, 3:23:45 PM1/7/09
to Wacek Kusnierczyk, R help
On 1/7/2009 3:03 PM, Wacek Kusnierczyk wrote:
> Kevin E. Thorpe wrote:
>> Zaslavsky, Alan M. wrote:
>>> SAS says it has noticed R’s rising popularity at universities,
>>> despite educational discounts on its own software, but it dismisses
>>> the technology as being of interest to a limited set of people
>>> working on very hard tasks.
>>>
>>> “I think it addresses a niche market for high-end data analysts that
>>> want free, readily available code," said Anne H. Milley, director of
>>> technology product marketing at SAS. She adds, “We have customers who
>>> build engines for aircraft. I am happy they are not using freeware
>>> when I get on a jet.”
>>>
>>
>> Thanks for posting. Does anyone else find the statement by SAS to be
>> humourous yet arrogant and short-sighted?
>
> there must be something wrong with me, but i can't find anything
> 'humorous yet arrogant and short-sighted' in the idea that engines for
> aircraft be built with software that does not advertise itself with
> 'ABSOLUTELY NO WARRANTY.'

Yes, everyone knows that the lack of warranty should be hidden in the
fine print, and say something like this:

"Institute warrants that the media on which SAS/C OnlineDoc is furnished
will be free from defects in material and workmanship under normal use
for a period of ninety (90) days from the date of delivery of SAS/C
OnlineDoc. Licensee’s exclusive remedy for breach of this warranty shall
be replacement of the defective media by the Institute. Institute and
its licensors disclaim all other warranties, express or implied,
including, but not limited to, any implied warranties of merchantability
and/or fitness for a particular purpose whether alleged to arise by law,
by reason of custom or usage in the trade, or by course of dealing. "

(Sorry, I couldn't find SAS/Stat's lack of warranty. I found this one
at
http://support.sas.com/documentation/onlinedoc/sasc/doc700/html/common/agreement.htm)

Duncan Murdoch

Mitchell Maltenfort

unread,
Jan 7, 2009, 3:23:10 PM1/7/09
to R help
On Wed, Jan 7, 2009 at 3:19 PM, Spencer Graves <spencer...@pdf.com> wrote:
> What kind of warranty does SAS offer? I haven't read their EULA recently,
> but if an airplane fell out of the sky because of a bug in SAS code, I'd be
> surprised if SAS was eager to pay damages!
>
> Spencer
>
>


And that's an issue that always comes up on Linux v. Microsoft -- just
because you pay money for it doesn't mean you're buying meaningful
guarantees.
--
Due to the recession, requests for instant gratification will be
deferred until arrears in scheduled gratification have been satisfied.

Thomas Adams

unread,
Jan 7, 2009, 3:25:10 PM1/7/09
to Wacek Kusnierczyk, R help
Wacek,

One would hope that if someone were to use software to "build engines
for aircraft", that said person would sufficiently test the software to
have confidence in it, whether it had a "Warranty" or not — at least
that's my mode of operation…

Cheers!
Tom


--
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL: thomas...@noaa.gov

VOICE: 937-383-0528
FAX: 937-383-0033

Douglas Bates

unread,
Jan 7, 2009, 3:57:43 PM1/7/09
to marc_s...@comcast.net, r-h...@r-project.org
On Wed, Jan 7, 2009 at 8:50 AM, Marc Schwartz <marc_s...@comcast.net> wrote:
> on 01/07/2009 08:44 AM Kevin E. Thorpe wrote:
>> Zaslavsky, Alan M. wrote:
>>> This article is accompanied by nice pictures of Robert and Ross.
>>>
>>> Data Analysts Captivated by Power of R
>>> http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
>>>
>>>
>>>
>>> January 7, 2009 Data Analysts Captivated by R's Power By ASHLEE VANCE
>>>
>>>
>>> SAS says it has noticed R's rising popularity at universities,
>>> despite educational discounts on its own software, but it dismisses
>>> the technology as being of interest to a limited set of people
>>> working on very hard tasks.
>>>
>>> "I think it addresses a niche market for high-end data analysts that
>>> want free, readily available code," said Anne H. Milley, director of
>>> technology product marketing at SAS. She adds, "We have customers who
>>> build engines for aircraft. I am happy they are not using freeware
>>> when I get on a jet."
>>>
>>
>> Thanks for posting. Does anyone else find the statement by SAS to be
>> humourous yet arrogant and short-sighted?
>>
>> Kevin

> It is an ignorant comment by a marketing person who has been spoon fed
> her lines...it is also a comment being made from a very defensive and
> insecure posture.

To some extent but we should also realize that open source software is
a nonsensical idea to those in the commercial software business. It
just doesn't fit into their world view.

As part of the 40th anniversary of Technometrics there will be a
discussion article on "The Future of Statistical Computing" by Leland
Wilkinson in the Nov. 2008 issue. (I say "will be" because I don't
see it on the web site yet.) Lee is the creator of Systat and is now
associated with SPSS, Inc. which bought Systat. I am one of the
discussants and I agreed with most of what Lee had to say except with
regard to the role of open source software. Lee looked at the market
share of SAS, SPSS, Stata, S-PLUS, Minitab, etc. in statistical
software and based his projections on that. He had some ball park
figure for the "market share" of R and concluded that it wouldn't
really be important. My response was that this misses the point. R
is a community, not a "product" in the traditional software sense. I
referred to Eric Raymond's essay "The Cathedral and the Bazaar", which
I think is still relevant in contrasting the views of those in the
commercial software and the open source software communities.

> Congrats to R Core and the R Community. This is yet another sign of R's
> growth and maturity.

______________________________________________

Gabor Grothendieck

unread,
Jan 7, 2009, 6:24:57 PM1/7/09
to r-h...@r-project.org
Here is the same number of messages/posts data
for each of S, SAS, R:
- reworked into a 3 column ts class time series
- with Jan 2009 removed since its not complete
- leading and trailing NA rows removed

At end we plot the raw data as well as the time
series of totals and show loess smooths for each.

By running the code below we see that the:
- sum of the three seems to be rising at a constant rate
- S is declining
- SAS and R are rising
- R is rising the fastest through its completed its phase
of highest growth which ended around 2004

tt3 <- structure(c(15, 458, 330, 219, 472, 517, 546, 511, 658, 681,
712, 751, 763, 975, 703, 805, 752, 666, 548, 734, 963, 792, 945,
1002, 775, 969, 745, 691, 773, 765, 853, 1024, 805, 1052, 1163,
999, 1184, 1053, 1176, 1197, 911, 844, 1007, 1150, 1108, 1315,
1212, 1127, 1074, 692, 947, 900, 853, 677, 894, 1068, 945, 784,
448, 813, 896, 823, 894, 1129, 733, 492, 514, 493, 659, 1077,
778, 540, 476, 612, 1351, 1708, 1720, 1595, 1720, 1519, 1177,
1163, 1963, 1615, 1572, 1696, 1491, 1669, 1490, 1298, 1826, 1537,
1915, 1467, 1735, 1905, 2027, 1976, 1439, 1592, 1636, 1424, 1941,
1845, 2010, 2199, 2373, 2133, 2445, 1492, 1864, 2133, 1663, 1520,
1832, 1846, 1755, 1757, 1863, 1701, 1926, 1689, 1646, 1832, 1545,
1445, 1636, 1652, 2188, 1826, 1836, 2606, 1843, 2143, 1784, 1712,
1786, 2148, 2122, 1960, 629, 2169, 2283, 2407, 2061, 1793, 1365,
1427, 1518, 1524, 2722, 1645, 1711, 2796, 3147, 2723, 761, 2027,
2714, 2983, 2848, 2374, 2750, 926, 1728, 2766, 2974, 2691, 2435,
2592, 1868, 2320, 2112, 1948, 2305, 2255, 2712, 2789, 2025, 2368,
2607, 2584, 2554, 2434, 1984, 1921, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
273, 378, 293, 330, 243, 219, 209, 191, 241, 181, 141, 210, 173,
313, 300, 334, 254, 284, 270, 300, 253, 300, 194, 264, 313, 285,
264, 306, 247, 245, 302, 204, 251, 261, 176, 246, 232, 252, 300,
331, 282, 258, 260, 260, 229, 232, 194, 230, 255, 242, 228, 219,
248, 230, 207, 221, 280, 228, 177, 189, 179, 218, 196, 189, 217,
221, 187, 186, 295, 197, 142, 197, 230, 257, 151, 164, 175, 154,
187, 195, 150, 176, 176, 174, 161, 193, 182, 174, 109, 159, 144,
107, 98, 82, 84, 109, 87, 99, 123, 107, 96, 84, 97, 68, 73, 53,
20, 51, 59, 74, 48, 46, 34, 47, 39, 35, 70, 56, 41, 48, 63, 58,
47, 31, 27, 40, 28, 41, 30, 27, 36, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 92, 36, 47, 41, 37, 40, 76, 61, 57, 135,
79, 114, 101, 90, 105, 110, 64, 94, 96, 184, 105, 226, 145, 195,
189, 161, 186, 184, 148, 203, 231, 318, 221, 205, 355, 377, 377,
504, 418, 293, 356, 434, 418, 433, 422, 558, 583, 651, 470, 552,
550, 615, 562, 678, 657, 825, 530, 884, 697, 880, 965, 1057,
926, 918, 824, 705, 1055, 1038, 742, 1017, 1137, 1203, 1488,
1268, 1319, 1344, 1210, 1443, 1567, 1605, 1158, 1116, 1580, 1946,
1657, 1561, 1714, 1618, 1493, 1534, 1712, 1895, 1481, 1746, 1724,
1703, 2057, 1887, 2056, 1872, 1777, 1709, 1810, 1907, 1508, 2075,
1920, 2270, 1818, 2029, 1811, 1785, 1898, 1902, 2328, 2127, 1450,
1714, 1907, 2191, 2145, 2210, 2307, 2138, 2241, 2028, 2708, 2594,
2028, 2490, 2583, 2740, 2487, 2517, 2774, 3268, 2813, 2990, 3037,
2730, 2399), .Dim = c(186L, 3L), .Dimnames = list(NULL, c("SAS",
"S", "R")), .Tsp = c(1993.5, 2008.91666666667, 12), class = c("mts",
"ts"))

tt4 <- cbind(tt3, rowSums(tt3))
colnames(tt4) <- c(colnames(tt3), "Sum")
ts.plot(tt4, col = 1:4)
grid()
legend("topleft", colnames(tt4), lty = 1, col = 1:4)

library(dyn)
for(i in 1:4) lines(fitted(dyn$loess(tt4[, i] ~ time(tt4))), col = i)

hadley wickham

unread,
Jan 7, 2009, 7:13:20 PM1/7/09
to Gabor Grothendieck, r-h...@r-project.org
Here's a couple of similar plots created with ggplot2. I chose to
turn the data into a data frame with an explicit date column. Using a
log scale somewhat stabilises the variability.

## SAS-L traffic
sas <- structure(list(Jan = c(NA, 546L, 548L, 853L, 1007L, 894L, 514L,

## s-news traffic


s <- structure(c(NA, 210, 264, 246, 230, 189, 197, 174, 109, 51, 48,
5, 273, 173, 313, 232, 255, 179, 230, 161, 87, 59, 63, NA, 378,
313, 285, 252, 242, 218, 257, 193, 99, 74, 58, NA, 293, 300,
264, 300, 228, 196, 151, 182, 123, 48, 47, NA, 330, 334, 306,
331, 219, 189, 164, 174, 107, 46, 31, NA, 243, 254, 247, 282,
248, 217, 175, 109, 96, 34, 27, NA, 219, 284, 245, 258, 230,
221, 154, 159, 84, 47, 40, NA, 209, 270, 302, 260, 207, 187,
187, 144, 97, 39, 28, NA, 191, 300, 204, 260, 221, 186, 195,
107, 68, 35, 41, NA, 241, 253, 251, 229, 280, 295, 150, 98, 73,
70, 30, NA, 181, 300, 261, 232, 228, 197, 176, 82, 53, 56, 27,
NA, 141, 194, 176, 194, 177, 142, 176, 84, 20, 41, 36, NA), .Dim = c(12L,
12L), .Dimnames = list(c("1998", "1999", "2000", "2001", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009"), c("Jan",
"Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")))

r <- structure(c(NA, 135, 226, 205, 558, 884, 1017, 1116, 1746,


2075, 1714, 2490, 462, NA, 79, 145, 355, 583, 697, 1137, 1580, 1724,
1920, 1907, 2583, NA, NA, 114, 195, 377, 651, 880, 1203, 1946,
1703, 2270, 2191, 2740, NA, 92, 101, 189, 377, 470, 965, 1488,
1657, 2057, 1818, 2145, 2487, NA, 36, 90, 161, 504, 552, 1057,
1268, 1561, 1887, 2029, 2210, 2517, NA, 47, 105, 186, 418, 550,
926, 1319, 1714, 2056, 1811, 2307, 2774, NA, 41, 110, 184, 293,
615, 918, 1344, 1618, 1872, 1785, 2138, 3268, NA, 37, 64, 148,
356, 562, 824, 1210, 1493, 1777, 1898, 2241, 2813, NA, 40, 94,
203, 434, 678, 705, 1443, 1534, 1709, 1902, 2028, 2990, NA, 76,
96, 231, 418, 657, 1055, 1567, 1712, 1810, 2328, 2708, 3037,
NA, 61, 184, 318, 433, 825, 1038, 1605, 1895, 1907, 2127, 2594,
2730, NA, 57, 105, 221, 422, 530, 742, 1158, 1481, 1508, 1450,
2028, 2399, NA), .Dim = c(13L, 12L), .Dimnames = list(c("1997",
"1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005",
"2006", "2007", "2008", "2009"), c("Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))

library(reshape)
sas <- melt(as.matrix(sas), na.rm = TRUE)
r <- melt(r, na.rm = TRUE)
s <- melt(s, na.rm = TRUE)
names(r) <- names(s) <- names(sas) <- c("year", "month", "count")

sas$software <- "sas"
s$software <- "s"
r$software <- "r"
all <- rbind(sas, s, r)
all$date <- with(all,
as.Date(paste(year, month, 15, sep = "-"), "%Y-%b-%d"))


library(ggplot2)
qplot(date, count, data = all, geom = "line", colour = software) +
geom_smooth(se = F, size = 1)
last_plot() + scale_y_log10(breaks = 10^(1:3), labels = 10^(1:3))

yearly <- ddply(all, .(year, software), function(df) c(count = sum(df$count)))
qplot(year, count, data = yearly, geom = "line", colour = software)


Hadley

--
http://had.co.nz/

Spencer Graves

unread,
Jan 7, 2009, 6:53:03 PM1/7/09
to Gabor Grothendieck, r-h...@r-project.org
Thanks, Gabor, Marc, Max:

The image is even more striking (and more accurately reflects
reality, I believe) if you add "log='y'" to "ts.plot".

Best Wishes,
Spencer

Gabor Grothendieck

unread,
Jan 7, 2009, 7:52:17 PM1/7/09