Having trouble replicating a ggplot2 graphic

2,258 views
Skip to first unread message

Ramnath Vaidyanathan

unread,
Mar 5, 2014, 4:40:50 PM3/5/14
to gg...@googlegroups.com
I am trying to recreate the following ggplot2 graphic

library(ggplot2)
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point() +
  geom_smooth(aes(color = cut))

Going through the documentation for ggvis, I got to the following code, which produces an error

ggvis(diamonds, props(x = ~carat, y = ~price)) +
  layer_point(props(opacity := 0.5)) +
  layer_smooth(props(stroke = ~cut ))

The error message I am getting is this

Guessing method = gam
Guessing formula = y ~ s(x)
Error in rep(col, length.out = nrow(data)) : 
  attempt to replicate an object of type 'closure'
Error in rep(col, length.out = nrow(data)) : 
  attempt to replicate an object of type 'closure'

I was wondering if this is a bug, or I am doing something wrong.

Thanks.
Ramnath

Winston Chang

unread,
Mar 5, 2014, 5:08:10 PM3/5/14
to Ramnath Vaidyanathan, gg...@googlegroups.com
Hi Ramnath -

Hm, that error message isn't very informative. Do you mind filing an issue on that?

In ggplot2, grouping is done automatically if a discrete variable in the data is mapped to an aesthetic. When calculating transforms (like smooth), the data is split apart into pieces, and the transform is applied to each piece.

In ggvis (as of now) the grouping doesn't happen automatically when a variable in the data is mapped to a property. So in the layer_smooth, it's trying to calculate a single model line from all the data, but also map the cut variable for each point to the stroke color -- and those two things won't work together.

Here are two ways of doing this. (I'm also taking a subset of the diamonds data set, since there are some to-be-addressed bottlenecks that cause slow rendering for ~53k points.)

data('diamonds', package='ggplot2')
d <- diamonds[sample(nrow(diamonds), 1000), ]


# In this version, the grouping on cut is done for all layers, although it's
# only visible for the layer_smooth
ggvis(diamond, by_group(cut), props(x = ~carat, y = ~price)) +
  layer_point(props(opacity := 0.5)) +
  layer_smooth(props(stroke = ~cut))

# In this version, the grouping on cut is done only for the layer_smooth
ggvis(d, props(x = ~carat, y = ~price)) +
  layer_point(props(opacity := 0.5)) +
  layer(
    by_group(cut),
    layer_smooth(props(stroke = ~cut))
  )

One thing we want to do for the future is to make the grouping simpler to use.

-Winston



--
You received this message because you are subscribed to the Google Groups "ggvis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggvis+un...@googlegroups.com.
To post to this group, send email to gg...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ramnath Vaidyanathan

unread,
Mar 5, 2014, 5:22:29 PM3/5/14
to gg...@googlegroups.com, Ramnath Vaidyanathan
Thanks Winston. That explanation makes a lot of sense. 

With a large dataset there are multiple bottlenecks to address. For starters, RJSONIO is dead slow when it comes to bigger data. rjson is very fast, but not as full features as RJSONIO. jsonlite is the most feature complete, but performance still an issue. So I presume, one important thing to do would be to have a fast and feature complete json transformer.

There are other issues, that I will email you about.

I shall file a bug report on the error message.

Thanks for the answers.

Winston Chang

unread,
Mar 5, 2014, 5:45:59 PM3/5/14
to Ramnath Vaidyanathan, gg...@googlegroups.com
With a large dataset there are multiple bottlenecks to address. For starters, RJSONIO is dead slow when it comes to bigger data. rjson is very fast, but not as full features as RJSONIO. jsonlite is the most feature complete, but performance still an issue. So I presume, one important thing to do would be to have a fast and feature complete json transformer.

Absolutely, RJSONIO is the biggest (smallest?) bottleneck right now. We've also experimented with rjson, but it has some different (and undesirable) behavior, so we decided to stick with RJSONIO for now. A fast, well-specced JSON converter is something we definitely want. It would affect many projects, including ggvis and Shiny.

-Winston

Ramnath Vaidyanathan

unread,
Mar 5, 2014, 8:53:10 PM3/5/14
to gg...@googlegroups.com, Ramnath Vaidyanathan, Jeroen Ooms
Jeroen Ooms has been focusing on the well-specced piece of the puzzle in jsonlite. With some Rcpp kind of support, I can see jsonlite becoming the package that we are all looking for. Maybe, this can be a proposal for GSoC 2014?

Jeroen Ooms

unread,
Mar 5, 2014, 9:03:31 PM3/5/14
to Ramnath Vaidyanathan, gg...@googlegroups.com
Some comments

- The latest version of jsonlite includes a more recent version of
libjson c++ lib than the one that was included with RJSONIO. There
were some critical bugfixes, but perhaps also performance has changed.
- Would it be possible to specify which parts of RJSONIO/jsonlite are
"dead slow"? Perhaps we can optimize some specific bottlenecks. The
parsing shouldn't be too bad, but JSON writer is pure R right now, so
that can probably get better with some c++ code.

Ramnath Vaidyanathan

unread,
Mar 5, 2014, 9:46:38 PM3/5/14
to gg...@googlegroups.com, Ramnath Vaidyanathan
Jeroen. I don't think is a specific part of jsonlite/RJSONIO that is slow. I just think the speed for rjson comes purely from the implementation.

I checked the lastest version of jsonlite, and while its performance is comparable with RJSONIO, it is about 5x slower than rjson. I think once the specs for jsonlite stabilitize it might make sense to rewrite the bottlenecks using RCpp. The Google Summer of Code 2014 might be a great opportunity to get students to get this work done. It pays a student for a 3 month period and I believe this is a very well defined project. Here is a link to the GSOC R Google Group https://groups.google.com/forum/#!forum/gsoc-r

Ramnath

Kenton Russell

unread,
Mar 6, 2014, 1:09:17 PM3/6/14
to gg...@googlegroups.com, Ramnath Vaidyanathan
I am very pleased with the performance improvements of jsonlite.  toJSON on something like diamonds basically would not work on my R in the first version.  Now down to a very respectable 5 seconds -- or about the 5x longer that Ramnath mentioned.  Of course, faster is always better :)
Reply all
Reply to author
Forward
0 new messages