histogram with scatter points

482 views
Skip to first unread message

knguyen

unread,
Oct 7, 2009, 1:35:43 PM10/7/09
to ggplot2
Hi all,

I just started using ggplot2 today. Please excuse me if this is
trivial

> head(mydata)
degree.g.
1 5
2 8
3 5
4 5
5 5
6 5

>p <- ggplot(mydata, aes(x = degree.g))

>p + geom_histogram() + scale_x_log10() + scale_y_log10()

essentially, this gives me a histogram of a graph's degree
distribution, in log-log scale. But I would like a scatter plot,
instead of a bar chart. What should I do?

Thanks

-k



Ista Zahn

unread,
Oct 7, 2009, 3:09:50 PM10/7/09
to knguyen, ggplot2
I don't really understand what you're trying to do. What are you
wanting to plot on the y axis? For single-variable distributions the
common displays are histograms and frequency polygons. Scatter plots
typically show the joint distribution of two variables.

-Ista
--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

Mike Lawrence

unread,
Oct 7, 2009, 3:13:19 PM10/7/09
to knguyen, ggplot2
use geom_point() instead of geom_histogram()
--
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~

Mike Lawrence

unread,
Oct 7, 2009, 3:23:47 PM10/7/09
to Ista Zahn, knguyen, ggplot2
ah, when I suggested geom_point() I assumed the 1:6 was the other
variable but possibly those are row numbers, so the question is indeed
confusing.

Ista Zahn

unread,
Oct 8, 2009, 3:52:30 PM10/8/09
to Khanh Nguyen, ggplot2
Well I'm still not sure why you want to do this, but I think you want
something like

degree.dist <- sample(1:20, 1000, replace = T)

D <- as.data.frame(table(degree.dist))

D$degree.dist as.numeric(D$degree.dist)

qplot(degree.dist, Freq, data=D, geom = 'point') + scale_x_log10() +
scale_y_log10()

Again, I think something like

qplot(degree.dist, geom = 'freqpoly', binwidth=.125) + scale_x_log10()
+ scale_y_log10()

makes more sense...

-Ista

On Thu, Oct 8, 2009 at 3:37 PM, Khanh Nguyen <kng...@cs.umb.edu> wrote:
> There was a typo,
>
>> qplot(degree(g), geom = 'histogram') + scale_x_log10() + scale_y_log10()
>
> should be
>
>> qplot(degree.dist, geom = 'histogram') + scale_x_log10() + scale_y_log10()
>
>
>
> On Thu, Oct 8, 2009 at 2:52 PM, knguyen <nguyen....@gmail.com> wrote:
>>
>> Thank you for all the response. I am sorry for the confusion.
>>
>> Here is what I am trying to do. I want a scatter plot degree vs.
>> counts of a graph. (hence, the histogram of the degree distribution).
>>
>> An example:
>>
>> > degree.dist <- sample(1:20, 1000, replace = T) # sample degree
>> > distribution
>>
>> > qplot(degree(g), geom = 'histogram') + scale_x_log10() + scale_y_log10()
>>
>> the last step is what I need, but I'd like a scatter plot instead of a
>> bar chart. Is it possible?
>>
>> Thanks.
>>
>> -k
>> - Hide quoted text -

knguyen

unread,
Oct 8, 2009, 2:52:53 PM10/8/09
to ggplot2

Harlan Harris

unread,
Oct 9, 2009, 8:04:38 AM10/9/09
to ggplot2


On Oct 8, 2:52 pm, knguyen <nguyen.h.kh...@gmail.com> wrote:
> Thank you for all the response. I am sorry for the confusion.
>
> Here is what I am trying to do. I want a scatter plot degree vs.
> counts of a graph. (hence, the histogram of the degree distribution).
>
> An example:
>
> > degree.dist <- sample(1:20, 1000, replace = T) # sample degree distribution
> > qplot(degree(g), geom = 'histogram') + scale_x_log10() + scale_y_log10()
>
> the last step is what I need, but I'd like a scatter plot instead of a
> bar chart. Is it possible?
>

Yes, it's possible. First, it's easiest if you use data.frames.
Second, it's easiest to not use qplot as a beginner. Really. Don't
believe Hadley on this one. :)

d <- data.frame(X=degree.dist)
ggplot(d) + geom_point(aes(x=X, y=..count..), stat="bin", binwidth=1)

So, geom_point says plot with points, where the x coordinate comes
from the data and the y coordinate is computed by ggplot. In this
case, by the binning algorithm which is used by histograms. It
generates a new variable called ..count.. which you can use directly.
The binwidth thing is a parameter to bin, and just makes the points
line up with the integers...

(The log scales didn't work for me right away, but I'm sure you can
figure that out with some trial and error!)

-Harlan

Reply all
Reply to author
Forward
0 new messages