Line plots with factor variable in x-axis

2,826 views
Skip to first unread message

Santosh Mishra

unread,
Feb 11, 2010, 12:12:57 AM2/11/10
to ggp...@googlegroups.com
Hello everyone,

I am a beginner in ggplot2 so this question may be very elementary. I have the following dataframe called tempdf ( created from a melt command). The first column is a factor variable of intervals.

               xvar1       variable     value
1  [-4.1579,-1.7641) Sample Logodds 0.4975000
2  [-1.7641,-1.4066) Sample Logodds 0.4850000
3  [-1.4066,-1.1812) Sample Logodds 0.5100000
4  [-1.1812,-1.0069) Sample Logodds 0.5150000
5  [-1.0069,-0.8502) Sample Logodds 0.5175000
6  [-0.8502,-0.7062) Sample Logodds 0.4675000
7  [-0.7062,-0.5819) Sample Logodds 0.5375000
8  [-0.5819,-0.4626) Sample Logodds 0.5300000
9  [-0.4626,-0.3482) Sample Logodds 0.5250000
10 [-0.3482,-0.2392) Sample Logodds 0.5000000
11 [-0.2392,-0.1340) Sample Logodds 0.4800000
12 [-0.1340,-0.0411) Sample Logodds 0.4375000
13 [-0.0411, 0.0498) Sample Logodds 0.5225000
14 [ 0.0498, 0.1448) Sample Logodds 0.4825000
15 [ 0.1448, 0.2440) Sample Logodds 0.4775000
16 [ 0.2440, 0.3548) Sample Logodds 0.4600000
17 [ 0.3548, 0.4671) Sample Logodds 0.5275000
18 [ 0.4671, 0.5742) Sample Logodds 0.5200000
19 [ 0.5742, 0.7049) Sample Logodds 0.5000000
20 [ 0.7049, 0.8386) Sample Logodds 0.5000000
21 [ 0.8386, 0.9975) Sample Logodds 0.4850000
22 [ 0.9975, 1.1776) Sample Logodds 0.5050000
23 [ 1.1776, 1.4058) Sample Logodds 0.4875000
24 [ 1.4058, 1.7387) Sample Logodds 0.4950000
25 [ 1.7387, 3.8853] Sample Logodds 0.5000000
26 [-4.1579,-1.7641)       Lower CI 0.4486816
27 [-1.7641,-1.4066)       Lower CI 0.4363220
28 [-1.4066,-1.1812)       Lower CI 0.4610714
29 [-1.1812,-1.0069)       Lower CI 0.4660357
30 [-1.0069,-0.8502)       Lower CI 0.4685197
31 [-0.8502,-0.7062)       Lower CI 0.4190693
32 [-0.7062,-0.5819)       Lower CI 0.4884351
33 [-0.5819,-0.4626)       Lower CI 0.4809578
34 [-0.4626,-0.3482)       Lower CI 0.4759789
35 [-0.3482,-0.2392)       Lower CI 0.4511572
36 [-0.2392,-0.1340)       Lower CI 0.4313866
37 [-0.1340,-0.0411)       Lower CI 0.3896318
38 [-0.0411, 0.0498)       Lower CI 0.4734913
39 [ 0.0498, 0.1448)       Lower CI 0.4338537
40 [ 0.1448, 0.2440)       Lower CI 0.4289207
41 [ 0.2440, 0.3548)       Lower CI 0.4116934
42 [ 0.3548, 0.4671)       Lower CI 0.4784677
43 [ 0.4671, 0.5742)       Lower CI 0.4710049
44 [ 0.5742, 0.7049)       Lower CI 0.4511572
45 [ 0.7049, 0.8386)       Lower CI 0.4511572
46 [ 0.8386, 0.9975)       Lower CI 0.4363220
47 [ 0.9975, 1.1776)       Lower CI 0.4561119
48 [ 1.1776, 1.4058)       Lower CI 0.4387915
49 [ 1.4058, 1.7387)       Lower CI 0.4462073
50 [ 1.7387, 3.8853]       Lower CI 0.4511572
51 [-4.1579,-1.7641)       Upper CI 0.5463661
52 [-1.7641,-1.4066)       Upper CI 0.5339643
53 [-1.4066,-1.1812)       Upper CI 0.5587378
54 [-1.1812,-1.0069)       Upper CI 0.5636780
55 [-1.0069,-0.8502)       Upper CI 0.5661463
56 [-0.8502,-0.7062)       Upper CI 0.5165510
57 [-0.7062,-0.5819)       Upper CI 0.5858492
58 [-0.5819,-0.4626)       Upper CI 0.5784697
59 [-0.4626,-0.3482)       Upper CI 0.5735440
60 [-0.3482,-0.2392)       Upper CI 0.5488428
61 [-0.2392,-0.1340)       Upper CI 0.5289951
62 [-0.1340,-0.0411)       Upper CI 0.4865611
63 [-0.0411, 0.0498)       Upper CI 0.5710793
64 [ 0.0498, 0.1448)       Upper CI 0.5314803
65 [ 0.1448, 0.2440)       Upper CI 0.5265087
66 [ 0.2440, 0.3548)       Upper CI 0.5090700
67 [ 0.3548, 0.4671)       Upper CI 0.5760074
68 [ 0.4671, 0.5742)       Upper CI 0.5686134
69 [ 0.5742, 0.7049)       Upper CI 0.5488428
70 [ 0.7049, 0.8386)       Upper CI 0.5488428
71 [ 0.8386, 0.9975)       Upper CI 0.5339643
72 [ 0.9975, 1.1776)       Upper CI 0.5537927
73 [ 1.1776, 1.4058)       Upper CI 0.5364471
74 [ 1.4058, 1.7387)       Upper CI 0.5438881
75 [ 1.7387, 3.8853]       Upper CI 0.5488428

Basically I want to create the logodds plot with confidence interval. When I run the following command
ggplot(tempdf,aes(xvar1,value,colour=variable))+geom_line()
it gives me a blank plot
When I do 
ggplot(tempdf,aes(as.numeric(xvar1),value,colour=variable))+geom_line()
I get the line plot but the intervals don't appear in x-axis (the mid point of the interval appears as it it is generated from the cut2() function of Hmisc)
when I do  
ggplot(tempdf,aes(xvar1,value,colour=variable))+geom_point()
I get the point plots but I need the continuous lines connecting the points.

A less significant supplementary question is 
Is there a way change the alignment x-axis values(I want the intervals to be perpendicular to x-axis so that the intervals donot overlap in text).
Thank you in advance.

Santosh Mishra

Dennis Murphy

unread,
Feb 11, 2010, 1:48:50 AM2/11/10
to Santosh Mishra, ggp...@googlegroups.com
Hi:

Anyone who tries to read in this data set will have a problem because of the
spaces in 'Sample Logodds',  'Lower CI' and 'Upper CI'. These need to be quoted,
for otherwise R will try to read them in as separate variables...which is fine
until it sees the numerical value at the end of a line and realizes that there are more
variables than variable names....error message. Even after these are quoted,
there are spaces in the intervals that R will try to read as separate fields.

After some massaging, I got the data in. Your data is essentially in melted
form, so you need to cast it in order to separate the three variables. Your data
set below I called sds, with corresponding variable names interval, variable and value.

library(ggplot2)
sds2 <- cast(sds, interval ~ variable)
> head(sds2)
           interval  Lower CI Sample Logodds  Upper CI
1  [-0.0411,0.0498) 0.4734913         0.5225 0.5710793
2 [-0.1340,-0.0411) 0.3896318         0.4375 0.4865611
3 [-0.2392,-0.1340) 0.4313866         0.4800 0.5289951
4 [-0.3482,-0.2392) 0.4511572         0.5000 0.5488428
5 [-0.4626,-0.3482) 0.4759789         0.5250 0.5735440
6 [-0.5819,-0.4626) 0.4809578         0.5300 0.5784697

We then need to change the variable names to remove the spaces:

names(sds2) <- c('interval', 'lowerCI', 'logOdds', 'upperCI')

And now, the plot:

p <- ggplot(sds2, aes(x = interval, y = logOdds, ymin = lowerCI,
                       ymax = upperCI))
p + geom_linerange() + geom_point() +
     opts(axis.text.x = theme_text(angle = 90, hjust = 1)

Hope this is what you were after.
Dennis

--
You received this message because you are subscribed to the ggplot2 mailing list.
To post to this group, send email to ggp...@googlegroups.com
To unsubscribe from this group, send email to
ggplot2+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/ggplot2

hadley wickham

unread,
Feb 11, 2010, 9:21:01 AM2/11/10
to Dennis Murphy, Santosh Mishra, ggp...@googlegroups.com
> p <- ggplot(sds2, aes(x = interval, y = logOdds, ymin = lowerCI,
>                        ymax = upperCI))
> p + geom_linerange() + geom_point() +
>      opts(axis.text.x = theme_text(angle = 90, hjust = 1)
>
> Hope this is what you were after.

I think Santosh might also have wanted

p + geom_line(aes(group = 1))

The important thing here is to manually specify the grouping. By
default ggplot2 uses the combination of all categorical variables in
the plot to group geoms - that doesn't work for this plot because you
get an individual line for each point. Manually specify group = 1
indicates you want a single line connecting all the points.

Hadley

--
http://had.co.nz/

Maiko Sell

unread,
Apr 4, 2017, 9:04:26 PM4/4/17
to ggplot2, djm...@gmail.com, santo...@gmail.com
I am just learning about this "group=1" stuff (and R in geenral) and would like to udnerstand it better. Is there documentation on it? The thought that comes to mind is what happens if group=2, or 3,  etc
Reply all
Reply to author
Forward
0 new messages