Colsplit question- what pattern for concatenated names?

174 views
Skip to first unread message

Tom W

unread,
Aug 6, 2011, 4:43:08 AM8/6/11
to manip...@googlegroups.com
hey all

i'm stuck using colsplit for the most simple case- when the two variable names are concatenated.

consider the following example, where we measure states incidence of growth (y) and interest rates (i) over time. We measure growth twice (in 90 and 92) whilst we only measure interest rates in 90.

rdf        <- data.frame(state = factor(1:4), 
                           y90 = c(1, 2, 1, 2), 
                           y92 = c(2, 1, 2, 1), 
                           i90 = c(3, 3, 3, 3))
> rdf

  state y90 y92 i90
1     1   1   2   3
2     2   2   1   3
3     3   1   2   3
4     4   2   1   3

Melting is simple, but we get the problem of not having a separate indicator for year

rdfm <-melt(rdf, id = 1)


> rdfm
   state variable value
1      1      y90     1
2      2      y90     2
3      3      y90     1
4      4      y90     2
5      1      y92     2
6      2      y92     1
7      3      y92     2
8      4      y92     1
9      1      i90     3
10     2      i90     3
11     3      i90     3
12     4      i90     3

clearly, it's required that we separate the second column, so we can have a year and a variable type indicator. That is:

   state year variable value
1      1   90        y     1
2      2   90        y     2


but the following 

rdfm2 <- cbind(rdfm, colsplit(rdfm$variable, names = c("variable", "year")))

just provides this error

Error in is.character(pattern) : 'pattern' is missing

what "pattern" does colsplit require for conatenated names? 

Thanks!


Dennis Murphy

unread,
Aug 6, 2011, 5:59:58 AM8/6/11
to manip...@googlegroups.com
Hi:

The problem is that colsplit() needs a character on which to split the
strings, just like the strsplit() function in base. It is much easier
to perform the column split if you add a character between the first
and second positions of the original strings:

library('reshape')


rdf <- data.frame(state = factor(1:4),

y_90 = c(1, 2, 1, 2),
y_92 = c(2, 1, 2, 1),
i_90 = c(3, 3, 3, 3))
rdfm <- melt(rdf, id = 1)
cbind(rdfm[, -2], colsplit(rdfm$variable, split = '_',


names = c('variable', 'year')))

The default split string is "", but since you have three characters in
the variable names, that's not what you want here.

HTH,
Dennis

> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/manipulatr/-/BQeDbzaOGBoJ.
> To post to this group, send email to manip...@googlegroups.com.
> To unsubscribe from this group, send email to
> manipulatr+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/manipulatr?hl=en.
>

Hadley Wickham

unread,
Aug 6, 2011, 7:46:29 AM8/6/11
to Dennis Murphy, manip...@googlegroups.com
And if you can't do that, use substr.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Dennis Murphy

unread,
Aug 6, 2011, 9:41:12 AM8/6/11
to Hadley Wickham, manip...@googlegroups.com
Hi Hadley:

I'm not seeing it. What am I missing?

> cbind(rdfm, colsplit(rdfm$variable, split = substr(rdfm$variable, 1, 1),
+ names = c('measure', 'year')))
state variable value measure year
1 1 y90 1 NA 90
2 2 y90 2 NA 90
3 3 y90 1 NA 90
...

# I expected this not to work...
> cbind(rdfm, colsplit(rdfm$variable, split = substr(rdfm$variable, 1, 2),
+ names = c('measure', 'year')))
state variable value measure year
1 1 y90 1 NA 0
2 2 y90 2 NA 0
3 3 y90 1 NA 0
...

Dennis

Kohske Takahashi

unread,
Aug 6, 2011, 10:20:32 AM8/6/11
to Dennis Murphy, Hadley Wickham, manip...@googlegroups.com
hi,

try this:

transform(rdfm, variable=substr(variable,0,1), year=substr(variable,2,100))

--
Kohske Takahashi <takahash...@gmail.com>

Research Center for Advanced Science and Technology,
The University of  Tokyo, Japan.
http://www.fennel.rcast.u-tokyo.ac.jp/profilee_ktakahashi.html

Reply all
Reply to author
Forward
0 new messages