Re: Long to wide format

Message has been deleted

Brandon Hurr

unread,

Aug 12, 2018, 10:05:50 PM8/12/18

to Ian James, manipulatr

Ian

Could you attach an example dataset?

It’s hard to follow along with the code when there’s no input data.

B

On Sun, Aug 12, 2018 at 18:13 Ian James <ijam...@gmail.com> wrote:

I am trying to get my data from long to wide format. I will need to do this with multiple data sets with different lengths. I believe i can alter this script to do what i want but my attempts have been unsuccessful. can anyone explain how i can alter this script to convert data sets (all 2 columns but variable numbers of rows) into wide format for analysis in the geomorph r package? (my first data set is 2686 rows and 2 columns (landmark data)).

data <- read.table("Rafalt1outline.txt", header=FALSE)
# Import data

landmarks=30
# Number of landmarks

data$specimen=as.factor(unlist(lapply(seq_len(landmarks), function(x)
rep(seq_len(landmarks)[x],30))))
data$landmark=as.factor(rep(1:30,30))
# Create two factors to index the various x,y coordinates

library(reshape)
data_wide=reshape(data,
idvar = "specimen",
timevar = "landmark",
direction = "wide")
# Use reshape to convert from long to wide format

landmarknames=unlist(lapply(seq_len(landmarks), function(x)
c(paste("x.",x,sep=""),paste("y.",x,sep=""))))
# Create ad hoc names to use for indexing

data_wide=cbind(data_wide$specimen,data_wide$genus.1,data_wide[,landmarknames])
# Get only the useful columns

write.table(data_wide, "reformat.txt")

--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at https://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Josh Kahn

unread,

Aug 13, 2018, 11:19:18 AM8/13/18

to Brandon Hurr, Ian James, manipulatr

Consider using spread() and gather() in the tidyverse. For some reason, I find it much easier than reshape().

Message has been deleted

Brandon Hurr

unread,

Aug 13, 2018, 3:54:44 PM8/13/18

to Ian James, manipulatr

How many landmarks and specimen for this dataset?

30 and 30 is only 900, this has 2686. This means your code doesn't work on this example. Mainly I'm after understanding what the output is so I know what I need to do to get there, but a working example would be beneficial for all.

B

On Mon, Aug 13, 2018 at 12:18 PM Ian James <ijam...@gmail.com> wrote:

I would be happy to.

Brandon Hurr

unread,

Aug 13, 2018, 4:22:38 PM8/13/18

to Ian James, manipulatr

Alright, I just made up some data that fit your example. If you use the tidyverse (dplyr, tidyr, ggplot2) this should get you the same output as you had from the code you supplied given the same number of specimens and landmarks.

library(tidyverse)

# 30 specimens

specimen <- rep(1:30, each = 30)

# 30 landmarks

landmark <- rep(1:30, times = 30)

# make up some coordinate data

x <- rnorm(length(specimen))

y <- rnorm(length(specimen))

# assemble into a tibble/data.frame

df <- tibble(specimen, landmark, x, y)

# make a vector of names so you can sort the columns how you want them

num_landmarks <- 30

num_coords <- 2

coord_names <- c('x', 'y')

# paste them all together with a '.' separating them

move_cols <- paste(rep(coord_names, times = num_landmarks), rep(1: num_landmarks, each = num_coords), sep = ".")

df %>%

gather(., key = "xy", value = "coord_value", -specimen, -landmark) %>% #make dataset even longer

unite(., landmark.coord, c("xy", "landmark"), sep = ".") %>% # create labels for future spreading

spread(., landmark.coord, coord_value) %>% #spread it all out

select(specimen, move_cols) # reorder things to fit expected output

HTH,
B

Message has been deleted

Brandon Hurr

unread,

Aug 14, 2018, 3:33:57 PM8/14/18

to Ian James, manipulatr

Ian,

What does str(Rafalt1outline)tell you?

I suspect you are giving it your original data.frame (with your coordinates). When you try and store that in a tibble you need to put it in with list(), but I don't think you want to do that at all.

Like this:

library(tidyverse)

# Import data

data <- read.table("~/Downloads/Rafalt1outline.txt", header=FALSE) %>%

rename(x = V1, y = V2) # rename to x and y

# 1 specimen

specimen <- 1

# 2686 landmarks

landmark <- 1:2686

# assemble into a tibble/data.frame, bind the columns to align specimen/landmarks

df <- bind_cols(tibble(specimen, landmark), data)

# make a vector of names so you can sort the columns how you want them

num_landmarks <- 2686

num_coords <- 2

coord_names <- c('x', 'y')

# paste them all together with a '.' separating them

move_cols <- paste(rep(coord_names, times = num_landmarks), rep(1: num_landmarks, each = num_coords), sep = ".")

df %>%

gather(., key = "xy", value = "coord_value", -specimen, -landmark) %>% #make dataset even longer

unite(., landmark.coord, c("xy", "landmark"), sep = ".") %>% # create labels for future spreading

spread(., landmark.coord, coord_value) %>% #spread it all out

select(specimen, move_cols) # reorder things to fit expected output

This output is super wide. Is this really what you need?

HTH,

B

On Tue, Aug 14, 2018 at 10:58 AM Ian James <ijam...@gmail.com> wrote:

Brandon,
Thank you. I really appreciate your help. When I attempt to use the code on the data set I provided I get an error code that says Error: Column `specimen` must be a 1d atomic vector or a list. can you spot my issue?

library(tidyverse)

# 1 specimens
specimen <- Rafalt1outline
# 2686 landmarks
landmark <- rep(1:2686, times = 1)

# make up some coordinate data
x <- rnorm(length(specimen))
y <- rnorm(length(specimen))

# assemble into a tibble/data.frame
df <- tibble(specimen, landmark, x, y)

# make a vector of names so you can sort the columns how you want them

num_landmarks <- 2686

num_coords <- 2
coord_names <- c('x', 'y')
# paste them all together with a '.' separating them
move_cols <- paste(rep(coord_names, times = num_landmarks), rep(1: num_landmarks, each = num_coords), sep = ".")

df %>%
gather(., key = "xy", value = "coord_value", -specimen, -landmark) %>% #make dataset even longer
unite(., landmark.coord, c("xy", "landmark"), sep = ".") %>% # create labels for future spreading
spread(., landmark.coord, coord_value) %>% #spread it all out
select(specimen, move_cols) # reorder things to fit expected output

Message has been deleted

Brandon Hurr

unread,

Aug 14, 2018, 5:54:24 PM8/14/18

to Ian James, manipulatr

Ian,

If you run the code I posted above, you are splitting x and y into separate columns per location. This gives you 1 row and 5373 columns (1 column is Specimen)

# A tibble: 1 x 5,373

For each location, you have an x and a y column. This is what I believe you were specifying in your original code.

As I suspected, you were trying to put your original data 2686 times into a larger dataframe. I got around this issue, but using dplyr::bind_cols(), which brings columns together if they are the same length. (Hopefully they are the same length because we're making the names specifically a given length.)

Do we have what you need Ian, or is there another problem?

B

On Tue, Aug 14, 2018 at 2:31 PM Ian James <ijam...@gmail.com> wrote:

> str(Rafalt1outline)
'data.frame': 2686 obs. of 2 variables:
$ V1: int 2880 2879 2879 2878 2878 2877 2877 2876 2876 2875 ...
$ V2: int 1437 1437 1442 1442 1446 1446 1451 1451 1455 1455 ...

Im trying to get it to 1 observation of 2686 variables

Reply all

Reply to author

Forward

Message has been deleted