Re: Long to wide format

270 zobrazení
Preskočiť na prvú neprečítanú správu
Správa bola odstránená

Brandon Hurr

neprečítané,
12. 8. 2018, 22:05:5012. 8. 2018
komu: Ian James, manipulatr
Ian

Could you attach an example dataset?

It’s hard to follow along with the code when there’s no input data.

B

On Sun, Aug 12, 2018 at 18:13 Ian James <ijam...@gmail.com> wrote:
I am trying to get my data from long to wide format. I will need to do this with multiple data sets with different lengths. I believe i can alter this script to do what i want but my attempts have been unsuccessful.   can anyone explain how i can alter this script to convert data sets (all 2 columns but variable numbers of rows) into wide format for analysis in the geomorph r package?  (my first data set is 2686 rows and 2 columns (landmark data)).




data <- read.table("Rafalt1outline.txt", header=FALSE) 
# Import data 

landmarks=30
# Number of landmarks 

data$specimen=as.factor(unlist(lapply(seq_len(landmarks), function(x) 
  rep(seq_len(landmarks)[x],30)))) 
data$landmark=as.factor(rep(1:30,30)) 
# Create two factors to index the various x,y coordinates 

library(reshape) 
data_wide=reshape(data, 
                  idvar = "specimen", 
                  timevar = "landmark", 
                  direction = "wide") 
# Use reshape to convert from long to wide format 

landmarknames=unlist(lapply(seq_len(landmarks), function(x) 
  c(paste("x.",x,sep=""),paste("y.",x,sep="")))) 
# Create ad hoc names to use for indexing 

data_wide=cbind(data_wide$specimen,data_wide$genus.1,data_wide[,landmarknames]) 
# Get only the useful columns 

write.table(data_wide, "reformat.txt")

--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at https://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Josh Kahn

neprečítané,
13. 8. 2018, 11:19:1813. 8. 2018
komu: Brandon Hurr, Ian James, manipulatr
Consider using spread() and gather() in the tidyverse. For some reason, I find it much easier than reshape().
Správa bola odstránená

Brandon Hurr

neprečítané,
13. 8. 2018, 15:54:4413. 8. 2018
komu: Ian James, manipulatr
How many landmarks and specimen for this dataset? 

30 and 30 is only 900, this has 2686. This means your code doesn't work on this example. Mainly I'm after understanding what the output is so I know what I need to do to get there, but a working example would be beneficial for all. 

B

On Mon, Aug 13, 2018 at 12:18 PM Ian James <ijam...@gmail.com> wrote:
I would be happy to.

Brandon Hurr

neprečítané,
13. 8. 2018, 16:22:3813. 8. 2018
komu: Ian James, manipulatr
Alright, I just made up some data that fit your example. If you use the tidyverse (dplyr, tidyr, ggplot2) this should get you the same output as you had from the code you supplied given the same number of specimens and landmarks. 

library(tidyverse)

# 30 specimens
specimen <- rep(1:30, each = 30)
# 30 landmarks
landmark <- rep(1:30, times = 30)

# make up some coordinate data
x <- rnorm(length(specimen))
y <- rnorm(length(specimen))

# assemble into a tibble/data.frame
df <- tibble(specimen, landmark, x, y)

# make a vector of names so you can sort the columns how you want them
num_landmarks <- 30
num_coords <- 2
coord_names <- c('x', 'y')
# paste them all together with a '.' separating them
move_cols <- paste(rep(coord_names, times = num_landmarks), rep(1: num_landmarks, each = num_coords), sep = ".")


df %>%
gather(., key = "xy", value = "coord_value", -specimen, -landmark) %>% #make dataset even longer
unite(., landmark.coord, c("xy", "landmark"), sep = ".") %>% # create labels for future spreading
spread(., landmark.coord, coord_value) %>% #spread it all out
select(specimen, move_cols) # reorder things to fit expected output

HTH,
B
Správa bola odstránená
Správa bola odstránená

Brandon Hurr

neprečítané,
14. 8. 2018, 15:33:5714. 8. 2018
komu: Ian James, manipulatr
Ian, 

What does str(Rafalt1outline)tell you?

I suspect you are giving it your original data.frame (with your coordinates). When you try and store that in a tibble you need to put it in with list(), but I don't think you want to do that at all. 

Like this:

library(tidyverse)

# Import data 
data <- read.table("~/Downloads/Rafalt1outline.txt", header=FALSE) %>% 
rename(x = V1, y = V2) # rename to x and y

# 1 specimen
specimen <- 1
# 2686 landmarks
landmark <- 1:2686

# assemble into a tibble/data.frame, bind the columns to align specimen/landmarks
df <- bind_cols(tibble(specimen, landmark), data)

# make a vector of names so you can sort the columns how you want them
num_landmarks <- 2686
num_coords <- 2
coord_names <- c('x', 'y')
# paste them all together with a '.' separating them
move_cols <- paste(rep(coord_names, times = num_landmarks), rep(1: num_landmarks, each = num_coords), sep = ".")


df %>%
gather(., key = "xy", value = "coord_value", -specimen, -landmark) %>% #make dataset even longer
unite(., landmark.coord, c("xy", "landmark"), sep = ".") %>% # create labels for future spreading
spread(., landmark.coord, coord_value) %>% #spread it all out
select(specimen, move_cols) # reorder things to fit expected output


This output is super wide. Is this really what you need?

HTH,
B


On Tue, Aug 14, 2018 at 10:58 AM Ian James <ijam...@gmail.com> wrote:
Brandon,
Thank you. I really appreciate your help. When I attempt to use the code on the data set I provided I get an error code that says   Error: Column `specimen` must be a 1d atomic vector or a list. can you spot my issue? 

library(tidyverse)

# 1 specimens
specimen <- Rafalt1outline
# 2686 landmarks
landmark <- rep(1:2686, times = 1)

# make up some coordinate data
x <- rnorm(length(specimen))
y <- rnorm(length(specimen))

# assemble into a tibble/data.frame
df <- tibble(specimen, landmark, x, y)

# make a vector of names so you can sort the columns how you want them
num_landmarks <- 2686
num_coords <- 2
coord_names <- c('x', 'y')
# paste them all together with a '.' separating them
move_cols <- paste(rep(coord_names, times = num_landmarks), rep(1: num_landmarks, each = num_coords), sep = ".")


df %>%
  gather(., key = "xy", value = "coord_value", -specimen, -landmark) %>% #make dataset even longer
  unite(., landmark.coord, c("xy", "landmark"), sep = ".") %>% # create labels for future spreading
  spread(., landmark.coord, coord_value) %>% #spread it all out
  select(specimen, move_cols) # reorder things to fit expected output



Správa bola odstránená
Správa bola odstránená

Brandon Hurr

neprečítané,
14. 8. 2018, 17:54:2414. 8. 2018
komu: Ian James, manipulatr
Ian,

If you run the code I posted above, you are splitting x and y into separate columns per location. This gives you 1 row and 5373 columns (1 column is Specimen)
# A tibble: 1 x 5,373

For each location, you have an x and a y column. This is what I believe you were specifying in your original code. 

As I suspected, you were trying to put your original data 2686 times into a larger dataframe. I got around this issue, but using dplyr::bind_cols(), which brings columns together if they are the same length. (Hopefully they are the same length because we're making the names specifically a given length.)

Do we have what you need Ian, or is there another problem? 

B





On Tue, Aug 14, 2018 at 2:31 PM Ian James <ijam...@gmail.com> wrote:
> str(Rafalt1outline)
'data.frame': 2686 obs. of  2 variables:
 $ V1: int  2880 2879 2879 2878 2878 2877 2877 2876 2876 2875 ...
 $ V2: int  1437 1437 1442 1442 1446 1446 1451 1451 1455 1455 ...



Im trying to get it to 1 observation of 2686 variables
Odpovedať všetkým
Odpovedať autorovi
Poslať ďalej
Správa bola odstránená
0 nových správ