Help to reshape a data frame

26 visningar
Hoppa till det första olästa meddelandet

Marialuisa Villani

oläst,
12 apr. 2016 13:54:142016-04-12
till manipulatr

Hello everyone, 


I'm a Statical and sociology student, I would like to work on a network data base. This is the database http://snap.stanford.edu/data/cit-HepPh.html. Is a citation paper database


Data are in edge list shape I use this code to import: 


nodes <- data.frame(read.table("cit-HepPh-dates.txt", header=T, as.is=T))

links <- data.frame(read.table("Cit-HepPh.txt", header=T, as.is=T))



the nodes list now is a dataframe, but different ID are repeated for each year of citation. 


I would like to use dcast in order to produce a column for each year of citation.


Someone can help me?


Thank you


Best Regards


Marialuisa


Cit-HepPh.txt
cit-HepPh-dates.txt

Brandon Hurr

oläst,
12 apr. 2016 14:43:542016-04-12
till Marialuisa Villani, manipulatr
I'm not sure I understand what you want fully, but in case I do, here is an attempt:

library(dplyr)
library(tidyr)
library(lubridate)

nodes <- data.frame(read.table("cit-HepPh-dates.txt", header=T, as.is=T))
links <- data.frame(read.table("Cit-HepPh.txt", header=T, as.is=T))


nodes %>%
mutate(Year = year(Date), rownum = row_number()) %>%
spread(Year, ID) %>% glimpse


Each year from 1992 to 2002 is now a column, but this matrix is really sparse and much larger than the original. It might help us understand better if we knew what sort of visualization or table you are going for in the end. 

B

--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at https://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Brandon Hurr

oläst,
12 apr. 2016 15:39:472016-04-12
till Marialuisa Villani, manipulatr
I think this does that. 

nodes %>%
mutate(Year = year(Date), rownum = row_number()) %>%
spread(Year, Date) %>% head



On Tue, Apr 12, 2016 at 12:31 PM, Marialuisa Villani <marialuis...@gmail.com> wrote:
Thanks but I would like something like that: 


 ID               Date1                Date2          Date3
9203201 1992-02-24        1993-01-03
9301202 1992-03-08 1993-01-04

Ect

Do you think it's possible?
--
PhD Marialuisa Villani
Ingénieur d'études Post-Doc
Laboratoire SAGE
Université de Strasbourg
Email: mvil...@unistra.fr
Tel:0033 -628503163

Marialuisa Villani

oläst,
12 apr. 2016 15:42:302016-04-12
till Brandon Hurr, manipulatr
I have un object with 6 rows and 13 columns

Brandon Hurr

oläst,
12 apr. 2016 15:43:422016-04-12
till Marialuisa Villani, manipulatr
just delete "%>% head"

It's all there. 

Brandon Hurr

oläst,
12 apr. 2016 16:05:152016-04-12
till Marialuisa Villani, manipulatr
Marialuisa, 

I'm sorry, I've used igraph twice in my life so I really don't know what you need. 

You have nodes. You said you wanted the nodes in the format below (Year as columns with Dates and rows as IDs. I believe that is working fine. You may need to remove the rownum column since it's just to keep track of duplicates. 


nodes %>%
mutate(Year = year(Date), rownum = row_number()) %>%
spread(Year, Date) %>%
        select(-rownum)

From there I really don't know what you mean by "transforme de edge list to node liste to use in Igraph package"

If you can tell me what the node list should look like I might be able to help further, but perhaps someone else who uses igraph knows a better way?

B


On Tue, Apr 12, 2016 at 12:55 PM, Marialuisa Villani <marialuis...@gmail.com> wrote:
No I need to transforme de edge list to node liste to use in Igraph package. 



2016-04-12 16:50 GMT-03:00 Brandon Hurr <brando...@gmail.com>:
If you print it out, is it what you want? 

Hard to say from over here. 

On Tue, Apr 12, 2016 at 12:48 PM, Marialuisa Villani <marialuis...@gmail.com> wrote:
This code? 

n<-nodes %>%
  mutate(Year = year(Date), rownum = row_number()) %>%
  spread(Year, Date)

Thank you so much for your help
Svara alla
Svara författaren
Vidarebefordra
0 nya meddelanden