Data reshaping with conditions

34 views
Skip to first unread message

sri vathsan

unread,
Apr 18, 2016, 6:34:07 AM4/18/16
to manipulatr

I am trying to reshape the data but with some conditions. The data looks like below. similarly there will be more ID which are repeating

  S.No     Count  ID      Type  
    1         135   350      A   
    2         114   350      A  
    3         84    350      A  
    4         80    350      A  
    5         19    350      A  
    6          8    350      A  
    7         21    350      A  
    8         88    350      B  
    9        111    350      B  
    10        46    350      B  
    11       108    350      B 
    12       19     352      A
    13       117    352      A
    14       167    352      B

And I need to convert it like below.

S.No    ID  Type    Count_type B    Count_typeA_less than     Count_typeA 
                                       max(count_type B)      high than  
                                                          max(count_type B)  
1       350  B      88,111,46,108       84,80,19,8,21             135,114  
2       352  B      167                 19,117                        NA

Count_typeA_Less than max(count_type B) = count of type A which are less than maximum number of count_type B
Count_typeA_high than max(count_type B) = count of type A which are higher than the maximum number of count_type B

I tried with the dplyr and reshape. I could not achieve this.

Brandon Hurr

unread,
Apr 18, 2016, 8:52:20 AM4/18/16
to sri vathsan, manipulatr
A couple things:

1. It would be super helpful for you to send a dataset that we can copy and paste that is more than 1.2 units in length.
2. I think you need to break your problem down and you'll probably see the answer.

From what I can tell you need to use reshape2::dcast() or tidyr::spread() to get A and B next to each other.

Then one at a time filter the dataset for your conditions and use dplyr::summarise() to paste the numbers and create a single column.

Repeat for each condition (above and then below max).

Then use dplyr::full_join() to merge everything into one table.

If that doesn't make sense, send an example dataset for more help.

B

--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at https://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

sri

unread,
Apr 18, 2016, 10:42:49 AM4/18/16
to manipulatr, sriv...@gmail.com
Thank you so much for your reply, Brandon Hurr. For the sake of clarity, I attached two csv file, one with data nad the other which is expected output from the data.

Firstly, I tried with dplyr with the following code for the initial step to get the values for each column.
Secondly, I thought to transpose the columns which has the unique ID alone.

For the first step, the code is executed but higher and lower value of A is not coming. 

I hope I make my point clear. Please bare with the code, as I am new to coding.

Expected_output= data %>%
  group_by
(id, Type) %>%
  mutate
(Count_of_B = paste(unlist(count[Type=="B"]), collapse = ","))%>%
  mutate
(Max_of_count_B = ifelse(Type == "B", max(count[Type == "B"]),max(count[Type == "A"]))) %>%
  mutate
(count_type_A_lesser = ifelse (Type=="B",(paste(unlist(count[Type=="A"]) < Max_of_count_B[Type=="B"], collapse = ",")), "NA"))%>%
  mutate
(count_type_A_higher = ifelse(Type=="B",(paste(unlist(count[Type=="A"]) > Max_of_count_B[Type=="B"], collapse = ",")), "NA"))

data.csv
expected output.csv
Reply all
Reply to author
Forward
0 new messages