Re: [OpenRefine] Sum of two duplicate rows

1,084 views
Skip to first unread message

Tom Morris

unread,
Feb 27, 2013, 2:03:20 PM2/27/13
to openr...@googlegroups.com
On Wed, Feb 27, 2013 at 9:47 AM, Filip Podstavec <filip.p...@gmail.com> wrote:
Hello everybody,
 i'm sorry for my English(i'm from Czech Republic), but i have important question for my job.
I did in Google Refine keyword analysis and i used clustering to rename similar rows. But i have in next cells data about search count from adwords and i need sum up search count in duplicate rows now(duplicate in keyword, but different in search count).
Is this possible in Google Refine?

Hi Filip.  I'm sure your English is much better than my Czech!

If your data looks like:

Keyword  Count
keyword1 10
keyword1 15
keyword2 42
keyword2 51

what you  want to do is:

- Blank Down command on the Keyword column
- move the Keyword colum to the first (left) position if it's not already there
- Join Multi-valued Cells on the Count column
- Add a column called Total Count with the expression: forEach(value.split(','),v,v.toNumber()).sum()

This takes advantage of OpenRefine's concept of "records" which it forms based on the missing initial column values after the Blank Down operation.  You can switch between Record and Row display mode to get different effects depending on what you are trying to do.

Tom

Filip Podstavec

unread,
Mar 1, 2013, 3:30:31 AM3/1/13
to openr...@googlegroups.com
Thank you Tom! It works :-)
I owe you a beer :-)

Dne středa, 27. února 2013 20:03:20 UTC+1 Tom Morris napsal(a):

Tom Morris

unread,
Apr 11, 2013, 12:21:54 PM4/11/13
to openr...@googlegroups.com
On Thu, Apr 11, 2013 at 12:43 AM, gjb <gjbl...@gmail.com> wrote:

I found your answer to be super helpful as well.  Thanks!   To add to the OP's question:  Any idea on what to do if you have an additional column with data you need to keep/ be able to sort on?  That is, when you can't just use the blank down command because there are additional data you need to preserve / distinguish between.

I.e. if data looks like this:

Name Gender Count
Chris Male 200
Chris Female 156
Chris Male 27
Chris Female 176

Ideally, I want to be able to know for each unique name, what is the sum of that unique name, and what is the sum by each name/gender combo.

Based on the above example: Chris,559,Male, 227,Female,332

Does that make any sense?  Thanks in advance!

One thing you can do for cases like this is combine things for some processing and then split them back out again.  For example, you could combine given name and gender using an unlikely separator (e.g. '|'), sort, blank down, join multivalued cells, and sum, then split the name and gender back into separate columns.  To get the grand total, duplicate the count column, sort on given name, make sort permanent, blank down, join multivalued cells on new count column, and sum.

Hope that's not too much of a shorthand description for you!

Tom 

gjb

unread,
Apr 11, 2013, 9:35:34 PM4/11/13
to openr...@googlegroups.com
Hey Tom, 

Actually that's a huge help.  And it worked like a charm.  Thanks for setting me straight!
Reply all
Reply to author
Forward
0 new messages