How to sum values for each record

2,695 views
Skip to first unread message

Rahi Delvi

unread,
May 13, 2016, 10:51:46 AM5/13/16
to OpenRefine

Would someone know how to write a GREL expression to get the combined total for each record.

I have a dataset that looks like this. It’s okay if each row in the Combined Total column shows the sum.


Name Grant Amount Combined Total
A 1000
22326
B 10615
16250
C 3000
3000
D 78050
1500
E 4055
18979
F 5000
12000
13000
G 6000

John Little

unread,
May 13, 2016, 11:49:42 AM5/13/16
to openr...@googlegroups.com
Rahi:  I feel like I could investigate how to do this.  It would be more complicated than the average OpenRefine task.  On the other hand, this is a perfect use-case for a pivot table (Very simple with a pivot table, in fact.)   So, for example LibreOffice - Calc, or Excel, either will do pivot tables.  There are really good short youtube videos on pivot tables.  

Maybe some [OR] guru can answer you question, however.  

Regrets (for not answering the actual question),
--John

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Owen Stephens

unread,
May 13, 2016, 11:55:39 AM5/13/16
to OpenRefine
You can do this by switching to Records mode (towards top left of project). In your sample data posted you should find your rows group by the letter in the first column
Convert all numbers in the Grant Amount column to numbers by using:

value.toNumber()

Then in the Combined Total column use the GREL:

row.record.cells["Grant Amount"].value.sum()

You should find that each row in the record contains the total for that record

Owen 

John Little

unread,
May 13, 2016, 12:03:00 PM5/13/16
to openr...@googlegroups.com
Hah!  A guru did respond.  Turns out it's not so complicated (well, it's still a bit complicated to me, but the explanation makes good sense).  Thanks Owen.  

--

Owen Stephens

unread,
May 13, 2016, 12:05:04 PM5/13/16
to openr...@googlegroups.com
No problem - but generally I’d agree that OpenRefine isn’t the best tool for doing this kind of calculation - in general spreadsheets are much better at this kind of thing :)

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: ow...@ostephens.com
Telephone: 0121 288 6936

Rahi Delvi

unread,
May 13, 2016, 1:14:19 PM5/13/16
to OpenRefine
Thanks John, point noted.

Thank you Owen, that worked perfectly!

I also found an old post by Tom tackling the issue a different way: https://groups.google.com/d/msg/openrefine/WDhF6ZnF61o/Hdcc1O5hhJoJ

Rahi

Tom Morris

unread,
May 16, 2016, 10:33:30 PM5/16/16
to openr...@googlegroups.com
On Fri, May 13, 2016 at 1:14 PM, Rahi Delvi <a...@rahidelvi.ca> wrote:
Thanks John, point noted.

Thank you Owen, that worked perfectly!

I also found an old post by Tom tackling the issue a different way: https://groups.google.com/d/msg/openrefine/WDhF6ZnF61o/Hdcc1O5hhJoJ

Owen's way is more elegant, but, especially if you're starting from scratch, be sure that all of the rows for your records are grouped together. It's easy to assume they're sorted/grouped only to discover later that that wasn't the case. (I know, I've done it many times myself!).

Tom 

Rahi Delvi

unread,
May 17, 2016, 6:48:43 AM5/17/16
to OpenRefine
Got it. Thanks Tom.

agrawals...@gmail.com

unread,
Apr 12, 2020, 3:52:32 PM4/12/20
to OpenRefine
hey , this method helped me in getting the sum , i am working on Titanic data set.

Please guide if we want to take the average of sum , what function to write in GREL.

in your case suppose average of (Combined Total)

Jevon, Graham

unread,
Apr 13, 2020, 7:20:15 AM4/13/20
to openr...@googlegroups.com
There might be a more elegant one liner, but you could do this:

1. Add new column (e.. called "Count") with the value in each row being 1
2. Then sum that column in the same way you summed the combined total - GREL: row.record.cells["Count"].value.sum()
3. Then divide the combined total by the count row sum - GREL: cells["Combine Total"].value/cells["Sum of Count"].value

In the attached example I added 3 new columns to help illustrate the 3 steps. But you could do all 3 transformation steps in one single column.

Best wishes

Graham


From: openr...@googlegroups.com [openr...@googlegroups.com] on behalf of agrawals...@gmail.com [agrawals...@gmail.com]
Sent: Sunday, April 12, 2020 8:52 PM
To: OpenRefine
Subject: [OpenRefine] Re: How to sum values for each record

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

 
******************************************************************************************************************
Experience the British Library online at www.bl.uk
The British Library’s latest Annual Report and Accounts : www.bl.uk/aboutus/annrep/index.html
Help the British Library conserve the world's knowledge. Adopt a Book. www.bl.uk/adoptabook
The Library's St Pancras site is WiFi - enabled
*****************************************************************************************************************
The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the postm...@bl.uk : The contents of this e-mail must not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author.
*****************************************************************************************************************
Think before you print
Screenshot 2020-04-13 at 12.12.33.png

Antoine Beaubien

unread,
Apr 13, 2020, 8:47:19 PM4/13/20
to OpenRefine
Hi,

   for the average, you can do this is one shot:
with(row.record.cells.Col2.value, vRs, vRs.sum() / vRs.length())

Col2 is the name of the column.

Regards,
   Antoine

Thad Guidry

unread,
Apr 13, 2020, 10:05:56 PM4/13/20
to openr...@googlegroups.com
Name your new column name whatever you want.
If you already have green numbers in the Grant Amount column then just drop the toNumber()
Don't use one-liners...use lots of lines so that your head doesn't spin around and around. (that's why we made the GREL expression input box draggable!

forEach(
  row.record.cells["Grant Amount"].value, v,
  v.toNumber()
)
.sum()


image.png



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Thad Guidry

unread,
Apr 13, 2020, 10:09:14 PM4/13/20
to openr...@googlegroups.com
Wait... I just realized your reviving a 4 year old thread!?!?!?

Antoine - Can you please stop that?   Your confusing the heck out of us and our community!


Thad Guidry

unread,
Apr 13, 2020, 10:11:51 PM4/13/20
to openr...@googlegroups.com
Nevermind.  It wasn't your fault.  I'll bow out now.  I guess if folks want to attempt to treat our email threads as forever knowledge, so be it.


Andrea Zanni

unread,
Jul 16, 2020, 5:01:52 AM7/16/20
to openr...@googlegroups.com
This is a great opportunity to ask again: will OR ever get the "pivot table" feature?
I use OR a lot for "exploratory" purposes, because Facets are great and just one click.
I get to see easily which values are more frequent, get a sense of my dataset.

Often, I want to sum up data attached to those values (the exact purpose of a pivot table, sorry I can't explain myself better).
I know this is a long awaited request (I filed myself an issue on Github 9 years ago!¹), but I wanted to ask again because maybe you can tell me why it's not in the works.

Thank you anyway (still love OR tho ;-)

Andrea


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Antonin Delpeuch (lists)

unread,
Jul 16, 2020, 5:14:02 AM7/16/20
to openr...@googlegroups.com
Hi Andrea,

One way you could increase the chances that this issue is worked on
would be to expand the issue with examples and pointers to describe how
this pivot works in other tools. Try to make it understandable to
someone who has never heard of pivot in the first place. Once this is
done, we might be able to tag the issue as "good first issue" or "good
second issue" depending on the estimated difficulty.

At the moment the issue description is so minimal that I would not
expect anyone to start working on it unless they have the exact same
need themselves.

Antonin
> <mailto:openrefine+...@googlegroups.com>.
> <https://groups.google.com/d/msgid/openrefine/CAChbWaMjBaY%2BHMCA-BbEE3dwOXqn%2BPPhk_p0fNoezYGfYC_KgQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/openrefine/CAC%3DVxybf8LxOLEEUqhNmg0QXcALggpRfMzy3bFxBefYuR49H1Q%40mail.gmail.com
> <https://groups.google.com/d/msgid/openrefine/CAC%3DVxybf8LxOLEEUqhNmg0QXcALggpRfMzy3bFxBefYuR49H1Q%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Tom Morris

unread,
Jul 16, 2020, 12:00:57 PM7/16/20
to openr...@googlegroups.com
On Thu, Jul 16, 2020 at 5:01 AM Andrea Zanni <zanni.a...@gmail.com> wrote:
This is a great opportunity to ask again: will OR ever get the "pivot table" feature?

"This" being what opportunity?
 
I use OR a lot for "exploratory" purposes, because Facets are great and just one click.
I get to see easily which values are more frequent, get a sense of my dataset.

Often, I want to sum up data attached to those values (the exact purpose of a pivot table, sorry I can't explain myself better).
I know this is a long awaited request (I filed myself an issue on Github 9 years ago!¹), but I wanted to ask again because maybe you can tell me why it's not in the works. 

There's no need to wait. People are allowed, nay, encouraged, to contribute solutions too. There's no need to wait for someone else to do it for you. For a large feature like this, it would be a good idea to discuss your proposed design ahead of time, before investing lots of time in coding, to make sure that it will fit in well with OpenRefine.

It's also worth looking at what you use pivot tables for to see if there might be other ways to implement tools which would be a more natural fit in OpenRefine. From your description, it sounds like perhaps a simple column sum which respected the selected facets would meet your needs.

Tom

Andrea Zanni

unread,
Jul 16, 2020, 1:42:50 PM7/16/20
to openr...@googlegroups.com
I'm sorry if my post "offended" someone,
it was not my intention.

I use OR to explore and clean data: if I need to work with data, I kinda always need to clean or edit them.
OR is the best tool I know for this, and I've used it extensively for years.

Now, I use pivot tables to COUNT and SUM values.
In OR, a "Text Facet" already counts how many times a value occurs in the dataset.
What I personally miss is the possibility to SUM.

For example; I have a book dataset.
A
Ferrante ElenaE/o525069.05.00
BAA.VV.Feltrinelli287150.5
CCamilleri AndreaSellerio Editore Palermo1166963.28.00
DTotti Francesco, Condò PaoloRizzoli294105.02.00
ECavallo Francesca, Favilli ElenaMondadori709346.25.00
FDicker JoëlGuanda
260853.23.00
GCavallo Francesca, Favilli ElenaMondadori241957.07.00
HJaneczek HelenaGuanda2055373.05.00
IScurati AntonioBompiani1981190.34.00
LMe contro TeMondadori
1910402.45.00

First colum is TITLE, then AUTHOR, PUBLISHER, REVENUE
If I do "Text Facet" over the PUBLISHER column, I have all the occurrences of the publisher with the count value.
I'd love to "add a column", so to speak (ie REVENUE) and be able to SUM all the revenues.
I'm not sure if there is a more simple GREL option for this, maybe I missed it. 
And maybe it's just a personal need, but this happens to me all the time and I solve it quickly with a Pivot table on a spreadsheet. 

I'd love to have it on OR because, as I said before, OR is an "exploratory and cleaning" tool, and I learn and fix mistakes and error as I dive into the dataset. A pivot table only works with data that are already cleansed, thus I can do it at the end of my process.

Hope this helps.

Aubrey



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Owen Stephens

unread,
Jul 16, 2020, 2:20:54 PM7/16/20
to OpenRefine
The uses you describe (counting and summing) sound like they might fit into this issue https://github.com/OpenRefine/OpenRefine/issues/2001 which is a proposal to add a new type of "Statistical facet". You may also want to review the older discussion that relates to this proposal https://github.com/OpenRefine/OpenRefine/issues/1340

It would be great to see Issue 2001 turned into a real piece of work especially as the old Stats extension (https://github.com/OpenRefine/refine-stats) hasn't been migrated forward to the latest version of OpenRefine

Owen
Reply all
Reply to author
Forward
0 new messages