"Count" function on text facet

972 views
Skip to first unread message

Andrea Zanni

unread,
Feb 23, 2018, 3:09:48 PM2/23/18
to openr...@googlegroups.com
Hello everyone,
I have a question.

One of the best and most helpful features of OR is the "text facet", and the "Count" display on that facet. I use it all the time to explore my data and get the big picture on the dataset.

I wonder tho if there is the possibility of creating a new project/facet with that kind of sorting... It would be amazing to have a sort of new "Numeric facet" to be able to explore the dataset in terms of "popularity" of each value.

I always end up clicking the "n choices", copying the numbers and pasting them on Google Docs to create a chart and look at the distribution.

I also wonder if the "change" action ould do something about it... I've never understood how to use it in the facet.

Thanks

Andrea

Owen Stephens

unread,
Feb 23, 2018, 3:34:01 PM2/23/18
to OpenRefine
Hi Andrea,

Can you say a bit more about how you would want this to work?

At the moment you can sort the values in the facet either alphabetically, or by the 'count' number by clicking the 'Sort by' options: "name" or "count" at the top of the facet.

Are there other sort orders, or way of ordering the values in the facet that would help?
Or are you wanting to count different aspects of the values?

Thanks

Owen

Andrea Zanni

unread,
Feb 23, 2018, 4:18:14 PM2/23/18
to openr...@googlegroups.com
On Fri, Feb 23, 2018 at 9:34 PM, Owen Stephens <ow...@ostephens.com> wrote:
Hi Andrea,
 
Hi Owen, thanks for replying.
 
At the moment you can sort the values in the facet either alphabetically, or by the 'count' number by clicking the 'Sort by' options: "name" or "count" at the top of the facet.

Yes, and I use this all the time. What I'd like, though, is to sort *the dataset* with the 'count' option: if I click 'Count', the values will be sort only in the facet.
I would like a "Numeric facet" based on the 'Count' number, so I can explore the whole data in a different way, selectiong only sections of the dataset based on the 'count' number.

I hope it's more clear.

Thanks

Owen

On Friday, February 23, 2018 at 8:09:48 PM UTC, Andrea Zanni wrote:
Hello everyone,
I have a question.

One of the best and most helpful features of OR is the "text facet", and the "Count" display on that facet. I use it all the time to explore my data and get the big picture on the dataset.

I wonder tho if there is the possibility of creating a new project/facet with that kind of sorting... It would be amazing to have a sort of new "Numeric facet" to be able to explore the dataset in terms of "popularity" of each value.

I always end up clicking the "n choices", copying the numbers and pasting them on Google Docs to create a chart and look at the distribution.

I also wonder if the "change" action ould do something about it... I've never understood how to use it in the facet.

Thanks

Andrea

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Antonin Delpeuch (lists)

unread,
Feb 24, 2018, 3:31:08 AM2/24/18
to openr...@googlegroups.com
Hi Andrea,

You can use the "facetCount" GREL function:
https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#facetcountchoicevalue-string-facetexpression-string-columnname

This will let you create a new column that contains the number of
occurences of each value. You can then facet again on that column.

Hope that helps,

Antonin
> send an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.

Andrea Zanni

unread,
Feb 24, 2018, 3:53:38 AM2/24/18
to openr...@googlegroups.com
This is exactly what I needed, thanks Antonin.

Andrea


>     For more options, visit https://groups.google.com/d/optout
>     <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.

Owen Stephens

unread,
Feb 25, 2018, 9:03:59 AM2/25/18
to OpenRefine
Note as well that if you use the same GREL expression but via "Facet->Custom Numeric Facet" you'll get those counts in a Numeric facet.

This ability to write custom facet expressions is what is also hidden behind the 'Change' button on the facet which you noted earlier. The only thing to look out for is that to get the Numeric Facet bar-chart/histogram style display you need to work from a Numeric facet and the result has to be a number. If you work from a text facet you'll get a Custom Text Facet (also available from the Facet menu) which will list the values rather than try to plot them in a graph.

(although I'm not personally really that keen on the graph interface so I often use a custom text facet even when I'm dealing with numbers!)

Owen
>     send an email to openrefine+...@googlegroups.com
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Thad Guidry

unread,
Feb 25, 2018, 11:12:25 AM2/25/18
to openr...@googlegroups.com
Owen,

You probably don't see the usefulness of the Numeric Facet because you have smaller datasets instead of 100,000's of rows. :)
Depends on the amount of data...and why we provide a Numeric Log facet as well, for when it gets really dense.

Andrea Zanni

unread,
Feb 25, 2018, 1:04:40 PM2/25/18
to openr...@googlegroups.com
Thanks everyone,
it's amazing to discover always new things about OR.
I have another question for you: there is a GREL option/command that gives me the "X choices" number that I get in every "Text facet"?
Sometimes I want to understand how many "choices" (values) I get for each facet.


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.

Ettore Rizza

unread,
Feb 25, 2018, 1:32:01 PM2/25/18
to OpenRefine
@Andrea Would you like a GREL formula that produces this number?



I'm afraid it's difficult. This would require the formula to list all the values in a column and count the lenght of the array of uniques, but Open Refine does not have a column variable.

Le dimanche 25 février 2018 19:04:40 UTC+1, Andrea Zanni a écrit :
Thanks everyone,
it's amazing to discover always new things about OR.
I have another question for you: there is a GREL option/command that gives me the "X choices" number that I get in every "Text facet"?
Sometimes I want to understand how many "choices" (values) I get for each facet.

On Sun, Feb 25, 2018 at 5:12 PM, Thad Guidry <thadg...@gmail.com> wrote:
Owen,

You probably don't see the usefulness of the Numeric Facet because you have smaller datasets instead of 100,000's of rows. :)
Depends on the amount of data...and why we provide a Numeric Log facet as well, for when it gets really dense.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Andrea Zanni

unread,
Feb 25, 2018, 1:38:07 PM2/25/18
to openr...@googlegroups.com
Yes, exactly.
I think it would be a very good feature, because it's helpful whenever you have a Text facet in one column and want to check the value of each item *in another column*.

Andrea

To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.

Thad Guidry

unread,
Feb 25, 2018, 6:06:07 PM2/25/18
to openr...@googlegroups.com
Andrea,

Most folks just use 2 or more Text Facets (one on each column) to filter and see values and their counts.

Andrea Zanni

unread,
Feb 26, 2018, 8:45:42 AM2/26/18
to openr...@googlegroups.com
Yes, I use multiple text facets all the time.
But if I have >1000 value on column A, and I want to see how many values this has on column B, I have to check manually all the column A values.
A function could be nice, in these cases.


Thad Guidry

unread,
Feb 26, 2018, 10:30:44 AM2/26/18
to openr...@googlegroups.com
Well, :) if you have 1000 unique values, then you have 1000 unique values and you will have to somehow check each one of the 1000.  But do you really ?  Perhaps you could perform some grouping on those 1000 unique values, such as finding prefixes, suffixes, contains, patterns, etc.  I don't know your data or the patterns that fit on your column A.  You'll have to discover those yourself.  My point is that you can use many Facets to perform that discovery of patterns on your column A.

Australian mother of pearl 
silver boxed necklace
gold ingot ring
antique turquoise fur-lined jewelry box
hat, wool, knit

Above, I would actually perform a NER processing or even just create a Facet and later add that as a new column with the following GREL that only shows the last 2 words from each value, to start with.  Later I might create a facet with the first 2 words, and add a column for only those 2 words, to help me later build some record rows from that, so I don't lose my discovered patterns or subset values.

value.rpartition(" ")[-1]

Perhaps the first 2 words usually contain a material or color...so I'd add a new column for that

value.parition(" ")[2]

Hopefully this gives you some insight on how I and others try to reduce the workload when you have many values to compare...by USING PATTERNS and trying to get the values into some groups or subsets of patterns through whitespace partitioning, NER processing with extensions, etc.

Andrea Zanni

unread,
Feb 26, 2018, 12:20:12 PM2/26/18
to openr...@googlegroups.com
Thanks Thad for the insight.
It's not useful for my current work¹, but it is in general.
I will study this again soon ;-)


¹ I usually work with bibliographic metadata, so I want for example to know in how many different publishers an author has been published.
'Author' is my column A, 'Publisher' is my column B. There is no pattern here, just checking one by one.

--

Ettore Rizza

unread,
Feb 26, 2018, 1:07:07 PM2/26/18
to OpenRefine

I usually work with bibliographic metadata, so I want for example to know in how many different publishers an author has been published. 
'Author' is my column A, 'Publisher' is my column B. There is no pattern here, just checking one by one. 

Open Refine is not the better tool when it comes to calculate stuff spread across many columns. This kind of thing is much easier to do with a pivot table in a spreadsheet. 

That said, for simple calculations, I do not bother to export my project. There is often a workaround to get the answer in Open Refine. For your problem of authors and their number of different publishers, I think I would do like this.



It's more complicated than in Excel, but it's faster (it takes at least two or three minutes to export in xls, open Excel, perform the pivot table, possibly reimport this table in Refine...).
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Andrea Zanni

unread,
Feb 26, 2018, 1:16:49 PM2/26/18
to openr...@googlegroups.com
Thanks Ettore!

Andrea


To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages