Do we need an easier GREL Number formatter?

12 views
Skip to first unread message

Thad Guidry

unread,
Oct 21, 2022, 4:19:15 AM10/21/22
to openref...@googlegroups.com
Reviewing Issue #4006 I poked around on the internet and found a similar use case although opposing somewhat.

I'd like to point out that Trifacta Data Wrangler has support for also the opposite.  $1,234.56 to $1234.56  (removing commas etc. for easier number formatting in tools after export, like Excel or other analytical tools that can handle Math and Financials much better than OpenRefine):

> Richer Number Formatting

>You can now change number strings with rich formats such as $1,234.56 to $1234.56. This is especially useful if you are working with financial data, data from global regions, or manually entered data. With this capability, you can input all Excel and Google Sheet formats into NUMFORMAT, and use NUMBERVALUE to go from a column with a particular format to a number (float or integer). 

But it seems we also have support now through something like
"1234.56".toString("$ %(,.2f")


But if they are starting with all Strings in their column, and want to use a single menu or GREL function, we don't have it I think?  And it becomes super tedious for users with multiple faceting and transforms needed.
So I think we should also have a new GREL function like `.numValue()` to make this less tedious as they do: https://docs.trifacta.com/display/DP/NUMVALUE+Function

Shall I open a new issue for a new GREL function .numValue() or do we have something almost suitable to avoid a new function?
Thoughts?

ow...@ostephens.com

unread,
Oct 21, 2022, 4:42:03 AM10/21/22
to OpenRefine Development
Just checking my understanding of what you are suggesting. Are you proposing a short cut for the current:

value.toNumber().toString(format)

That's not how I read the Trifacta documentation (my interpretation of their documentation is that NUMVALUE is similar to our toNumber() and NUMFORMAT is similar to our toString(format) - which seems like exactly the same situation we have right now - you have to first convert something to a Number before you can then represent it as a string with particular formatting

Note that I'm in favour of easier number formatting and certainly in favour of support for formatting as per a specific locale or other common requirement like currency... but I'm not quite sure what you are proposing right now.

Owen

Just as an aside, the way I'd read it https://github.com/OpenRefine/OpenRefine/issues/4006 was only asking for the UI furniture (i.e. not the data) to observe locale based formatting - it would probably good to clarify exactly the scope of #4006 on the issue. For me, I would think we'd want to keep the display and formatting of data (and how we support that with GREL) separate to general display aspects for numbers appearing outside the data grid in the UI.

Thad Guidry

unread,
Oct 21, 2022, 9:27:07 AM10/21/22
to openref...@googlegroups.com
Thanks for responding Owen,
It's about locale based formatting.

$1,234.56
$1,000,009.99
($2,999.00)

where a user would want to quickly perform some transformation steps (but not in steps but rather a single operation):
1. remove the commas
2. apply parentheses around the number for negative amounts

Number formatting for globalization is found in many other applications through customized locale support.
The export from those applications might typically look like this:

$1234.56
$1000009.99
($2999.00)

The above works in the financial world most of the time, but some users might want to customize for easier Analytic tools and DB's such as:

-2999.00

I think having smart defaults that can be overridden and easily customized makes sense from within OpenRefine itself, rather than altering the system locale that OpenRefine runs on.
Users have messy data from around the world, and I'm seeing and hearing about our shortcomings more and more through Wikidata, Commons, and financial groups concerning Numbers and Currencies.

I just thought I'd bring it up for discussion on how we might make this much easier for folks to:
1. not have to change their system locale to work with data coming from another locale source.
2. improve or add a dialog for Number and Currency customization similar to Regional and Language dialogs found natively in OS's that GREL functions (or new functions) and menus could take advantage of users session preferences with locale handling.

Tom Morris

unread,
Oct 21, 2022, 3:57:54 PM10/21/22
to openref...@googlegroups.com
On Fri, Oct 21, 2022 at 9:27 AM Thad Guidry <thadg...@gmail.com> wrote:

It's about locale based formatting.

Formatting or parsing? The examples below sound like parsing (ie toNumber()), not formatting (toString()). Issue #507 (from 2011!) requested support for (U.S.) thousands separators in toNumber(). I've expanded/generalized it to address grouping and decimal separators for any locale. Java does *not* have support for accounting format negative numbers, so that would have to be coded separately if it's really needed. The formatting side (without locale support) was addressed in issue #816 (but might not be documented?).

Tom

Thad Guidry

unread,
Oct 21, 2022, 7:49:18 PM10/21/22
to openref...@googlegroups.com
Thanks Tom for adding notes to  issue #507 which I think will help.
The accounting format can be dealt with later perhaps as a GREL function enclose(“(“,”)”) or wrap() or maybe just use toString() and it’s syntax to wrap parentheses around negative numbers, I.e. numbers with a minus sign? Dunno

--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine-dev/CAE9vqEHdyJXr-sqdoWXbCHL%2BcGZYrfwF0qWcWNihAvwWK_x4iQ%40mail.gmail.com.
--

ow...@ostephens.com

unread,
Oct 23, 2022, 5:37:52 AM10/23/22
to OpenRefine Development
On Sat, Oct 22, 2022 at 3:57 AM Tom Morris <tfmo...@gmail.com> wrote:
On Fri, Oct 21, 2022 at 9:27 AM Thad Guidry <thadg...@gmail.com> wrote:

It's about locale based formatting.

Formatting or parsing? The examples below sound like parsing (ie toNumber()), not formatting (toString()). Issue #507 (from 2011!) requested support for (U.S.) thousands separators in toNumber(). I've expanded/generalized it to address grouping and decimal separators for any locale. Java does *not* have support for accounting format negative numbers, so that would have to be coded separately if it's really needed. The formatting side (without locale support) was addressed in issue #816 (but might not be documented?).

Just to confirm that the formatting of numbers to strings is documented
It sounds like what Thad is requesting here is adding the ability for the user to specify a desired locale on both formatter and parser - is that correct Thad?
If this is a correct interpretation I'd support this for both numbers and dates

I'd suggest that accounting/currency issues might be worth more discussion and thought before going ahead - my instinct is that it might be worth us having a separate GREL function for formatting a number to an accounting and/or currency display (although I'm not sure this is a particularly common use case for OpenRefine)

Owen

Thad Guidry

unread,
Oct 23, 2022, 9:40:45 AM10/23/22
to openref...@googlegroups.com
Just to confirm that the formatting of numbers to strings is documented
It sounds like what Thad is requesting here is adding the ability for the user to specify a desired locale on both formatter and parser - is that correct Thad?

Yes, I think it’s needed on both since our users often have different source and target needs.

ow...@ostephens.com

unread,
Oct 25, 2022, 4:24:35 AM10/25/22
to OpenRefine Development
Thanks for confirming Thad

So it looks like https://github.com/OpenRefine/OpenRefine/issues/507 is the issue for the parser side. Is there already an issue for the formatting side or do we need to create one?

Owen

Thad Guidry

unread,
Oct 25, 2022, 9:51:09 AM10/25/22
to openref...@googlegroups.com
I'd like you to create a new one.  Because
https://github.com/OpenRefine/OpenRefine/issues/3102
and
Antonin's frontend localization issue https://github.com/OpenRefine/OpenRefine/issues/4006
are for aligning rendering, which we totally need to do also now that backend localization is pretty much done by Elroy from https://github.com/OpenRefine/OpenRefine/issues/2443.
But we need toNumber() ability for formatting side to be able to specify a locale.  (or toString() which I hate it's syntax and the reason for opening this thread to begin with, we need something easier to specify formatting a number by just giving a locale)

Thad Guidry

unread,
Nov 4, 2022, 1:43:58 AM11/4/22
to openref...@googlegroups.com
Tom and Owen,

It looks like toString() enhancement will not work after I did some research?  It seems Java's util.Formatter `toString()` doesn't support newer locale conventions found elsewhere in the JDK that are needed with a GREL number format enhancement.
So, I documented that while taking a stab at creating the new issue.
Reply all
Reply to author
Forward
0 new messages