Unicode - Numeric Values

162 views
Skip to first unread message

John Souvestre

unread,
Apr 20, 2015, 12:39:54 PM4/20/15
to golan...@googlegroups.com

I’m trying to get the numeric values (value property) for some non-ASCII code points which  are decimal digits (as per Unicode.IsDigit).  I believe that in Java you would use getNumericValue, and in .Net GetNumericValue, for example.  How can I do this in Go?

 

Thanks,

 

John

    John Souvestre - New Orleans LA

 

Konstantin Khomoutov

unread,
Apr 20, 2015, 1:17:42 PM4/20/15
to John Souvestre, golan...@googlegroups.com
On Mon, 20 Apr 2015 11:39:37 -0500
"John Souvestre" <jo...@sstar.com> wrote:

> I'm trying to get the numeric values (value property) for some
> non-ASCII code points which are decimal digits (as per
> Unicode.IsDigit). I believe that in Java you would use
> getNumericValue, and in .Net GetNumericValue, for example. How can I
> do this in Go?
> Thanks,

Go uses type `rune` to represent individual Unicode code points not
encoded using some specific encoding (like, say, UTF-8).

`rune` is defined to be `int32`, so just type-convert your rune value to
`int32` to get its "value property".

Konstantin Khomoutov

unread,
Apr 20, 2015, 1:20:52 PM4/20/15
to John Souvestre, golan...@googlegroups.com
On Mon, 20 Apr 2015 11:39:37 -0500
"John Souvestre" <jo...@sstar.com> wrote:

> I'm trying to get the numeric values (value property) for some
> non-ASCII code points which are decimal digits (as per
> Unicode.IsDigit). I believe that in Java you would use
> getNumericValue, and in .Net GetNumericValue, for example. How can I
> do this in Go?

To demonstrate how runes work: http://play.golang.org/p/fpIsUD4TDx

John Souvestre

unread,
Apr 20, 2015, 1:39:48 PM4/20/15
to Konstantin Khomoutov, golan...@googlegroups.com
> `rune` is defined to be `int32`, so just type-convert your rune value to
`int32` to get its "value property".

I believe that you would just get the of the code point, not the value which
it carries. In other words, for an ASCII "0", /u0030, you would get 0x0030.
But the "numeric value" for this "decimal digit" is 0.

Likewise, a /u0660 (from one of the other 22 sets of decimal digits) should
result in 0.

Jason Gade

unread,
Apr 20, 2015, 1:55:09 PM4/20/15
to golan...@googlegroups.com, flat...@users.sourceforge.net
I don't think that's what he's looking for. http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#getNumericValue(char)

If someone gives him a Chinse or Tamil or Roman numeral in Unicode, he wants to decode the value of the numeral. If I understand his question correctly.

And no, in a quick search I can't quite find how to do it. I think that strconv only handles Arabic number system.

Doug Henderson

unread,
Apr 20, 2015, 2:10:37 PM4/20/15
to golang-nuts
Hi,

Go does not seem to have the full unicode data that we find in the python unicodedata module.

You may have to scan each individual rune (or []byte containing the utf-8). IIRC, Unicode.IsDigit checks for the Nd category only. I suspect that other numeric categories will not be accepted in numbers.

Doug


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Doug Henderson, Calgary, Alberta, Canada

John Souvestre

unread,
Apr 20, 2015, 2:19:00 PM4/20/15
to Doug Henderson, golang-nuts

Ø  IIRC, Unicode.IsDigit checks for the Nd category only. I suspect that other numeric categories will not be accepted in numbers.

Yes, Unicode.IsDigit does identify decimal digits for all of the 23 languages they exist in.  These are the only ones I’m interested in.  I’m not concerned with the non-decimal numbers.

Andy Balholm

unread,
Apr 20, 2015, 2:50:47 PM4/20/15
to John Souvestre, Doug Henderson, golang-nuts
That sounds like something that would belong in the unicode package, but apparently no one has implemented it yet.

John Souvestre

unread,
Apr 20, 2015, 3:14:08 PM4/20/15
to Andy Balholm, Doug Henderson, golang-nuts
> That sounds like something that would belong in the unicode package, but
apparently no one has implemented it yet.

Yes, that would be nice! Meanwhile, I think I'll make a map and load the 230
digits into it.

Hmmm... I wonder if golang/text/collate might have a routine to do it?

Nigel Tao

unread,
Apr 22, 2015, 12:29:52 AM4/22/15
to John Souvestre, Marcel van Lohuizen, Andy Balholm, Doug Henderson, golang-nuts
On Tue, Apr 21, 2015 at 5:13 AM, John Souvestre <jo...@sstar.com> wrote:
> Hmmm... I wonder if golang/text/collate might have a routine to do it?

I ask mpvl all my Go i18n questions.
Reply all
Reply to author
Forward
0 new messages