Hex to character/rune

2,444 views
Skip to first unread message

Martin Gallagher

unread,
Jan 11, 2013, 1:03:57 PM1/11/13
to golan...@googlegroups.com
Parsing the Unicode data file I'm left with hex values for characters, for example "0061" is lower case "A".

The problem is I need to transform "0061" to "A" via Go so that I can perform some case and accent folding on characters.

fmt.Println("\u0061") behaves as expected, I assumed I might get away with:

h := "0061"
fmt.Println(`\u` + h)

But the result of this is "\u0061".

Scott Lawrence

unread,
Jan 11, 2013, 1:09:56 PM1/11/13
to Martin Gallagher, golan...@googlegroups.com
That's because "\u0061" is transformed to "a" (it's lowercase) at compile-time
by the parser - it's not a feature of Println.

hex.DecodeString() from encoding/hex should accomplish most of what you want:

str, err := hex.DecodeString("6566")
if err != nil {
panic(err)
}
fmt.Printf("%s\n", string(str)) // ef

>
> --
>
>
>

--
Scott Lawrence

go version go1.0.3
Linux baidar 3.6.11-1-ARCH #1 SMP PREEMPT Tue Dec 18 08:57:15 CET 2012 x86_64 GNU/Linux

minux

unread,
Jan 11, 2013, 1:12:35 PM1/11/13
to Martin Gallagher, golan...@googlegroups.com
the conversion of "\uUUUU" happens at compile time, no at run-time.
 
There are several ways to achieve this:

Martin Gallagher

unread,
Jan 11, 2013, 1:28:00 PM1/11/13
to golan...@googlegroups.com, Martin Gallagher
hex.DecodeString("6566") worked for lower ranges, minux's example worked for all cases without any issue.

Cheers!

Kevin Gillette

unread,
Jan 13, 2013, 3:59:56 AM1/13/13
to golan...@googlegroups.com, Martin Gallagher
If by "lower ranges" you mean ascii, then realize that if the input is utf-8 encoded, you can use the hex.Decode with a destination []byte, and then convert that into a string (or use bytes and unicode packages to deal with the data without needing to make an extra copy in memory). If the entire input can be accepted by hex.Decode without preprocessing on your part, using hex instead of strconv will also likely be faster.

Hraban Luyat

unread,
Jan 13, 2013, 10:36:29 AM1/13/13
to golan...@googlegroups.com
You might find use for:


Maybe (hopefully) it helps.

Greetings,

Hraban

roger peppe

unread,
Jan 13, 2013, 12:25:15 PM1/13/13
to minux, Martin Gallagher, golang-nuts
just for completeness (you wouldn't want to actually do it
this way, but it's closest to the original poster's solution):

func hexToString(h string) (string, error) {
return strconv.Unquote(`"\u` + h + `"`)
> --
>
>

Martin Gallagher

unread,
Jan 13, 2013, 3:58:49 PM1/13/13
to golan...@googlegroups.com, minux, Martin Gallagher
Performance isn't an issue, but I'm a bit of a premature optimiser anyway... so knowing the fastest solution would let me sleep easy!

Regarding what I'm trying to achieve - it's automatic creation of full Unicode support for the Sphinx full text search engine, basically mapping and normalising characters and creating charset tables for supplied Unicode block ranges.

I've just pushed the code if anyone's interested (sorry, it's in a rough state at the moment): https://github.com/Mutatio/sphinx-character-map/blob/master/characterMap.go

CJK is still pending (but support is easy enough to add), it's the normalisation of accented Latin / Greek / Other scripts that I want to master first. Also all other command-line features are missing, it's purely at the prototyping stage.

Cheers,
- Martin
Reply all
Reply to author
Forward
0 new messages