regexp: invalid UTF-8

1,437 views
Skip to first unread message

Carlos Cobo

unread,
Dec 9, 2012, 7:50:52 AM12/9/12
to golan...@googlegroups.com
Hi there gophers,

I'm trying to remove a certain character pattern from a string.

The actual pattern is "(?i)\xa7[0-9a-fk-or]" but it doesn't work at all. I always get: error parsing regexp: invalid UTF-8: `�[0-9a-fk-or]`

That "\xa7" thingy is a "§" charactercalled Section sign.
It seems regexp doesn't translate my character

I tried using:
  • "(?i)\u00a7[0-9a-fk-or]"\u00a7 instead of \xa7
  • "(?i)\uc2a7[0-9a-fk-or]"'§' == '\u00a7'
  • `(?i)\\u00a7[0-9a-fk-or]`, using \u00a7 fails (invalid escape sequence: `\u`)
  • And a bunch more hoping something magically will work but none of them did actually work.

Here's the code:

Thanks,
Carlos

Paul Hankin

unread,
Dec 9, 2012, 8:30:59 AM12/9/12
to golan...@googlegroups.com
\x inserts bytes into your string, whereas you want to insert the UTF-8 encoding. http://golang.org/ref/spec#String_literals

Using \u00a7 instead of \xa7 works. See: http://play.golang.org/p/Sq_C6qAUyq

-- 
Paul

Carlos Cobo

unread,
Dec 9, 2012, 8:38:23 AM12/9/12
to golan...@googlegroups.com
Yeah I figured that out when I tried translating the Section sign to []byte and to []rune.

The solution you suggest doesn't remove the characters but at least doesn't complaing about invalid UTF-8.

Copypasted from your code snippet, last 2 lines:
Before: "\xa7e--------- \xa7fHelp:"...
After: "\xa7e--------- \xa7fHelp:"...

It should be:
Before: "\xa7e--------- \xa7fHelp:"...
After: "--------- Help:"...

Matt Harden

unread,
Dec 9, 2012, 11:26:51 AM12/9/12
to golan...@googlegroups.com
Your test string is also invalid UTF-8. You should use \u00a7 instead of \xa7 in all strings.

Carlos Cobo

unread,
Dec 9, 2012, 11:51:20 AM12/9/12
to golan...@googlegroups.com
Second time I repeat.
I tried with both "\xa7" and "\u00a7". None of them work.

It seems the service producing this messages doesn't give a **** about UTF-8 so I'll have to do 2 passes. First to correct bytes, then to remove them.

Peter

unread,
Dec 9, 2012, 12:23:40 PM12/9/12
to golan...@googlegroups.com
Read up on string literals: http://golang.org/ref/spec#String_literals

There's some subtlety involved, but once you understand it you'll see it's quite consistent.

Have a look at http://play.golang.org/p/_rrMfmDKZh to make sure you can tell what's going on.

Hope this helps.
Reply all
Reply to author
Forward
0 new messages