umlauts utf-8 problems with sequel pro

1,591 views
Skip to first unread message

Thorsten Rock

unread,
Apr 18, 2011, 10:34:34 AM4/18/11
to Sequel Pro
Hi,
i'm having a problem displaying umlauts in sequel pro.
my database is set to utf-8 throughout the board, and so is the ruby
1.9 i'm inserting the utf-8 strings with,
when i read them out through commandline i get the correct umlauts in
the strings, but when i look
within sequel pro, the umlauts are broken.

Broken as in:
Gekreuzte Möhrchen
instead of
Gekreuzte Möhrchen

i've looked for a solution on this for a while now, but no success.
any help will be greatly appreciated.

T

Rowan Beentje

unread,
Apr 18, 2011, 10:38:30 AM4/18/11
to seque...@googlegroups.com
Hi Thorsten,

Ah... this sounds like somewhere along the line the data is getting transferred as Latin-1, I'm afraid. Just to confirm that, could you view the table, then go to the Database menu > View Using Encoding > UTF-8 Unicode via Latin 1?

Let me know and I'll start digging through the code, or explain as appropriate :)

Cheers,
Rowan

> --
> You received this message because you are subscribed to the Google Groups "Sequel Pro" group.
> To post to this group, send email to seque...@googlegroups.com.
> To unsubscribe from this group, send email to sequel-pro+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sequel-pro?hl=en.
>

thorsten rock

unread,
Apr 18, 2011, 10:51:43 AM4/18/11
to seque...@googlegroups.com
Ahhh.
Thanks for the quick reply Rowan,

Yes, that absolutely did the trick!
So it really is just a display think in sequel pro yes?

Thank you so much :)

cool

Thorsten

Rowan Beentje

unread,
Apr 18, 2011, 11:03:09 AM4/18/11
to seque...@googlegroups.com, thorsten rock
Thorsten,

Ah - it's not quite a display thing in Sequel Pro then I'm afraid, that specific encoding is just a compatibility view to work with slightly incorrect setups I'm afraid...

Basically, your characters in your Ruby app are UTF8, and the tables in the database are UTF8, but each UTF8 character stored in those tables has at some point been transferred across a Latin1 encoding. Here's an attempt to explain this from a previous time this has happened:

> ... this means that the tables may be set to UTF8, and the
> programs you've used so far to read or write data may use UTF8, but the
> connection between the program and the server has never been set to UTF8 -
> it therefore stays at the server default, usually Latin1. When your
> program writes the data to the server, this means it sends "é" in UTF8
> (C3A9 in hex). The connection however believes this is default (Latin1)
> text, and splits the unicode multibyte character into the default Latin1
> single-byte characters - "é" (C3 and A9 in hex). The table then stores
> these characters as UTF8 data (in hex, C383 and C2A9 !).
>
> When reading the information again using the same setup, the same
> transformation happens backwards, so you get "é" out again, as intended.
> But a program using UTF8 in full reads out the actual table contents - the
> utf8-stored é.
>
> Sequel Pro can emulate this setup using the UTF8-via-Latin1 encoding
> selection. However if you want to store the data as true UTF8, and never
> run through the encoding change, you'll need to correctly set up each
> connection. When trying this on the command line, use a "SET NAMES
> 'utf8';" after connection - this sets the client into UTF8 mode. If
> you're using a web application to perform a connection, there's usually an
> equivalent. For example, in PHP, after a mysql_connect you can run either
> mysql_set_charset('utf8') or perform a "SET NAMES" query manually.

So your command-line test probably also never "SET NAMES 'utf8'" either, meaning it was Latin1 MySQL in a UTF8 terminal. The best thing would be to look at the MySQL Gem you're using and tell it how to set the connection to UTF8... you may then have to convert the data already in your app if you want it all to be "correct" UTF8.

If none of this makes sense, let me know :)
Rowan

thorsten rock

unread,
Apr 18, 2011, 11:19:33 AM4/18/11
to seque...@googlegroups.com, Rowan Beentje
Ahhh, yes that makes perfect sense.

Geez. look at all these places one needs to set the charset for.
I found it, and now it works perfectly.
Thankfully i'm at the very beginning of my project and don't have any data to be converted yet :)

Thank you so much for your help!

Thorsten

Rowan Beentje

unread,
Apr 18, 2011, 11:38:26 AM4/18/11
to seque...@googlegroups.com
Glad we could help! Thanks for updating us - at some point this will be added to a FAQ ;)
Reply all
Reply to author
Forward
0 new messages