Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

jdbc mysql: encoding utf-8 to latin1?

1,163 views
Skip to first unread message

Alexander Burger

unread,
Apr 30, 2010, 8:02:04 PM4/30/10
to
Hello,

I'm using Java with jdbc and mysql.
mysql has charset latin1 in nearly all cases.
that can't be changed because still other software has connect to that
database.

java is using utf-8, standard.

if I write now to database via jdbc, I get wrong chars for the umlauts.

What I read about that problem, jdbc should manage that by it own.

Anybody with help?
thanks.

regards
Alex

John B. Matthews

unread,
May 1, 2010, 1:12:55 AM5/1/10
to
In article <hrfr1s$sa2$1...@online.de>,
Alexander Burger <alexande...@yahoo.de> wrote:

> I'm using Java with jdbc and mysql. mysql has charset latin1 in
> nearly all cases. that can't be changed because still other software
> has connect to that database.

I'm not a MySQL user, but it looks like they equate latin1 and CP1252:

| latin1 | cp1252 West European | latin1_swedish_ci

<http://dev.mysql.com/doc/refman/5.5/en/charset-mysql.html>

> java is using utf-8, standard.

On my platform, Charset.defaultCharset().name() returns "UTF-8", but I
think that "typically depends upon the locale and charset of the
underlying operating system." You might need to verify yours.

<http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html>

> if I write now to database via jdbc, I get wrong chars for the
> umlauts.

It looks like all but LATIN CAPITAL LETTER Y WITH DIAERESIS have an
ISO-8859-1 encoded equivalent [see code, below]:

<http://www.fileformat.info/info/unicode/char/0178/index.htm>

> What I read about that problem, jdbc should manage that by it own.

If you can't change the server's default character set, it looks like
you can use the JDBC URL: "To override the automatically detected
encoding on the client side, use the characterEncoding property in the
URL used to connect to the server."

<http://dev.mysql.com/doc/refman/5.5/en/connector-j-reference-charsets.html>

<code>
public static void main(String[] args) throws Exception {
System.out.println(Charset.defaultCharset().name());
String s = "ÄËÏÖÜŸäëïöüÿ";
String t = "\u00C4\u00CB\u00CF\u00D6\u00DC\u0178"
+ "\u00E4\u00EB\u00EF\u00F6\u00FC\u00FF";
System.out.println("s.equals(t) is " + s.equals(t));
Charset latin1 = Charset.forName("ISO-8859-1");
byte[] b = s.getBytes(latin1);
for (int i = 0; i < b.length; i++) {
System.out.printf("%X ", b[i]);
}
System.out.println();
System.out.println(s);
System.out.println(new String(b, latin1));
}
</code>

<console>
UTF-8
s.equals(t) is true
C4 CB CF D6 DC 3F E4 EB EF F6 FC FF
ÄËÏÖÜŸäëïöüÿ
ÄËÏÖÜ?äëïöüÿ
</console>

--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>

Alexander Burger

unread,
May 2, 2010, 7:28:02 PM5/2/10
to
John B. Matthews wrote:


thank you for your answer.


>> java is using utf-8, standard.
>
> On my platform, Charset.defaultCharset().name() returns "UTF-8", but I
> think that "typically depends upon the locale and charset of the
> underlying operating system." You might need to verify yours.

well, yes, I got Windows1252. Now I changed.
Now I get UTF-8.

>> What I read about that problem, jdbc should manage that by it own.
>
> If you can't change the server's default character set, it looks like
> you can use the JDBC URL: "To override the automatically detected
> encoding on the client side, use the characterEncoding property in the
> URL used to connect to the server."
>
>
<http://dev.mysql.com/doc/refman/5.5/en/connector-j-reference-charsets.html>

well, so I used characterEncoding=Cp1252
also characterEncoding=latin1

but in database-table, there is no change at all.
Anytime I get for the same umlauts the same wrong chars.
So it seems to me there is no influence, if I'm using characterEncoding or
not.
I checked it with Debugger: the Drivermanger accepted the value and I can
find it in Drivermanager. But no influence in database-table at all.

any idea?

thank you

regards
Alex

John B. Matthews

unread,
May 2, 2010, 9:42:54 PM5/2/10
to
In article <hrl1q1$njb$1...@online.de>,
Alexander Burger <alexande...@yahoo.de> wrote:

> John B. Matthews wrote:
> [...]


> >> java is using utf-8, standard.
> >
> > On my platform, Charset.defaultCharset().name() returns "UTF-8",
> > but I think that "typically depends upon the locale and charset of
> > the underlying operating system." You might need to verify yours.
>
> well, yes, I got Windows1252. Now I changed.
> Now I get UTF-8.
>
> >> What I read about that problem, jdbc should manage that by it own.
> >
> > If you can't change the server's default character set, it looks like
> > you can use the JDBC URL: "To override the automatically detected
> > encoding on the client side, use the characterEncoding property in the
> > URL used to connect to the server."
> >
<http://dev.mysql.com/doc/refman/5.5/en/connector-j-reference-charsets.html>
>
> well, so I used characterEncoding=Cp1252

As you changed your client to URF-8, I'd have thought to use UTF-8 in
the URL. A comment in the reference cited above suggests this:

jdbc:mysql://localhost/some_db?useUnicode=yes&characterEncoding=UTF-8

> also characterEncoding=latin1

"When specifying character encodings on the client side, Java-style
names should be used." Shouldn't that be "Cp1252"?

> but in database-table, there is no change at all. Anytime I get for
> the same umlauts the same wrong chars. So it seems to me there is no
> influence, if I'm using characterEncoding or not. I checked it with
> Debugger: the Drivermanger accepted the value and I can find it in
> Drivermanager. But no influence in database-table at all.

IIUC, your database is still "latin1". What happens when you set your
Windows client to "latin1" and characterEncoding=Cp1252?

> any idea?

You can always specify the encoding explicitly, as suggested in the
code example above and the second comment in the reference cited above.

Alexander Burger

unread,
May 3, 2010, 8:58:41 AM5/3/10
to
John B. Matthews wrote:


> jdbc:mysql://localhost/some_db?useUnicode=yes&characterEncoding=UTF-8
>
>> also characterEncoding=latin1
>
> "When specifying character encodings on the client side, Java-style
> names should be used." Shouldn't that be "Cp1252"?

thank you a lot for your answers.

yes, that is true. but I just gave it a try.

>> but in database-table, there is no change at all. Anytime I get for
>> the same umlauts the same wrong chars. So it seems to me there is no
>> influence, if I'm using characterEncoding or not. I checked it with
>> Debugger: the Drivermanger accepted the value and I can find it in
>> Drivermanager. But no influence in database-table at all.
>
> IIUC, your database is still "latin1". What happens when you set your
> Windows client to "latin1" and characterEncoding=Cp1252?

nothing :-(

>> any idea?
>
> You can always specify the encoding explicitly, as suggested in the
> code example above and the second comment in the reference cited above.

well I tried:

Class.forName("com.mysql.jdbc.Driver").newInstance();

if(this.dbLoginTimeOut != null){
DriverManager.setLoginTimeout(this.dbLoginTimeOut.intValue() );
}

//Properties info = new Properties();
//info.put("user", this.dbuser);
//info.put("password", this.dbpwd);
//info.put("charSet", this.strEnCodingToFromDB); //"utf-8");
//info.put("characterEncoding",this.strEnCodingToFromDB);

//conn = DriverManager. getConnection(this.dburl, info);


conn =
DriverManager.getConnection(this.dburl,this.dbuser,this.dbpwd);

stmt = conn.createStatement();

//change chars


Charset latin1 = Charset.forName("ISO-8859-1");

byte[] b = strRequest.getBytes(latin1);
System.out.println("strRequest.getBytes(latin1) : ");


for (int i = 0; i < b.length; i++) {
System.out.printf("%X ", b[i]);
}

System.out.println("new String(b, latin1) : " + new String(b,
latin1));

resultSet = stmt.executeQuery(new String(b, latin1));

is there something wrong? well I have completely no effect in database. The
system.out is correct.
I can try, what I want. The result in database is anytime the same.
For me it seems, that all, what I'm doing is ignored by jdbc or mysql.
Could there be any other configuration, which is still on top?
Have I to set any ON or OFF in any configuration?
I'm using eclipse for developing.

thank you
regards
Alex

John B. Matthews

unread,
May 3, 2010, 5:30:52 PM5/3/10
to
In article <hrmha1$8ri$1...@online.de>,
Alexander Burger <alexande...@yahoo.de> wrote:

Sorry, I couldn't follow your code fragment. I'm nonplussed.

> I'm using eclipse for developing.

You might verify the encoding specified in Eclipse > Preferences >
General > Workspace > Text file encoding.

Lew

unread,
May 3, 2010, 9:00:16 PM5/3/10
to
Alexander Burger wrote:
> well I tried:
>
> Class.forName("com.mysql.jdbc.Driver").newInstance();

You don't need to create an instance of the driver, not even to simply throw
it away. The driver registration occurs during the class initialization
triggered by 'forName()'.

You should indent far less aggressively for Usenet posts, a maximum of four
spaces per indent level.

> if(this.dbLoginTimeOut != null){
> DriverManager.setLoginTimeout(this.dbLoginTimeOut.intValue() );
> }
>
> //Properties info = new Properties();
> //info.put("user", this.dbuser);
> //info.put("password", this.dbpwd);
> //info.put("charSet", this.strEnCodingToFromDB); //"utf-8");
> //info.put("characterEncoding",this.strEnCodingToFromDB);
>
> //conn = DriverManager. getConnection(this.dburl, info);
>
>
> conn =
> DriverManager.getConnection(this.dburl,this.dbuser,this.dbpwd);
>
> stmt = conn.createStatement();
>
> //change chars
> Charset latin1 = Charset.forName("ISO-8859-1");
> byte[] b = strRequest.getBytes(latin1);
> System.out.println("strRequest.getBytes(latin1) : ");
> for (int i = 0; i< b.length; i++) {
> System.out.printf("%X ", b[i]);
> }
> System.out.println("new String(b, latin1) : " + new String(b,
> latin1));
>
> resultSet = stmt.executeQuery(new String(b, latin1));

The issue isn't the character set of the query but of the data. Changing the
character set of the query will have no effect.

> is there something wrong? well I have completely no effect in database. The
> system.out is correct.

Exactly so.

--
Lew

Alexander Burger

unread,
May 4, 2010, 8:19:59 PM5/4/10
to
John B. Matthews wrote:

> In article <hrmha1$8ri$1...@online.de>,
> Alexander Burger <alexande...@yahoo.de> wrote:
>
> Sorry, I couldn't follow your code fragment. I'm nonplussed.
>
>> I'm using eclipse for developing.
>
> You might verify the encoding specified in Eclipse > Preferences >
> General > Workspace > Text file encoding.
>

thanks a lot for your answers.
I found the problem.

somebody told me I should change from mysql 5.4 beta to mysql 5.1
a lot of work, but no effect :-(

so at the end I found the problem:
I'm using mysql via consol on Windows XP.
And MS is using in the consol the charset MS-DOS 850 .
Crazy. Not Windows-1252, what would be near to latin1 or ISO-8859-1 .
Oh I love MS .
So the problem was, when I inserted values. everything looks well in the
table, but not in the browser. If I inserted values via browser to
database, the consol gave me wrong chars.
Now I'm using mysql tools to display the content of the tables, not longer
MS-consol.

thank you all for help
regards
Alex

p.s.: normally I'm working with linux, but not for that project. I didn't
expect that.

Alexander Burger

unread,
May 4, 2010, 8:23:33 PM5/4/10
to
Lew wrote:

> Alexander Burger wrote:
>> well I tried:
>>
>> Class.forName("com.mysql.jdbc.Driver").newInstance();
>
> You don't need to create an instance of the driver, not even to simply
> throw
> it away. The driver registration occurs during the class initialization
> triggered by 'forName()'.

ok, thank you. I will try it.

ok, I found the problem. Look my answer in thread.

thank you
regards
Alex

John B. Matthews

unread,
May 4, 2010, 9:02:18 PM5/4/10
to
In article <hrqdjf$fhn$1...@online.de>,
Alexander Burger <alexande...@yahoo.de> wrote:

> John B. Matthews wrote:
>
> > In article <hrmha1$8ri$1...@online.de>,
> > Alexander Burger <alexande...@yahoo.de> wrote:
> >
> > Sorry, I couldn't follow your code fragment. I'm nonplussed.
> >
> >> I'm using eclipse for developing.
> >
> > You might verify the encoding specified in Eclipse > Preferences >
> > General > Workspace > Text file encoding.
> >
>
> thanks a lot for your answers.
> I found the problem.
>
> somebody told me I should change from mysql 5.4 beta to mysql 5.1
> a lot of work, but no effect :-(
>
> so at the end I found the problem: I'm using mysql via consol on
> Windows XP. And MS is using in the consol the charset MS-DOS 850 .
> Crazy. Not Windows-1252, what would be near to latin1 or ISO-8859-1 .
> Oh I love MS .
> So the problem was, when I inserted values. everything looks well in
> the table, but not in the browser. If I inserted values via browser
> to database, the consol gave me wrong chars. Now I'm using mysql
> tools to display the content of the tables, not longer MS-consol.

I'm glad you've made progress, and I appreciate your followup.

Regarding Eclipse preferences, mentioned above, I see that one can
change the default encoding for each project, which may offer some
convenience. A similar feature is available in NetBeans.

> p.s.: normally I'm working with linux, but not for that project. I didn't
> expect that.

--

Lew

unread,
May 4, 2010, 10:58:11 PM5/4/10
to
John B. Matthews wrote:
> Regarding Eclipse preferences, mentioned above, I see that one can
> change the default encoding for each project, which may offer some
> convenience. A similar feature is available in NetBeans.

Eclipse and offspring also let you specify the encoding for specific artifacts
within a project, if they don't match the default.

--
Lew

0 new messages