Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

NLS_LANG

175 views
Skip to first unread message

"Álvaro G. Vicario"

unread,
Jul 23, 2008, 8:29:20 AM7/23/08
to
I'm setting up Oracle libraries in a Windows Server box so two PHP web
sites can access a remote Oracle 10g database. I've downloaded the
Instant Client *.zip package and everything works fine so far, except
for some charset issues (Spanish letters do not show properly). Setting
NLS_LANG environmental variable seems to be the clue but I don't know
which value I should use. Given that data is in Spanish I have three
likely candidates:

SPANISH_SPAIN.WE8MSWIN1252 (the Windows codepage)
SPANISH_SPAIN.WE8ISO8859P1 (ISO-Latin-1, used in first site)
SPANISH_SPAIN.WE8ISO8859P15 (ISO-Latin-9, used in second site)

From the "Oracle Database Globalization Support Guide" I understand
that NLS_LANG makes Oracle perform a charset conversion. Since both
sites use different charsets and I can convert in my PHP app when
necessary, I think it'd be a good idea that Oracle does not make any
conversion. But I don't know how to find out the charset that database
tables are using natively. How could I find out? Does all this make any
sense?


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor al baño María: http://www.demogracia.com
--

sybr...@hccnet.nl

unread,
Jul 23, 2008, 11:11:41 AM7/23/08
to

You could find out by querying NLS_DATABASE_PARAMETERS.
However, for Windows the default is MSWIN1252.
WE8ISO8859P1 does not contain the euro sign.

--

Sybrand Bakker
Senior Oracle DBA

"Álvaro G. Vicario"

unread,
Jul 24, 2008, 2:41:40 AM7/24/08
to
sybr...@hccnet.nl escribió:

>> SPANISH_SPAIN.WE8MSWIN1252 (the Windows codepage)
>> SPANISH_SPAIN.WE8ISO8859P1 (ISO-Latin-1, used in first site)
>> SPANISH_SPAIN.WE8ISO8859P15 (ISO-Latin-9, used in second site)
>>
>> From the "Oracle Database Globalization Support Guide" I understand
>> that NLS_LANG makes Oracle perform a charset conversion. Since both
>> sites use different charsets and I can convert in my PHP app when
>> necessary, I think it'd be a good idea that Oracle does not make any
>> conversion. But I don't know how to find out the charset that database
>> tables are using natively. How could I find out? Does all this make any
>> sense?
>
> You could find out by querying NLS_DATABASE_PARAMETERS.

A query to that table shows that NLS_CHARACTERSET is WE8ISO8859P1 so
I'll work on that base.

> However, for Windows the default is MSWIN1252.
> WE8ISO8859P1 does not contain the euro sign.

You are right (I've seen apps that use the currency symbol instead).
Luckily it's not really an issue in web sites since you can always use
the € entity--I just need to know what charset the data is actually
using.

Thank you very much.


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor al baño María: http://www.demogracia.com

--

Laurenz Albe

unread,
Jul 24, 2008, 5:14:40 AM7/24/08
to
"Álvaro G. Vicario" <alvaroNOS...@demogracia.com> wrote:
>>> SPANISH_SPAIN.WE8MSWIN1252 (the Windows codepage)
>>> SPANISH_SPAIN.WE8ISO8859P1 (ISO-Latin-1, used in first site)
>>> SPANISH_SPAIN.WE8ISO8859P15 (ISO-Latin-9, used in second site)
>>>
>>> From the "Oracle Database Globalization Support Guide" I understand
>>> that NLS_LANG makes Oracle perform a charset conversion. Since both
>>> sites use different charsets and I can convert in my PHP app when
>>> necessary, I think it'd be a good idea that Oracle does not make any
>>> conversion. But I don't know how to find out the charset that database
>>> tables are using natively. How could I find out? Does all this make any
>>> sense?
>>
>> You could find out by querying NLS_DATABASE_PARAMETERS.
>
> A query to that table shows that NLS_CHARACTERSET is WE8ISO8859P1 so
> I'll work on that base.
>
>> However, for Windows the default is MSWIN1252.
>> WE8ISO8859P1 does not contain the euro sign.
>
> You are right (I've seen apps that use the currency symbol instead).
> Luckily it's not really an issue in web sites since you can always use
> the &euro; entity--I just need to know what charset the data is actually
> using.

If you really plan to use WE8ISO8859P1 on the Windows client, please write
your application so that it makes dead sure that only LATIN-1 characters
are entered into the database.

Oracle has a special "feature" in that it will not check your input for
validity if client and server character sets are the same.
This means that your client application can enter arbitrary junk into the
database that will be happily stored as is.

Let me give you an example.
If the PHP application decides to store an Euro sign in the database and
it happens to run in the Windows codepage, it will try to store a byte 0x80.
Oracle will happily store that byte, even though this byte does not make
any sense in LATIN-1. You will not notice that until you try to access the
database with a different client character set, e.g. from Java.
Then Oracle will return garbage instead of the Euro signs, and there is
no way to fix that.

But even if you plan to use a different character set on the Windows client
(which I personally would do, namely WE8MSWIN1252), you will not get error
messages if invalid characters are stored. Oracle will check the characters,
but if it detects - say - a Euro sign, it will not throw an error but
clandestinely convert the character to a question mark.


The best solution in my opinion would be to recreate the database using the
character set AL32UTF8 and on the client either use WE8MSWIN1252 or
AL32UTF8 (if your PHP code speaks UTF-8).

Yours,
Laurenz Albe

"Álvaro G. Vicario"

unread,
Jul 24, 2008, 8:10:37 AM7/24/08
to
Laurenz Albe escribió:

> If you really plan to use WE8ISO8859P1 on the Windows client, please write
> your application so that it makes dead sure that only LATIN-1 characters
> are entered into the database.
>
> Oracle has a special "feature" in that it will not check your input for
> validity if client and server character sets are the same.
> This means that your client application can enter arbitrary junk into the
> database that will be happily stored as is.

Oh my! That's scary...

> But even if you plan to use a different character set on the Windows client
> (which I personally would do, namely WE8MSWIN1252), you will not get error
> messages if invalid characters are stored. Oracle will check the characters,
> but if it detects - say - a Euro sign, it will not throw an error but
> clandestinely convert the character to a question mark.
>
>
> The best solution in my opinion would be to recreate the database using the
> character set AL32UTF8 and on the client either use WE8MSWIN1252 or
> AL32UTF8 (if your PHP code speaks UTF-8).

Well, the database server is quite out of my reach. The machine belongs
to a customer, the main database has been growing for years and there're
many other apps working on top of it. I couldn't even get any
information about charsets from their DB admin.

Is there any way to get an unmodified output from a certain table field
that I know contains (or it's supposed to contain) an Euro symbol so I
can check with a binary editor what numeric code it's actually using?

sybr...@hccnet.nl

unread,
Jul 24, 2008, 1:21:34 PM7/24/08
to
On Thu, 24 Jul 2008 14:10:37 +0200, "Álvaro G. Vicario"
<alvaroNOS...@demogracia.com> wrote:

>Is there any way to get an unmodified output from a certain table field
>that I know contains (or it's supposed to contain) an Euro symbol so I
>can check with a binary editor what numeric code it's actually using?

Earlier in this thread you specified the characterset of the database
is WE8ISO8859P1.
This characterset doesn't have the euro.

The P15 and MSWIN1252 have the euro at either 128 or 160.
However, there are other differences between those two charactersets.

"Álvaro G. Vicario"

unread,
Jul 25, 2008, 2:17:25 AM7/25/08
to
sybr...@hccnet.nl escribió:

>> Is there any way to get an unmodified output from a certain table field
>> that I know contains (or it's supposed to contain) an Euro symbol so I
>> can check with a binary editor what numeric code it's actually using?
>
> Earlier in this thread you specified the characterset of the database
> is WE8ISO8859P1.
> This characterset doesn't have the euro.

Despite that, it seems they did insert euros. In my web pages it
displays as "¿" but through the context it's obvious it's supposed to be
the euro symbol. So, no matter the actual charset, it seems their apps
treat the data as MSWIN1252 or WE8ISO8859P15.

I believe I could connect as WE8ISO8859P15 and let Oracle do the
conversion but if actual data does not use the charset it's supposed to,
the conversion will be meaningless. I've done further testing and I'd
dare say data is MSWIN1252: the only way I can see the euro symbol is
connecting as WE8ISO8859P1 and then doing a client-side conversion from
cp1252 to iso-8859-15 with Iconv.

Again, thank you for your hints.

Laurenz Albe

unread,
Jul 25, 2008, 3:00:30 AM7/25/08
to
"Álvaro G. Vicario" <alvaroNOS...@demogracia.com> wrote:
>>> Is there any way to get an unmodified output from a certain table field
>>> that I know contains (or it's supposed to contain) an Euro symbol so I
>>> can check with a binary editor what numeric code it's actually using?

You could use the DUMP() function to see what is actually stored in the
database.

>> Earlier in this thread you specified the characterset of the database
>> is WE8ISO8859P1.
>> This characterset doesn't have the euro.
>
> Despite that, it seems they did insert euros. In my web pages it
> displays as "¿" but through the context it's obvious it's supposed to be
> the euro symbol. So, no matter the actual charset, it seems their apps
> treat the data as MSWIN1252 or WE8ISO8859P15.
>
> I believe I could connect as WE8ISO8859P15 and let Oracle do the
> conversion but if actual data does not use the charset it's supposed to,
> the conversion will be meaningless. I've done further testing and I'd
> dare say data is MSWIN1252: the only way I can see the euro symbol is
> connecting as WE8ISO8859P1 and then doing a client-side conversion from
> cp1252 to iso-8859-15 with Iconv.

If you can get Euro signs from a LATIN-1 database, it is an indication
that you are a victim of the problem I mentioned upthread, namely that
Oracle doesn't check your input if client and server character set are
identical.

You probably have the Euros stored as hex 0x80 if the client is a Windows
machine (make sure with DUMP).

You will get the 0x80 back as long as client and server character set
stay the same, but there is no other setting for the client character set
that will deliver anything meaningful.

That is because you have bad data in your database, and they cannot be
converted from LATIN-1 to anything. Essentially it is a case of
user induced data corruption.

You can either
1) stick with LATIN-1 on both server and client and pray that nobody will
ever enter characters in a different encoding and that you will never
have to access the database from a non-Windows system.
2) if you are sure that all bogus characters are actually WE8MSWIN1252,
you could try to change the database character set.

Yours,
Laurenz Albe

joel garry

unread,
Jul 25, 2008, 2:03:05 PM7/25/08
to
On Jul 24, 11:17 pm, "Álvaro G. Vicario"

<alvaroNOSPAMTHA...@demogracia.com> wrote:
>
> I believe I could connect as WE8ISO8859P15 and let Oracle do the
> conversion but if actual data does not use the charset it's supposed to,
> the conversion will be meaningless. I've done further testing and I'd
> dare say data is MSWIN1252: the only way I can see the euro symbol is
> connecting as WE8ISO8859P1 and then doing a client-side conversion from
> cp1252 to iso-8859-15 with Iconv.
>
> Again, thank you for your hints.
>

Also see the docs for csscan to scope the problem. Some problems are
more easily fixed than others. If it turns out the only problem is
the Euro, it could be a relatively simple update or games lying to exp/
imp.

jg
--
@home.com is bogus.
That darn craigslist. http://ap.google.com/article/ALeqM5g1THE017WECuE2kGK_pKt14pefOgD924512O0

0 new messages