Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Difference between VARCHAR2(xx CHAR) and NVARCHAR2(xx)

264 views
Skip to first unread message

"Álvaro G. Vicario"

unread,
Dec 22, 2010, 6:54:56 AM12/22/10
to
I've just installed Oracle XE (Oracle Database 10g Express Edition
Universal). I've changed nothing so it's the using the character sets
specified in the default settings:

SELECT * FROM NLS_DATABASE_PARAMETERS;

NLS_CHARACTERSET AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16

Given that both CHAR and NCHAR data types seem to accept multi-byte
strings (UTF-8 and UTF-16), what is the exact difference between these
two column definitions?

VARCHAR2(10 CHAR)
NVARCHAR2(10)

I've found similar questions in many forums but they always explain CHAR
vs VARCHAR or mention that NVARCHAR2 accepts multi-byte strings.

My educated guess is VARCHAR2 is a legacy type that did not accept
multi-byte, NVARCHAR2 was added, then VARCHAR2 was enhanced to support
multi-byte... And also VARCHAR2 depends on the server version to be able
to store multi-byte while NVARCHAR2 is always available.

Am I right?


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
--

ddf

unread,
Dec 22, 2010, 8:18:40 AM12/22/10
to
On Dec 22, 6:54 am, "Álvaro G. Vicario"

<alvaro.NOSPAMTH...@demogracia.com.invalid> wrote:
> I've just installed Oracle XE (Oracle Database 10g Express Edition
> Universal). I've changed nothing so it's the using the character sets
> specified in the default settings:
>
>      SELECT * FROM NLS_DATABASE_PARAMETERS;
>
>      NLS_CHARACTERSET       AL32UTF8
>      NLS_NCHAR_CHARACTERSET AL16UTF16
>
> Given that both CHAR and NCHAR data types seem to accept multi-byte
> strings (UTF-8 and UTF-16), what is the exact difference between these
> two column definitions?
>
>      VARCHAR2(10 CHAR)
>      NVARCHAR2(10)
>
> I've found similar questions in many forums but they always explain CHAR
> vs VARCHAR or mention that NVARCHAR2 accepts multi-byte strings.
>
> My educated guess is VARCHAR2 is a legacy type that did not accept
> multi-byte, NVARCHAR2 was added, then VARCHAR2 was enhanced to support
> multi-byte... And also VARCHAR2 depends on the server version to be able
> to store multi-byte while NVARCHAR2 is always available.
>
> Am I right?
>
> --
> --http://alvaro.es- lvaro G. Vicario - Burgos, Spain
> -- Mi sitio sobre programaci n web:http://borrame.com

> -- Mi web de humor satinado:http://www.demogracia.com
> --

I can't say whether you're 'right' or not, but I do know this:

UTF-16: Each character is either 2 or 4 bytes long.

UTF-8: Each character takes 1 to 4 bytes to store.

Thus any character in a UTF-16 implementation (your NVARCHAR2 type)
will occupy no less than 2 bytes of storage, where your VARCHAR2
characters (AL32UTF8) can be stored in as little as 1 byte which may
not be important to you now but could become so in the future. Both
can support multiple languages without character loss however the
database character set is also used to identify SQL and PL/SQL source
code. In order to do this, it must have either EBCDIC or 7-bit ASCII
as a subset, whichever is native to the platform; it is not possible
to use a fixed-width, multibyte character set as the database
character set such as AL16UTF16. Thus your AL32UTF8 character set
allows for 7-bit ASCII characters and doesn't interfere with object
names, user names, etc. Also the N-types (NVARCHAR2, NCHAR, NCLOB)
will accept and stored multi-byte characters regardless of the
database character set in use making it possible to support single-
byte and multi-byte languages.


David Fitzjarrell

"Álvaro G. Vicario"

unread,
Dec 27, 2010, 10:58:25 AM12/27/10
to
El 22/12/2010 14:18, ddf escribió/wrote:
> On Dec 22, 6:54 am, "Álvaro G. Vicario"
> <alvaro.NOSPAMTH...@demogracia.com.invalid> wrote:
>> I've just installed Oracle XE (Oracle Database 10g Express Edition
>> Universal). I've changed nothing so it's the using the character sets
>> specified in the default settings:
>>
>> SELECT * FROM NLS_DATABASE_PARAMETERS;
>>
>> NLS_CHARACTERSET AL32UTF8
>> NLS_NCHAR_CHARACTERSET AL16UTF16
>>
>> Given that both CHAR and NCHAR data types seem to accept multi-byte
>> strings (UTF-8 and UTF-16), what is the exact difference between these
>> two column definitions?
>>
>> VARCHAR2(10 CHAR)
>> NVARCHAR2(10)
>>
>> I've found similar questions in many forums but they always explain CHAR
>> vs VARCHAR or mention that NVARCHAR2 accepts multi-byte strings.
>>
>> My educated guess is VARCHAR2 is a legacy type that did not accept
>> multi-byte, NVARCHAR2 was added, then VARCHAR2 was enhanced to support
>> multi-byte... And also VARCHAR2 depends on the server version to be able
>> to store multi-byte while NVARCHAR2 is always available.
>>
>> Am I right?

> I can't say whether you're 'right' or not, but I do know this:


>
> UTF-16: Each character is either 2 or 4 bytes long.
>
> UTF-8: Each character takes 1 to 4 bytes to store.
>
> Thus any character in a UTF-16 implementation (your NVARCHAR2 type)
> will occupy no less than 2 bytes of storage, where your VARCHAR2
> characters (AL32UTF8) can be stored in as little as 1 byte which may
> not be important to you now but could become so in the future. Both
> can support multiple languages without character loss however the
> database character set is also used to identify SQL and PL/SQL source
> code. In order to do this, it must have either EBCDIC or 7-bit ASCII
> as a subset, whichever is native to the platform; it is not possible
> to use a fixed-width, multibyte character set as the database
> character set such as AL16UTF16. Thus your AL32UTF8 character set
> allows for 7-bit ASCII characters and doesn't interfere with object
> names, user names, etc.

I'm not sure I understand this. I assume you are talking about using
object names that contain international chars, such as:

CREATE TABLE "año" (...)

:-?


> Also the N-types (NVARCHAR2, NCHAR, NCLOB)
> will accept and stored multi-byte characters regardless of the
> database character set in use making it possible to support single-
> byte and multi-byte languages.

It makes sense.

Thank you!

--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com

0 new messages