Regards
Abhijit
Are you inserting the strings as N'<the Japanese data>' ? If you leave
off the N, they will not be interpreted as Unicode. Also, make sure the ???
aren't there only because of the display font.
If neither of these works, post a CREATE TABLE statement and the
INSERT .. VALUES or INSERT INTO .. SELECT statements that are
not correctly inserting the Japanese characters.
Steve Kass
Drew University
Only using unicode datatype is sufficient to develop multilingual
application in sql server 2000? What about Locale, charset and sort
order? Do I need to specify these also? My application will show the
data both in English and Japanese depending upon the language
selection. For this what other things should I change?
Thanks and Regards
Abhijit
Steve Kass <sk...@drew.edu> wrote in message news:<b5i74s$t1f$1...@slb9.atl.mindspring.net>...
I use Courier New (Western) in QA, and I when I say
select nchar(12354)
I see something which definititely is a character different from ?. My
knowledge of Japanese is about non-existent, but when I check in the
character map in windows at U+3042 I find a very similar charcater,
and it is said to be Hiragana Letter A.
If this simple select prints the correct character, we can rule out
font issues. If your data displays correctly in the client, you
are probably not using Unicode, but some legacy DBCS.
> Only using unicode datatype is sufficient to develop multilingual
> application in sql server 2000? What about Locale, charset and sort
> order? Do I need to specify these also? My application will show the
> data both in English and Japanese depending upon the language
> selection. For this what other things should I change?
To diaplay data from a table, you might need different COLLATE clauses
for different users. But it might be better to push all such things
client-side.
--
Erland Sommarskog, SQL Server MVP, som...@algonet.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp
You need (I think) to be aware that Unicode is essentially a character
set, and there are some different types of encodings that can be
used at the byte level for it. SQL Server supports some of the
encodings; anyway, it supports at least one (one of the UCS2 ones;
possibly it supports UTF-8 and/or one of the UTF-16 ones). (I doubt
very much that it supports UCS-4, or any encodings using non-Intel
byte orders; none of the Windows operating systems support these.)
> What about Locale, charset and sort
> order? Do I need to specify these also? My application will show the
> data both in English and Japanese depending upon the language
> selection. For this what other things should I change?
>
You definitely will care about character encoding issues in Japanese,
as there are some popular Japanese encodings which are not any of
the Unicode encodings.
As for collation, you may want to do that outside of SQL Server:
- SQL Server is incapable of providing desired collation order
at runtime (AFAIK); you have to pick only one collation order
and you can never change it at runtime (again, AFAIK).
(Personally I hope they fix this some day, as they must have
all the collation info available to SQL Server.)
- I don't know if SQL Server can collate the Unicode characters
that are outside of BMP-0 correctly (ie, the ones which require
more than two bytes to encode in any Unicode encoding). But,
Japanese may not use any of these characters (certainly English
does not :))
Important disclaimer: I am no expert in these matters, just an
amateur throwing in tidbits I've seen.
I'm not sure what you mean here. You can very well have a table:
CREATE TABLE authors (
au_id int NOT NULL,
se_name varchar(50) COLLATE Finnish_Swedish_CS_AS NOT NULL,
hu_name varchar(50) COLLATE Hungarian_CS_AS NOT NULL)
What you cannot do is to say:
SET COLLATION Lithuanian_CS_AS
go
SELECT * FROM pubs..authors ORDER BY au_lname
and have Yokomoto turning up between Hunter and Karsen.
This may seem like a serious restriction, but when you think of it, you
see that there are some fairly serious problems here.
Imagine this sequence:
CREATE TABLE test(a varchar(20) COLLATE Danish_Norwegian_CS_AS NOT NULL,
CONSTRAINT pk_test PRIMARY KEY (a))
go
INSERT test VALUES ('Amager')
INSERT test VALUES ('Kolding')
INSERT test VALUES ('Kærlighed')
INSERT test VALUES ('København')
INSERT test VALUES ('Kaare')
INSERT test VALUES ('Ålborg')
INSERT test VALUES ('Aaltonen')
go
CREATE PROCEDURE get_test_sp @a AS
SELECT * FROM test WHERE a = @a
go
EXEC get_test_sp 'Aaltonen'
go
SET COLLATION Finnish_Swedish_CS_AS
go
EXEC get_test_sp 'Aaltonen'
Would the query plan created with the first invocation of get_test_sp
be reusable for the second call? No, it would not, because the index
for the table is orgnanized according the Danish/Norwegian collation
(which imposes the order in he INSERT statements), and this order is
completely different than the one for Swedish/Finnish. Thus, SQL
Server would have to scan the table.
This could have grave performance consequences.
True, you get the same consequences if you say:
SELECT * FROM test WHERE a COLLATE Finnish_Swedish_CS_AS = @a
but then you have control over the situation as a programmer. If the
user can set collation dynamically, any well-designed database would
break down.
True, you could have some more advance scheme, where you create multiple
indexes for the same column, and SQL Server would automatically pick
the index that matches the run-time setting collation, but there are
still cans of worms. What if an inserted value into a unique index is
unique in the current collation, but non-unique in some other collation?
You seem to below --
<snip>
> What you cannot do is to say:
>
> SET COLLATION Lithuanian_CS_AS
> go
> SELECT * FROM pubs..authors ORDER BY au_lname
>
> and have Yokomoto turning up between Hunter and Karsen.
Right. This you cannot do (AFAIK). That is my point.
So you must propagate all the data up to the frontend
and do this in your own code.
<snip>
> This could have grave performance consequences.
My point is that the user doesn't have to worry
about whether it may be slow sometimes, because in
reality SQL Server will (AFAIK) never be able to
do it, and so he will have to push all this data
through his own system and do this all in his own
code (so I suppose it will always be slower than
any slowdown you imagine for SQL Server, because
now the data must first be entirely pulled out ?)
Where this really hurts is when you just want one
page of the data, and really don't want to
propagate all the data to the frontend. Moving
500 rows of data, rather than say 40 rows, obviously
is going to make you sad as you scale up.
You have never had a user screaming on the phone "the database is slow!!!",
have you?
I would expect that Microsoft are wary of adding features that will
run a high risk to give bad performance, because they have a reputation
to think of.
And, as I mentioned, there are more issues than just performance.
You would need to save different query plans for the same stored
procedure for different collations.
There is also the question what happens if you declare a column
to have a certain collation, and you then make a run-time setting.
Which collation setting applies? That of the column or the run-
time setting? What if you in the same query want to use different
collations?
> because in reality SQL Server will (AFAIK) never be able to do it, and
> so he will have to push all this data through his own system and do this
> all in his own code (so I suppose it will always be slower than any
> slowdown you imagine for SQL Server, because now the data must first be
> entirely pulled out ?)
All data needs to be read, but SQL Server does not have to convert
between collations, so you only have to byte-shuffle.
> Where this really hurts is when you just want one page of the data, and
> really don't want to propagate all the data to the frontend. Moving 500
> rows of data, rather than say 40 rows, obviously is going to make you
> sad as you scale up.
Reading 500 rows or 40 rows is not an issue in a LAN. For bigger
data sizes, it is not going to scale if you have table scan
ten million rows to get 40 rows anyway. So you need to add a
column for each collation that the users may use and then use
the right column for the chose collation. (That probably leads to
some dynamic SQL.) Of course, only the collations you have indexed
columns for are permitted choices.
I think that breaking that midsentence you missed the crucial part :)
I said you need not worry if it may be slow sometimes, if you are
guaranteed it can never happen, because SQL Server can never do it.
It is a point of logic; users do not call up and screem that a system
is running slow unless the system exists, so for systems that do not
exist, slowness does not become an issue, -- just logic, correct ?
>
> I would expect that Microsoft are wary of adding features that will
> run a high risk to give bad performance, because they have a reputation
> to think of.
Ok, yes, you really mistook the contrary to fact supposition for a claim
above, right ? It was only a contrary to fact hypothetical, so, no claim
was meant...
Well, I assume differently; I assume that a mere x10 factor can matter
(actually I have seen it matter, so this is not entirely unwarranted
assumption). Further, I assume there are some interesting cases in
between 500 rows and 10 million rows, and that most of my actual
cases may even fall in that range :)
I expect that if MS really moves their filesystem to MS SQL Server, that
this may matter even more. The Windows Explorer application is already
one of the most embarassingly slow applications when demoing a
windows maching to a Unix crowd (I've been burned by this -- why are
we sitting and waiting for this multi-gigahertz machine to show a
simple file listing, pray tell ??? Sure, I know that much of the answer
probably involves OLE, and perhaps poor caching, but, no answer really
suffices in that situation) -- if it further slows down for all
locales except only the original installation one, I expect that will
only make matters worse.