Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Unicode to Shift-JIS conversion

89 views
Skip to first unread message

Michael (michka) Kaplan [MS]

unread,
Sep 15, 2002, 2:46:13 AM9/15/02
to
The code you have now is calling WtoA to convert from UTF-16 to UTF-8, then
it is calling WtoA again to convert the UTF-8 string as if it were UTF-16
and converting it shift-JIS. This is a recipe for string corruption.

Why do you need to involve UTF-8 at all here? You can move from UTF-16 right
to shift-JIS as far as I can see. In afct, you do not even need
WideCharToMultiByte -- try:

stOut = StrConv(stIn, vbFromUnicode, 1041)

and you will get the converion you need.


--
MichKa

This posting is provided "AS IS" with
no warranties, and confers no rights.


"Jan Opitz" <nex...@SPAMgmx.de> wrote in message
news:O5csQIIXCHA.1912@tkmsftngp09...
> After days of searching the web and trying much of
> what has been written about WideToMulti conversion
> I am quite desparate.
>
> A TTS-Object in my app needs Shift-JIS input and
> I did not find a way to convert Unicode (e.g. something like chrw(&h79c1)
> in the example below) to Shift-JIS.
>
> When I feed the object with an Shift-JIS string (loaded binary from a
file,
> which has been saved as shift-jis coded text with MS Word) it works fine.
>
> I remember Michka wrote
> 'But it is easy to call MultiByteToWideChar with code page 65001 to move
from
> 'UTF-8 to UTF-16, and then call WideCharToMultiByte with code page 949 to
> 'move from UTF-16 to code page 949. This will take care of the Korean
Win9x
> 'case.
> in a recent posting to this NG. So this is what I tried finally (using
his
> InCodePage.bas module and cp 932 instead of 949) - no success:
>
> What am I doing wrong?
>
> Jan Opitz
>
> (OS is w2000 sp3, Codepage 1252, locale 1031.)
>
>
Code: ----------------------------------------------------------------------
-----
>
> Private Sub TTSspeak
>
> Dim Buffer() As Byte
> Dim sText as String
>
> Buffer = WToA(ChrW(&H79C1), 65001)
> Buffer = WToA(buffer, 932)
>
> 'same result with
> 'sText = WToA(ChrW(&H79C1), 65001)
> 'sText = WToA(sText, 932)
> 'or mixture - which may be nonsense
> 'buffer = WToA(ChrW(&H79C1), 65001)
> ' sText = buffer
> 'buffer = WToA(sText, 932)
>
>
> TTS.speak buffer, param
>
> 'or TTS.speak sText, param
>
> end sub
>
> Public Declare Function MultiByteToWideChar Lib "kernel32" _
> (ByVal codepage As Long, _
> ByVal dwFlags As Long, _
> ByVal lpMultiByteStr As Long, _
> ByVal cchMultiByte As Long, _
> ByVal lpWideCharStr As Long, _
> ByVal cchWideChar As Long) As Long
>
> Public Declare Function WideCharToMultiByte Lib "kernel32" _
> (ByVal codepage As Long, _
> ByVal dwFlags As Long, _
> ByVal lpWideCharStr As Long, _
> ByVal cchWideChar As Long, _
> ByVal lpMultiByteStr As Long, _
> ByVal cchMultiByte As Long, _
> ByVal lpDefaultChar As Long, _
> lpUsedDefaultChar As Long) As Long
>
>
> ' AToW ANSI to UNICODE conversion, via a given codepage.
> Public Function AToW(ByVal st As String, Optional ByVal cpg As Long = -1,
Optional lFlags As Long = 0) As String
> Dim stBuffer As String
> Dim cwch As Long
> Dim pwz As Long
> Dim pwzBuffer As Long
> If cpg = -1 Then cpg = GetACP()
> pwz = StrPtr(st)
> cwch = MultiByteToWideChar(cpg, lFlags, pwz, -1, 0&, 0&)
> stBuffer = String$(cwch + 1, vbNullChar)
> pwzBuffer = StrPtr(stBuffer)
> cwch = MultiByteToWideChar(cpg, lFlags, pwz, -1, pwzBuffer,
Len(stBuffer))
> AToW = Left$(stBuffer, cwch - 1)
> End Function
>
> ' WToA UNICODE to ANSI conversion, via a given codepage
> Public Function WToA(ByVal st As String, Optional ByVal cpg As Long = -1,
Optional lFlags As Long = 0) As String
> Dim stBuffer As String
> Dim cwch As Long
> Dim pwz As Long
> Dim pwzBuffer As Long
> Dim lpUsedDefaultChar As Long
>
> If cpg = -1 Then cpg = GetACP()
> pwz = StrPtr(st) ' ln(st)
> cwch = WideCharToMultiByte(cpg, lFlags, pwz, -1, 0&, 0&, ByVal 0&,
ByVal 0&)
> stBuffer = String$(cwch + 1, vbNullChar)
> pwzBuffer = StrPtr(stBuffer)
> cwch = WideCharToMultiByte(cpg, lFlags, pwz, -1, pwzBuffer,
Len(stBuffer), ByVal 0&, ByVal 0&)
> WToA = Left$(stBuffer, cwch - 1)
> End Function
>
>


Jan Opitz

unread,
Sep 15, 2002, 2:35:32 AM9/15/02
to

Jan Opitz

unread,
Sep 15, 2002, 4:35:15 AM9/15/02
to
Thank you for your immediate (!) answer on a sunday.
I did try StrConv, but it failed to produce the format I need.

What else could be done to convert any string produced
by chrw(&h....) to Shift-JIS?

Jan

"Michael (michka) Kaplan [MS]" <mic...@online.microsoft.com> schrieb im Newsbeitrag news:u53fxNIXCHA.2452@tkmsftngp09...

Jan Opitz

unread,
Sep 15, 2002, 4:35:40 AM9/15/02
to
Thank you for your immediate (!) answer on a sunday.
I did try StrConv, but it failed to produce the format I need.

What else could be done to convert any string produced
by chrw(&h....) to Shift-JIS?

Jan

"Michael (michka) Kaplan [MS]" <mic...@online.microsoft.com> schrieb im Newsbeitrag news:u53fxNIXCHA.2452@tkmsftngp09...

Jan Opitz

unread,
Sep 15, 2002, 5:15:28 AM9/15/02
to
Sorry, there is typho in my posting.
This part of the code should of course read

> Buffer = WToA(ChrW(&H79C1), 65001)
> Buffer = AToW (buffer, 932)
'and also WtoA and AtoW with sText

but though I was using these without success.

Jan.


"Michael (michka) Kaplan [MS]" <mic...@online.microsoft.com> schrieb im Newsbeitrag news:u53fxNIXCHA.2452@tkmsftngp09...

Jan Opitz

unread,
Sep 15, 2002, 9:06:54 AM9/15/02
to
Let me put it another way:

How can I - with VB - convert a string produced by chrw(xxxxx ) so that its
bytes are equal to that of a binary file made by saving the same
character as Shift-JIS encoded text with MS Word?

The latter works fine as input to my TTS-object,
but I did find no way to convert the CHRW(xxxxx) to the
same character coding. Strconv() does not help and I had no
luck with WtoA.

Jan

Michael (michka) Kaplan [MS]

unread,
Sep 15, 2002, 11:18:33 AM9/15/02
to
You need to take UTF-8 out of the code -- UTF-8 is not needed here.

Just use StrConv alone and if you call is properly you will get the right
bytes:

stOut = StrConv(stIn, vbFromUnicode, 1041)

Once you do this, if stIn is your Unicode string then stOut will be bytes in
code page 932.

Of course, if you are still having problems you will need to explain the
method you are using to look at the bytes -- perhaps they are being
converted/corrupted later?


--
MichKa

This posting is provided "AS IS" with
no warranties, and confers no rights.

"Jan Opitz" <nex...@SPAMgmx.de> wrote in message

news:ejJzvgJXCHA.1828@tkmsftngp08...

Jan Opitz

unread,
Sep 15, 2002, 2:13:14 PM9/15/02
to
MichKa, I came closer to the problem:
I checked the code which loads the Shift-JIS file:

It was
dim loadShift as string
FileNr = FreeFile
Open Path For Binary As #FileNr
loadShift = Space$(LOF(FileNr))
Get #FileNr, , loadShift
Close #FileNr

When I fed the TTS object with this string, it worked fine
(it was saved as Shift-JIS encoded text).

Now I changed the loadshift variable type to a byte variable and
SURPRISE: the values are different from those in the
string variable, but exactly the same values as I would find
in a string variable filled with the chrw(code) of the char in the
file. These values are NOT accepted by the TTS object.

So what I have do do is to try to convert the chrw(code) result
(and other text strings in the same format) in the
same way as VB converted the file data. But how???

Jan



Jan Opitz

unread,
Sep 15, 2002, 2:48:43 PM9/15/02
to
MichKa, I came closer to the problem:
I checked the code which loads the Shift-JIS file:

It was
dim LoadShift as string

FileNr = FreeFile
Open Path For Binary As #FileNr
LoadShift = Space$(LOF(FileNr))
Get #FileNr, , LoadShift
Close #FileNr

Only when I feed the TTS object with this Loadshift string, it works
fine (the file was saved as Shift-JIS encoded text).

Now I changed the Loadshift variable type to a byte variable and

SURPRISE: the values are different from those in the
string variable, but exactly the same values as I would find
in a string variable filled with the chrw(code) of the char in the

file and converted with StrConv.
So StrConv(ChrW(code),VbFromUnicode,1041) would have
the same values as the byte variable LoadShift.

These values are NOT accepted by the TTS object.

So what I have do do is to try to convert the chrw(code) result
(and other text strings in the same format) in the
same way as VB converted the file data. But how???

Quite difficult to follow - I feel like doing circles in a labyrinth

Jan

Michael (michka) Kaplan [MS]

unread,
Sep 15, 2002, 8:56:24 PM9/15/02
to
It sounds like the file may be in UTF-16 format? If the bytes are the same
as a UTF-16 string without conversion, that is what is happening here.
Shift-JIS is not involed, neither is UTF-8.

As for converting the files, you have not explained who is converting and
when.

--
MichKa

This posting is provided "AS IS" with
no warranties, and confers no rights.


"Jan Opitz" <nex...@SPAMgmx.de> wrote in message

news:uXaZHhOXCHA.2372@tkmsftngp12...

Jan Opitz

unread,
Sep 15, 2002, 10:40:40 PM9/15/02
to
I finally got it: StrConv the chrw() string and then made it WideChar by AtoW.
This is exactly what was done by VB when I loaded the presumably Shift-JIS
file to a string.

Thank you again!
Jan

Michael (michka) Kaplan [MS]

unread,
Sep 16, 2002, 3:10:57 AM9/16/02
to
You do not need to use AtoW. Try using StrConf with the vbUnicode flag where
you were doing that.

You only need to use AtoW and WtoA when you need a code page like 65001 that
is not associated with a locale.


--
MichKa

This posting is provided "AS IS" with
no warranties, and confers no rights.


"Jan Opitz" <nex...@SPAMgmx.de> wrote in message

news:uLs7GnUXCHA.1748@tkmsftngp08...

Michael (michka) Kaplan [MS]

unread,
Sep 16, 2002, 3:12:24 AM9/16/02
to
ALSO, please note that you basically converted something for no reason --
ChrW produces Unicode text. You essentially converted the Unicode text to
sp932 and then converted that text back to Unicode.

In other words, you did not have to do any of it!


--
MichKa

This posting is provided "AS IS" with
no warranties, and confers no rights.

"Jan Opitz" <nex...@SPAMgmx.de> wrote in message

news:uLs7GnUXCHA.1748@tkmsftngp08...

Jan Opitz

unread,
Sep 18, 2002, 1:03:20 AM9/18/02
to
Sure, I will have to find out, why it works (only?) when double-converted the way I described.
Also, I will have to find out, why the same input of chrw(code) is accepted
without any conversion under a Japanese system default locale, while this is not under 1031.
I am just too ignorant of all the theoretical background.
Unfortunatelay, I am abroad and do not have your book here with me.
Thank you for your guidance.
Jan


"Michael (michka) Kaplan [MS]" <mic...@online.microsoft.com> schrieb im Newsbeitrag news:#YAUEBVXCHA.2416@tkmsftngp09...

Michael (michka) Kaplan [MS]

unread,
Sep 18, 2002, 11:28:51 AM9/18/02
to
Well, for starters how about trying reasl strings rather than single
characters.....

I think you will see what you are doing here if you call the AscW function
on the string after you have converted and then converted back -- you will
have the original character! :-)

But as a rule, AtoW and WtoA are *never* needed for anything but UTF-8 and
other code pages not covered as ACPs for various locales. The intrinsic
StrConv function is roughly twice as fast as any declare statement.


--
MichKa

This posting is provided "AS IS" with
no warranties, and confers no rights.

"Jan Opitz" <nex...@SPAMgmx.de> wrote in message

news:uCnJWOtXCHA.720@tkmsftngp12...

0 new messages