Unicode test

142 views
Skip to first unread message

Antonio Linares

unread,
Sep 30, 2011, 8:18:40 AM9/30/11
to Harbour Developers
I am testing this little test, but I don't get the original string.
Any hints ? thanks

Function Main()

Local cUnicodeText := "RED TEA 华语/華語"

MsgInfoU( cUnicodeText )

RETURN NIL

#pragma BEGINDUMP

#include <windows.h>
#include <hbapi.h>

HB_FUNC( MSGINFOU )
{
MessageBoxW( GetActiveWindow(), ( WCHAR * ) hb_parc( 1 ),
L"Information", 0x40 );
}

#pragma ENDDUMP

Antonio

wen....@gmail.com

unread,
Sep 30, 2011, 8:42:29 AM9/30/11
to harbou...@googlegroups.com

You must save program code to UTF8 of encode.

���(On) Fri, 30 Sep 2011 05:18:40 -0700 (PDT)
Antonio Linares <antonio....@gmail.com> �g�H(write):

> I am testing this little test, but I don't get the original string.
> Any hints ? thanks
>
> Function Main()
>

> Local cUnicodeText := "RED TEA ??/�ػy"


>
> MsgInfoU( cUnicodeText )
>
> RETURN NIL
>
> #pragma BEGINDUMP
>
> #include <windows.h>
> #include <hbapi.h>
>
> HB_FUNC( MSGINFOU )
> {
> MessageBoxW( GetActiveWindow(), ( WCHAR * ) hb_parc( 1 ),
> L"Information", 0x40 );
> }
>
> #pragma ENDDUMP
>
> Antonio

--
WenSheng <wen....@gmail.com>

Lucas De Beltrán

unread,
Sep 30, 2011, 9:23:13 AM9/30/11
to Harbour Developers
Yes, I am using UTF8 format from UEStudio but no success.


On 30 sep, 14:42, "wen.ss...@gmail.com" <wen.ss...@gmail.com> wrote:
> You must save program code to UTF8 of encode.
>
> (On) Fri, 30 Sep 2011 05:18:40 -0700 (PDT)
> Antonio Linares <antonio.fivet...@gmail.com> g H(write):
>
>
>
>
>
> > I am testing this little test, but I don't get the original string.
> > Any hints ? thanks
>
> > Function Main()
>
> >    Local cUnicodeText := "RED TEA ??/ ػy"
>
> >    MsgInfoU( cUnicodeText )
>
> > RETURN NIL
>
> > #pragma BEGINDUMP
>
> > #include <windows.h>
> > #include <hbapi.h>
>
> > HB_FUNC( MSGINFOU )
> > {
> >    MessageBoxW( GetActiveWindow(), ( WCHAR * ) hb_parc( 1 ),
> > L"Information", 0x40 );
> > }
>
> > #pragma ENDDUMP
>
> > Antonio
>
> --
> WenSheng <wen.ss...@gmail.com>- Ocultar texto de la cita -
>
> - Mostrar texto de la cita -

Bacco

unread,
Sep 30, 2011, 10:42:54 AM9/30/11
to harbou...@googlegroups.com
Try with

Request HB_CODEPAGE_UTF8
hb_cdpSelect( 'UTF8' )

(assuming you are using UTF8, as Unicode per se doesn't mean any
encoding, just the character table)

Also, I don't remember if the windows api accepts UTF8 directly.


Regards,
Bacco

Bacco

unread,
Sep 30, 2011, 10:44:30 AM9/30/11
to harbou...@googlegroups.com
PS: We have UTF16LE in the codepage folder also, to give a try.

Antonio Linares

unread,
Sep 30, 2011, 1:36:09 PM9/30/11
to Harbour Developers
Please notice that in C language this code works fine (saved as
Unicode ASCII escaped):

WCHAR * cUniStr = L"RED TEA 华语/華語";

MessageBoxW( 0, cUniStr, L"Ok", 0 );

I wonder how to get the same result with Harbour. Here is the test:

Function Main()

local cUnicodeText := "RED TEA 华语/華語"

MsgInfoU( cUnicodeText )

TestOk()

RETURN NIL

#pragma BEGINDUMP

#include <windows.h>
#include <hbapi.h>

HB_FUNC( MSGINFOU )
{
MessageBoxW( GetActiveWindow(), ( WCHAR * ) hb_parc( 1 ),
L"Information", 0x40 );
}

HB_FUNC( TESTOK )
{
WCHAR * cUniStr = L"RED TEA 华语/華語";

MessageBoxW( 0, cUniStr, L"Ok", 0 );
}

#pragma ENDDUMP

Antonio

Antonio Linares

unread,
Sep 30, 2011, 2:00:08 PM9/30/11
to Harbour Developers
If I use a C function to declare the string, then this example works
fine. I wonder how to get the same effect without the need of the C
function, thanks
(saved as unicode ansii escaped)

function Main()

local cUnicodeText := GetStr()

MsgInfoU( cUnicodeText )

return nil

#pragma BEGINDUMP

#include <windows.h>
#include <hbapi.h>

HB_FUNC( MSGINFOU )
{
MessageBoxW( GetActiveWindow(), ( WCHAR * ) hb_parc( 1 ),
L"Information", 0x40 );
}

HB_FUNC( GETSTR )
{
hb_retclen( ( char * ) L"RED TEA 华语/華語", 26 );
}

#pragma ENDDUMP

Antonio

Bacco

unread,
Sep 30, 2011, 3:36:14 PM9/30/11
to harbou...@googlegroups.com
Hi, Antonio

Have you tried the


Request HB_CODEPAGE_UTF8
hb_cdpSelect( 'UTF8' )

sent before?


Regards,
Bacco

Antonio Linares

unread,
Sep 30, 2011, 4:20:32 PM9/30/11
to Harbour Developers
Bacco,

Yes, this is the test, and it does not work :-(

Request HB_CODEPAGE_UTF8

function Main()

local cUnicodeText := "RED TEA 华语/華語"

hb_cdpSelect( 'UTF8' )

Bacco

unread,
Sep 30, 2011, 4:39:21 PM9/30/11
to harbou...@googlegroups.com
Hi, Antonio

Can you attach your sample? Even in my mail client your string on the
sample doesn't show correctly, so as attachment I believe I can do
better testing. Please, use some accented characters in the tests
saved as your desired encoding, and put some comment so we can know
what's happening.

I do use accented and special characters both in UTF8 and win1252 in
some applications, even converting them at some points with no
problem. Send your sample and I'll try to derive something using win32
api.


Regards,
Bacco.

Antonio Linares

unread,
Sep 30, 2011, 4:46:42 PM9/30/11
to Harbour Developers
Bacco,

Here you have it saved in uncode ansii escaped, Please let me know if
you are able to build it. You should get a "RED TEA ..." result,
thanks!

request HB_CODEPAGE_UTF8

function Main()

local cUnicodeText := "RED TEA \u534E\u8BED/\u83EF\u8A9E"

hb_cdpSelect( 'UTF8' )

MsgInfoU( cUnicodeText )

MsgInfoU( GetStr() )

return nil

#pragma BEGINDUMP

#include <windows.h>
#include <hbapi.h>

HB_FUNC( MSGINFOU )
{
MessageBoxW( GetActiveWindow(), ( WCHAR * ) hb_parc( 1 ),
L"Information", 0x40 );
}

HB_FUNC( GETSTR )
{
hb_retclen( ( char * ) L"RED TEA \u534E\u8BED/\u83EF\u8A9E", 26 );
}

#pragma ENDDUMP

Antonio

Bacco

unread,
Sep 30, 2011, 5:03:07 PM9/30/11
to harbou...@googlegroups.com
Hi, Antonio

I prefer the attachment to check your encoding also. Simply providing
the unicode points will not help, we need to know what encoding are
you using, so we can start it right. Unicode can be usually encoded as
UTF-8, UTF-16LE or UTF-16BE.

Also, by reading the win32 api documentation, I've noticed that the W
(wide) expect 16 bit characters, and this probably would need the
original harbour string to be saved as wide. We are dealing with more
than one problem simultaneously, in fact.

Maybe converting the harbour string to UTF-16 before using would be
much simpler instead.

Antonio Linares

unread,
Sep 30, 2011, 5:06:09 PM9/30/11
to Harbour Developers
Bacco,

You can download it from here: http://harbour-and-xharbour-builds.googlecode.com/files/richard.prg

thanks!

Antonio

Bacco

unread,
Sep 30, 2011, 11:04:22 PM9/30/11
to harbou...@googlegroups.com
Hi, Antonio

Tested and compiled fine, but it shouldn't work correctly anyway.

First, when you define cUnicodeText, it's a normal 8 bit string,
that's not recognized by win32 Wide functions.
If you do
local cUnicodeText := "R"+Chr(0)+"E"+Chr(0)+"D"+Chr(0)+"
"+Chr(0)+"T"+Chr(0)+"E"+Chr(0)+"A"+Chr(0)+"
"+Chr(0)+Chr(0xAC)+Chr(0x20)

it starts to work fine in this part (of course not a solution, just
testing). Also, it seems to me that "\u" has no special meaning in
harbour code, as in C part does. Also, note the Chr(0xAC)+Chr(0x20)
that means U+20AC (euro), is working properly also.

Can you tell me the encoding your text editor can save files? If it
accepts utf-8, you can simply use this:

request HB_CODEPAGE_UTF8 // <- should be your editor format
request HB_CODEPAGE_UTF16LE // <- this is the win32 W format

function Main()
local cUnicodeText := "Type here the characters, without escaping.
Á É Í Ó Ú á é í ó ú"

MsgInfoU( hb_translate( cUnicodeText,'UTF8','UTF16LE' ) )

MsgInfoU( GetStr() )

return nil
...

I'll get into the C part later.


Regards,
Bacco

Bacco

unread,
Sep 30, 2011, 11:09:45 PM9/30/11
to harbou...@googlegroups.com
PS:
Use BOTH requests, to provide the necessary pages to the translate function.

request HB_CODEPAGE_UTF8
request HB_CODEPAGE_UTF16LE

wen....@gmail.com

unread,
Sep 30, 2011, 11:33:50 PM9/30/11
to harbou...@googlegroups.com

program code is UTF8 of encode, so .prg --> to --> .c string is UTF8 not
UTF16!!

Windows Unicode is UTF16, so you must write function about Utf8toUtf16
then you can show utf16 word use 'MessageBoxW'.


> Yes, I am using UTF8 format from UEStudio but no success.
>
>
> On 30 sep, 14:42, "wen.ss...@gmail.com" <wen.ss...@gmail.com> wrote:
> > You must save program code to UTF8 of encode.
> >
> > (On) Fri, 30 Sep 2011 05:18:40 -0700 (PDT)
> > Antonio Linares <antonio.fivet...@gmail.com> g H(write):
> >
> >
> >
> >
> >
> > > I am testing this little test, but I don't get the original string.
> > > Any hints ? thanks
> >
> > > Function Main()
> >

> > > ? ?Local cUnicodeText := "RED TEA ??/ ?y"
> >
> > > ? ?MsgInfoU( cUnicodeText )


> >
> > > RETURN NIL
> >
> > > #pragma BEGINDUMP
> >
> > > #include <windows.h>
> > > #include <hbapi.h>
> >
> > > HB_FUNC( MSGINFOU )
> > > {

> > > ? ?MessageBoxW( GetActiveWindow(), ( WCHAR * ) hb_parc( 1 ),


> > > L"Information", 0x40 );
> > > }
> >
> > > #pragma ENDDUMP
> >
> > > Antonio
> >
> > --
> > WenSheng <wen.ss...@gmail.com>- Ocultar texto de la cita -
> >
> > - Mostrar texto de la cita -

--
WenSheng <wen....@gmail.com>

wen....@gmail.com

unread,
Oct 1, 2011, 12:26:14 AM10/1/11
to harbou...@googlegroups.com

The sample is reference HWGUI.


Function Main()

Local cUnicodeText := ""

HB_SETUTF8()
cUnicodeText := "����r RED TEA 华语/華語"

MsgInfoU( cUnicodeText )

RETURN NIL

#pragma BEGINDUMP

#include <windows.h>
#include <hbapi.h>
#include <hbapiitm.h>
#include "hbapicdp.h"

static int s_iVM_CP = CP_ACP; /* CP_OEMCP */
static const wchar_t s_wszConstStr[ 1 ] = { 0 };
extern HB_EXPORT void hb_strfree( void * hString );
#define HB_PARSTR( n, h, len ) hwg_wstrget( hb_param( n, HB_IT_ANY ), h, len )


const wchar_t * hwg_wstrget( PHB_ITEM pItem, void ** phStr, HB_SIZE * pnLen )
{
const wchar_t * pStr;

if( pItem && HB_IS_STRING( pItem ) )
{
HB_SIZE nLen = hb_itemGetCLen( pItem ), nDest = 0;
const char * pszText = hb_itemGetCPtr( pItem );

if( nLen )
nDest = MultiByteToWideChar( s_iVM_CP, 0, pszText, nLen, NULL, 0 );

if( nDest == 0 )
{
*phStr = ( void * ) s_wszConstStr;
pStr = s_wszConstStr;
}
else
{
wchar_t * pResult = ( wchar_t * ) hb_xgrab( ( nDest + 1 ) * sizeof( wchar_t ) );

pResult[ nDest ] = 0;
nDest = MultiByteToWideChar( s_iVM_CP, 0, pszText, nLen, pResult, nDest );
*phStr = ( void * ) pResult;
pStr = pResult;
}
if( pnLen )
*pnLen = nDest;
}
else
{
*phStr = NULL;
pStr = NULL;
if( pnLen )
*pnLen = 0;
}
return pStr;
}

HB_FUNC( MSGINFOU )
{
void * hText;
MessageBoxW( GetActiveWindow(), HB_PARSTR( 1, &hText, NULL ), L"Information", 0x40 );
hb_strfree( hText );
}

HB_FUNC( HB_SETUTF8 )
{
s_iVM_CP = CP_UTF8;
}

#pragma ENDDUMP

> I am testing this little test, but I don't get the original string.
> Any hints ? thanks
>
> Function Main()
>

> Local cUnicodeText := "RED TEA ??/�ػy"


>
> MsgInfoU( cUnicodeText )
>
> RETURN NIL
>
> #pragma BEGINDUMP
>
> #include <windows.h>
> #include <hbapi.h>
>
> HB_FUNC( MSGINFOU )
> {
> MessageBoxW( GetActiveWindow(), ( WCHAR * ) hb_parc( 1 ),
> L"Information", 0x40 );
> }
>
> #pragma ENDDUMP
>
> Antonio

--
WenSheng <wen....@gmail.com>

testu.zip

Bacco

unread,
Oct 1, 2011, 12:53:19 AM10/1/11
to harbou...@googlegroups.com
Hi, WenSheng

Also give a try to this code, it works fine using the harbour
functions to convert codepages instead an extra C conversion:

request HB_CODEPAGE_UTF8
request HB_CODEPAGE_UTF16LE

function Main()
local cUnicodeText := "Type here the characters, without escaping.
Á É Í Ó Ú á é í ó ú"

MsgInfoU( hb_translate( cUnicodeText,'UTF8','UTF16LE' ) )


Regards,
Bacco.

wen....@gmail.com

unread,
Oct 1, 2011, 3:13:44 AM10/1/11
to harbou...@googlegroups.com

Way you can to correctly display Traditional Chinese and Simplified Chinese characters.
But I must add the code 'hb_translate()' in all this code.

>
> Also give a try to this code, it works fine using the harbour
> functions to convert codepages instead an extra C conversion:
>
> request HB_CODEPAGE_UTF8
> request HB_CODEPAGE_UTF16LE
>
> function Main()
> local cUnicodeText := "Type here the characters, without escaping.

> � � � � � � � � � �"

Przemysław Czerpak

unread,
Oct 1, 2011, 5:26:49 AM10/1/11
to harbou...@googlegroups.com
On Sat, 01 Oct 2011, wen....@gmail.com wrote:

Hi,

> The sample is reference HWGUI.
> Function Main()
> Local cUnicodeText := ""
> HB_SETUTF8()

> cUnicodeText := "?????r RED TEA 华语/華語"


> MsgInfoU( cUnicodeText )
> RETURN NIL
>
> #pragma BEGINDUMP
> #include <windows.h>
> #include <hbapi.h>
> #include <hbapiitm.h>
> #include "hbapicdp.h"
> static int s_iVM_CP = CP_ACP; /* CP_OEMCP */
> static const wchar_t s_wszConstStr[ 1 ] = { 0 };
> extern HB_EXPORT void hb_strfree( void * hString );
> #define HB_PARSTR( n, h, len ) hwg_wstrget( hb_param( n, HB_IT_ANY ), h, len )
>
> const wchar_t * hwg_wstrget( PHB_ITEM pItem, void ** phStr, HB_SIZE * pnLen )

hwg_wstr*() functions I added to HWGUI only for compatibility with
xHarbour. Harbour has native support for UNICODE string API inside
HVM so it's useless here.
Harbour code uses HB STR API so it's full of examples, i.e. look at
WAPI_MESSAGEBOX() in contrib/hbwin/wapi_winuser.c for code which
works correctly compiled with and without UNICODE macro with any
character based user encoding set by _SET_CODEPAGE or hb_cdpSelect(),
i.e. with UTF8 or BIG5. Please also note that even compiled without
UNICODE macro it makes necessary translations to codepage set by
_SET_OSCODEPAGE. It means that Harbour does not need any functions
like OEMTOANSI() or ANSITOOEM() or any other CP modifications in
user code. Correctly written user interface using HB_STR*() API
works with any encoding - it doesn't matter it's CUI or GUI library.
Whole Harbour code uses HB STR API and also some 3-rd party GUI
libraries like HWGUI. I hope that all other 3-rd party code will
be adopted to new interface which gives also some additional features,
i.e. it's MT safe API which allows to introduce simultaneous item
assign and code using it will work correctly without any modifications
with future HVM releases having native support for UNICODE string
items.

best regards,
Przemek

Reply all
Reply to author
Forward
0 new messages