I'm trying to convert an UTF-8 character sequence into wchar_t using
MultiByteToWideChar API call with CP_UTF8 in the code page argument.
The result wchar_t is not correct. I'm wondering if there's any
special requirement of using this API?
And also, is there any simple function can do that? Preferablly using
std::codecvt. Examples are appreciated!
Regards,
Bill
Yes, providing some more details or some source code when you post to a
newsgroup :).
> And also, is there any simple function can do that? Preferablly using
> std::codecvt. Examples are appreciated!
I put some code together (attached) to convert between UTF-8 and TCHAR, to
be used by the standard locale functions. It's pretty evil.
Here's an example of its use in loading a UTF8-formatted text file:
std::basic_ifstream<TCHAR> str;
CharSetFacet *the_facet = new CharSetFacet(CP_UTF8);
std::locale loc(std::locale::empty());
str.imbue(std::_ADDFAC(loc, the_facet));
str.open(T2CA(m_filename.c_str()));
--
Tim Robinson (MVP, Windows SDK)
http://www.themobius.co.uk/
class CharSetFacet : public std::codecvt<TCHAR, char, mbstate_t>
{
protected:
UINT m_cp;
CPINFO m_info;
typedef TCHAR from_type;
typedef char to_type;
public:
enum { id = 1 };
CharSetFacet(UINT cp)
{
m_cp = cp;
GetCPInfo(cp, &m_info);
}
protected:
result do_in(state_type& _State,
const to_type *first1, const to_type *last1, const to_type *& next1,
from_type *first2, from_type *last2, from_type *& next2) const
{
#ifdef UNICODE
if (MultiByteToWideChar(m_cp, 0, first1, last1 - first1,
first2, last2 - first2) == 0)
{
switch (GetLastError())
{
case ERROR_INSUFFICIENT_BUFFER:
case 0:
return partial;
default:
*first2 = *first1;
return ok;
}
}
else
return ok;
#else
memcpy(first2, first1, last2 - first2);
return ok;
#endif
}
virtual result do_out(state_type& state_type,
const from_type *first1, const from_type *last1, const
from_type *next1,
to_type *first2, to_type *last2, to_type *next2)
{
#ifdef UNICODE
if (WideCharToMultiByte(m_cp, 0, first1, last1 - first1,
first2, last2 - first2, NULL, NULL) == 0)
{
switch (GetLastError())
{
case ERROR_INSUFFICIENT_BUFFER:
case 0:
return partial;
default:
return error;
}
}
else
return ok;
#else
memcpy(first2, first1, last2 - first2);
return ok;
#endif
}
virtual bool do_always_noconv() const throw()
{
return m_info.MaxCharSize == 1;
}
virtual int do_max_length() const throw()
{
return m_info.MaxCharSize;
}
virtual int do_encoding() const throw()
{
return 0;
}
virtual int do_length(state_type& state_type, from_type *first1,
const from_type *last1, size_t len2) const throw()
{
#ifdef UNICODE
return WideCharToMultiByte(m_cp, 0, first1, last1 - first1,
NULL, 0, NULL, NULL);
#else
return len2;
#endif
}
};
The problem is that
std::basic_istringstream<TCHAR> str
is actually defined as
std::basic_istringstream<wchar_t> str
when UNICODE is defined, which makes it impossible for str to .read()
non-wchar_t characters.
How can I feed char* characters into the stream?
Regards,
Bill
"Tim Robinson" <tim.at.gaat.f...@invalid.com> wrote in message news:<bthnnh$7ctcn$1...@ID-103400.news.uni-berlin.de>...
> Yes, providing some more details or some source code when you post to a
> newsgroup :).
>
If you really did want to read or write chars through str, you'd have to
convert them to UCS-2, then call methods on str, which would convert them to
UTF-8 using CharSetFacet.
--
Tim Robinson (MVP, Windows SDK)
http://www.themobius.co.uk/
"Bill Holt" <mail...@21cn.com> wrote in message
news:74e0d43f.0401...@posting.google.com...