I'm working on a (raw C++) minimal library I've called “stdlib”, a
portable wrapper for the ordinary C++ standard library that
• sets up working Unicode based console i/o for the standard streams,
in particular so they'll work for international text in Windows,
• adds necessary defines for <math.h> to get M_PI etc,
• provides functionality-area headers, e.g. all of i/o,
etc.
Because I realized that this is more fundamental than the Expressive C++
stuff.
For example, the following should work fine with general Unicode text in
Windows:
#include <stdlib/iostream.hpp>
#include <stdlib/string.hpp>
using namespace std;
auto main() -> int
{
cout << "Hi, what’s your name? ";
string name;
getline( cin, name );
cout << "Pleased to meet you, " << name << "!" << endl;
}
The header wrappers used here install custom iostream buffers that do
UTF-8 / UTF-16 conversion via std::codecvt_utf8_utf16<wchar_t>, and
access the Windows console Unicode API directly without dragging in the
<windows.h> header.
And it works nicely with Visual C++, but with g++ I get Chinese or
whatever it is (garbage?), showing in the console as just squares:
[C:\my\dev\libraries\stdlib\examples\hello_world]
> g++ hello_world.cpp && a
䠀椀Ⰰ 眀栀愀琀ᤠ猀 礀漀甀爀 渀愀洀攀㼀 my ÆØÅ-input
倀氀攀愀猀攀搀 琀漀 洀攀攀琀 礀漀甀Ⰰ 洀礀 였�씀ⴀ椀渀瀀甀琀℀
[C:\my\dev\libraries\stdlib\examples\hello_world]
> cl hello_world.cpp /Feb /wd4373 && b
hello_world.cpp
Hi, what’s your name? my ÆØÅ-input
Pleased to meet you, my ÆØÅ-input!
[C:\my\dev\libraries\stdlib\examples\hello_world]
> _
With wide text i/o the thing works also with g++, so it's not the
no-windows.h-binding to the API that's at fault, hence my strong
assumption that it's the UTF-8 / UTF-16 conversion.
The library is header only, at <url:
https://github.com/alf-p-steinbach/stdlib>. The conversion sources are
all in “workarounds” folder. Possibly the bug resides / the bugs reside
in “source/workarounds/impl/windows_console_io/Byte_to_wide_converter.hpp”,
------------------------------------------------------------------------
#pragma once // Source encoding: utf-8 ∩
// #include <stdlib/workarounds/impl/windows_console_io/Codecvt.hpp>
// Copyright © 2017 Alf P. Steinbach, distributed under Boost license 1.0.
#include <codecvt> // std::codecvt_utf8
namespace stdlib{ namespace impl{ namespace windows_console_io{
using std::codecvt_utf8_utf16;
using Codecvt = codecvt_utf8_utf16<wchar_t>;
using Codecvt_result = decltype( Codecvt::ok );
using Codecvt_state = Codecvt::state_type;
}}} // namespace stdlib::impl::windows_console_io
------------------------------------------------------------------------
------------------------------------------------------------------------
#pragma once // Source encoding: utf-8 ∩
// #include
<stdlib/workarounds/impl/windows_console_io/Byte_to_wide_converter.hpp>
// Copyright © 2017 Alf P. Steinbach, distributed under Boost license 1.0.
#include <assert.h> // assert
#include <stdlib/workarounds/impl/Size.hpp> // Size
#include <stdlib/workarounds/impl/windows_console_io/Codecvt.hpp> //
Codecvt, Codecvt_result
#include <stdlib/workarounds/impl/windows_console_io/constants.hpp> //
ascii::del
namespace stdlib{ namespace impl{ namespace windows_console_io{
using std::begin;
using std::copy;
using std::end;
class Byte_to_wide_converter
{
public:
static Size constexpr in_buf_size = general_buffer_size;
private:
Codecvt codecvt_{};
Codecvt_state conversion_state_{}; // mb_state
char in_buf_[in_buf_size];
Size n_buffered_ = 0;
auto start_of_buffer() -> char* { return begin(
in_buf_ ); }
auto put_position() -> char* { return begin(
in_buf_ ) + n_buffered_; }
auto beyond_buffer() -> char const* { return end(
in_buf_ ); }
public:
auto n_buffered() const -> Size { return n_buffered_; }
auto available_space() const -> Size { return in_buf_size -
n_buffered_; }
void add( Size const n, char const* const chars )
{
assert( n <= available_space() );
copy( chars, chars + n, put_position() );
n_buffered_ += n;
}
auto convert_into( wchar_t* const result, Size const result_size )
-> Size
{
char const* p_next_in = start_of_buffer();
wchar_t* p_next_out = result;
for( ;; )
{
auto const p_start_in = p_next_in;
auto const p_start_out = p_next_out;
auto const result_code = static_cast<Codecvt_result>(
codecvt_.in(
conversion_state_,
p_start_in, put_position(), p_next_in, //
begin, end, beyond processed
p_start_out, result + result_size, p_next_out //
begin, end, beyond processed
) );
switch( result_code )
{
case Codecvt::ok:
case Codecvt::partial:
case Codecvt::noconv:
{
copy<char const*>( p_next_in, put_position(),
start_of_buffer() );
n_buffered_ = put_position() - p_next_in;
return p_next_out - result;
}
case Codecvt::error:
{
*p_next_out++ = static_cast<wchar_t>( ascii::del );
break; // p_next_in points past the
offending byte.
}
default:
{
assert(( "Should never get here.", false ));
throw 0;
break;
}
}
}
}
};
}}} // namespace stdlib::impl::windows_console_io
------------------------------------------------------------------------
Maybe someone has encountered the same phenomenon? Or maybe someone just
by looking at it can see a glaringly obvious bug? I know I often fail to
see my own bugs, while I can spot others' bugs easily.
Hopefully!
Cheers!
- Alf