ANSICON vs Console2 vs cmd.exe vs chcp 65001 (was: Proposal to deprecate win32console)

Luis Lavena

unread,

Nov 7, 2010, 8:38:03 PM11/7/10

to rubyin...@googlegroups.com

Hello,

In our previous thread, Bosko exposed an issue with cyrillic and
Unicode (65001) codepage that indicates ANSICON needs more work.

Unable to reproduce at the beginning started to scratch my head until
some considerable amount of hair felt off.

Now, I believe I found what stopped us from having a direct
apples-to-apples comparison.

First we had different codepages, different console managers and
different versions of Ruby. Not only that, but certain things affected
codepage under my environment.

So:

1) Let's create a Unicode (UTF-8) File to use

https://github.com/luislavena/test-unicode

Look at sample.txt

2) Let's compare what Console versus direct cmd.exe have to say about this.

Same font (Consolas) same codepage (1252), same result, see 1.png

3) See what Console vs cmd.exe tell us under UTF-8 (chcp 65001) mode

See attached 2.png, seems Console is unable to handle properly UTF-8
encoding. This discards its usage moving forward for the other tests.

4) What Ruby knows about our current encoding?

puts Encoding.default_external is telling me US-ASCII for both chcp
1252 and chcp 65001. See 3.png

Seems that something was altering the results. Inspecting let me look
for LC_* variables (LC_ALL) set in the environment. Removal of them
let me obtain the proper results:

chcp 1252: Windows-1252
chcp 65001: UTF-8

See 4.png

5) Get some colors (colors.rb)

So now using UTF-8 (chcp 65001) and win32console, you can see the
output in 5.png

Now, loading ansicon (ansicon -p), you can see the output in 6.png
which contains certain garbage.

====

Conclusions:

* Console2 is great for Western/Latin characters, sucks for Unicode
until someone can give some love to it.

* ANSICON needs some encoding love, I emailed the author and until he
setup Git and GitHub account to ease sharing, I took the liberty to
put the code at GitHub: https://github.com/luislavena/ansicon

You will need to use TDM64 to be able to generate both 32 and 64 bits
executables. All my tests were using 64bits processes (except for
Ruby). Please double check that same things happen under 32bits

* Always check your environment for any variable that might be
interfering when testing this type of things.

Finally, last but not least, encoding sucks.

IMPORTANT: moving forward

Until ANSICON gets fixed, win32console seems to get the job done.
Cucumber and RSpec will need to be adapted once ANSICON works, so...

I'm opening a bounty for getting ANSICON working with encoding. I'm
giving 100 USD (via Paypal) to the one that manages adapt ANSICON to
behave properly under this test scenario (sample.txt and colors.rb).
Of course, it doesn't cover all the options, so please feel free to
chime in.

All the code should be put on GitHub as fork of the repo and all the
code changes will be sent to the author for improvement of the
official build.

Cheers,
--
Luis Lavena
AREA 17
-
Perfection in design is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.
Antoine de Saint-Exupéry

1.png

2.png

3.png

4.png

5.png

6.png

Boško Ivanišević

unread,

Nov 8, 2010, 2:39:26 AM11/8/10

to rubyin...@googlegroups.com

On Mon, Nov 8, 2010 at 2:38 AM, Luis Lavena <luisl...@gmail.com> wrote:

Hello,

In our previous thread, Bosko exposed an issue with cyrillic and
Unicode (65001) codepage that indicates ANSICON needs more work.

Unable to reproduce at the beginning started to scratch my head until
some considerable amount of hair felt off.

Now, I believe I found what stopped us from having a direct
apples-to-apples comparison.

First we had different codepages, different console managers and
different versions of Ruby. Not only that, but certain things affected
codepage under my environment.

So:

1) Let's create a Unicode (UTF-8) File to use

https://github.com/luislavena/test-unicode

Look at sample.txt

2) Let's compare what Console versus direct cmd.exe have to say about this.

Same font (Consolas) same codepage (1252), same result, see 1.png

3) See what Console vs cmd.exe tell us under UTF-8 (chcp 65001) mode

See attached 2.png, seems Console is unable to handle properly UTF-8
encoding. This discards its usage moving forward for the other tests.

I think Console is working same as cmd.exe. See attached Console_utf8.png. I'm not sure why these changes occur. I tried to reproduce problem you have with Console but without any success.

4) What Ruby knows about our current encoding?

puts Encoding.default_external is telling me US-ASCII for both chcp
1252 and chcp 65001. See 3.png

Seems that something was altering the results. Inspecting let me look
for LC_* variables (LC_ALL) set in the environment. Removal of them
let me obtain the proper results:

chcp 1252: Windows-1252
chcp 65001: UTF-8

See 4.png

5) Get some colors (colors.rb)

So now using UTF-8 (chcp 65001) and win32console, you can see the
output in 5.png

Now, loading ansicon (ansicon -p), you can see the output in 6.png
which contains certain garbage.

====

Conclusions:

* Console2 is great for Western/Latin characters, sucks for Unicode
until someone can give some love to it.

See above.

* ANSICON needs some encoding love, I emailed the author and until he
setup Git and GitHub account to ease sharing, I took the liberty to
put the code at GitHub: https://github.com/luislavena/ansicon

You will need to use TDM64 to be able to generate both 32 and 64 bits
executables. All my tests were using 64bits processes (except for
Ruby). Please double check that same things happen under 32bits

I built ANSICON during weekend and tried to figure out what is going on. Didn't make some significant progress but I believe I've narrowed down where problem is. In the ANSI.c file in method MyWriteConsoleA conversion from multi-byte to wide char is performed (MultiByteToWideChar) and it seems that ParseAndPrintString after this conversion misinterpret ANSI controls characters and the end of the string, so it prints out characters behind the end of the buffer. Haven't found why this happens and how to solve this so if anyone has any idea...

* Always check your environment for any variable that might be
interfering when testing this type of things.

Finally, last but not least, encoding sucks.

I couldn't agree more :-)

IMPORTANT: moving forward

Until ANSICON gets fixed, win32console seems to get the job done.
Cucumber and RSpec will need to be adapted once ANSICON works, so...

I agree again. Maybe I should send explanation why Cucumber crashes when code page is set to 65001 to Aslak since win32console will stay here till ANSICON gets fixed.

At the end I must say you made excellent summary of all problems. Great work as usual.

--
Regards,
Boško Ivanišević

Console_utf8.png

Vít Ondruch

unread,

Nov 8, 2010, 5:09:16 AM11/8/10

to rubyin...@googlegroups.com

Hello Luis,

The issue of Console2 and the ANSICon is the same from my point of view.
They check for the console codepage at the beginning:

https://github.com/luislavena/ansicon/blob/master/ANSI.c#L1103
http://console.git.sourceforge.net/git/gitweb.cgi?p=console/console;a=blob;f=Console/ConsoleView.cpp;h=3e2ae892a2e1ae0b7ed49400481530661990c428;hb=HEAD#l228

Using CHCP you change the expected format of incoming characters and
obviously Console2 neither ANSICon can react on this change. May be its
their design flaw, since if they would operate always in UTF-16, this
problem would never happen, but I am not sure if that is reasonable idea :)

Vit

Dne 8.11.2010 2:38, Luis Lavena napsal(a):

Vít Ondruch

unread,

Nov 8, 2010, 5:20:07 AM11/8/10

to rubyin...@googlegroups.com

BTW: This is also interesting line:

http://console.git.sourceforge.net/git/gitweb.cgi?p=console/console;a=blob;f=ConsoleHook/ConsoleHandler.cpp;h=175d90efc9c752fa99caa9a36165226c0f0da4d2;hb=HEAD#l190

Is it ansi or wide version? Shouldn't it be always wide and the rest of
application should be adjusted accordingly?

Vit

Dne 8.11.2010 11:09, V�t Ondruch napsal(a):

Luis Lavena

unread,

Nov 8, 2010, 7:31:35 AM11/8/10

to rubyin...@googlegroups.com

2010/11/8 Boško Ivanišević <bosko.iv...@gmail.com>:

>
> I think Console is working same as cmd.exe. See attached Console_utf8.png.
> I'm not sure why these changes occur. I tried to reproduce problem you have
> with Console but without any success.

Would you mind telling me which version of Console2 are you using?
Because 2.00.146 x64 bits is giving me these results.

Also, based on Vit's comments, seems ANSICON and Console only pay
attention to codepage when they start/initialize.

Would you mind telling me which is the default codepage (ACP and OEM)
in the registry of your computer? Also, what "chcp" tell you once you
first open the console.

Thank you.

Luis Lavena

unread,

Nov 8, 2010, 7:34:44 AM11/8/10

to rubyin...@googlegroups.com

On Sun, Nov 7, 2010 at 10:38 PM, Luis Lavena <luisl...@gmail.com> wrote:
> Hello,
>
>

> * ANSICON needs some encoding love, I emailed the author and until he
> setup Git and GitHub account to ease sharing, I took the liberty to
> put the code at GitHub: https://github.com/luislavena/ansicon
>

Jason has just pushed the official repository of ansicon:

https://github.com/adoxa/ansicon

I'm now going to kill mine and use him as reference.

Boško Ivanišević

unread,

Nov 8, 2010, 9:10:12 AM11/8/10

to rubyin...@googlegroups.com

On Mon, Nov 8, 2010 at 1:31 PM, Luis Lavena <luisl...@gmail.com> wrote:

2010/11/8 Boško Ivanišević <bosko.iv...@gmail.com>:

>
> I think Console is working same as cmd.exe. See attached Console_utf8.png.
> I'm not sure why these changes occur. I tried to reproduce problem you have
> with Console but without any success.

Would you mind telling me which version of Console2 are you using?
Because 2.00.146 x64 bits is giving me these results.

My version is 2.00.144 x86 and OS is Windows 7 x64. That's obviously reason why everything is working in Console on my computer :-(

Also, based on Vit's comments, seems ANSICON and Console only pay
attention to codepage when they start/initialize.

Would you mind telling me which is the default codepage (ACP and OEM)
in the registry of your computer? Also, what "chcp" tell you once you
first open the console.

Registry values are:

ACP = 1252

MACCP = 10000

OEMCP = 437

OEMHAL = vgaoem.fon

--
Regards,
Boško Ivanišević

Vít Ondruch

unread,

Nov 12, 2010, 1:39:39 PM11/12/10

to rubyin...@googlegroups.com

FYI: https://github.com/adoxa/ansicon/issues#issue/1

Vit

Dne 8.11.2010 2:38, Luis Lavena napsal(a):

Vít Ondruch

unread,

Nov 12, 2010, 2:26:19 PM11/12/10

to rubyin...@googlegroups.com

Using Console2 2.00.147

Boško Ivanišević

unread,

Nov 12, 2010, 2:37:43 PM11/12/10

to rubyin...@googlegroups.com

On Fri, Nov 12, 2010 at 8:26 PM, Vít Ondruch <v.on...@gmail.com> wrote:

Using Console2 2.00.147

On Windows 7 x64 using Console2 2.00.147 (x86 and x64) with applied patch I have different result (attached). Do you have some additional setting like LC_ALL or similar?

--
Regards,
Boško Ivanišević

Console_utf8.png

Luis Lavena

unread,

Nov 12, 2010, 2:53:36 PM11/12/10

to rubyin...@googlegroups.com

2010/11/12 Boško Ivanišević <bosko.iv...@gmail.com>:

I would suggest testing directly with cmd.exe (Command Prompt) and not
inside Console2, just in case.

Anyhow, I thought chcp needed to be before the the installation of
ansicon, but on Vit example, it is done after.

Vít Ondruch

unread,

Nov 12, 2010, 3:18:44 PM11/12/10

to rubyin...@googlegroups.com

The order doesn't matter. In each call of WriteConsoleA API function the
actual codepage is checked and after that the conversion into UTF-16
takes place.

Vit

Dne 12.11.2010 20:53, Luis Lavena napsal(a):

Vít Ondruch

unread,

Nov 12, 2010, 3:19:57 PM11/12/10

to rubyin...@googlegroups.com

BTW you can always show the CMD.exe window which is associated with
specific tab, and work directly with it.

Dne 12.11.2010 20:53, Luis Lavena napsal(a):

Vít Ondruch

unread,

Nov 12, 2010, 3:22:10 PM11/12/10

to rubyin...@googlegroups.com

Hi Boško,

This reminds me output of original version. I made there one bug which prevented MinGW from compilation, so please make sure that you have pulled my latest version and that you are really using the latest version.

Vit

Dne 12.11.2010 20:37, Boško Ivanišević napsal(a):

--
You received this message because you are subscribed to the Google Groups "RubyInstaller" group.
To post to this group, send email to rubyin...@googlegroups.com.
To unsubscribe from this group, send email to rubyinstalle...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyinstaller?hl=en.

Luis Lavena

unread,

Nov 12, 2010, 3:22:00 PM11/12/10

to rubyin...@googlegroups.com

On Fri, Nov 12, 2010 at 5:19 PM, Vít Ondruch <v.on...@gmail.com> wrote:
> BTW you can always show the CMD.exe window which is associated with specific
> tab, and work directly with it.

I know, but as I shown in my previous post, Console2 was doing weird
things chcp 65001

Luis Lavena

unread,

Nov 12, 2010, 10:10:09 PM11/12/10

to rubyin...@googlegroups.com

Hello, See attached shots.

64bits version of Vit's patch do not work. 32bits one, works.

64.png

32.png

Vít Ondruch

unread,

Nov 12, 2010, 10:44:15 PM11/12/10

to rubyin...@googlegroups.com

Hm, 64bit version built using MinGW does not work for me at all (it
cannot load the ansi64.dll, but it is due to some unicode settings), but
it works for me when I build using MS Visual Studio Express 2010. So I
believe it works. Just the build itself is a bit strange :/

Vit

Dne 13.11.2010 4:10, Luis Lavena napsal(a):

adoxa

unread,

Nov 13, 2010, 2:47:15 AM11/13/10

to RubyInstaller

> So now using UTF-8 (chcp 65001) and win32console, you can see the
> output in 5.png

If you had have tested without win32console you would have noticed
that the same junk occurs. For example, I changed the opening sequence
to "1234" and the closing sequence to "5678" and now it does this
(Ruby 1.9.2p0):

1234This is 56781234a test 56781234line5678
1234Ésta es 567881234una línea 567881234de prueba5678
1234Бошко 5678 56781234Иваниш 5678 567881234евић56785678
1234Это 56786781234тест 567856781234линии567856788
1234Αυτή είναι 5678αι 5678781234μια γραμμή 5678μή
5678781234δοκιμής5678ς567878

This is due to a discrepancy between what is sent (multiple UTF-8
bytes) and what is written (single characters). With a hacked ANSICON
to set the byte count, not the character count, it works as you'd
expect. This really needs to be addressed in Ruby, since ANSICON has
no choice but to follow the API.

--
Jason.

Boško Ivanišević

unread,

Nov 13, 2010, 7:04:52 AM11/13/10

to rubyin...@googlegroups.com

On Sat, Nov 13, 2010 at 8:47 AM, adoxa <jad...@yahoo.com.au> wrote:

> So now using UTF-8 (chcp 65001) and win32console, you can see the
> output in 5.png

If you had have tested without win32console you would have noticed

I'm not sure if I understood what you mean. I have tested without win32console.

that the same junk occurs. For example, I changed the opening sequence
to "1234" and the closing sequence to "5678" and now it does this
(Ruby 1.9.2p0):

1234This is 56781234a test 56781234line5678
1234Ésta es 567881234una línea 567881234de prueba5678
1234Бошко 5678 56781234Иваниш 5678 567881234евић56785678
1234Это 56786781234тест 567856781234линии567856788
1234Αυτή είναι 5678αι 5678781234μια γραμμή 5678μή
5678781234δοκιμής5678ς567878

This is due to a discrepancy between what is sent (multiple UTF-8
bytes) and what is written (single characters). With a hacked ANSICON
to set the byte count, not the character count, it works as you'd
expect. This really needs to be addressed in Ruby, since ANSICON has
no choice but to follow the API.

Can you be a little bit more specific? What should be addressed in Ruby? Since English is not my native spoken language maybe I missed something in your message.What I see from the above result is that you still have double '8' chars in line 2. There is also duplicated closing sequence '5678' in line three. Similarly in lines 4 and 5 there are additional parts of closing sequence that is repeated. I do not say that it is only ANSICON problem. It might be combination of Ruby and ANSICON, but the fact is that colors are not displayed any more.

--
Regards,
Boško Ivanišević

Boško Ivanišević

unread,

Nov 13, 2010, 7:15:25 AM11/13/10

to rubyin...@googlegroups.com

2010/11/13 Boško Ivanišević <bosko.iv...@gmail.com>

Something I forgot to write which might help you to narrow down where problem is. In previous version of ANSICON in the log files there were lines with info about hooking into Ruby dll. This version does not have this. I've added DEBUGSTR in ParseAndPrintString and here is resulted log:

WriteConsoleW: 36 "Microsoft Windows [Version 6.1.7600]"

Parsing string with length (36)

WriteConsoleW: 2 "\r\n"

Parsing string with length (2)

Parsing string with length (63)

WriteConsoleW: 2 "\r\n"

Parsing string with length (2)

WriteConsoleW: 2 "\r\n"

Parsing string with length (2)

WriteConsoleW: 30 "D:\projects\ruby\test-unicode>"

Parsing string with length (30)

WriteConsoleW: 14 "ruby colors.rb"

Parsing string with length (14)

CreateProcessW: "C:\Ruby\192\bin\ruby.exe", "ruby colors.rb"

Hooking in NTDLL.DLL (LoadLibraryW)

WriteConsoleW: 2 "\r\n"

Parsing string with length (2)

WriteConsoleW: 30 "D:\projects\ruby\test-unicode>"

Parsing string with length (30)

As you can see after CreateProcessW where Ruby is started ParseAndPrintString is not called.

--
Regards,
Boško Ivanišević

Boško Ivanišević

unread,

Nov 13, 2010, 7:35:56 AM11/13/10

to rubyin...@googlegroups.com

2010/11/13 Boško Ivanišević <bosko.iv...@gmail.com>

Just to have it here too. After applying commit cabaa575 to Ruby branch everything is working. I've tested ANSICON in cmd.exe and in x86 and x64 bit versions of Console2 2.0.0.147. Congratulations once again! This is really amazing pars of software.

--
Regards,
Boško Ivanišević

Luis Lavena

unread,

Nov 13, 2010, 9:03:12 AM11/13/10

to rubyin...@googlegroups.com

2010/11/13 Boško Ivanišević <bosko.iv...@gmail.com>:

>
> Just to have it here too. After applying commit cabaa575 to Ruby branch
> everything is working. I've tested ANSICON in cmd.exe and in x86 and x64 bit
> versions of Console2 2.0.0.147. Congratulations once again! This is really
> amazing pars of software.
>

I can confirm the results on 64bits cmd.exe and and 32bits cmd.exe

I'll try to upgrade Console2 and see the results.

Jason: are you planning an new release of ansicon? I would like to
send you a pull request to make it more easy for you do releases :-)

Also, there are 100 USD I offered to the one able to fix this issue,
which was you :-)

Please email me directly (you know the address) and send me your
Paypal email so I can transfer the money.

Vit and Bosko, than you for your help and contributions getting this to work.

adoxa

unread,

Nov 13, 2010, 9:26:49 AM11/13/10

to RubyInstaller

On Nov 14, 12:03 am, Luis Lavena <luislav...@gmail.com> wrote:
> Jason: are you planning an new release of ansicon?

I don't like this method (as mentioned in the issue), so I won't
release it as it is. It is not really an ANSICON problem, but a Ruby
(or Windows API) problem. I could release the API-compatible version
on my site and let Ruby distribute its own incompatible version. Or I
could do the ANSICON_OVERRIDE (or maybe ANSICON_API) environment
variable to provide flexibility for other programs with the same
problem (without the variable you get the junk; do `set
ANSICON_API=ruby.exe;some_other_program.exe` and it goes away, without
worrying about breaking compatibility with anything else). Or maybe
something else?

--
Jason.

Luis Lavena

unread,

Nov 13, 2010, 9:30:19 AM11/13/10

to rubyin...@googlegroups.com

Sorry, but I don't follow, you mean this issue?

https://github.com/adoxa/ansicon/issues#issue/1

As far I can tell, 32bits Ruby running under 64bits ANSICON is not
working, correct?

Can you explain better what are the Ruby issues? I can work on an
improvement for Ruby, after all, got my commits rights a few weeks
ago.

Thank you.

Jason Hood

unread,

Nov 13, 2010, 10:03:13 AM11/13/10

to rubyin...@googlegroups.com

On 14/11/2010 0:30, Luis Lavena wrote:
> As far I can tell, 32bits Ruby running under 64bits ANSICON is not
> working, correct?

I think Boï¿½ko said it's working now (I forgot to change another LLA to
LLW for the issue 2 fix).

> Can you explain better what are the Ruby issues? I can work on an

The original problem is actually two problems. The trailing garbage
at the end of UTF-8 text was a fault of my ANSI to Unicode conversion -
that has been fixed and will be in 1.31. The "inline" garbage shown
by this thread is actually to do with the API call - it has nothing
to do with ANSICON at all. The problem is due to UTF-8 consisting of
multiple bytes of input, but only single characters as output. For
sample.txt, this means WriteFile (probably via fwrite, via printf or
whatever) is writing 209 bytes (in total), but only receiving 155 as
written. Ruby must be detecting that what was written doesn't match
what was sent, so it sends the remainder again, hence the garbage. The
Ruby branch of ANSICON simply sets written to sent, breaking API
compatibility, which is why I won't release it as is. Without ANSICON,
Ruby itself is still going to display garbage, so the real fix needs to
be in Ruby (or make ANSICON mandatory). Perhaps you could detect the
write is going to the console and just assume it all gets sent.

--
Jason.

Boško Ivanišević

unread,

Nov 13, 2010, 10:33:12 AM11/13/10

to rubyin...@googlegroups.com

On Sat, Nov 13, 2010 at 3:30 PM, Luis Lavena <luisl...@gmail.com> wrote:

On Sat, Nov 13, 2010 at 11:26 AM, adoxa <jad...@yahoo.com.au> wrote:
> On Nov 14, 12:03 am, Luis Lavena <luislav...@gmail.com> wrote:
>> Jason: are you planning an new release of ansicon?
>
> I don't like this method (as mentioned in the issue), so I won't
> release it as it is. It is not really an ANSICON problem, but a Ruby
> (or Windows API) problem. I could release the API-compatible version
> on my site and let Ruby distribute its own incompatible version. Or I
> could do the ANSICON_OVERRIDE (or maybe ANSICON_API) environment
> variable to provide flexibility for other programs with the same
> problem (without the variable you get the junk; do `set
> ANSICON_API=ruby.exe;some_other_program.exe` and it goes away, without
> worrying about breaking compatibility with anything else). Or maybe
> something else?
>

Sorry, but I don't follow, you mean this issue?

https://github.com/adoxa/ansicon/issues#issue/1

As far I can tell, 32bits Ruby running under 64bits ANSICON is not
working, correct?

Can you explain better what are the Ruby issues? I can work on an
improvement for Ruby, after all, got my commits rights a few weeks
ago.

I believe Jason thinks on commit https://github.com/adoxa/ansicon/commit/912a68b6a5f13ebc4dc8a544d7c8de233986e129 in ruby branch. If https://github.com/adoxa/ansicon/commit/cabaa57578618d2fde10a5657726f4a6c43eccbf is cherry picked on top of ruby branch ansicon works with Ruby. And, as you can see, in the ruby branch he is checking whether module is 'ruby.exe' and if it is number of bytes is returned as a result of method MyWriteFile instead of number of characters.

--
Regards,
Boško Ivanišević

Luis Lavena

unread,

Nov 13, 2010, 10:37:11 AM11/13/10

to rubyin...@googlegroups.com

2010/11/13 Boško Ivanišević <bosko.iv...@gmail.com>:

> I believe Jason thinks on
> commit https://github.com/adoxa/ansicon/commit/912a68b6a5f13ebc4dc8a544d7c8de233986e129 in
> ruby branch.
> If https://github.com/adoxa/ansicon/commit/cabaa57578618d2fde10a5657726f4a6c43eccbf is
> cherry picked on top of ruby branch ansicon works with Ruby. And, as you can
> see, in the ruby branch he is checking whether module is 'ruby.exe' and if
> it is number of bytes is returned as a result of method MyWriteFile instead
> of number of characters.

Understand that, but if you guys can work on a example that shows what
Ruby is doing wrong, minimal example of "ruby -ve" type of command I
can debug it and perhaps fix it's assumption so next release of Ruby
does properly.

I've done my testing with 1.9.2, but using 1.8.7 should be considered too, no?

Thank you.

PS: Sorry to be such stubborn but trying to get all the dots and lines right.

Jason Hood

unread,

Nov 13, 2010, 11:25:52 AM11/13/10

to rubyin...@googlegroups.com

On 14/11/2010 1:37, Luis Lavena wrote:
>
> Ruby is doing wrong, minimal example of "ruby -ve" type of command I

Typing UTF-8 on the command line is problematic, so a file is the
way to go, which is what I've attached. WriteFile is being sent a
buffer of 63 bytes, eight of which are UTF-8 multibyte sequences for
four characters, so only 59 bytes are being written. 63 - 59 = 4, so
the last four characters are resent.

--
Jason.

utf-8.rb

Boško Ivanišević

unread,

Nov 13, 2010, 11:32:25 AM11/13/10

to rubyin...@googlegroups.com

On Sat, Nov 13, 2010 at 4:37 PM, Luis Lavena <luisl...@gmail.com> wrote:

2010/11/13 Boško Ivanišević <bosko.iv...@gmail.com>:

> I believe Jason thinks on
> commit https://github.com/adoxa/ansicon/commit/912a68b6a5f13ebc4dc8a544d7c8de233986e129 in
> ruby branch.
> If https://github.com/adoxa/ansicon/commit/cabaa57578618d2fde10a5657726f4a6c43eccbf is
> cherry picked on top of ruby branch ansicon works with Ruby. And, as you can
> see, in the ruby branch he is checking whether module is 'ruby.exe' and if
> it is number of bytes is returned as a result of method MyWriteFile instead
> of number of characters.

Understand that, but if you guys can work on a example that shows what
Ruby is doing wrong, minimal example of "ruby -ve" type of command I
can debug it and perhaps fix it's assumption so next release of Ruby
does properly.

I'm trying to figure out why Jason thinks ruby is a source of problems. What I see from the code is that MyWriteFile calls MyWriteConsoleA with number of bytes to write. Then MyWriteConsoleA calls ParseAndPrintString with the length got as a result of MultiByteToWideChar conversion. ParseAndPrintString actually sets number of chars written (lpNumberOfCharsWritten). Jason's fix for Ruby just sets instead of number of chars, number of bytes written. On the other side WriteFile function in Windows API sets out parameter lpNumberOfBytesWritten to number of bytes written not number of chars written, so I do not think this 'Ruby fix' is actually doing anything wrong, although I might be wrong.

I've done my testing with 1.9.2, but using 1.8.7 should be considered too, no?

Ruby branch with additional commit works with Ruby 1.8.7 too.

--
Regards,
Boško Ivanišević

Boško Ivanišević

unread,

Nov 13, 2010, 11:50:24 AM11/13/10

to rubyin...@googlegroups.com

That's because counting number of bytes and number of chars are mixed. In the case of utf8 some characters are represented with one byte and others with two.

--
Regards,
Boško Ivanišević

Vít Ondruch

unread,

Nov 13, 2010, 12:53:38 PM11/13/10

to rubyin...@googlegroups.com

This is from my point of view really wrong understanding of problem and
wrongly designed Windows API. The name of the parameter should be
something like "numberOfInputBytesWritten" instead of
"lpNumberOfCharsWritten". The return value should be correlated with
input bytes count, i.e. returning the number of incoming characters is
the only one correct.

In ideal world, you should do something like:

1) Get a buffer
2) Convert it from arbitrary encoding into UTF-16
3) Do some processing, probably output the characters.
4) Do counter transformation from UTF-16 to original encoding
5) Now count the bytes

But anyway this will not work, since you already removed the escape
characters from the original string! So once more, you have to think
about this parameter just in bytes of processed input, not in number of
output characters.

Vit

Dne 13.11.2010 16:03, Jason Hood napsal(a):

> On 14/11/2010 0:30, Luis Lavena wrote:
>> As far I can tell, 32bits Ruby running under 64bits ANSICON is not
>> working, correct?
>

> I think Boško said it's working now (I forgot to change another LLA to

Vít Ondruch

unread,

Nov 13, 2010, 6:34:09 PM11/13/10

to rubyin...@googlegroups.com

I had time today and recheck my yesterdays version and it works like a charm. So it was just some strange build issue on your side I guess (probably ansicon loaded somewhere in memory and protected by OS from rewrite). You can check it once more from my utf-8_fix branch: https://github.com/voxik/ansicon/commits/utf-8_fix

Vit

Dne 12.11.2010 20:37, Boško Ivanišević napsal(a):

Vít Ondruch

unread,

Nov 13, 2010, 7:28:27 PM11/13/10

to rubyin...@googlegroups.com

It is a build issue. Try it once more please. The dll is loaded
somewhere in memory. Search for it using Process Explorer.

Vit

Dne 13.11.2010 4:10, Luis Lavena napsal(a):

Vít Ondruch

unread,

Nov 13, 2010, 8:43:09 PM11/13/10

to rubyin...@googlegroups.com

Now I understand why it worked yesterday for me and not for you! Because
the original x64\ANSI32.dll was never replaced by the updated version
from x86 folder! While I was using cmd.exe and plain TDM and everything
wored, you have used devkit and MSys. And the line commented in here
https://github.com/adoxa/ansicon/commit/212adb101b0639d3f8dddcb3942854061d1abd08#commitcomment-193322
made the issue.

This sucks ... Sorry ... to late, going to sleep

Vit

Dne 14.11.2010 1:28, Vít Ondruch napsal(a):

Reply all

Reply to author

Forward