In our previous thread, Bosko exposed an issue with cyrillic and
Unicode (65001) codepage that indicates ANSICON needs more work.
Unable to reproduce at the beginning started to scratch my head until
some considerable amount of hair felt off.
Now, I believe I found what stopped us from having a direct
apples-to-apples comparison.
First we had different codepages, different console managers and
different versions of Ruby. Not only that, but certain things affected
codepage under my environment.
So:
1) Let's create a Unicode (UTF-8) File to use
https://github.com/luislavena/test-unicode
Look at sample.txt
2) Let's compare what Console versus direct cmd.exe have to say about this.
Same font (Consolas) same codepage (1252), same result, see 1.png
3) See what Console vs cmd.exe tell us under UTF-8 (chcp 65001) mode
See attached 2.png, seems Console is unable to handle properly UTF-8
encoding. This discards its usage moving forward for the other tests.
4) What Ruby knows about our current encoding?
puts Encoding.default_external is telling me US-ASCII for both chcp
1252 and chcp 65001. See 3.png
Seems that something was altering the results. Inspecting let me look
for LC_* variables (LC_ALL) set in the environment. Removal of them
let me obtain the proper results:
chcp 1252: Windows-1252
chcp 65001: UTF-8
See 4.png
5) Get some colors (colors.rb)
So now using UTF-8 (chcp 65001) and win32console, you can see the
output in 5.png
Now, loading ansicon (ansicon -p), you can see the output in 6.png
which contains certain garbage.
====
Conclusions:
* Console2 is great for Western/Latin characters, sucks for Unicode
until someone can give some love to it.
* ANSICON needs some encoding love, I emailed the author and until he
setup Git and GitHub account to ease sharing, I took the liberty to
put the code at GitHub: https://github.com/luislavena/ansicon
You will need to use TDM64 to be able to generate both 32 and 64 bits
executables. All my tests were using 64bits processes (except for
Ruby). Please double check that same things happen under 32bits
* Always check your environment for any variable that might be
interfering when testing this type of things.
Finally, last but not least, encoding sucks.
IMPORTANT: moving forward
Until ANSICON gets fixed, win32console seems to get the job done.
Cucumber and RSpec will need to be adapted once ANSICON works, so...
I'm opening a bounty for getting ANSICON working with encoding. I'm
giving 100 USD (via Paypal) to the one that manages adapt ANSICON to
behave properly under this test scenario (sample.txt and colors.rb).
Of course, it doesn't cover all the options, so please feel free to
chime in.
All the code should be put on GitHub as fork of the repo and all the
code changes will be sent to the author for improvement of the
official build.
Cheers,
--
Luis Lavena
AREA 17
-
Perfection in design is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.
Antoine de Saint-Exupéry
Hello,
In our previous thread, Bosko exposed an issue with cyrillic and
Unicode (65001) codepage that indicates ANSICON needs more work.
Unable to reproduce at the beginning started to scratch my head until
some considerable amount of hair felt off.
Now, I believe I found what stopped us from having a direct
apples-to-apples comparison.
First we had different codepages, different console managers and
different versions of Ruby. Not only that, but certain things affected
codepage under my environment.
So:
1) Let's create a Unicode (UTF-8) File to use
https://github.com/luislavena/test-unicode
Look at sample.txt
2) Let's compare what Console versus direct cmd.exe have to say about this.
Same font (Consolas) same codepage (1252), same result, see 1.png
3) See what Console vs cmd.exe tell us under UTF-8 (chcp 65001) mode
See attached 2.png, seems Console is unable to handle properly UTF-8
encoding. This discards its usage moving forward for the other tests.
4) What Ruby knows about our current encoding?
puts Encoding.default_external is telling me US-ASCII for both chcp
1252 and chcp 65001. See 3.png
Seems that something was altering the results. Inspecting let me look
for LC_* variables (LC_ALL) set in the environment. Removal of them
let me obtain the proper results:
chcp 1252: Windows-1252
chcp 65001: UTF-8
See 4.png
5) Get some colors (colors.rb)
So now using UTF-8 (chcp 65001) and win32console, you can see the
output in 5.png
Now, loading ansicon (ansicon -p), you can see the output in 6.png
which contains certain garbage.
====
Conclusions:
* Console2 is great for Western/Latin characters, sucks for Unicode
until someone can give some love to it.
* ANSICON needs some encoding love, I emailed the author and until he
setup Git and GitHub account to ease sharing, I took the liberty to
put the code at GitHub: https://github.com/luislavena/ansicon
You will need to use TDM64 to be able to generate both 32 and 64 bits
executables. All my tests were using 64bits processes (except for
Ruby). Please double check that same things happen under 32bits
* Always check your environment for any variable that might be
interfering when testing this type of things.
Finally, last but not least, encoding sucks.
IMPORTANT: moving forward
Until ANSICON gets fixed, win32console seems to get the job done.
Cucumber and RSpec will need to be adapted once ANSICON works, so...
The issue of Console2 and the ANSICon is the same from my point of view.
They check for the console codepage at the beginning:
https://github.com/luislavena/ansicon/blob/master/ANSI.c#L1103
http://console.git.sourceforge.net/git/gitweb.cgi?p=console/console;a=blob;f=Console/ConsoleView.cpp;h=3e2ae892a2e1ae0b7ed49400481530661990c428;hb=HEAD#l228
Using CHCP you change the expected format of incoming characters and
obviously Console2 neither ANSICon can react on this change. May be its
their design flaw, since if they would operate always in UTF-16, this
problem would never happen, but I am not sure if that is reasonable idea :)
Vit
Dne 8.11.2010 2:38, Luis Lavena napsal(a):
Is it ansi or wide version? Shouldn't it be always wide and the rest of
application should be adjusted accordingly?
Vit
Dne 8.11.2010 11:09, V�t Ondruch napsal(a):
Would you mind telling me which version of Console2 are you using?
Because 2.00.146 x64 bits is giving me these results.
Also, based on Vit's comments, seems ANSICON and Console only pay
attention to codepage when they start/initialize.
Would you mind telling me which is the default codepage (ACP and OEM)
in the registry of your computer? Also, what "chcp" tell you once you
first open the console.
Thank you.
Jason has just pushed the official repository of ansicon:
https://github.com/adoxa/ansicon
I'm now going to kill mine and use him as reference.
2010/11/8 Boško Ivanišević <bosko.iv...@gmail.com>:
>Would you mind telling me which version of Console2 are you using?
> I think Console is working same as cmd.exe. See attached Console_utf8.png.
> I'm not sure why these changes occur. I tried to reproduce problem you have
> with Console but without any success.
Because 2.00.146 x64 bits is giving me these results.
Also, based on Vit's comments, seems ANSICON and Console only pay
attention to codepage when they start/initialize.
Would you mind telling me which is the default codepage (ACP and OEM)
in the registry of your computer? Also, what "chcp" tell you once you
first open the console.
Using Console2 2.00.147
I would suggest testing directly with cmd.exe (Command Prompt) and not
inside Console2, just in case.
Anyhow, I thought chcp needed to be before the the installation of
ansicon, but on Vit example, it is done after.
Vit
Dne 12.11.2010 20:53, Luis Lavena napsal(a):
Dne 12.11.2010 20:53, Luis Lavena napsal(a):
--
You received this message because you are subscribed to the Google Groups "RubyInstaller" group.
To post to this group, send email to rubyin...@googlegroups.com.
To unsubscribe from this group, send email to rubyinstalle...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyinstaller?hl=en.
I know, but as I shown in my previous post, Console2 was doing weird
things chcp 65001
Vit
Dne 13.11.2010 4:10, Luis Lavena napsal(a):
> So now using UTF-8 (chcp 65001) and win32console, you can see theIf you had have tested without win32console you would have noticed
> output in 5.png
that the same junk occurs. For example, I changed the opening sequence
to "1234" and the closing sequence to "5678" and now it does this
(Ruby 1.9.2p0):
1234This is 56781234a test 56781234line5678
1234Ésta es 567881234una línea 567881234de prueba5678
1234Бошко 5678 56781234Иваниш 5678 567881234евић56785678
1234Это 56786781234тест 567856781234линии567856788
1234Αυτή είναι 5678αι 5678781234μια γραμμή 5678μή
5678781234δοκιμής5678ς567878
This is due to a discrepancy between what is sent (multiple UTF-8
bytes) and what is written (single characters). With a hacked ANSICON
to set the byte count, not the character count, it works as you'd
expect. This really needs to be addressed in Ruby, since ANSICON has
no choice but to follow the API.
I can confirm the results on 64bits cmd.exe and and 32bits cmd.exe
I'll try to upgrade Console2 and see the results.
Jason: are you planning an new release of ansicon? I would like to
send you a pull request to make it more easy for you do releases :-)
Also, there are 100 USD I offered to the one able to fix this issue,
which was you :-)
Please email me directly (you know the address) and send me your
Paypal email so I can transfer the money.
Vit and Bosko, than you for your help and contributions getting this to work.
Sorry, but I don't follow, you mean this issue?
https://github.com/adoxa/ansicon/issues#issue/1
As far I can tell, 32bits Ruby running under 64bits ANSICON is not
working, correct?
Can you explain better what are the Ruby issues? I can work on an
improvement for Ruby, after all, got my commits rights a few weeks
ago.
Thank you.
I think Bo�ko said it's working now (I forgot to change another LLA to
LLW for the issue 2 fix).
> Can you explain better what are the Ruby issues? I can work on an
The original problem is actually two problems. The trailing garbage
at the end of UTF-8 text was a fault of my ANSI to Unicode conversion -
that has been fixed and will be in 1.31. The "inline" garbage shown
by this thread is actually to do with the API call - it has nothing
to do with ANSICON at all. The problem is due to UTF-8 consisting of
multiple bytes of input, but only single characters as output. For
sample.txt, this means WriteFile (probably via fwrite, via printf or
whatever) is writing 209 bytes (in total), but only receiving 155 as
written. Ruby must be detecting that what was written doesn't match
what was sent, so it sends the remainder again, hence the garbage. The
Ruby branch of ANSICON simply sets written to sent, breaking API
compatibility, which is why I won't release it as is. Without ANSICON,
Ruby itself is still going to display garbage, so the real fix needs to
be in Ruby (or make ANSICON mandatory). Perhaps you could detect the
write is going to the console and just assume it all gets sent.
--
Jason.
On Sat, Nov 13, 2010 at 11:26 AM, adoxa <jad...@yahoo.com.au> wrote:Sorry, but I don't follow, you mean this issue?
> On Nov 14, 12:03 am, Luis Lavena <luislav...@gmail.com> wrote:
>> Jason: are you planning an new release of ansicon?
>
> I don't like this method (as mentioned in the issue), so I won't
> release it as it is. It is not really an ANSICON problem, but a Ruby
> (or Windows API) problem. I could release the API-compatible version
> on my site and let Ruby distribute its own incompatible version. Or I
> could do the ANSICON_OVERRIDE (or maybe ANSICON_API) environment
> variable to provide flexibility for other programs with the same
> problem (without the variable you get the junk; do `set
> ANSICON_API=ruby.exe;some_other_program.exe` and it goes away, without
> worrying about breaking compatibility with anything else). Or maybe
> something else?
>
As far I can tell, 32bits Ruby running under 64bits ANSICON is not
working, correct?
Can you explain better what are the Ruby issues? I can work on an
improvement for Ruby, after all, got my commits rights a few weeks
ago.
Understand that, but if you guys can work on a example that shows what
Ruby is doing wrong, minimal example of "ruby -ve" type of command I
can debug it and perhaps fix it's assumption so next release of Ruby
does properly.
I've done my testing with 1.9.2, but using 1.8.7 should be considered too, no?
Thank you.
PS: Sorry to be such stubborn but trying to get all the dots and lines right.
Typing UTF-8 on the command line is problematic, so a file is the
way to go, which is what I've attached. WriteFile is being sent a
buffer of 63 bytes, eight of which are UTF-8 multibyte sequences for
four characters, so only 59 bytes are being written. 63 - 59 = 4, so
the last four characters are resent.
--
Jason.
2010/11/13 Boško Ivanišević <bosko.iv...@gmail.com>:
> I believe Jason thinks onUnderstand that, but if you guys can work on a example that shows what
> commit https://github.com/adoxa/ansicon/commit/912a68b6a5f13ebc4dc8a544d7c8de233986e129 in
> ruby branch.
> If https://github.com/adoxa/ansicon/commit/cabaa57578618d2fde10a5657726f4a6c43eccbf is
> cherry picked on top of ruby branch ansicon works with Ruby. And, as you can
> see, in the ruby branch he is checking whether module is 'ruby.exe' and if
> it is number of bytes is returned as a result of method MyWriteFile instead
> of number of characters.
Ruby is doing wrong, minimal example of "ruby -ve" type of command I
can debug it and perhaps fix it's assumption so next release of Ruby
does properly.
I've done my testing with 1.9.2, but using 1.8.7 should be considered too, no?
In ideal world, you should do something like:
1) Get a buffer
2) Convert it from arbitrary encoding into UTF-16
3) Do some processing, probably output the characters.
4) Do counter transformation from UTF-16 to original encoding
5) Now count the bytes
But anyway this will not work, since you already removed the escape
characters from the original string! So once more, you have to think
about this parameter just in bytes of processed input, not in number of
output characters.
Vit
Dne 13.11.2010 16:03, Jason Hood napsal(a):
> On 14/11/2010 0:30, Luis Lavena wrote:
>> As far I can tell, 32bits Ruby running under 64bits ANSICON is not
>> working, correct?
>
> I think Boško said it's working now (I forgot to change another LLA to
Vit
Dne 13.11.2010 4:10, Luis Lavena napsal(a):
This sucks ... Sorry ... to late, going to sleep
Vit
Dne 14.11.2010 1:28, Vít Ondruch napsal(a):