Per program encoding setting

200 views
Skip to first unread message

br.rena...@gmail.com

unread,
Nov 11, 2012, 8:58:06 PM11/11/12
to mintty-...@googlegroups.com
Hi,

First of all, mintty is awesome, thanks. I have some small issues once in a while, but the one that comes to mind right now is output encoding.

I leave the locale and chaset configurations empty, and it works nicely, I think it uses Windows-1252 or ISO-8859-1. However, I wanted programs with different encoding output working nicely too. For example, cmd.exe uses CP850, and unix-like programs commonly use UTF-8 (actually, I don't know how MSYS utilities treat encoding exactly, since I don't have a sample output with non-ASCII chars to show you, but either way, if MSYS ever uses UTF-8 somewhere, that won't work in MinTTY automatically).

I'm thinking of a simple approach based on directories and files, for example:

* Any program from C:\Windows is considered to output ISO-8859-1
* Any program from C:\SomeUTF8Program is considered as UTF-8
* Output of C:\Windows\System32\cmd.exe will be CP850

So these wouldn't be hard-coded rules, maybe there's some basics MinTTY could include by default (for example the cmd.exe rule), but we users should be able to customize the settings to our needs. I think it would be awesome and not hard to implement.

By the way, I know this should be a bug report maybe, but I just want to inform that if ~/.minttyrc is hidden, we can only read settings not save. It is hidden in my installation because C:/Users is mounted at /home, and I don't like those hidden files showing up in my profile. Not a big issue, I can "attrib -h ~/.minttyrc", and rollback after saving settings, but that's it. The error message is permission denied, not accurate but as I said toggling the hide flag fixes the problem.

Thanks in advance!

Andy Koppe

unread,
Nov 17, 2012, 9:26:54 AM11/17/12
to mintty
On Nov 12, 1:58 am, br.renatosilva wrote:
> First of all, mintty is awesome, thanks. I have some small issues once in a
> while, but the one that comes to mind right now is output encoding.
>
> I leave the locale and chaset configurations empty, and it works nicely, I
> think it uses Windows-1252 or ISO-8859-1.

In that case, if a locale is set in the environment (through LC_ALL,
LC_CTYPE or LANG), mintty uses that. Otherwise, it uses the platform
default. For Cygwin 1.7, that's UTF-8, whereas for 1.5 and MSYS, it's
the system's "ANSI" codepage, which indeed is CP1252 for Western
European languages.

> However, I wanted programs with
> different encoding output working nicely too. For example, cmd.exe uses
> CP850, and unix-like programs commonly use UTF-8 (actually, I don't know
> how MSYS utilities treat encoding exactly, since I don't have a sample
> output with non-ASCII chars to show you, but either way, if MSYS ever uses
> UTF-8 somewhere, that won't work in MinTTY automatically).

MSYS and Cygwin 1.5 have very little locale/encoding support.
Basically, they assume singlebyte encodings and default to the
system's ANSI codepage. For Cygwin 1.5, there's a $CYGWIN option for
switching to the OEM codepage, which means CP850 in your case. Not
sure whether MSYS has that option too. (Cygwin 1.7 has full locale
support, supporting lots of encodings.)

> I'm thinking of a simple approach based on directories and files, for
> example:
>
> * Any program from C:\Windows is considered to output ISO-8859-1
> * Any program from C:\SomeUTF8Program is considered as UTF-8
> * Output of C:\Windows\System32\cmd.exe will be CP850

I'm afraid that's not feasible, because mintty doesn't know what
program each bit of output comes from. It's not mintty that processes
your commands, but the shell invoked inside it (usually bash).

However, you can switch the codepage used by cmd.exe and other "DOS"
programs using the 'chcp' command, e.g.:

$ /Windows/System32/chcp 1252
Active code page: 1252


> By the way, I know this should be a bug report maybe

Yep.

> but I just want to
> inform that if ~/.minttyrc is hidden, we can only read settings not save.
> It is hidden in my installation because C:/Users is mounted at /home, and I
> don't like those hidden files showing up in my profile. Not a big issue, I
> can "attrib -h ~/.minttyrc", and rollback after saving settings, but that's
> it. The error message is permission denied, not accurate but as I said
> toggling the hide flag fixes the problem.

This doesn't happen with Cygwin 1.7, so I expect it's an MSYS oddity.

Andy

Renato Silva

unread,
Nov 18, 2012, 11:28:43 AM11/18/12
to mintty-...@googlegroups.com
Em sábado, 17 de novembro de 2012 12h26min55s UTC-2, Andy Koppe escreveu:
On Nov 12, 1:58 am, br.renatosilva wrote:
> First of all, mintty is awesome, thanks. I have some small issues once in a
> while, but the one that comes to mind right now is output encoding.
>
> I leave the locale and chaset configurations empty, and it works nicely, I
> think it uses Windows-1252 or ISO-8859-1.

In that case, if a locale is set in the environment (through LC_ALL,
LC_CTYPE or LANG), mintty uses that. Otherwise, it uses the platform
default. For Cygwin 1.7, that's UTF-8, whereas for 1.5 and MSYS, it's
the system's "ANSI" codepage, which indeed is CP1252 for Western
European languages.

> However, I wanted programs with
> different encoding output working nicely too. For example, cmd.exe uses
> CP850, and unix-like programs commonly use UTF-8 (actually, I don't know
> how MSYS utilities treat encoding exactly, since I don't have a sample
> output with non-ASCII chars to show you, but either way, if MSYS ever uses
> UTF-8 somewhere, that won't work in MinTTY automatically).

MSYS and Cygwin 1.5 have very little locale/encoding support.
Basically, they assume singlebyte encodings and default to the
system's ANSI codepage. For Cygwin 1.5, there's a $CYGWIN option for
switching to the OEM codepage, which means CP850 in your case. Not
sure whether MSYS has that option too. (Cygwin 1.7 has full locale
support, supporting lots of encodings.)


Thanks for the info. The problem is more multiple encodings in terminal than the default encoding of MSYS itself, but yes it would be nice if MSYS supported UTF-8, which is better than Windows-1252/latin1.
 
> I'm thinking of a simple approach based on directories and files, for
> example:
>
> * Any program from C:\Windows is considered to output ISO-8859-1
> * Any program from C:\SomeUTF8Program is considered as UTF-8
> * Output of C:\Windows\System32\cmd.exe will be CP850

I'm afraid that's not feasible, because mintty doesn't know what
program each bit of output comes from. It's not mintty that processes
your commands, but the shell invoked inside it (usually bash).


Ah sorry, how could I forget that. So yes, that's rather a suggestion for bash.
 
However, you can switch the codepage used by cmd.exe and other "DOS"
programs using the 'chcp' command, e.g.:

$ /Windows/System32/chcp 1252
Active code page: 1252


I can't remember why I'm using the default cp850 in cmd.exe, because with 1252 I could use both bash and cmd.exe from mintty. Maybe it was some commands that can only work with cp850, but not sure. Either way, I have modified the Windows registry to permanently change the codepage of cmd.exe to 1252, then I'll check how it behaves over the time.
 

> By the way, I know this should be a bug report maybe

Yep.

> but I just want to
> inform that if ~/.minttyrc is hidden, we can only read settings not save.
> It is hidden in my installation because C:/Users is mounted at /home, and I
> don't like those hidden files showing up in my profile. Not a big issue, I
> can "attrib -h ~/.minttyrc", and rollback after saving settings, but that's
> it. The error message is permission denied, not accurate but as I said
> toggling the hide flag fixes the problem.

This doesn't happen with Cygwin 1.7, so I expect it's an MSYS oddity.

Maybe it's something inherited from Cygwin 1.3. If anyone else can reproduce the problem in MSYS, please let me know.

Thanks!

Renato Silva

unread,
Nov 20, 2012, 10:55:36 PM11/20/12
to mintty-...@googlegroups.com

Follow-up to my previous post. I tried changing codepage of cmd.exe to 1252 using some sort of autorun registry value, but that doesn't work because ping, for example, still outputs in cp850. There was another registry path that I believe to apply the code page more extensively, and therefore could work for ping and other cases, but I'm pretty sure it caused Windows to freeze on startup.

Therefore I wonder if there's some other way to achieve that, and in the meanwhile, or if it's not possible after all, I would still need to know what encoding each program is about to output. However, I think wrapper scripts using iconv can do the trick. For example, a script named ping, with precedence in path over original ping from system32, would look like:

#!/bin/bash
ping.exe "$@" | iconv -f cp850 -t cp 1252


It is a problem for interactive programs using something other than cp1252 (or whatever I adopt as default encoding), but it would work at least for some programs. In sum, yes that's a bash problem not mintty's, but if one get annoyed enough, they actually can implement such a thing like I described originally.

Renato Silva

unread,
Nov 23, 2012, 12:55:54 PM11/23/12
to Lista MinTTY
2012/11/21 Renato Silva <br.rena...@gmail.com>

It is a problem for interactive programs using something other than cp1252 (or whatever I adopt as default encoding), but it would work at least for some programs. In sum, yes that's a bash problem not mintty's, but if one get annoyed enough, they actually can implement such a thing like I described originally.

I just tested with cmd.exe in interactive mode and it worked! I thought the pipe would only flush the whole output after program exit, but no it works just fine. We just need to redirect standard output. I have generalized the above ping solution like this:

#!/bin/bash
$(basename "$0").exe "$@" 2>&1 | iconv -f cp850 -t cp1252


I have created symlinks to this script called cmd, ping, and ipconfig. In theory, the command prompt autorun from registry could be used to change codepage of cmd to 1252 instead, but the header with version and copyright notice is still 850 (before the autorun is executed), so I have opted for the symlink. Overall, it's working just fine so far, multiple encodings living in peace together in mintty :)

Renato Silva

unread,
Nov 23, 2012, 5:15:22 PM11/23/12
to Lista MinTTY
2012/11/23 Renato Silva <br.rena...@gmail.com>

We just need to redirect standard output

Sorry, I mean standard error (2>&1).
Reply all
Reply to author
Forward
0 new messages