Elixir on Windows: Problems with special characters

1,140 views
Skip to first unread message

Torben Dohrn

unread,
Jan 2, 2016, 3:55:37 PM1/2/16
to elixir-lang-talk
Hi everybody,

I'm currently trying to learn a little bit of Elixir and was wondering: Why are there problems with special characters when invoked by a Windows shell?

For example if I start iex.bat (the Windows equivalent to iex) and try to write "Hallö" to the console I get the following result.

iex(1)> IO.puts "Hallö"
** (UnicodeConversionError) invalid encoding starting at <<148, 34, 10>>
    (elixir) lib/string.ex:1418: String.to_char_list/1

A similar (maybe related) problem occurs when starting a phoenix app and hitting response times faster than 1 millisecond:
[info] Sent 200 in 1000┬Ás
The μ sign gets translated to some gibberish letters
-----------

What have I looked at so far:

iex.bat has problems with special characters on Windows, iex.bat --werl has not:
If I start iex.bat and try to print the o-umlaut ö and the microsecond sign μ I get this:

iex.bat
Interactive Elixir (1.1.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> IO.puts "ö"
** (UnicodeConversionError) invalid encoding starting at <<148, 34, 10>>
    (elixir) lib/string.ex:1418: String.to_char_list/1
iex(1)> IO.puts "μ"
** (UnicodeConversionError) invalid encoding starting at <<230, 34, 10>>
    (elixir) lib/string.ex:1418: String.to_char_list/1

Because of the mention in Elixir introduction I also tested iex.bat --werl:
The werl.exe which opens, is capable of writing the both characters, but can't print the μ sign (see attached image). The difference is, the μ sign gets printed as black box on input and output, indicating a missing character sign in the font.

iex.bat --werl
Interactive Elixir (1.1.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> IO.puts "ö"
ö
:ok
iex(2)> IO.puts "μ"
μ
:ok

The problem is not limited to iex.bat but happens on apps as well (if invoked by Powershell or cmd):
I then created a Elixir app with the following function:
1
2
3
def sayMicro do
  "μ"
end
The result wasn't good as well.

iex.bat -S mix
iex(1)> MicroSecondTest.sayMicro
"╬╝"


Windows consoles are generally able to write the special character signs:
I assumed this is a console problem, but Powershell as well as the Windows cmd tool are able to write the corresponding values (at least as values for variables).

Powershell:
PS C:\Users\Torben> $test="μ"
PS C:\Users\Torben> $test
μ

cmd
C:\Users\Torben>set test="μ"

C:\Users\Torben>echo %test%
"μ"


The erl shell can show the special chars as well:
Eshell V7.2.1  (abort with ^G)
1> io:fwrite("ö").
öok
2> io:fwrite("μ").
µok

Summary:
All tools on Windows seem to be capable of working with the "μ" sign, yet String.to_char_list/1 throws errors (and Phoenix fails on writing the "μ" sign as well, but doesn't throw errors).

So has anyone an idea where and how to look further? 

werl.PNG

Theron Boerner

unread,
Jan 2, 2016, 4:05:12 PM1/2/16
to elixir-lang-talk
This is probably an endianness issue/encoding issue. Is powershell set to UTF-8?

--
You received this message because you are subscribed to the Google Groups "elixir-lang-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-ta...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-talk/49e859aa-c151-4592-8e8a-d4cb0745a54d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

José Valim

unread,
Jan 2, 2016, 4:11:19 PM1/2/16
to elixir-l...@googlegroups.com
Thanks for the detailed report!

If you are running on Windows, there is a chance your terminal does not use UTF-8 by default. You can change the encoding of your current session by running "chcp 65001" before entering iex.

Hopefully Powershell allows you set the encoding to UTF-8 globally, so you don't need to do it for every session. You also need to make sure whatever font you are using include those characters.


José Valim
Skype: jv.ptec
Founder and Director of R&D

Torben Dohrn

unread,
Jan 2, 2016, 4:50:36 PM1/2/16
to elixir-lang-talk, jose....@plataformatec.com.br
Thanks for the answer.

The change of the code page did the trick for two out of three of my issues:
  • Phoenix is now showing correctly [info] Sent 200 in 0µs (with microsecond sign) in both Powershell as well as cmd
  • My micro_second_test application now correctly shows the sign as well (and, for what it's worth also the o-umlaut ö)
Unfortunately now iex.bat is broken.entering iex.bat (in Powershell or cmd) and writing IO.puts("µ") now waits for another input from the terminal and shows nothing. For non utf8 characters the behavior is unchanged.

Regarding the global setting of the code page: I will write the command (as soon as my mentioned bug is fixed) in my "C:\Users\<username>\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1" file. This is executed everytime a powershell i opened.

Tallak Tveide

unread,
Jan 3, 2016, 10:52:18 AM1/3/16
to elixir-lang-talk
Try starting iex with the -werl parameter, to run in a window.

Onorio Catenacci

unread,
Jan 3, 2016, 11:42:07 AM1/3/16
to elixir-lang-talk
Which version of Windows are you working with? I will see if I can reproduce the issue.

Torben Dohrn

unread,
Jan 3, 2016, 12:21:33 PM1/3/16
to elixir-lang-talk
Windows 10 Pro, all languages on German (if this is of interest)
Powershell Version is 5.0.10586
Codepage set to chcp 65001 as José suggested

Torben Dohrn

unread,
Jan 3, 2016, 12:30:33 PM1/3/16
to elixir-lang-talk
iex in the werl Window does work and I have no problems there.

Still i would love to see the Powershell iex working.

Onorio Catenacci

unread,
Jan 3, 2016, 12:43:42 PM1/3/16
to elixir-lang-talk
I will see if I can reproduce the behavior. If you can, you might try checking Elixir 1.2 to see if it has the same issue.

Torben Dohrn

unread,
Jan 3, 2016, 12:56:36 PM1/3/16
to elixir-lang-talk
If only the choco packages would be updated to 1.2.0... :-)
No, seriously. Thank you for maintaining the choco packages! Choco is immensely useful and having Elixir there is awesome. Thank you!

The behavior is exact the same in 1.2.0 (just checked)

Onorio Catenacci

unread,
Jan 3, 2016, 2:34:21 PM1/3/16
to elixir-lang-talk
LOL.  Glad to see someone's paying attention :)

Actually I had a little free time so I updated the CNG package to 1.2.0.  I've modified the package to put the Elixir batch files in the path which should, once and for all, fix the issues people have with starting the various Elixir commands at a Windows Console prompt.

--
Onorio

Onorio Catenacci

unread,
Jan 4, 2016, 8:59:02 AM1/4/16
to elixir-lang-talk
For whatever it's worth, I can also reproduce this issue.  

I can also reproduce this issue with this Erlang code: 

:io.put_chars('µ')

Interestingly enough, when I don't do the chcp 65001 before I try typing this into the terminal, I get this error message:

** (UnicodeConversionError) invalid encoding starting at <<230, 34, 41, 10>>
    (elixir) lib/string.ex:1634: String.to_char_list/1

(I get a slightly different error message with the :io.put_chars call by the way:

** (UnicodeConversionError) invalid encoding starting at <<230, 39, 41, 10>>
    (elixir) lib/string.ex:1634: String.to_char_list/1

Note that different second number in the binary.) 

µ is an interesting character anyway.  It's actually in the ASCII set as an extended character (see here: http://www.theasciicode.com.ar/extended-ascii-code/lowercase-letter-mu-micro-sign-micron-ascii-code-230.html)  I wonder if that may be what's causing this issue.  Perhaps Windows is trying to get the µ from the extended ASCII set as opposed to the Unicode char set. 

--
Onorio


Reply all
Reply to author
Forward
0 new messages