using Serial Monitor with midi

951 views
Skip to first unread message

Kees Bot

unread,
Aug 28, 2013, 11:41:00 AM8/28/13
to devel...@arduino.cc
Is there a way to use the Serial Monitor at MIDI speed? (31250 baud)

Tom Igoe

unread,
Aug 28, 2013, 1:22:36 PM8/28/13
to Kees Bot, devel...@arduino.cc
No, the serial library on which it's based doesn't support it. Few do. Since MIDI's a binary protocol, the ASCII-based serial monitor wouldn't do you much good anyway. Better to use something dedicated to MIDI. On OSX, the Audio MIDI interface is useful.

On Aug 28, 2013, at 11:41 AM, Kees Bot <kees....@gmail.com> wrote:

Is there a way to use the Serial Monitor at MIDI speed? (31250 baud)

--
You received this message because you are subscribed to the Google Groups "Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@arduino.cc.

Paul Stoffregen

unread,
Aug 28, 2013, 1:50:42 PM8/28/13
to Kees Bot, Arduino Developers
On 08/28/2013 08:41 AM, Kees Bot wrote:
> Is there a way to use the Serial Monitor at MIDI speed? (31250 baud)

void setup() {
Serial.begin(115200);
Serial1.begin(31250); // obviously, run this on a board that has
Serial1.....
}
void loop() {
if (Serial1.available()) {
byte b = Serial1.read();
if (b >= 128) Serial.println(); // begin each MIDI message on a new
line
if (b < 16) Serial.print("0"); // print every byte as 2 hex digits
Serial.print(b, HEX);
}

I made this up just now, only to illustrate the point, so I didn't test
this or even check it compiles without error. That point, by the way,
is Arduino gives you lots of tools you can use to pretty easily build
creative solutions, rather than needing a specific pre-made feature.

Kees Bot

unread,
Aug 28, 2013, 6:09:43 PM8/28/13
to devel...@arduino.cc, Kees Bot
thx, knew there was a logical explanation for



Op woensdag 28 augustus 2013 19:22:36 UTC+2 schreef t.igoe:

Kees Bot

unread,
Aug 28, 2013, 6:13:21 PM8/28/13
to devel...@arduino.cc, Kees Bot, Arduino Developers
Paul 

tried this on a mega, but it seems that you cannot  set the individual ports to different bautrates



Op woensdag 28 augustus 2013 19:50:42 UTC+2 schreef paul:
> an email to developers+unsubscribe@arduino.cc.

Tom Igoe

unread,
Aug 28, 2013, 6:36:58 PM8/28/13
to Kees Bot, devel...@arduino.cc, Arduino Developers
I've set the different ports to different baud rates before -- Cristian, did something change with 1.5? Try with 115200 on one and something standard like 9600 on the other. The serial monitor won't read the 31250.
T.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Paul Stoffregen

unread,
Aug 28, 2013, 7:04:24 PM8/28/13
to devel...@arduino.cc
On 08/28/2013 03:36 PM, Tom Igoe wrote:
I've set the different ports to different baud rates before -- Cristian, did something change with 1.5? Try with 115200 on one and something standard like 9600 on the other. The serial monitor won't read the 31250.

Maybe it wasn't obvious... the code below is meant to have a MIDI serial signal connected to the RX pin on Serial1.  It receives each MIDI byte and retransmits it in ASCII HEX (at more than double the baud rate, to allow the 2X data size) to Serial, which is meant to be connected to the Arduino Serial Monitor.

I'm pretty sure different baud rates do indeed work.

Sent from my Android device with K-9 Mail. Please excuse my brevity. --
You received this message because you are subscribed to the Google Groups "Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@arduino.cc.

Matthew Ford

unread,
Aug 30, 2013, 1:56:11 AM8/30/13
to devel...@arduino.cc
I have working on displaying non-ASCII chars on my Android app, pfodApp.
So that users can control Arduino devices using their own Language.
The basic approach is to store the UTF-8 bytes in an array and send them
to the pfodApp for display.

However I on testing I noticed that I can paste Italian accented
characters directly into the IDE in a " " (string) and they are
correctly sent and displayed by the pfodApp which is expecting UTF-8.

My question is what mult-language support is already provided by the IDE
and the compiler?
Does it handle Chinese, Greek, etc?

matthew


Matthew Ford

unread,
Aug 31, 2013, 9:31:07 AM8/31/13
to devel...@arduino.cc
No comments on this so far.
Could it be that the IDE saves the sketch in UTF-8 format and that as a
result the gcc compiler gets the UTF-8 bytes for string constants?

matthew

Loren M. Lang

unread,
Aug 31, 2013, 9:47:54 AM8/31/13
to devel...@arduino.cc


On Aug 31, 2013 6:31 AM, "Matthew Ford" <matthe...@forward.com.au> wrote:
>
> No comments on this so far.
> Could it be that the IDE saves the sketch in UTF-8 format and that as a result the gcc compiler gets the UTF-8 bytes for string constants?

IIRC, gcc defaults to interpreting source files using the encoding for the locale it's in. That is going be UTF-8 for Mac OS and modern Linux desktop. The other half that is what encoding the text editor uses. It most likely is also UTF-8.

>
> matthew
>
> On 30/08/2013 3:56 PM, Matthew Ford wrote:
>>
>> I have working on displaying non-ASCII chars on my Android app, pfodApp.
>> So that users can control Arduino devices using their own Language.
>> The basic approach is to store the UTF-8 bytes in an array and send them to the pfodApp for display.
>>
>> However I on testing I noticed that I can paste Italian accented characters directly into the IDE in a " " (string) and they are correctly sent and displayed by the pfodApp which is expecting UTF-8.
>>
>> My question is what mult-language support is already provided by the IDE and the compiler?
>> Does it handle Chinese, Greek, etc?
>>
>> matthew
>>
>>
>

Cristian Maglie

unread,
Aug 31, 2013, 9:59:12 AM8/31/13
to devel...@arduino.cc, matthe...@forward.com.au
In data sabato 31 agosto 2013 15:31:07, Matthew Ford ha scritto:
> No comments on this so far.
> Could it be that the IDE saves the sketch in UTF-8 format and that as a
> result the gcc compiler gets the UTF-8 bytes for string constants?

Hi Matthew,

the IDE uses a JEditTextArea whose methods getText(..) and setText(..) uses a
java String to retrieve/store the editor content.
Since java strings are UTF8, I would say yes, the compiler receive an UTF-8
file as input.

But, and this is what matters, I didn't know is if the g++ compiler can handle
such strings literal correctly. I did a quick search, and It seems that the
compiler behaviour is undefined for non-ascii string. C++11 added an "u8"
keyword to force UTF8 literals like:

const char[] str = u8"Test String";

but it didn't work on avr-gcc, even if I try to use the compiler option "-
std=c++0x"

http://stackoverflow.com/questions/13748068/gcc-utf-8-string-literal-compile-
error
http://stackoverflow.com/questions/13444930/is-the-u8-string-literal-necessary-
in-c11
http://stackoverflow.com/questions/14679717/c11-example-of-difference-between-
ordinary-string-literal-and-utf-8-string-li

C

Loren M. Lang

unread,
Aug 31, 2013, 10:26:53 AM8/31/13
to Cristian Maglie, matthe...@forward.com.au, devel...@arduino.cc


On Aug 31, 2013 6:59 AM, "Cristian Maglie" <c.ma...@bug.st> wrote:
>
> In data sabato 31 agosto 2013 15:31:07, Matthew Ford ha scritto:
> > No comments on this so far.
> > Could it be that the IDE saves the sketch in UTF-8 format and that as a
> > result the gcc compiler gets the UTF-8 bytes for string constants?

I am less well versed with C++, but I know how C89 does it. There is a source encoding and a machine (binary) encoding for string contents. They are not always the same. Source files are parsed as characters in some source encoding (originally in either an ASCII or an EBCDIC encoding variant) and any non-control character excluding backslash and double quote can be included in a string literal verbatim and will be converted to the machine encoding when complied to an object file (machine form). For standard GCC on modern Linux systems, the machine encoding of any string literal is UTF-8, but the source file encoding is selected based on the current locale. For portability, it's common to specify non-ASCII literals as an escape like \u2182. This removes any dependence on the current locale or compiler. I expect AVR GCC has the same defaults. Both encodings can be changed with command-line options to GCC.

Note, this is completely different from wide string literals in C which are written with a capital L in front of quotation marks and use the base type wchar_t. That is a 32-bit (formerly 16-bit) integer as opposed to the standard 8-bit char used for normal C strings. Wide strings are always stored in UTF-32 encoding and that can't be changed.

>
> Hi Matthew,
>
> the IDE uses a JEditTextArea whose methods getText(..) and setText(..) uses a
> java String to retrieve/store the editor content.
> Since java strings are UTF8, I would say yes, the compiler receive an UTF-8
> file as input.
>
> But, and this is what matters, I didn't know is if the g++ compiler can handle
> such strings literal correctly. I did a quick search, and It seems that the
> compiler behaviour is undefined for non-ascii string. C++11 added an "u8"
> keyword to force UTF8 literals like:
>
> const char[] str = u8"Test String";
>
> but it didn't work on avr-gcc, even if I try to use the compiler option "-
> std=c++0x"
>
> http://stackoverflow.com/questions/13748068/gcc-utf-8-string-literal-compile-
> error
> http://stackoverflow.com/questions/13444930/is-the-u8-string-literal-necessary-
> in-c11
> http://stackoverflow.com/questions/14679717/c11-example-of-difference-between-
> ordinary-string-literal-and-utf-8-string-li
>
> C
>

Matthew Ford

unread,
Aug 31, 2013, 7:58:28 PM8/31/13
to devel...@arduino.cc
Thanks every one for the info.
I did not really think it was as simple as I proposed.

Some notes

>>the IDE uses a JEditTextArea whose methods getText(..) and setText(..) uses a java String to retrieve/store the editor content.
>>Since java strings are UTF8, I would say yes, the compiler receive an UTF-8 file as input.

The Java spec says
"The Java programming language represents text in sequences of 16-bit
code units, using the UTF-16 encoding."
so internally to the java program the encoding is NOT UTF-8 so it would
seem that it depends on how the source file is written and then read by
the C++ compiler.

Another reply mentioned

>>IIRC, gcc defaults to interpreting source files using the encoding for the
>>locale it's in. That is going be UTF-8 for Mac OS and modern Linux desktop.
"

So to summarise, the sorry state of internationalisation of Arduino is :-
a) IF the IDE was changed to ALWAYS write and read sketches in UTF-8
encoding then non-ASCII strings could be entered in the IDE and saved
in UTF-8
b) HOWEVER as noted above in order to get the gcc compiler to reliably
read these bytes we would need to ensure that the C++ compiler ALWAYS
read in the file as a UTF-8 encode file.

Part a) is under control of the IDE developers.
Part b) may be a little more difficult to implement. (any ideas?)

At the moment, while gcc reads in the default encoding of the platform,
we cannot just change the IDE to always read and write in UTF-8.

So the advice to users seems to be
"Putting non-ASCII chars in your strings may just work, depending of
your OS and local and that of the machine where your sketch will be run.
On the other hand it may not"

For those cases that don't "just work" I will putting up a java program
to convert non-ASCII chars to UTF-8 (in octal) for insertion into IDE
(and C++) strings.
Note: UTF-8 in hex is not reliable due to the odd way C++ interprets it
(looks at more then just 2 chars after \x)

thanks
matthew

Loren M. Lang

unread,
Aug 31, 2013, 8:35:49 PM8/31/13
to matthe...@forward.com.au, devel...@arduino.cc
On Sat, Aug 31, 2013 at 4:58 PM, Matthew Ford <matthe...@forward.com.au> wrote:
Thanks every one for the info.
I did not really think it was as simple as I proposed.

Some notes

the IDE uses a JEditTextArea whose methods getText(..) and setText(..) uses a java String to retrieve/store the editor content.
Since java strings are UTF8, I would say yes, the compiler receive an UTF-8 file as input.

The Java spec says
"The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding."
so internally to the java program the encoding is NOT UTF-8 so it would seem that it depends on how the source file is written and then read by the C++ compiler.

Yes, all JVMs use UTF-16 for string manipulation internally, but that does not imply that file or terminal I/O is also UTF-16. The important thing to understand is that the Arduino IDE, which is written in Java, can handle the full Unicode character set. It doesn't matter whether the encoding internally is UTF-8, UTF-16, or UTF-32, it can store, retrieve, and manipulate strings that might contain any character from the full repertoire of Unicode.

The actual encoding that C/C++/Wiring source files are saved in is UTF-8 which I have confirmed using Arduino 1.0.5. Here is my sample sketch:

setup()
{
  Serial.begin(9600);
}

loop()
{
  Serial.println("98.6°F");
}

I saved the sketch as UTF.ino using the Arduino IDE 1.0.5. If I cat the file in a terminal, it will be displayed according to the terminal's locale which happens to be UTF-8 on my system. It is displayed correctly. If it was UTF-16, the degree sign would not be displayed correctly. If I use the file command, I get "UTF.ino: UTF-8 Unicode text". I don't think GCC supports having source files in non-ASCII-compatible encodings like UTF-16. UTF-8 is ascii compatible because byte values in the range of 0x00 to 0x7f have the same meaning between ASCII and UTF-8.

So, in summary, the Arduino IDE, at least on Mac and Linux (and probably Windows) does support non-latin characters, but they will be saved to the HEX file in UTF-8 and send back out the serial port in that fashion. With string literals, you will probably get the expected behavior (if the expected behavior is UTF-8), but there are issues if you try using non-ASCII characters with character constants (single-quotes), but that is an issue inherent with C/C++, not Arduino or GCC.
 
email to developers+unsubscribe@arduino.cc.


--
You received this message because you are subscribed to the Google Groups "Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+unsubscribe@arduino.cc.

Matthew Ford

unread,
Aug 31, 2013, 9:56:28 PM8/31/13
to Loren M. Lang, devel...@arduino.cc
Hi Loren,

>>The actual encoding that C/C++/Wiring source files are saved in is UTF-8
>>which I have confirmed using Arduino 1.0.5. Here is my sample sketch:

Java will not save in UTF-8 by default. By default Java saves in the
'default' local encoding.
If we want the IDE to ALWAYS save in UTF-8, it needs to explicitly
specified as the encoding in the source code if the IDE.
I did a quick check of the IDE source code and could not find any case
where anything other then the 'default' character encoding is used.
If you run the IDE on an OS that does not have UTF-8 as the default
encoding you will not get UTF-8 files.
gcc will still work because it, also, uses the 'default' encoding to
read the files.
BUT if you download the sketch to another OS all bets are off.

>>but that is an issue inherent with C/C++, not Arduino or GCC.

To an IDE user the underlying C/C++ IS Arduino. They have no knowledge
of the magic that makes their sketch into a loadable file, and should
not be expected to.
That is the main selling point of Arduino. Arduino hides the C/C++, and
micro assembler, under an IDE with a simplified programming interface
and set of programming statements.

Testing back to your own terminal is not a valid test as the sketch is
being run on the same OS/local that created it.
If I downloaded that sketch (or a library) on some other machine, what
would happen?

Internationalisation means a consistent experience over various OS/locals.

I think this issue of using non-ASCII characters for display purposes,
such as when coding http servers, needs to more precisely defined or
documented.
Given that Arduino originated in Italy, which uses some accented chars
that are outside the 7bit ASCII range, I am surprised this was not done
from the start.
Can we do something about it now, even if just documenting the current
state.

matthew
>>>> http://stackoverflow.com/**questions/13748068/gcc-utf-8-**
>>> string-literal-compile-<http://stackoverflow.com/questions/13748068/gcc-utf-8-string-literal-compile->
>>>
>>>> error
>>>>
>>>> http://stackoverflow.com/**questions/13444930/is-the-u8-**
>>> string-literal-necessary-<http://stackoverflow.com/questions/13444930/is-the-u8-string-literal-necessary->
>>>
>>>> in-c11
>>>>
>>>> http://stackoverflow.com/**questions/14679717/c11-**
>>> example-of-difference-between-<http://stackoverflow.com/questions/14679717/c11-example-of-difference-between->
>>>
>>>> ordinary-string-literal-and-**utf-8-string-li
>>>>
>>>> C
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>>
>>> "Developers" group.
>>>
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>>
>>> email to developers+unsubscribe@**arduino.cc.
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to developers+unsubscribe@**arduino.cc.
>>

Loren M. Lang

unread,
Sep 1, 2013, 2:59:01 AM9/1/13
to matthe...@forward.com.au, devel...@arduino.cc
On Sat, Aug 31, 2013 at 6:56 PM, Matthew Ford <matthe...@forward.com.au> wrote:
Hi Loren,


The actual encoding that C/C++/Wiring source files are saved in is UTF-8
which I have confirmed using Arduino 1.0.5. Here is my sample sketch:

Java will not save in UTF-8 by default.  By default Java saves in the 'default' local encoding.
If we want the IDE to ALWAYS save in UTF-8, it needs to explicitly specified as the encoding in the source code if the IDE.
I did a quick check of the IDE source code and could not find any case where anything other then the 'default' character encoding is used.
If you run the IDE on an OS that does not have UTF-8 as the default encoding you will not get UTF-8 files.
gcc will still work because it, also, uses the 'default' encoding to read the files.
BUT if you download the sketch to another OS all bets are off.

True, I forget about the encoding handling of strings works in Java, but at least it matches the same behavior of GCC which also assumes the current locale's encoding when reading source files. I remember the nuisance of adding the UTF-8 encoding explicitly to String.getBytes() and PrinterWriter/PrintReader when I had to save an XML file. Now, all calls to getBytes() or Writer initialization must be wrapped in a try..catch for UnsupportedEncodingException. I think Java might even mandate support for UTF-8 on all JREs, but the try..catch is still needed.

If we decide we should lock the encoding to UTF-8, it would come down to making sure the appropriate call to Reader/Writer classes and/or String getBytes() include an explicit encoding and, on the other side, specifying -finput-charset=utf-8 to avr-gcc so it interprets the file with the correct encoding. We should also verify that the Serial Monitor is using UTF-8 as well. This should ensure full Unicode from source input in the IDE to final output in the serial monitor.



but that is an issue inherent with C/C++, not Arduino or GCC.

To an IDE user the  underlying C/C++ IS Arduino.  They have no knowledge of the magic that makes their sketch into a loadable file, and should not be expected to.
That is the main selling point of Arduino.  Arduino hides the C/C++, and micro assembler, under an IDE with a simplified programming interface and set of programming statements.

I was merely stating that the issue with character constants is inherent in the underlying language used by Arduino/Wiring and can't be fixed at the Arduino level. If a user does try to utilize a character constant of a non-ASCII character and we do fix the machine encoding to UTF-8, it will not work because of how C/C++ operate. I'm not saying that the Arduino user needs to understand why, but they will need to take a different approach to the problem if this ever comes up.
 

Testing back to your own terminal is not a valid test as the sketch is being run on the same OS/local that created it.
If I downloaded that sketch (or a library) on some other machine, what would happen?

Internationalisation means a consistent experience over various OS/locals.

I wasn't trying to do an extensive i18n test suite, just a quick confirmation that the source was not saved in UTF-16 as stated previously.
 

I think this issue of using non-ASCII characters for display purposes, such as when coding http servers, needs to more precisely defined or documented.

The HTTP layer is the first layer where character encoding becomes an issue, but the stock Arduino libraries don't include the HTTP layer. That layer is entirely handled in the sketch and all the code for it is provided in the Ethernet library examples. For the WebServer example sketch, the quickest fix to getting proper Unicode support is to add an explicit encoding to the Content-Type header like this:

client.println("Content-Type: text/html; charset=utf-8");

Since the machine encoding for the sketch is already using UTF-8, that's all it takes for the example to send out a proper UTF-8 encoded web page.

Given that Arduino originated in Italy, which uses some accented chars that are outside the 7bit ASCII range, I am surprised this was not done from the start.

My impression is that any modern Linux, Mac, or Windows system will typically have it's system locale using the UTF-8 encoding so, relying on the default behavior just happens to work, even with accented characters or the Euro sign. That doesn't mean that we should rely on default behavior, but that problems caused by relying on it don't pop up very often.

Matthew Ford

unread,
Sep 1, 2013, 8:52:07 PM9/1/13
to Loren M. Lang, devel...@arduino.cc
As discussed below,
I would like to request a change to the IDE and gcc cmd lines to
consistently use UTF-8 for all OS and OSversions.

I believe the changes come down to
i) setting UTF-8 as the encoding for ALL IDE file sketch and translated
files read and write
ii) -finput-charset=utf-8 to avr-gcc so it interprets the file with the
correct encoding.
iii) Ensuring the Serial Monitor also uses UTF-8 encoding

With these minor changes Arduino would be internationalized and
consistent across all platforms and languages.

matthew

p.s.
Currently this is needed for web server. Although, as Loren mentions,
you can tell the other end to expect UTF-8 bytes in the webpage,

client.println("Content-Type: text/html; charset=utf-8");

this does not ensure that the encoding of the text sent by println are
in fact UTF-8.
The bytes sent depend on the default encoding of the OS when the sketch
is saved and compiled and subsequently loaded to the board.
The changes requested above would ensure that the bytes are always saved
and compiled and sent in UTF-8 format as expected.
>>> <matthe...@forward.com.au>**wrote:
>>>>>> http://stackoverflow.com/****questions/13748068/gcc-utf-8-****<http://stackoverflow.com/**questions/13748068/gcc-utf-8-**>
>>>>>>
>>>>> string-literal-compile-<http:/**/stackoverflow.com/questions/**
>>>>> 13748068/gcc-utf-8-string-**literal-compile-<http://stackoverflow.com/questions/13748068/gcc-utf-8-string-literal-compile->
>>>>> error
>>>>>> http://stackoverflow.com/****questions/13444930/is-the-u8-****<http://stackoverflow.com/**questions/13444930/is-the-u8-**>
>>>>>>
>>>>> string-literal-necessary-<http**://stackoverflow.com/**
>>>>> questions/13444930/is-the-u8-**string-literal-necessary-<http://stackoverflow.com/questions/13444930/is-the-u8-string-literal-necessary->
>>>>> in-c11
>>>>>> http://stackoverflow.com/****questions/14679717/c11-**<http://stackoverflow.com/**questions/14679717/c11-**>
>>>>>>
>>>>> example-of-difference-between-**<http://stackoverflow.com/**
>>>>> questions/14679717/c11-**example-of-difference-between-<http://stackoverflow.com/questions/14679717/c11-example-of-difference-between->
>>>>> **>
>>>>>
>>>>> ordinary-string-literal-and-****utf-8-string-li
>>>>>>
>>>>>> C
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups
>>>>>>
>>>>>> "Developers" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>>> an
>>>>>>
>>>>>> email to developers+unsubscribe@****arduino.cc.
>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "Developers" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to developers+unsubscribe@****arduino.cc.
>>>>
>>>>

Dennis German

unread,
Sep 1, 2013, 9:45:14 PM9/1/13
to devel...@arduino.cc
It has become apparent from this thread this is a problem that spans multiple areas:

1) The editing of the source files in the Arduino IDE (or other source editor )
2) The storing of the source including OS file system issues
3) The handling by the GCC compiler and
4) Finally the handling by the receiver of the characters
(be it an http client, the serial monitor, LCD display , another serial display device or a file)

Not addressing any single one of these will produce unpredictable results.
Some being outside of the sphere of influence of the arduino project.

Perhaps the best solution is to publish a web document which discusses this.
A warning suggesting restricting to using characters of the very limited range of the 95 printable ASCII characters
(see https://en.wikipedia.org/wiki/Ascii)
might be included as a footnote in the Serial.print documentation with a link to that document.


Regarding web page rendering, the list of "Character entity references" includes many "symbols"
including degree-sign, bowtie, Omega,
frequently used grave, accented etc
using the ampersand-name-semicolon syntax.
See http://dev.w3.org/html5/html-author/charref .

Dennis German

Matthew Ford

unread,
Sep 1, 2013, 10:11:39 PM9/1/13
to devel...@arduino.cc

>>Perhaps the best solution is to publish a web document which
discusses this.
>>A warning suggesting restricting to using characters of the very
limited range of the 95 printable ASCII characters
>>(see https://en.wikipedia.org/wiki/Ascii)
>>might be included as a footnote in the Serial.print documentation
with a link to that document.

That would be an improvement.
However, in my opinion, fixing points 1,2 and 3 below would provide a
consistent known interface for all external devices, such as utf-8
encode web pages,
that would still be fully compatible with ASCII and would also handle
all languages (point 1 would only cover editing in the IDE which is
were most users do their work)

Also it would provide consistent results for all downloaded sketches
(and libraries) regardless the OS they are downloaded to and regardless
of the languages used in them.

So far no one seems to be suggesting any other encoding standard other
then UTF-8, so I think we should standardise on that and remove any
ambiguity of results.

Certainly my Android app for controlling Arduino devices, pfodApp, has
standardised on UTF-8 for displaying messages and menus in the user's
own language, which is why I started this thread.

matthew

Paul Stoffregen

unread,
Sep 1, 2013, 10:37:39 PM9/1/13
to devel...@arduino.cc
Since at least Arduino 0022, the IDE's editor has written files with
UTF8 encoding, avr-gcc has successfully compiled code with UTF8 within
string constants, and the serial monitor has properly displayed those
characters when they're sent with Serial.print().

There very likely are places where UTF8 support could to be improved,
but many of the areas mentioned below already work. If you're going to
write proposals that involve (other people) doing a lot of work, the
very least you could do is actually use the IDE with an Arduino board to
identify the specific places where work is actually needed.

Matthew Ford

unread,
Sep 1, 2013, 11:22:01 PM9/1/13
to devel...@arduino.cc
As mentioned previously I have looked at the IDE java code that reads
and writes files.
All the java code, as far as I can see, just uses 'default' encoding.

Here is a note on Window 7 encoding problems faced by us poor souls that
use windows
http://superuser.com/questions/239810/setting-utf8-as-default-character-encoding-in-windows-7

Are you saying that the IDE start up sets some environmental variable
that changes the default encoding to UTF-8?

The suggestion is basically to set explicitly set the encoding as UTF-8,
when reading and writing files from the IDE (including the
SerialMonitor) and changing the
gcc command string specify utf-8 as the input file encoding.

I would be happy to assist with these changes.

matthew

Ben Combee

unread,
Sep 1, 2013, 11:40:55 PM9/1/13
to matthe...@forward.com.au, Arduino Developers
I just opened up the Arduino 1.0.5 IDE here on my Windows 7 machine, pasted in some mixed English and Korean text, saved that out to disk, then loaded it into Sublime Text.  The text on disk is definitely in UTF-8 format, verified by looking at it in a hex editor and by trying to load it in various UTF-16 and code page encodings.


--
You received this message because you are subscribed to the Google Groups "Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+unsubscribe@arduino.cc.

Matthew Ford

unread,
Sep 1, 2013, 11:59:53 PM9/1/13
to devel...@arduino.cc
Of course we are only interested in Arduino IDE (and the gcc it invokes)

My question is, then, where does the defaulting to UTF-8 happen for the
Arduino IDE.
It is not a Java default.

What makes the IDE think the 'default' OS file encoding is UTF-8? and
how can we make sure this always happens.

matthew

On 2/09/2013 1:44 PM, Loren M. Lang wrote:
> That question doesn't specifically apply to the Arduino IDE, but a general
> problem with text files and text editors everywhere. On top of that,
> Windows supports three different encodings depending on the APIs an app
> chooses to use labeled OEM, ANSI, and Unicode. My experience has been that
> the Arduino IDE defaults to UTF-8 on Windows as well. It might be different
> if used on Windows 95/98/ME, but I think the behavior in NT/2K/XP/Vista/7/8
> is UTF-8.
>
>
> On Sun, Sep 1, 2013 at 8:22 PM, Matthew Ford <matthe...@forward.com.au>wrote:
>
>> As mentioned previously I have looked at the IDE java code that reads and
>> writes files.
>> All the java code, as far as I can see, just uses 'default' encoding.
>>
>> Here is a note on Window 7 encoding problems faced by us poor souls that
>> use windows
>> http://superuser.com/**questions/239810/setting-utf8-**
>> as-default-character-encoding-**in-windows-7<http://superuser.com/questions/239810/setting-utf8-as-default-character-encoding-in-windows-7>
>>
>> Are you saying that the IDE start up sets some environmental variable that
>> changes the default encoding to UTF-8?
>>
>> The suggestion is basically to set explicitly set the encoding as UTF-8,
>> when reading and writing files from the IDE (including the SerialMonitor)
>> and changing the
>> gcc command string specify utf-8 as the input file encoding.
>>
>> I would be happy to assist with these changes.
>>
>> matthew
>>
>>
>> On 2/09/2013 12:37 PM, Paul Stoffregen wrote:
>>
>>> Since at least Arduino 0022, the IDE's editor has written files with UTF8
>>> encoding, avr-gcc has successfully compiled code with UTF8 within string
>>> constants, and the serial monitor has properly displayed those characters
>>> when they're sent with Serial.print().
>>>
>>> There very likely are places where UTF8 support could to be improved, but
>>> many of the areas mentioned below already work. If you're going to write
>>> proposals that involve (other people) doing a lot of work, the very least
>>> you could do is actually use the IDE with an Arduino board to identify the
>>> specific places where work is actually needed.
>>>
>>>
>>>
>>> On 09/01/2013 07:11 PM, Matthew Ford wrote:
>>>
>>>>>> Perhaps the best solution is to publish a web document which discusses
>>>> this.
>>>>>> A warning suggesting restricting to using characters of the very
>>>> limited range of the 95 printable ASCII characters
>>>>>> (see https://en.wikipedia.org/wiki/**Ascii<https://en.wikipedia.org/wiki/Ascii>
>>>>> (see https://en.wikipedia.org/wiki/**Ascii<https://en.wikipedia.org/wiki/Ascii>
>>>>> )
>>>>> might be included as a footnote in the Serial.print documentation with
>>>>> a link to that document.
>>>>>
>>>>>
>>>>> Regarding web page rendering, the list of "Character entity references"
>>>>> includes many "symbols"
>>>>> including degree-sign, bowtie, Omega,
>>>>> frequently used grave, accented etc
>>>>> using the ampersand-name-semicolon syntax.
>>>>> See http://dev.w3.org/html5/html-**author/charref<http://dev.w3.org/html5/html-author/charref>.
>>>>>
>>>>> Dennis German
>>>>>
>>>>>

Cristian Maglie

unread,
Sep 2, 2013, 4:46:57 AM9/2/13
to devel...@arduino.cc
In data lunedì 2 settembre 2013 05:59:53, Matthew Ford ha scritto:
> Of course we are only interested in Arduino IDE (and the gcc it invokes)
>
> My question is, then, where does the defaulting to UTF-8 happen for the
> Arduino IDE.
> It is not a Java default.
>
> What makes the IDE think the 'default' OS file encoding is UTF-8? and
> how can we make sure this always happens.

Looking more carefully at the PApplet.saveStrings() function, I see that it
uses an helper method to create a PrintWriter with encoding forced to UTF-8:

static public PrintWriter createWriter(OutputStream output) {
try {
OutputStreamWriter osw = new OutputStreamWriter(output, "UTF-8");
return new PrintWriter(osw);
} catch (UnsupportedEncodingException e) { } // not gonna happen
return null;
}

thats why the resulting file is always UTF-8 regardless any system settings.

C

Matthew Ford

unread,
Sep 2, 2013, 6:32:19 PM9/2/13
to devel...@arduino.cc
Thanks Cristain, found that,  should have looked harder.

Now just one remaining point.  The IDE needs to ensure that the gcc compiler reads the file in UTF=8

I have done some mods on the gcc cmd line in the IDE and did not see anything to force a particular non-local encoding while reading the file in to compile.
(in Compiler.java)  but found this note in the GCC docs

-finput-charset=charset
Set the input character set, used for translation from the character set of the input file to the source character set used by GCC. If the locale does not specify, or GCC cannot get this information from the locale, the default is UTF-8. This can be overridden by either the locale or this command line option. Currently the command line option takes precedence if there's a conflict. charset can be any encoding supported by the system's iconv library routine. 


So it still seems that the IDE should be changed to explicitly add

-finput-charset=UTF-8



  to the gcc command line, to ensure that the local does not override the UTF-8 encoding of the IDE source files.

Can this very minor change be done in the next version please.

matthew

Paul Stoffregen

unread,
Sep 2, 2013, 7:10:19 PM9/2/13
to devel...@arduino.cc
Matthew, it's great you're looking to contribute to Arduino.  There certainly are a number of places where real work needs to be done to improve UTF8 support.  I really do hope you will apply yourself to making some good contributions.

But to be a bit frank, you're not off to a very good start here, now with 9 messages that effectively are wasting everyone time (most importantly: Cristian's time to dig up the specific place where UTF-8 encoding is forced) proposing to "fix" things that already work.

Please, I beg of you, before transmitting another word and again consuming the attention of hundreds of people who read this mail list, please put some of your own time to using the Arduino IDE with non-ascii characters on an actual Arduino board.  You'll discover UTF8 already works nicely in many places.  Some places, perhaps the LiquidCrystal library, do not support UTF8 but probably could.  Obviously good patches or pull requests are the best way to contribute, but even just finding and documenting the places where UTF8 does not actually work would be a good starting contribution to Arduino.

Please, before you hit "reply", actually use the Arduino IDE and a real Arduino board, so your 10th message on this thread can be well informed.
--
You received this message because you are subscribed to the Google Groups "Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@arduino.cc.

Weddington, Eric

unread,
Sep 2, 2013, 7:52:12 PM9/2/13
to matthe...@forward.com.au, devel...@arduino.cc


> -----Original Message-----
> From: Matthew Ford [mailto:matthe...@forward.com.au]
> Sent: Monday, September 02, 2013 4:32 PM
> Cc: devel...@arduino.cc
> Subject: Re: [Developers] Handling Non-ASCII chars -- conclusion --
> Minor change request, add -finput-charset=UTF-8 to gcc cmd line
>
>
> -finput-charset=charset
> Set the input character set, used for translation from the character
> set of the input file to the source character set used by GCC. If the
> locale does not specify, or GCC cannot get this information from the
> locale, the default is UTF-8. This can be overridden by either the
> locale or this command line option. Currently the command line option
> takes precedence if there's a conflict. charset can be any encoding
> supported by the system's iconv library routine.
>
>
> So it still seems that the IDE should be changed to explicitly add
>
>
> -finput-charset=UTF-8

AFAIK, the compiler is built with no locale information, so there should be no conflict. Hence that switch should not be needed.

But I'm happy to be proven wrong, too.

Eric Weddington


Matthew Ford

unread,
Sep 4, 2013, 5:38:02 PM9/4/13
to devel...@arduino.cc
Testing with Uno and V1.0.3 finds that
while non-ascii compiles and loads on my OS (Windows 7)
And correctly sends to the serial port (tested via a Bluetooth
connection to pfodApp)

The SerialMonitor does not handle UTF-8 in either direction.
Notes in the Serial.java advise caller to convert first
SerialMonitor does not do the necessary conversion.

So the change request is to add UTF-8 conversion for SerialMontior i/o
so that sketch developers can test their non-ASCII output.

matthew
(small screen shots attached)
nonASCIIstring.jpg
serialMonitorOUt.jpg

Matthew Ford

unread,
Sep 5, 2013, 9:55:27 PM9/5/13
to Rob Tillaart, devel...@arduino.cc
Hi Rob,
UTF-8 string handling is way beyond what I had in mind.

I was just interested in sending text containing non-ASCII chars.
If I really wanted to check what came back, I would just do a byte
compare against with a known byte array.

I tried the SerialMonitor on MacOS, same problem. Since the
SerialMonitor is the main (only) debugging tool, I think it is important
for it to also support UTF-8 in the same way the IDE editor and c++
compiler does.

here is a page describing my understanding of the current state of UTF-8
support.
http://www.forward.com.au/pfod/ArduinoProgramming/Languages/index.html

matthew

On 6/09/2013 2:56 AM, Rob Tillaart wrote:
> Can you post this on the forum please, think it should be discussed in a
> wider audience . On one hand the need for UTF8 and on the other hand the
> fact that this needs new classes e.g. UTF8-string etc. including UTF8-based
> functiones etc. Not trivial. Given the restricted memory of the Arduino
> (UNO) this should become a 3rd party library first and prove its use before
> it can be adapted in the core.
>
> So please start a discussion on the forum (1) to check the need and if
> needed (2) to trigger volunteers to code
>
> my 2 cents,
> Rob
>
>
>
> On Wed, Sep 4, 2013 at 11:38 PM, Matthew Ford
> <matthe...@forward.com.au>wrote:
>
Reply all
Reply to author
Forward
0 new messages