Encoding issue in Eclipse since v6.x

Nicolas Micoud

unread,

Jan 21, 2019, 4:11:35 AM1/21/19

to iDempiere

Encoding issue in Eclipse

Since I migrated my sources from v5.1 to v6.x i notices issue with encoding (Eclipse replace accentued characters by ?) ; eg :

Doing some search, I see it's because of a change made in <package>/.settings/org.eclipse.core.resources.prefs

Incrimited line is

encoding/<project>=UTF-8

If I remove this line from those config files, error are gone.

So wondering if that modification should be done in iDempiere repository or only in "my" repo.

Thanks,

Nicolas

Carlos Antonio Ruiz Gomez

unread,

Jan 21, 2019, 6:55:58 AM1/21/19

to idem...@googlegroups.com

Hi Nicolás, it sounds like your java sources are not coded in UTF-8

I know in Ubuntu you can convert to UTF-8 using iconv - don't know in windows.

Regards,

Carlos Ruiz

El 21/01/19 a las 10:11, Nicolas Micoud escribió:

Nicolas Micoud

unread,

Jan 21, 2019, 7:18:54 AM1/21/19

to iDempiere

I agree, but I haven't changed anything between v5.1 and v6.

Just to be sure, from v6, sources are now in UTF-8 ?
I think this is this commit : https://bitbucket.org/idempiere/idempiere/commits/a3cef71ac2728005b3dcc045da5198380c27cf54
If so, i will convert all new and customized classes to UTF-8 little by little or use iconv on a ubuntu system.

Thanks,

Nicolas

Carlos Antonio Ruiz Gomez

unread,

Jan 21, 2019, 7:57:02 AM1/21/19

to idem...@googlegroups.com

Hi Nicolas, I think we have not defined officially the standard preferences, but things would be easier if all developers work with the same.

In Eclipse > Window > Preferences > General > Workspace
Text file encoding: UTF-8

Text file line delimiter: Unix

We have also a problem with a mix of delimiters in our sources - something would make things easier is if we convert all sources with unix delimiter.

Many times the patches cannot be applied using command line mercurial because of the file containing mix of delimiters.

Count today:

4455 java sources

70 'with CRLF, LF line terminators'

66 with 'UTF-8 Unicode text' <- which is not a good idea to have UTF-8 characters in core

Does it sound OK to make a commit to convert those 70 to unix delimiter?

Regards,

Carlos Ruiz

El 21/01/19 a las 13:18, Nicolas Micoud escribió:

Nicolas Micoud

unread,

Jan 21, 2019, 8:46:38 AM1/21/19

to iDempiere

You're right, we should all use same parameters.

Ok for me about the commit.

But that may be added to the wiki ?

Nicolas Micoud

unread,

Jan 23, 2019, 10:48:47 AM1/23/19

to iDempiere

Hello Carlos,

I've started to work on this.
It's possible to use iconv on Windows using cygwin and I was able to convert several classes to UTF-8. Work in progress :)

But... I've made some tests using cygwin on org.adempiere.base\src\org\compiere\process.

Here's the output when using the command file *.java --mime

Nico@nicofixe ~
$ file process/*.java --mime
process/ClientProcess.java:        text/plain; charset=us-ascii
process/CreateForeignKey.java:     text/plain; charset=us-ascii
process/CreateTableIndex.java:     text/plain; charset=us-ascii
process/DatabaseViewDrop.java:     text/plain; charset=us-ascii
process/DatabaseViewValidate.java: text/plain; charset=us-ascii
process/DocAction.java:            text/plain; charset=us-ascii
process/DocActionEventData.java:   text/x-java; charset=us-ascii
process/DocActionTemplate.java:    text/plain; charset=us-ascii
process/DocOptions.java:           text/plain; charset=us-ascii
process/DocumentEngine.java:       text/plain; charset=us-ascii
process/FactReconcile.java:        text/x-java; charset=us-ascii
process/FactReconciliation.java:   text/x-java; charset=us-ascii
process/PosKeyGenerate.java:       text/x-java; charset=us-ascii
process/ProcessCall.java:          text/plain; charset=us-ascii
process/ProcessInfo.java:          text/plain; charset=us-ascii
process/ProcessInfoLog.java:       text/plain; charset=us-ascii
process/ProcessInfoParameter.java: text/plain; charset=us-ascii
process/ProcessInfoUtil.java:      text/plain; charset=us-ascii
process/ProjectClose.java:         text/plain; charset=us-ascii
process/RemoteMergeDataVO.java:    text/plain; charset=us-ascii
process/RemoteSetupVO.java:        text/plain; charset=us-ascii
process/RemoteUpdateVO.java:       text/plain; charset=us-ascii
process/ServerProcessCtl.java:     text/plain; charset=us-ascii
process/StateEngine.java:          text/plain; charset=us-ascii
process/SvrProcess.java:           text/plain; charset=us-ascii
process/TableIndexDrop.java:       text/plain; charset=us-ascii
process/TableIndexValidate.java:   text/plain; charset=us-ascii

I would expect to have UFT-8 instead of ascii.
But perhaps is because of sources were downloaded in Windows and then copied elsewhere and then analysed by cygwin ?

WDYT ?

Carlos Antonio Ruiz Gomez

unread,

Jan 23, 2019, 11:25:57 AM1/23/19

to idem...@googlegroups.com

Hi Nicolas, I think us-ascii is a subset of utf-8, so your finding is not a problem.

But, when you find sources with charset utf-8 - that's usually a problem, that's better to be converted using native2ascii to encoded utf characters.

I've seen some programs with comments in other languages - and that's a problem, by standard we encourage that comments must be always in english.

BTW, I made a wrong count yesterday:

from 4455 java sources

2015 have CRLF line terminators

So, 45% of sources is a really big change on our code - that will affect also checking the history :(

Regards,

Carlos Ruiz

El 23/01/19 a las 16:48, Nicolas Micoud escribió:

Nicolas Micoud

unread,

Jan 24, 2019, 2:56:51 AM1/24/19

to iDempiere

Hello Carlos,

I'm not sure to understand :-/

AFAIU, sources should be in UTF-8 (or us-ascii as is a subset of UTF-8)?
But you also said "when you find sources with charset utf-8 - that's usually a problem".
nb: ATM, all my comments are in French (with lots of accentued characters)

So, sources should be in UTF-8 or not ?

Thanks,

Carlos Antonio Ruiz Gomez

unread,

Jan 24, 2019, 8:50:17 AM1/24/19

to idem...@googlegroups.com

Hi Nicolas,

When the file command shows us-ascii it means that all the file is composed by the set of 128 characters common ascii characters.

This is the ideal case as basically all editors, printers, etc can interpret these character set without problems

Now, when you execute the file command, and it shows UTF-8, it means that the file contains characters beyond those commonest 128 characters.

That becomes a problem when distributing the sources, because for example I write my code in UTF-8, but your eclipse is configured with ISO-8859 - then it will show garbage in the position of the strange characters.

And also happens the opposite, when somebody write using ISO-8859 (for example), people reading the code in eclipse configured as UTF-8 will see garbage.

BTW, I guess since the origin of the project when adempiere started and copied the source from compiere some (if not all) the UTF-8 core files got corrupted.

Look for example the file org.adempiere.base/src/org/compiere/model/MCurrency.java line 285

Seems like that was corrupted since revision 5 of the project.

Or probably it was broken on compiere also, because I see the same problem in their sources.

That's the problem created when all developers don't agree in one character set :-)

Also, for core, comments in non-english languages are heavily discouraged (to not say forbidden) - this is in part because of this character set issue, but mostly because the maintainers must be able to read comments, and we that's possible just establishing a common language.

Now, for your customizations, comments with UTF-8 are OK, while everybody reads with the same character set - but, in some cases properties and variable contents are better to be translated to unicode-encoded notation - which is the work that native2ascii does. That's bad for comments because it makes them unreadable, but is good for variable contents and properties file as it guarantees that characters are managed correctly when executing.

Regards,

Carlos Ruiz

El 24/01/19 a las 8:56, Nicolas Micoud escribió:

Nicolas Micoud

unread,

Jan 24, 2019, 10:23:56 AM1/24/19

to iDempiere

Hello Carlos,

Thanks for the clear explanation :)

So, I will update all my Eclipse to use UTF-8 and update sources so they match it.
nb: I always use ascii characters to define variables, but comments are easier to read in "accentued" French :) ; same for Spanish language i guess.