OCE non-ASCII filename support under MinGW

65 views
Skip to first unread message

cirilo....@gmail.com

unread,
Feb 28, 2017, 9:59:04 AM2/28/17
to oce-dev
Hi folks,

 In the past year I added STEP export and viewing support to the free KiCad electronics CAD software. The earliest version of OCE that we use is 0.17.2.  Under MinGW, OCE does not treat the filenames as UTF-8 despite the OCCT team's claim to support non-ASCII filenames in the underlying version of OCCT.  This is due (in part) to the fact that MinGW does not (and will not) expose the Microsoft extensions to the STL which allow OCE/OCCT to work as claimed when built with the MSVC toolchain.

 I believe that it will not be too painful to fix this by using gcc-specific extensions to the STL and using compiler directives to ensure that the code is only inserted when absolutely required.

 Is there a reasonable chance that such changes would be accepted to OCE if I were to work on this, or is it something I should take up with OCCT?

cheers,
Cirilo

benjami...@compositence.de

unread,
Mar 10, 2017, 3:01:15 PM3/10/17
to oce-dev
Hi Cirilo,

I have been contributing to OCE and to OCCT. My impression was that both projects accept contributions and they are even willing to support you in doing your contributions to a certain degree.

Since I am drifting towards OCCT right now and I am also very, very much interested in UTF-8 support for OCCT with MinGW compilers, I would propose to do the contributions to OCCT. I had already registered this issue as 0027585 there (https://tracker.dev.opencascade.org/view.php?id=27585), but later this issue has switched to be a bugfix for the MSVC code and has been marked as fixed. When contributing code I guess another issue should be registered. I have been indending to work on that topic in the course of this spring, but I cannot promise that I will find the opportunity. I would be glad to hear from you, if you really work on MinGW UTF-8 support.

Benjamin

cirilo....@gmail.com

unread,
Mar 15, 2017, 4:50:23 AM3/15/17
to oce-dev
Thanks Benjamin,

  I had a look at the source and it is fairly easy to follow and overall is well structured. I implemented a hack (against OCE 0.17.3) to support non-ASCII characters within MinGW - the hack is intrusive and breaks everything other than a MinGW build:


 In my opinion, working around the problem in various project sources is not the best solution in this case. I think the best solution would be to patch the GNU STL implementation so that std::i/o/f/stream under MinGW would interpret char* as UTF-8, convert to UTF-16, and use WINAPI _wopen().  That solution would fix this problem for all MinGW projects - and possibly break projects which use open( char* ) where the string is meant to be 8-bit characters interpreted in the current Code Page. I'm just not keen on trying to push a patch to GCC, but I hope you find the MinGW hack useful.

- Cirilo

benjami...@compositence.de

unread,
Mar 16, 2017, 12:55:01 PM3/16/17
to oce-dev
Hi Cirilo,

I have written this question to the MinGW-w64-Public mailing list (see https://sourceforge.net/p/mingw-w64/mailman/message/35318539/) and if I have understood the answers correctly, then Windows does not really support UTF-8, but Microsoft has only introduced non-standard extensions to MSVC to enable UTF-8 file support. That seems to be the reason why they are not willing to modify MinGW-w64 to work around operating system flaws.

Anyway, thank you for the link to your patch. I will definitely have a look at it in the next months.

Benjamin

Cirilo Bernardo

unread,
Mar 16, 2017, 9:14:09 PM3/16/17
to oce...@googlegroups.com
Thanks Benjamin,

I see the typical response in that thread: "but you can use this horrible
non-portable workaround in your code because our code is perfect". I
can understand why GCC won't add the Microsoft extension
std::fstream::open( const wchar_t * ), in fact Microsoft suggested this
as an enhancement of the STL but at least two such proposals were
rejected. However, GCC remains broken in MinGW because it cannot
deal with something which everyone needs to do every day, which is to
open the files that users need. I think the better solution is for the
GCC STL to do a UTF8 to UTF16 conversion internally and only use
the UTF16 open() on Microsoft platforms. Microsoft's UTF8 support
is still pretty new and there is a lot of work to do yet. I'd rather the
problem be fixed on GCC's end rather than wait another 5 years or
so until Microsoft change their FileOpenA() to interpret the string
as UTF8.

In the meantime at least we have an ugly hack to apply to OCE to
get the job done, but in reality this problem affects all MinGW
software and is especially troublesome for any software which uses
std::fstream.

- Cirilo
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "oce-dev" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/oce-dev/1MCfrE19Kvc/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> oce-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages