UTF BOM

34 views
Skip to first unread message

Francois

unread,
Jul 4, 2018, 4:33:01 PM7/4/18
to MapInfo-L
Hi,

I create some export files in UTF toward a csv file, which I then upload into a MySQL database. Mapinfo adds three bytes at the beginning of the file, called BOM (for byte order mark). Unfortunately, MySQL does not understand these three bytes and it corrupts the first field of the first line. Removing this outside of MapInfo would be difficult, because the file is many MB large. Do you know if there is a way to generate the csv file without the BOM prefix? Thanks,


Francois Blanc

Eric Blasenheim

unread,
Jul 5, 2018, 10:21:20 AM7/5/18
to MapInfo-L
I don't know of any way to force MapInfo Pro to not write the BOM characters. These characters are extremely useful as any modern text editor looks for them and gets the encoding correct from them . From my side of things, I have often heard about users having problems reading textual data files that were UTF-8. The BOM in MapInfo Pro, will force that charset choice to be automatically set which is a good thing because when done wrong, you just have to start over. 

 I am surprised to see the MySQL still can't handle them.  There is a bug report https://bugs.mysql.com/bug.php?id=10573 and just  like with us, people are fighting about whether this is a bug or enhancement!

I suppose a feature request to us at PB would be to try and auto-detect UTF8 when the BOM is not present.  Nice idea but a bit tricky to do because we might have to read the entire file to find a UTF8 character that is different from pure ASCII. All pure ASCII characters are exactly the same in UTF8.  But maybe we should try!

As for what you can do now,  you could either write code to post process the file or to be sure that you really need UTF8.  If your data is not ASCII but just Western European the other character sets that support this might be a much easier choice. They have no BOM.  ISO 8859-1 or Windows Latin 1 (1252) are both options and I know the ISO one is supported by MySQL. You just have to be sure that your string data is limited to the ASCII plus the other Western European characters .

Regards,
Eric Blasenheim
Pitney Bowes Software

Andrew Harfoot

unread,
Jul 6, 2018, 7:25:31 AM7/6/18
to mapi...@googlegroups.com
Hi Francois,

I'm not aware of a way to prevent MapInfo from writing the BOM characters, however, an alternative would be to use the open source GDAL library to do the conversion from MapInfo TAB to CSV. The GDAL CSV driver (http://www.gdal.org/drv_csv.html) has a WRITE_BOM creation option. Equally, using GDAL it should be possible to go from MI TAB straight into MySQL.

Cheers,

Andy
--
--
You received this message because you are subscribed to the
Google Groups "MapInfo-L" group.To post a message to this group, send
email to mapi...@googlegroups.com
To unsubscribe from this group, go to:
http://groups.google.com/group/mapinfo-l/subscribe?hl=en
For more options, information and links to MapInfo resources (searching
archives, feature requests, to visit our Wiki, visit the Welcome page at
http://groups.google.com/group/mapinfo-l?hl=en

---
You received this message because you are subscribed to the Google Groups "MapInfo-L" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapinfo-l+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


-- 
Andy Harfoot

GeoData Institute
University of Southampton
Southampton
SO17 1BJ

Tel:  +44 (0)23 8059 2719

www.geodata.soton.ac.uk

Francois

unread,
Jul 7, 2018, 10:17:32 AM7/7/18
to MapInfo-L
Thanks, I understand the problem is more on MySql, I will deal with it in post process. Regards.
Reply all
Reply to author
Forward
0 new messages