Trouble with Unicode (I guess) in MapInfo 9

479 views
Skip to first unread message

Nash

unread,
Oct 6, 2007, 9:14:19 AM10/6/07
to MapInfo-L
Here's what happens:
I used MapInfo 8 previously. In my tables, i usually had a field named
"Name" or something similar, containing (surprise!) the name of say, a
town, or lake or whatever. Since I live and work in Serbia, the name
would often contain characters from Serbian Latin alphabet. These
characters never showed right in a table browser, but by using the
"Arial CE" "font" they always showed on screen and printed correctly.
Recently I switched to version 9. When I load my tables, all non-
western characters are converted to underscores. After I change them
back to previous values and save the table, the whole thing works -
for the duration of the session !?! The labels show and print
correctly, but if I close the table and load it again, the underscores
are there again.
Since I'd really hate having to edit all the entries in all my tables
every time I open them, I'd like to know WTF is going on and how to
fix this. The least I'd expect is for the stupid program to save
things *as they are* and respect that.

TIA,
Nash

Lars I. Nielsen (GisPro)

unread,
Oct 6, 2007, 10:26:58 AM10/6/07
to mapi...@googlegroups.com
Hi Nash,

The TAB format attribute file, the DAT file, is basically just a
propriatary DBF file. I.e., using an 8 bit encoding.

I.e., it needs to contain the specific encoding instruction. This cannot
be set in the table, but must be set before import.

If from MIF/MID, the MIF file contains an encoding clause, which here is
always -- Charset "WindowsLatin1"

My guess would be that you need to use Charset "WindowsLatin2", or at
least something different.

Have you tried to export your table to MIF/MID, and check the "Charset"
in the MIF, and the textual attribute values in the MID ?

We're still (!) waiting for version 9 here, so I can't test it for you.
Besides, my Windows is "WindowsLatin1", so it might not be a proper test
anyway.

Best regards / Med venlig hilsen
Lars I. Nielsen
GisPro

Nash skrev:

Nash

unread,
Oct 7, 2007, 3:56:39 AM10/7/07
to MapInfo-L
I thought of it myself, and checked the *.tab file. It started like
this:

!table
!version 300
!charset WindowsLatin2

Definition Table
Type NATIVE Charset "WindowsLatin1"
Fields 2
ID Integer ;
Naziv Char (30) ;
ReadOnly
begin_metadata
... and so on...

When I changed "WindowsLatin1" to "WindowsLatin2" to match the charset
clause at the beginning, everything worked fine. However, previous
version of MapInfo used WindowsLatin1. This is just stupid, if you ask
me.

Thanks, Lars

On 6 , 16:26, "Lars I. Nielsen (GisPro)" <L...@gispro.dk> wrote:
> Hi Nash,
>
> The TAB format attribute file, the DAT file, is basically just a
> propriatary DBF file. I.e., using an 8 bit encoding.
>
> I.e., it needs to contain the specific encoding instruction. This cannot
> be set in the table, but must be set before import.
>
> If from MIF/MID, the MIF file contains an encoding clause, which here is
> always -- Charset "WindowsLatin1"
>
> My guess would be that you need to use Charset "WindowsLatin2", or at
> least something different.
>

-----------snipped the rest-----------------------------------

Nash

unread,
Oct 7, 2007, 4:26:31 AM10/7/07
to MapInfo-L
Just tried something else: I opened a tab file from a previous
version, that had both clauses "WindowsLatin1" and that worked
previously without problems. As soon as I opened it in MapInfo 9, most
Serbian national characters (except 2) were replaced with underscores.
I edited them back to normal, saved the file, and opened it again. Of
course, the underscores were there again, as well as "WindowsLatin1"
clauses. When I changed them to "WindowsLatin2" everything worked
normally, and my edits stuck for a change.

Like I said, stupid. Now I'll have to check every table before I open
it for the first time in ver. 9 and change Latin1 to Latin2... I hope
that will help me avoid editing table entries.

Mats Elfström

unread,
Oct 7, 2007, 5:50:07 AM10/7/07
to mapi...@googlegroups.com
Hi Nash!

Not at all surprising. MI Pros handling of foreign character tables (
i e non-US ) has for a long time been ad hoc more than anything else.
And since they abandoned the beta cycle of localized versions, no one
will notice before it hits the customers.

Sadly, Mats.E

2007/10/7, Nash <lon...@yahoo.com>:

Eric_Bl...@mapinfo.com

unread,
Oct 7, 2007, 6:51:30 PM10/7/07
to mapi...@googlegroups.com, Nash

There was a change made in MapInfo Professional relating to the way this works. I cannot say that this is the reason without getting a sample of the data. Again, Lars is correct about the basics of how this works! The first piece of information would be what your .TAB file says at the top in the "Charset clause" and what your default language is on your Windows system.

Here's what I can tell you.

Many years ago, the MapInfo product now known as Professional was built for Windows, Apple MacIntosh and two Unix windowing systems. Not trying to tie our code to any operating system, we developed our own internal tables to handle the conversion between the data you have (based on the charset clause) and what you were running (the OS and language settings of your system). We did not use any operating system calls to do this conversion, just our own code and tables.

Over the years, there have been additions to these tables and errors were found in some of the tables we developed years ago. Since it has been over ten years since we shipped any Professional product on a non-Windows OS, we decided to change our code and fix many of these problems by allowing Windows to do the conversions for us and get out of the business of keeping up our own tables. This change fixed a number of issues and was published in Professional 9.0.

From what you have described, it sounds like in previous versions a conversion was occurring that somehow mapped to the Arial CE font.  However, without seeing it I can't tell. As you noted, sometimes incorrect mapping is obscured by the font one uses.

Please send your information and data sample to us at techs...@mapinfo.com. Feel free to cc me as well.

Eric Blasenheim
Chief Product Architect
Pitney Bowes MapInfo Corporation



Mail List:grbounce-yvy1equaaaajbprysysrydkk7vpghp_9=mail_list=mapin...@googlegroups.com
From: Nash <lon...@yahoo.com> on 10/06/2007 06:14 AM MST
To: MapInfo-L <mapi...@googlegroups.com>
cc:
Subject: [MI-L] Trouble with Unicode (I guess) in MapInfo 9

Eric_Bl...@mapinfo.com

unread,
Oct 7, 2007, 7:25:46 PM10/7/07
to mapi...@googlegroups.com

Sorry I did not see the rest of this thread before responding. I believe that our current implementation is correct and can stand by that. As I explained previously, our old implementation was our own and therefore not standard Windows. It was quite stable for many years and in some cases, plain wrong. We tried to address that issue! I would not describe that as ad hoc!

A few more pieces of information. The reason there are two charset clauses in the TAB file is because we allow for the fact that the .TAB file itself (the one with the description) can be in a different charset than the actual data, which could be in a .DAT, DBF, XLS or many other sources. The second charset clause tells us about that!

From what I can tell now, it sounds like the data in question here is definitely Latin2 data and that Nash is running his system on some Latin1 system, probably English if I had to guess. What was happening in earlier versions is that the data was being incorrectly mapped, coming up with bogus visible values, which was noted and these problems were being masked by Arial CE, a font that was developed to mask some of this for display purposes.  That is, at run time, it changes the character set to Latin2. The reason I say masked is that many other operations in Professional were not working correctly. You were just seeing the characters correctly but they were not being operated on in that fashion.  For example, any data that you indexed or sorted without indexing was being used as if it was Latin1 data! MapInfo had no idea what your data really was!

Again, I would be happy to look at the data to be sure of this. My recommendation to anyone using Central European data would be to use Professional on a system set for one of those languages.
Your data will load faster (no conversion) and the data will be interpreted correctly.  I would definitely not put two clauses in the .TAB file.  I am surprised that does not error.


Eric Blasenheim
Chief Product Architect

Pitney Bowes MapInfo Corporation




Mail List:grbounce-yvy1equaaaajbprysysrydkk7vpghp_9=mail_list=mapin...@googlegroups.com
From: "Mats Elfström" <mats.e...@gmail.com> on 10/07/2007 11:50 AM ZE2
To: mapi...@googlegroups.com
cc:
Subject: [MI-L] Re: Trouble with Unicode (I guess) in MapInfo 9

Nash

unread,
Oct 8, 2007, 4:38:58 AM10/8/07
to MapInfo-L
Hi Eric,
thanks for your reply. I'm glad someone from MapInfo reads the threads
here. The machine I work at has the following information: Location:
Serbia and Montenegro, with formats set as Serbian Latin. So you see,
it's not configured for English.
To tell you the truth I don't know whether that's Latin1 or 2. Font
standards in Windows have always puzzled me and I can't seem to find
my way through the standards, code pages, scripts etc. I was never
able to find a good, clear, comprehensive explanatory text on the
subject, and anyway it changes all the time... Some programs support
Unicode, some don't, and so on... In my opinion it's just a mess.
I found the solution to my problem and I'm happy with it.
However I have something else to ask you now that you're here.
Are you guys ever going to make the makers of Universal Translator
change the Choose Directory dialog that lets you choose where you want
to save the translated tab or mif file? As far as I can see it's still
something like a Win 3.11 style dialog and you can't see the network
through it. One thing to add to the wish list for the future... :)

Cheers,
Nash

Mats Elfström

unread,
Oct 8, 2007, 7:03:34 AM10/8/07
to mapi...@googlegroups.com
Hi Nash!

Funny you should mention UT, because that is one component of MI Pro
which has suffered badly from the character table changes during the
years. I still think you can't translate a swedish indexed tab file
with UT.

Eric B, I used the term 'ad hoc' in this sense: "For this purpose", as
signifying a solution that has been custom designed for a specific
problem, is non-generalizable, and cannot be adapted to other
purposes.
I maintain that this is an adequate description.
I did not mean the derogatory sense: A makeshift solution, inadequate
planning, or improvised event.
But the difference, I admit, is sometimes subtle.
Now you have probably arrived at a wise and lasting solution by using
standard Windows behaviour.
However, my concern is that these changes and attempts to solve this
problem, and then changing the solution seems to have been undertaken
without asking for feedback from the target group, the non US users.
Or providing tools to update legacy map files when changes were deemed
necessary.

Regards, Mats.E


2007/10/8, Nash <lon...@yahoo.com>:

Kianastar

unread,
Oct 8, 2007, 10:56:00 AM10/8/07
to MapInfo-L
Hi guys,

It looks like you have found a solution for your immediate problem,
Nash, but I thought I would post some observations on how to deal with
non-Latin1 data in MIPro.

Firstly a word on codepages - the 8-bit codepage only has enough room
for 191 characters, unlike unicode which is 16-bit and supports
thousands. This means that the entire spectrum of characters is split
up into a number of sets. These sets can also be referred to by their
ISO code. WindowsLatin1 is ISO 8859-1 (or ISO8859_1 in MIPro). For
more info check out http://en.wikipedia.org/wiki/ISO_8859-1.

Now that PB MapInfo is producing many countries for StreetPro using a
variety of character sets (Latin1, Latin2, Baltic, Cyrillic, Greek,
Turkish, Arabic), we had to change our processes to make sure non-
Latin characters are preserved. Firstly you generally need to change
the regional settings for non-unicode programs to the language that
you are using, otherwise you will see all diacritic characters as
underscores. You can do this in Windows by going to Control Panel,
Regional and Language Options, then click the Advanced tab and change
the language in the drop-down. You will need to restart your computer
for this to take effect.

Next, to see characters correctly in browsers you may need to change
the text font to either your local Arial variant (Arial CE, Arial CYR
etc) or a unicode font such as Tahoma. Also, the label font may need
to be changed as well for correct characters on maps.

Lastly, if you get the underscores when saving a new copy of a table,
you should use a commit statement specifying the charset. eg
Commit table newtab as "filename.tab" Type Native Charset
"WindowsLatin2", otherwise the table may save in WindowsLatin1 and
you'll lose your non-Latin1 chars.
I have found in the past that if you just change WindowsLatin1 to
WindowsLatin2 in the tabfile, the special characters sometimes convert
to underscores permanently (possibly because the DAT file must be
changed as well).

Basically, make sure you always keep an original copy just in case you
lose your names.


Good Luck!
---
Anika Claire
Sr Data Engineer, PB MapInfo EMEA Data Team

Nash

unread,
Oct 9, 2007, 5:39:47 AM10/9/07
to MapInfo-L
Thanks for the reply, Anika (I'm guessing it's your first name - or is
it Claire?),

I don't have an "Advanced" tab in that dialog in Control Panel,
probably because I don't have administrator privileges. I'll try and
catch the system administrator and get him to change that :)

Cheers,
Nash

On 8 , 16:56, Kianastar <kianas...@gmail.com> wrote:
> Hi guys,
>
> It looks like you have found a solution for your immediate problem,
> Nash, but I thought I would post some observations on how to deal with
> non-Latin1 data in MIPro.
>
> Firstly a word on codepages - the 8-bit codepage only has enough room
> for 191 characters, unlike unicode which is 16-bit and supports
> thousands. This means that the entire spectrum of characters is split
> up into a number of sets. These sets can also be referred to by their
> ISO code. WindowsLatin1 is ISO 8859-1 (or ISO8859_1 in MIPro). For

> more info check outhttp://en.wikipedia.org/wiki/ISO_8859-1.

> > On 8 , 01:25, Eric_Blasenh...@mapinfo.com wrote:- -
>
> - -

Reply all
Reply to author
Forward
0 new messages