I'm not sure why I can't reply to the last post of this thread, but...
Pardon me for jumping in at the end of the discussion; I have a
followup question.
I'm been given a road database sample file in TAB format for a Chinese
city. The road names in the TAB file are all numbers. They make
reference to a separate data file that can convert the numbers to
Chinese road names. The encoding of the data file (according to
Visual Studio) is "Unicode (Big-Endian) - Codepage 1201". This file
looks fine in the visual studio editor.
I have written a .NET program that I use to convert the source TAB
file to a TAB file with different columns, one of which will be the
names of the roads in Chinese. I am using mitab to create the TAB
file. Mitab is creating the tab file with "!charset Neutral" and when
I open it in the Geoset Manager all the road names appear as ????. I
have tried various guesses at how !charset should be set to show the
Chinese correctly, but I really haven't a clue what values can go
here.
What can I do to get the resulting TAB file into a form that will show
Chinese in Geoset Manager?
Thanks for any help you can provide.
Shindigo.
On Feb 19, 3:35 am, "Lars I. Nielsen (GisPro)" <
L...@gispro.dk> wrote:
> : Hi Eric,
> Always good to read your in-depth comments.
> >Currently there is no absolutely safe way to use UTF-8 data. If the data is entirely ASCII, then I would just put WindowsLatin1 as the character set. The binary codes are identical. If you have data with Western European diacritics (accents, umlauts, tildes, Danish, Swedish special characters) then you will not be able to store them as UTF-8 and have them read correctly by Professional. You will need to convert them to Windows Latin1 before storing.
> Isn't this a perfect example of a case when one should use Charset Neutral ? ;-)
> >We are looking at support for UTF-8.As far as I know, UTF-8 isn't a character set as such, it's just an 8 bit encoding. You'd still have to work with different 8 bit charsets, like Latin1 vs. Latin2 etc., unless of course if you're always encoding/using the full 16 bitUnicodecharset.
> Support for 16 bitUnicode(in 8 bit Windows) is of course a whole different ballgame.
> Or am I missing a point or two here ?Best regards / Med venlig hilsen Lars I. Nielsen GIS & DB Integrator GisProEric...@mapinfo.comskrev:I will forward this documentation error on!The explanation that Spencer gave is pretty reasonable except for the part about display.Charset_neutral in a .tab means that we will make no attempt to convert and just hope and pray that it works. You might be able to tell that I don't like CharSet neutral and would have pushed for its removal other than the fact that we have it in the past. It is like hoping to line up data with no coordinate system.What is important to understand is that putting in a CharSet is a declaration just like Coordsys. It declares that all the characters in the table are from that set. At runtime if the charset matches your system, no conversion occurs. If it is anything else, we convert. Not all conversions are very useful and any conversions that cannot occur become an underscore ("_") character. Note that your system will always be one of the Windows character sets. So if you have data in ISO8859-1 (Latin1) and your locale is one of the Western Europe or English speaking countries, a conversion still occurs.I must have missed the thread about UTF-8 so here's what I can tell about that.We are looking at support for UTF-8.Currently there is no absolutely safe way to use UTF-8 data. If the data is entirely ASCII, then I would just put WindowsLatin1 as the character set. The binary codes are identical. If you have data with Western European diacritics (accents, umlauts, tildes, Danish, Swedish special characters) then you will not be able to store them as UTF-8 and have them read correctly by Professional. You will need to convert them to Windows Latin1 before storing. Conversion can be fairly easily done via the Windows API and there are probably tools out there.Note that what glyphs get used at draw time is a completely different issue. Any strings sent to Windows for drawing are expected to already be converted to the current Windows set.Eric Blasenheim
> Chief Product ArchitectGroup Technology Office
> Pitney Bowes Business InsightMail List:grbounce-YvY1eQUAAAAJBprYSySRydkk7vpghP_9=mail_list=mapin...@googlegroups.comFrom:Uffe Kousgaard<
uf...@routeware.dk>on 02/18/2009 05:20 PM CETTo:mapi...@googlegroups.comcc:Subject:[MI-L] Re: Charset in TAB files
> Hi Spence,
> Thanks for the explanation and comparison with coordsys.
> My need is actually to write TAB files, using MITAB but from aunicode
> string source. I think I will have to make some changes to MITAB to get
> it producing more exact TAB files than just "neutral". Of course, trueunicodeisn't possible (and that was a reply to that other user).