Re: [MI-L] Using Thai character encoding

593 views
Skip to first unread message

Uffe Kousgaard

unread,
Feb 10, 2013, 7:39:30 AM2/10/13
to mapi...@googlegroups.com
Hi,

Charset is documented in mapbasic help file. But nothing that looks like Thai.

I know you can store UTF-8 in SHP files and register these in Mapinfo. Technically that should allow you to get any characters into MapInfo.

Regards
Uffe Kousgaard


Jelmer Baas wrote:
Hello everyone,

I've been testing around with MapInfo and Character Sets to try to store Thai data.

MI Pro 10.5 supports ISO8859_1-ISO8859_9, but not 10 or 11, which is what Thai is. There's also no mention of windows-874. I tried changing the TAB file manually, to start with
!table
!version 1000
!charset ISO8859_11

Tested in both 11.0 and 11.5, both start up with an error: "Unsupported character set: ISO8859_11. Unable to open table Test".

Is there any other way to get Thai characters in my MapInfo tables? Possibly to set MI to a "Unicode mode" and manually change the encoding when I need to display something?

P.S.: From version 10.5 the entire CharSet clause is gone from the help file...

Regards,
Jelmer Baas
Speer IT


--
--
You received this message because you are subscribed to the
Google Groups "MapInfo-L" group.To post a message to this group, send
email to mapi...@googlegroups.com
To unsubscribe from this group, go to:
http://groups.google.com/group/mapinfo-l/subscribe?hl=en
For more options, information and links to MapInfo resources (searching
archives, feature requests, to visit our Wiki, visit the Welcome page at
http://groups.google.com/group/mapinfo-l?hl=en
 
---
You received this message because you are subscribed to the Google Groups "MapInfo-L" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapinfo-l+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Eric Blasenheim

unread,
Feb 10, 2013, 1:06:33 PM2/10/13
to mapi...@googlegroups.com
Windows Thai is a supported character set in MapInfo Professional. The text representation of it is "WindowsThai" and should appear in any table you create using MapInfo Professional when your non-Unicode system settings are set to "thai". When you run MapInfo Professional under these settings and you run the following print command in the mapbasic window, the result in the message window should be "WindowsThai"

print systeminfo(5)

Windows Thai is not exactly the same as the ISO version but they are very close and in most cases WindowsThai is a superset http://en.wikipedia.org/wiki/Windows-874 so there should be no loss and when running Windows Thai, as in any Windows setup, your native code is the Windows version, not the ISO.
So there should be no issue creating data for Windows Thai in that environment but you must be in that environment.
Note that editing the TAB file and declaring that the data is in a character set does not make it so. You have to be in that environment or have carefully crafted tools.
 
As to Uffe's statement about  UTF-8 that is not quite accurate. We support reading files with UTF-8 but it only works when all the data is actually in one character set.  A number of data suppliers, who supply data completely in one character set, actually store it in UTF-8 so this allows for reading of that data and automatic conversion into the windows character set. If you create a UTF-8 DBF file where the data has strings whose representation lies outside the system character set, then those unrepresentable characters will be replaced by an underscore "_" character.
Again, as Professional is not a Unicode program, there is no magic in supporting UTF-8.
I will send a note that a few suported character sets are missing from the MapBasic  section on charset and the section does not seem to be in the current MapInfo Professional help. I am not positive it ever was being a MapBasic clause.
 
Eric Blasenheim
Pitney Bowes Software

Jelmer Baas

unread,
Feb 11, 2013, 3:48:42 AM2/11/13
to mapi...@googlegroups.com
Eric, Uffe,

Thank you both for your replies. I will continue my testing using WindowsThai as a character set.

As to the MapBasic statement, this is what the 10.5 docs say (has been removed in 11/11.5!)


CharSet clause

Purpose

Specifies which character set MapBasic uses for interpreting character codes.

See the MapInfo Professional User Guide documentation for changes affecting this clause.

Syntax

CharSet char_set 

char_set is a string that identifies the name of a character set; see table below.

Description

The CharSet clause specifies which character set MapBasic should use when reading or writing files or tables. Note that CharSet is a clause, not a complete statement. Various file-related statements, such as the Open File statement, can incorporate optional CharSet clauses.

What Is A Character Set?

Every character on a computer keyboard corresponds to a numeric code. For example, the letter "A" corresponds to the character code 65. A character set is a set of characters that appear on a computer, and a set of numeric codes that correspond to those characters.

Different character sets are used in different countries. For example, in the version of Windows for North America and Western Europe, character code 176 corresponds to a degrees symbol; however, if Windows is configured to use a different character set, character code 176 may represent a different character.

Call SystemInfo(SYS_INFO_CHARSET) to determine the character set in use at run-time.

How Do Character Sets Affect MapBasic Programs?

If your files use only standard ASCII characters in the range of 32 (space) to 126 (tilde), you do not need to worry about character set conflicts, and you do not need to use the CharSet clause.

Even if your files include "special" characters (for example, characters outside the range 32 to 126), if you do all of your work within one environment (e.g., Windows) using only one character set, you do not need to use the CharSet clause.

If your program needs to read an existing file that contains "special" characters, and if the file was created in a character set that does not match the character set in use when you run your program, your program should use the CharSet clause. The CharSet clause should indicate what character set was in use when the file was created.

The CharSet clause takes one parameter: a string expression which identifies the name of the character set to use. The following table lists all character sets available.


Character Set
Comments

"Neutral"

No character conversions performed.

"ISO8859_1"

ISO 8859-1 (UNIX)

"ISO8859_2"

ISO 8859-2 (UNIX)

"ISO8859_3"

ISO 8859-3 (UNIX)

"ISO8859_4"

ISO 8859-4 (UNIX)

"ISO8859_5"

ISO 8859-5 (UNIX)

"ISO8859_6"

ISO 8859-6 (UNIX)

"ISO8859_7"

ISO 8859-7 (UNIX)

"ISO8859_8"

ISO 8859-8 (UNIX)

"ISO8859_9"

ISO 8859-9 (UNIX)

"PackedEUCJapanese"

UNIX, standard Japanese implementation.

"WindowsLatin2"
"WindowsArabic"
"WindowsCyrillic"
"WindowsGreek"
"WindowsHebrew"
"WindowsTurkish"

Windows Eastern Europe

"WindowsTradChinese"

Windows Traditional Chinese

"WindowsSimpChinese"

Windows Simplified Chinese

"WindowsJapanese"


"WindowsKorean"


"CodePage437"

DOS Code Page 437 = IBM Extended ASCII

"CodePage850"

DOS Code Page 850 = Multilingual

"CodePage852"

DOS Code Page 852 = Eastern Europe

"CodePage855"

DOS Code Page 855 = Cyrillic

"CodePage857"


"CodePage860"

DOS Code Page 860 = Portuguese

"CodePage861"

DOS Code Page 861 = Icelandic

"CodePage863"

DOS Code Page 863 = French Canadian

"CodePage864"

DOS Code Page 864 = Arabic

"CodePage865"

DOS Code Page 865 = Nordic

"CodePage869"

DOS Code Page 869 = Modern Greek

"LICS"

Lotus worksheet release 1,2 character set

"LMBCS"

Lotus worksheet release 3,4 character set

You never need to specify a CharSet clause in an Open Table statement. Each table's .TAB file contains information about the character set used by the table. When opening a table, MapInfo Professional reads the character set information directly from the .TAB file, then automatically performs any necessary character translations.

To force MapInfo Professional to save a table in a specific character set, include a CharSet clause in the Commit Table statement.

MapBasic 2.x CharSet Syntax

MapBasic version 2.x supported three character sets: "XASCII", "ANSI" and "MAC". Older programs that refer to those three character-set names will still compile and run in later versions of MapBasic; however, continued use of the 2.x-era character set names is discouraged.

CharSet "XASCII" specifies the same character set as CharSet "CodePage437".

CharSet "MAC" specifies the same character set as CharSet "MacRoman".

When a program runs on Windows, CharSet "ANSI" specifies whatever character set Windows is currently using. Example: When reading a file created by a DOS application, you should specify the "CodePage437" character set, as shown in the following example.

Open File "parcel.txt" 
	For INPUT As #1 
	CharSet "CodePage437" 

I also searched through the help file for the "WindowsThai" and found no mention at all.


Regards,

Jelmer

Eric Blasenheim

unread,
Feb 11, 2013, 7:37:12 AM2/11/13
to mapi...@googlegroups.com
Yes I already sent a note saying the docs are incomplete not listing Thai, Vietnamese and BalticRim all of which are supported.
Reply all
Reply to author
Forward
0 new messages