I hope this helps
В Thu, 10 Sep 2009 15:21:06 -0700 (PDT)
silasmerlin <silas...@gmail.com> написал:
>
> Hello all.
>
> I am attempting to create a new database from a set of shapefiles
> containing parcel data (i.e. Polygons with parcel attributes like
> owner, parcel number, address, and more). While the majority of the
> files loaded correctly into the spatial database, a few (5 of 120)
> would not load and gave a message that reads:
>
> load shapefile error:
> Invalid character sequence
--
Alexander Bruy
mailto: alexand...@gmail.com
just a few technical points (I hope this may help you):
a) SQLite/SpatiaLite actually use UTF-8 to encode TEXT
UTF-8 is an Universal charset.
This means you can encode any 'odd' national alphabet
using UTF-8: Latin, Cyrillic, Arabian, Hebrew, Chinese,
Japanese, Korean and so on
b) Shapefiles (namely: DBF) use an implicit/undeclared
charset to encode TEXT
Usually, this one is the 'standard' platform charset:
I intend, the standard one used by the system on which
the shapefile was originally generated
E.G. I'll expect:
- a DBF generated on Linux to adopt UTF-8 (universal)
- a DBF generated in the USA may easily use ASCII
- a DBF generated in Italy (Windows) will use CP1252 (Latin)
- a DBF generated in Russia (Windows) will use CP1251 (Cyrillic)
- a DBF generated in Israel (Windows) will use CP 1255 (Hebrew)
and so on
Because in DBF there is no explicit charset declaration (as in
XML or HTML), you must GUESS it by yourself , by trial and error.
Sorry, but there isn't any other possible way.
c) when you import a Shapefile into Spatialite, the GNU 'libiconv'
is used to convert any local charset into UTF-8: and you have
to declare the charset used to encode the shapefile.
This falls under your responsibility.
----------------
An 'Invalid character sequence' error is raised by 'libiconv'
when an encoding error is found. I.e. some multi-byte sequence
was found that doesn't corresponds to any possible character
applying the declared charset rules.
Easily, this means you have selected the wrong charset code.
Alternatively, the DBF is broken (less probable, but not impossible)
As a suggestion, you can try opening the incriminated DBF using
OpenOffice Calc (spreadsheet).
This too is libiconv-based, and requires an explicit charset
selection (exactly as Spatialite does).
May be you can get a most useful diagnostic, and some immediate
'visual' feedback too.
Usefull hint: I'll assume your shapefiles does contain TEXT
in some European language. Pay close attention to 'special chars'
using diacritics (commonly used in Italian, French, German, Spanish,
Swedish ...) as: 'ò' 'à' 'ñ' 'Ç' 'ö' 'Ø'
They can help you guessing the 'right' charset to be used.
bye Sandro