load shapefile error: Invalid character sequence

738 views
Skip to first unread message

silasmerlin

unread,
Sep 10, 2009, 6:21:06 PM9/10/09
to SpatiaLite Users
Hello all.

I am attempting to create a new database from a set of shapefiles
containing parcel data (i.e. Polygons with parcel attributes like
owner, parcel number, address, and more). While the majority of the
files loaded correctly into the spatial database, a few (5 of 120)
would not load and gave a message that reads:

load shapefile error:
Invalid character sequence


All of these shapefiles are from the same original source, one master
shapefile that contained all the parcel data for the county in
question.

My question is: What is the invalid character sequence that would
prevent it from loading? Is is certain characters that are
unacceptable to Spatialite? If so, which characters? It must be noted
that the shapefiles in question will load fine as virtual tables, but
I want the data to be permanently added to the database.

Thanks!

Alexander Bruy

unread,
Sep 11, 2009, 1:06:07 AM9/11/09
to spatiali...@googlegroups.com, silasmerlin
Hi,
some time ago I have similar problem. Probably, some shapefiles
has data in UTF-8, so you need to change encoding before load them.
Also this error may occur because in some records there are incorrect unicode sequences.

I hope this helps

В Thu, 10 Sep 2009 15:21:06 -0700 (PDT)
silasmerlin <silas...@gmail.com> написал:

>
> Hello all.
>
> I am attempting to create a new database from a set of shapefiles
> containing parcel data (i.e. Polygons with parcel attributes like
> owner, parcel number, address, and more). While the majority of the
> files loaded correctly into the spatial database, a few (5 of 120)
> would not load and gave a message that reads:
>
> load shapefile error:
> Invalid character sequence


--
Alexander Bruy
mailto: alexand...@gmail.com

a.fu...@lqt.it

unread,
Sep 11, 2009, 4:34:03 AM9/11/09
to spatiali...@googlegroups.com
Hi,

just a few technical points (I hope this may help you):

a) SQLite/SpatiaLite actually use UTF-8 to encode TEXT
UTF-8 is an Universal charset.
This means you can encode any 'odd' national alphabet
using UTF-8: Latin, Cyrillic, Arabian, Hebrew, Chinese,
Japanese, Korean and so on

b) Shapefiles (namely: DBF) use an implicit/undeclared
charset to encode TEXT
Usually, this one is the 'standard' platform charset:
I intend, the standard one used by the system on which
the shapefile was originally generated

E.G. I'll expect:
- a DBF generated on Linux to adopt UTF-8 (universal)
- a DBF generated in the USA may easily use ASCII
- a DBF generated in Italy (Windows) will use CP1252 (Latin)
- a DBF generated in Russia (Windows) will use CP1251 (Cyrillic)
- a DBF generated in Israel (Windows) will use CP 1255 (Hebrew)
and so on

Because in DBF there is no explicit charset declaration (as in
XML or HTML), you must GUESS it by yourself , by trial and error.
Sorry, but there isn't any other possible way.

c) when you import a Shapefile into Spatialite, the GNU 'libiconv'
is used to convert any local charset into UTF-8: and you have
to declare the charset used to encode the shapefile.
This falls under your responsibility.

----------------

An 'Invalid character sequence' error is raised by 'libiconv'
when an encoding error is found. I.e. some multi-byte sequence
was found that doesn't corresponds to any possible character
applying the declared charset rules.

Easily, this means you have selected the wrong charset code.
Alternatively, the DBF is broken (less probable, but not impossible)

As a suggestion, you can try opening the incriminated DBF using
OpenOffice Calc (spreadsheet).
This too is libiconv-based, and requires an explicit charset
selection (exactly as Spatialite does).
May be you can get a most useful diagnostic, and some immediate
'visual' feedback too.

Usefull hint: I'll assume your shapefiles does contain TEXT
in some European language. Pay close attention to 'special chars'
using diacritics (commonly used in Italian, French, German, Spanish,
Swedish ...) as: 'ò' 'à' 'ñ' 'Ç' 'ö' 'Ø'
They can help you guessing the 'right' charset to be used.

bye Sandro


Reply all
Reply to author
Forward
0 new messages