ImportSHP() concurrency issues - "IMMEDIATE" transactions?

69 views
Skip to first unread message

Hans W

unread,
Jun 14, 2016, 10:23:46 AM6/14/16
to SpatiaLite Users
Hello,

in my application I'm importing multiple shapefile at once. Using the Spatialite native ImportSHP() SQL function and different connections.
I'd like the database to handle the concurrency issues.

With a small shapefile everything is fine, but as soon as they get bigger (e.g. 200 MB) one transaction with a connection will lock the database and the other transaction (started around the same time) with a different connections will error out with "load shapefile error: <database is locked>". I guess the ImportSHP() SQL function uses the standard "DEFERED" transaction.
Is it possible to set Spatialite to use "IMMEDIATE" transactions? So that the "busy_timeout" Pragma can kick in or do I have to write a feature request?

I would appreciate any feedback.

a.fu...@lqt.it

unread,
Jun 14, 2016, 3:51:59 PM6/14/16
to spatiali...@googlegroups.com
On Tue, 14 Jun 2016 07:23:46 -0700 (PDT), Hans W wrote:
> Hello,
>
> in my application I'm importing multiple shapefile at once. Using the
> Spatialite native ImportSHP() SQL function and different connections.
> I'd like the database to handle the concurrency issues.
>
> With a small shapefile everything is fine, but as soon as they get
> bigger (e.g. 200 MB) one transaction with a connection will lock the
> database and the other transaction (started around the same time)
> with
> a different connections will error out with "load shapefile error: ".
> I guess the ImportSHP() SQL function uses the standard "DEFERED"
> transaction.
> Is it possible to set Spatialite to use "IMMEDIATE" transactions? So
> that the "busy_timeout" Pragma can kick in or do I have to write a
> feature request?
>
> I would appreciate any feedback.
>

Hi Hans,

SQLite certainly is a wonderful light-weight DBMS, but it has its
own intrinsic limitations dictated by its simple and unsophisticated
architecture.

The most obvious limitation is that SQLite is practically unable
to handle more that a single WRITE access at each time, because
the database-file as a whole is subject to an EXCLUSIVE LOCK
when a write operation has to be processed.
SQLite can nicely support many concurrent "reader" connections
on the same database-file, but just a single connection can be
a "writer".

[1] https://www.sqlite.org/lockingv3.html
[2] https://www.sqlite.org/faq.html#q5
[3] https://www.sqlite.org/lang_transaction.html

Declaring an IMMEDIATE transaction will simply anticipate the
actual acquisition of the exclusive lock, that will continue
to be active until a corresponding COMMIT or ROLLBACK Will
be issued.
So the net overall effect will simply be that just a single
connection at each time will be able to import a shapefile,
while all other contending connections will wait; a clear
example of "false" parallelism degenerating into a flat
serialization..

Re-designing the whole application so to have just a single
thread importing all shapefiles one after the other in a serial
fashion (e.g. by handling a single queue of incoming requests)
seems to be a simpler and more efficient approach.

bye Sandro

Hans W

unread,
Jun 20, 2016, 5:33:38 AM6/20/16
to SpatiaLite Users
Am Dienstag, 14. Juni 2016 21:51:59 UTC+2 schrieb sandro furieri:
Re-designing the whole application so to have just a single
thread importing all shapefiles one after the other in a serial
fashion (e.g. by handling a single queue of incoming requests)
seems to be a simpler and more efficient approach.

bye Sandro

Thanks for your detailed answer! That helped a lot to determine the actual issues. And I understand your your suggestion.

My only concern is, that SQLite is a engine already made to handle this kind of concurrency issues. ("DEFERRED" transactions) The "ImportSHP()" SQL function is not the only problematic one. I also noticed concurrency issues with the "ST_Cutter()" SQL function. (There are probably more) From the start of my project and with the knowledge from the SQLite documentation I was under the impression that Spatialite can make use of this SQLite feature. (That might be my mistake.)

As this is not the case I would humbly (and potentially uninformed) suggest a global option to switch all Spatialite internal transactions to deferred transactions. In the case this might be a "stupid" or undesired feature suggestion it might help users, if this limitation (behaviour) of Spatialite could be documented somewhere in a appropriate place in the official documentation.

Greetings

HansGIS

unread,
Jun 20, 2016, 7:43:09 AM6/20/16
to SpatiaLite Users
sorry I just realized, that I used "deferred" instead of "immediate" in the above answer. Please imagine in all the places "immediate" instead of "deferred". :-)

Goyo

unread,
Jun 20, 2016, 7:58:03 AM6/20/16
to SpatiaLite Users
El lunes, 20 de junio de 2016, 11:33:38 (UTC+2), Hans W escribió:


Thanks for your detailed answer! That helped a lot to determine the actual issues. And I understand your your suggestion.

My only concern is, that SQLite is a engine already made to handle this kind of concurrency issues. ("DEFERRED" transactions) The "ImportSHP()" SQL function is not the only problematic one. I also noticed concurrency issues with the "ST_Cutter()" SQL function. (There are probably more) From the start of my project and with the knowledge from the SQLite documentation I was under the impression that Spatialite can make use of this SQLite feature. (That might be my mistake.)

As this is not the case I would humbly (and potentially uninformed) suggest a global option to switch all Spatialite internal transactions to deferred transactions. In the case this might be a "stupid" or undesired feature suggestion it might help users, if this limitation (behaviour) of Spatialite could be documented somewhere in a appropriate place in the official documentation.



Your issues, if IUC, have nothing to do with any Spatialite limitation. You just can not execute several write operations at once, no matter what. If you try to do that, only one operation will make its way and the other ones will have to wait until it finishes --and they will eventually timeout and return an error if they have to wait too much.

Regarding the transaction type, SQLite transactions are DEFERRED by default and it already provides syntax for other transaction semantics. I do not think it is any of Spatialite's bussiness changing it in any way.

Regards

Goyo

HansGIS

unread,
Jun 20, 2016, 11:25:32 AM6/20/16
to SpatiaLite Users
Am Montag, 20. Juni 2016 13:58:03 UTC+2 schrieb Goyo:
Your issues, if IUC, have nothing to do with any Spatialite limitation. You just can not execute several write operations at once, no matter what. If you try to do that, only one operation will make its way and the other ones will have to wait until it finishes --and they will eventually timeout and return an error if they have to wait too much.

Regarding the transaction type, SQLite transactions are DEFERRED by default and it already provides syntax for other transaction semantics. I do not think it is any of Spatialite's bussiness changing it in any way.

Thanks for your further reply. (If you read my comment to my reply I stated that I meant IMMEDIATE transaction.) I think we are talking past each other.

I understood the serialization issue. At the moment I use a workaround in my application to not run into this Spatialite issue. But this is tedius and should be handled by the db engine (even SQLite).

Pure SQLite is able to serialize multiple concurrent write requests by its own (using a IMMEDIATE transaction). SQLite will check in after a certain time period and will try to write again if the database is locked. It will do so as long as the Pragma setting "busy_timeout" specifies.
As I understand the difference between the transaction modes:
  • DEFERRED transaction: SQLite will not try to write again if db is locked for writing. Will error out in case of lock.
  • IMMEDIATE transaction: SQLite will try to write again (after certain amount of time) if db is locked. AS long as the Pragma setting "busy_timeout" specifies. Then it will error out if db is still locked.
And I was asking if there could be a mode where Spatialite will use IMMEDIATE transactions. So that the user can specify if and for how long a Spatialite transaction should try to write into the database.
The Java XERIAL JDBC can change the standard transaction of SQLite to IMMEDIATE. A similar switch for Spatialite would be nice.

Hope I didn't state anything wrong. Please correct me if this is the case.

Greetings

Jukka Rahkonen

unread,
Jun 20, 2016, 2:48:33 PM6/20/16
to spatiali...@googlegroups.com, HansGIS
Hi,

By reading http://www.sqlite.org/lang_transaction.html I would try
something like

BEGIN IMMEDIATE TRANSACTION;
ImportSHP();
Commit;

I supppose that your theory about different busy_timeout behaviour is
not right. The difference is that in case or IMMEDIATE you must retry
the BEGIN but with DEFERRED you must retry COMMIT after the database is
locked message that comes after the timeout. I may be wrong, though.

-Jukka Rahkonen-



HansGIS kirjoitti 2016-06-20 18:25:
> Am Montag, 20. Juni 2016 13:58:03 UTC+2 schrieb Goyo:
>
>> Your issues, if IUC, have nothing to do with any Spatialite
>> limitation. You just can not execute several write operations at
>> once, no matter what. If you try to do that, only one operation will
>> make its way and the other ones will have to wait until it finishes
>> --and they will eventually timeout and return an error if they have
>> to wait too much.
>>
>> Regarding the transaction type, SQLite transactions are DEFERRED by
>> default and it already provides syntax for other transaction
>> semantics. I do not think it is any of Spatialite's bussiness
>> changing it in any way.
>
> Thanks for your further reply. (If you read my comment to my reply I
> stated that I meant IMMEDIATE transaction.) I think we are talking
> past each other.
>
> I understood the serialization issue. At the moment I use a workaround
> in my application to not run into this Spatialite issue. But this is
> tedius and should be handled by the db engine (even SQLite).
>
> Pure SQLite is able to serialize multiple concurrent write requests by
> its own (using a IMMEDIATE transaction). SQLite will check in after a
> certain time period and will try to write again if the database is
> locked. It will do so as long as the Pragma setting "busy_timeout"
> specifies.
> As I understand the difference between the transaction modes:
>
> * DEFERRED transaction: SQLite will not try to write again if db is
> locked for writing. Will error out in case of lock.
> * IMMEDIATE transaction: SQLite will try to write again (after
> certain amount of time) if db is locked. AS long as the Pragma setting
> "busy_timeout" specifies. Then it will error out if db is still
> locked.
>
> And I was asking if there could be a mode where Spatialite will use
> IMMEDIATE transactions. So that the user can specify if and for how
> long a Spatialite transaction should try to write into the database.
> The Java XERIAL JDBC can change the standard transaction of SQLite to
> IMMEDIATE. A similar switch for Spatialite would be nice.
>
> Hope I didn't state anything wrong. Please correct me if this is the
> case.
>
> Greetings
>
> --
> You received this message because you are subscribed to the Google
> Groups "SpatiaLite Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to spatialite-use...@googlegroups.com.
> To post to this group, send email to
> spatiali...@googlegroups.com.
> Visit this group at https://groups.google.com/group/spatialite-users.
> For more options, visit https://groups.google.com/d/optout.

a.fu...@lqt.it

unread,
Jun 21, 2016, 11:24:34 AM6/21/16
to spatiali...@googlegroups.com
I was rather intrigued by this discussion, so I've duly
set up a quick testbed intended to practically verify
how SQLite handles all this stuff.

my test tool is a simple C program launching 9 threads
in parallel/concurrent execution.
all threads open an individual SQLite connection targeting
the same DB-file.
each thread creates a table of its own then inserting several
hundredth thousand rows into it; all SQL operations are
confined within a transaction.
here are my findings:

test #1:
using BEGIN TRANSACTION
and PRAGMA busy_timeout=1000 (1 sec)
------------------
THREAD-1 START
THREAD-2 START
THREAD-5 START
THREAD-7 START
THREAD-6 START
THREAD-4 START
THREAD-8 START
THREAD-3 START
THREAD-9 START
"CREATE TABLE table_7 (id INTEGER NOT NULL PRIMARY KEY, \
name TEXT NOT NULL, value DOUBLE NOT NULL)" error: database is
locked
"CREATE TABLE table_9 (id INTEGER NOT NULL PRIMARY KEY, \
name TEXT NOT NULL, value DOUBLE NOT NULL)" error: database is locked
"CREATE TABLE table_6 (id INTEGER NOT NULL PRIMARY KEY, \
name TEXT NOT NULL, value DOUBLE NOT NULL)" error: database is
locked
"CREATE TABLE table_3 (id INTEGER NOT NULL PRIMARY KEY, \
name TEXT NOT NULL, value DOUBLE NOT NULL)" error: database is locked
"CREATE TABLE table_2 (id INTEGER NOT NULL PRIMARY KEY, \
name TEXT NOT NULL, value DOUBLE NOT NULL)" error: database is locked
"CREATE TABLE table_4 (id INTEGER NOT NULL PRIMARY KEY, \
name TEXT NOT NULL, value DOUBLE NOT NULL)" error: database is locked
"CREATE TABLE table_5 (id INTEGER NOT NULL PRIMARY KEY, \
name TEXT NOT NULL, value DOUBLE NOT NULL)" error: database is locked
"CREATE TABLE table_8 (id INTEGER NOT NULL PRIMARY KEY, \
name TEXT NOT NULL, value DOUBLE NOT NULL)" error: database is locked
THREAD-1 STOP

only the first thread successfully completed its task,
all other threads failed due to pending locks


test #2:
using BEGIN IMMEDIATE TRANSACTION
and PRAGMA busy_timeout=1000 (1 sec)
------------------
THREAD-4 START
thread-8 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
thread-1 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
thread-9 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
thread-5 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
thread-6 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
thread-2 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
thread-3 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
thread-7 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
THREAD-4 STOP

more or less the same as above; just a single thread is able
to successfully complete its task.


test #3:
using BEGIN IMMEDIATE TRANSACTION
and PRAGMA busy_timeout=10000 (10 secs)
------------------
THREAD-1 START
THREAD-1 STOP
THREAD-7 START
THREAD-7 STOP
THREAD-9 START
THREAD-9 STOP
THREAD-4 START
thread-6 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
thread-8 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
thread-3 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
thread-5 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
thread-2 "BEGIN IMMEDIATE TRANSACTION" error: database is locked
THREAD-4 STOP

after setting a longer timeout there is an obvious improvement;
now several threads are correctly working, but not all.


test #4:
using BEGIN IMMEDIATE TRANSACTION
and PRAGMA busy_timeout=60000 (1 min)
------------------
THREAD-1 START
THREAD-1 STOP
THREAD-5 START
THREAD-5 STOP
THREAD-9 START
THREAD-9 STOP
THREAD-7 START
THREAD-7 STOP
THREAD-4 START
THREAD-4 STOP
THREAD-3 START
THREAD-3 STOP
THREAD-6 START
THREAD-6 STOP
THREAD-8 START
THREAD-8 STOP
THREAD-2 START
THREAD-2 STOP

this time we finally have a full success; after declaring
a very long timeout SQLite can effectively serialize
many parallel write operations.
please note: there is no benefit in all this, because
the total execution time is obviously "serial", and any
expected improvement deriving from implementing a more
sophisticated parallel approach is completely frustrated.


test #5:
using BEGIN TRANSACTION
and PRAGMA busy_timeout=60000 (1 min)
------------------
THREAD-2 START
THREAD-3 START
THREAD-8 START
THREAD-4 START
THREAD-5 START
THREAD-6 START
THREAD-7 START
THREAD-9 START
THREAD-1 START
THREAD-2 STOP
THREAD-6 STOP
THREAD-7 STOP
THREAD-4 STOP
THREAD-8 STOP
THREAD-1 STOP
THREAD-9 STOP
THREAD-5 STOP
THREAD-3 STOP

in this final test we are using an ordinary (deferred)
transaction, anyway we are declaring a very long timeout
as in the previous case.
in this test too we have a full success, because SQLite
now isn't any longer serializing individual transactions
but will serialize instead individual statements, which
at the end of the day have the same practical effects.

FINAL CONCLUSION:
-------------------
we've confirmed once again what we already know from the
beginning, i.e. that SQLite is completely unable to
handle many concurrent write operations on behalf of
the same DB-file.
by declaring an appropriate (long) timeout users are
anyway able to correctly serialize many concurrent
write operations, but this is a generic feature
supported by SQLite not necessarily requiring to use
the rather bizarre BEGIN IMMEDIATE TRANSACTION syntax.

I personally still continue to not appreciate the baroque
idea of implementing a fake parallel design actually
defaulting to a clumsy serial implementation thanks to
same "dirty tricky" workarounds; designing from scratch a
straightforward serial design seems to be simpler
and more robust choice.

anyway, if someone really needs to simulate someway
a genuine "parallel" DBMS on SQLite what is basically
required is just to set a reasonably long timeout
interval by explicitly calling a "PRAGMA busy_timeout"
directive, and that's all.

bye Sandro

mj10777

unread,
Jun 22, 2016, 12:59:26 AM6/22/16
to SpatiaLite Users
A further note on this matter:
It is often useful, in understanding such matters, to know the reason why certain thing can only be done in a specific way. 

In this scenario it is:
- 'there can only be one' (to misquote the Highlander)
-- writing to a file at the same time

This is, to my knowledge, a common feature on all operating systems. 

It is true, that there are ways to 'simulate' that this is not so
- but in the end, it always come down to the basic reality
- 'there can only be one'
-- writing to a file at the same time

----
In multi-user Database systems
- each Database-connection being one user
such as mysql, postgres, mssql, oracal, IMS (a MVS-mainframe database system), 
you will have a program (server) 'lurking' in the background
- listening for 'requests' (mostly on a specific 'port')

This program receives a request and depending on what type of request it is (reading/writing being two)
- deal with it in some form

In the case of 'writing' it is often:
- depending on how the Database is structured internally
1) mysql and postgres (and possibly mssql, oracal) use a DATABASE concept
- each DATABASE is a directory
- each TABLE is a file (possibly spitted into more that one, depending on a size limit)
2) IMS also uses a DATABASE logic
- but there are no TABLES as such
- each DATABASE being one file (at least)
- MVS (when I worked with it) had no concept of directories

So when a 'writing' request has been received
- this request is 'QUEUED' 
-- 'line up in the queue number X and wait your turn'
--- often a separate 'QUEUE' for each file (TABLE/DATABASE depending on the system)

Since 'there can only be one', the request will wait until the other requests have been completed
- when your request can write away, possible overwriting any changes the previous requests have just completed
-- and of course the next request, may overwrite your changes
(this assume that there is no LOCKING being done. i.e. prevent changes of others until your changes have been completed)

When completed, the server-program will inform the client, who sent the request, the result.
This may also be a 'TIMEOUT' result
- the requests before you taking so long, that the defined 'TIMEOUT' setting for a result has been exceeded
-- like in a grocery store, where surprised is shown, when the next in line learn that they must pay something and the digging in the great unknown starts
This may also be a 'TOO MANY CONNECTIONS' result
- the server will only handle a defined amount of request at one time

So even if it looks like everything is being serviced simultaneously, the reality is that it is
- sequential  / parallel

---
In single-user Database systems
- each Database-connection (again) being one user
such as sqlite, sqlite3, dbase, access

In this scenario, there is nobody 'listening' to requests
- basically a library is being called to service this connection
-- not knowing (and often not caring) what other connections are active

The DATASET are one file, the TABLES being inside this file
- where the 'there can only be one' restriction will ALWAYS apply

In the above sample, Sandro has shown (very nicely) 
- what efforts the the sqlite3 developers have made to avoid this problem, where practicable
-- but it only a simulation 

---
For a Database designer, this is one (of many) crucial aspects that must understood correctly
- a central TABLE where every 'Tom, Dick and Harry' (adding  a 'Mary' may make this statement politically correct) must write to
-- is a major cause for bottlenecks in a designed system/application

The Database system being used may be well designed
- but the use of it, may not be
-- this is an aspect often not taken into consideration

---
In the end, independent of single/mult-user modus, one must expect to receive the result:
- 'there can only be one' and it is not you

Hope this helps

Mark

bye Sandro

HansGIS

unread,
Jun 22, 2016, 12:30:18 PM6/22/16
to SpatiaLite Users
Thanks for all the testing and taking my request serious! :-)

SQLite still does error out when trying to write from two different connections to one file. Even with an IMMEDIATE transaction and the busy_timeout set to a high value.
It might be connected to the xerial / sqlite-jdbc driver I'm using (or to the Spatialite functions in the SQL statements). Anyway, thanks for all the input!!!
Reply all
Reply to author
Forward
0 new messages