database disk image is malformed

55 views
Skip to first unread message

andrea antonello

unread,
Feb 6, 2016, 1:44:11 AM2/6/16
to spatiali...@googlegroups.com
I guess this is more a sqlite related question, but since the only solution seem to be dump to sql and re-import, I wanted to know experiences with the spatial part.

Long story short:
It is a week I am importing sql statement files (57 giga of files wrapped in a BEGIN; -> COMMIT;). Many are about 300 megs. I am now at about 110 giga of database (without indexes, which I guess it is better to create only after the inserts). Now the pc crashed (linux, wooo :-) ) and the .read command was interrupted.

When I restarted everything, and wanted to start inserting data again, I got a nice:  "database disk image is malformed" error. 

I browsed half the internet for a simple repair solution, but the only way to solve this seems to be:

sqlite> .mode insert
sqlite> .output dump_all.sql
sqlite> .dump

and then on a new database:

sqlite> .read dump_all.sql

Now I have to questions for the Gods of spatialite :-)
- is this really the only way to recover it? (right now I am running a: pragma integrity_check , but it seems it will take forever)
- if this is the only way, will it work for the spatialite wrapper the same way as for the dqlite?
- and last but not least... what is the most efficient way to insert data in batch mode?

Thanks in advance for any insight you might give me.
Andrea


mj10777

unread,
Feb 7, 2016, 1:33:21 AM2/7/16
to SpatiaLite Users


On Saturday, 6 February 2016 07:44:11 UTC+1, moovida wrote:
I guess this is more a sqlite related question, but since the only solution seem to be dump to sql and re-import, I wanted to know experiences with the spatial part.

Long story short:
It is a week I am importing sql statement files (57 giga of files wrapped in a BEGIN; -> COMMIT;). Many are about 300 megs. I am now at about 110 giga of database (without indexes, which I guess it is better to create only after the inserts). Now the pc crashed (linux, wooo :-) ) and the .read command was interrupted.
Does this mean you are working with linux?
Do you have a '.journal' file? 

When I restarted everything, and wanted to start inserting data again, I got a nice:  "database disk image is malformed" error. 

I browsed half the internet for a simple repair solution, but the only way to solve this seems to be:

sqlite> .mode insert
sqlite> .output dump_all.sql
sqlite> .dump

and then on a new database:

sqlite> .read dump_all.sql

Now I have to questions for the Gods of spatialite :-)
- is this really the only way to recover it? (right now I am running a: pragma integrity_check , but it seems it will take forever)
- if this is the only way, will it work for the spatialite wrapper the same way as for the dqlite?
Q1: yes, this seems to be the correct way (with the 'BEGIN' and 'COMMIT' commands)
Q2: in this scenario (without geometries), they are the same. With geometries, a 'RecoverGeometryColumn' must be done afterwords.
- and last but not least... what is the most efficient way to insert data in batch mode?
Not being an expert on indexes, I would say that building the indexes later is not a problem
- this could be a task that can be done together with 'RecoverGeometryColumn'
-- after the basic import has been completed
What should not be done, is to deactivate constraints
- I often use the following horror combinations of:
   PRIMARY KEY (id_admin,admin_level,name,valid_since,valid_until,changed_type)
-- here the constraints must be enforced while importing to insure that the data in sane

---
So it would seem to me you are doing this correctly
- so the question remains about the '.journal' file

In the past I have received an 'database disk image is malformed' when
- on a long task, become curious and impatient, and copy the file elsewhere to have sneak 'preview'
-- without coping the '.journal' file, this will, in most cases, fail
--> sometimes it will work, but often not

The '.journal' is a ROLLBACK file
- so if something goes wrong while INSERTing or UPDATEing
-- sqlite3 will use this file to remove any data inserted into the database before a COMMIT has been completed
--> the crash may have occurred during the writing into the database (i.e. before the COMMIT was completed
---> thus causing the 'malformed' error

So if a sqlite3 connection is started
- this being done with any 'spatialite' connection
-- sqlite3 will look for any '.journal' and try to repair the database
--> based on the information contained in the '.journal'

So in such a scenario
- where a '.journal' file exists
-- you may notice that after opening the database with spatialite_gui
--> the '.journal' file will disappear, after the offending data has been removed from the database (ROLLBACK)

So executing the following sql-script
- which lists a cleaned up version of the table/views (without most of the internal administration tables/views)

-- sqlite3 20160117.berlin_street_geometries.db < list_tables.sql
SELECT
 name
FROM sqlite_master
WHERE
(
 (name NOT LIKE '%_f_%') AND
 (name NOT LIKE '%ref_sys%') AND
 (name NOT LIKE '%_history') AND
 (name NOT LIKE '%_log') AND
 (name NOT LIKE '%_insert') AND
 (name NOT LIKE '%_update') AND
 (name NOT LIKE '%_columns%') AND
 (name NOT LIKE '%_view_%') AND
 (name NOT LIKE '%_layers%') AND
 (name NOT LIKE '%_temp%') AND
 (name NOT LIKE '%_ins_%') AND
 (name NOT LIKE '%_upd_%') AND
 (name NOT LIKE 'sqlite_%') AND
 (name NOT LIKE 'vw_%') AND
 (name NOT LIKE 'gg%') AND
 (name NOT LIKE 'gi%') AND
 (name NOT LIKE 'tm%') AND
 (name NOT LIKE 'SpatialIndex') AND
 (name NOT LIKE 'ElementaryGeometries') AND
 (name NOT LIKE 'KNN') AND
 (name NOT LIKE 'idx_%')
);

This may be enough to force any needed ROLLBACK
- this is the last possibility I can think of to resolve such a situation
-- and, of course, will only work within a TRANSACTION (i.e. BEGIN / COMMIT)
--> which you are doing

--
The only other way to (possibly) speed up the bulk INSERTing of data in batch mode
- is to group your INSERT commands

-------
-- 1811042191 - Pariser Platz [1811042191,Pariser Platz]
-------
INSERT INTO berlin_admin
 (id_admin, name, notes, admin_level, valid_since, valid_until, changed_type, id_belongs_to, belongs_to)
 VALUES
 (1811042191,'Viereck','',11,'1732-01-01','1733-12-31',1,1811042191,'Pariser Platz'),
 (1811042191,'Quarree','',11,'1734-01-01','1814-09-14',1,1811042191,'Pariser Platz'),
 (1811042191,'Pariser Platz','Bekanntmachung vom 1814-09-16',11,'1814-09-15','3000-01-01',1,1811042191,'Pariser Platz'),
 (1811042191,'Pariser Platz','',11,'1732-01-01','3000-01-01',2,1811042191,'Pariser Platz'),
 (1811042191,'Pariser Platz','',11,'1732-01-01','3000-01-01',3,1811042191,'Pariser Platz'),
 (1811042191,'Pariser Platz','1-6',11,'1814-09-15','3000-01-01',4,1811042191,'Pariser Platz');

This was created with to following sql
- which is used in a php-script to dump out the database
-- each admin_level (11=street) in it own file
--> change-type=1 means the name of the (here street) has changed; 4=the houses have changed (street has been extend or shorten)
---> so a new geometry must be created for each changed situation

So each record must be unique
- thus the 'PRIMARY KEY (id_admin,admin_level,valid_since,valid_until,changed_type,id_belongs_to)'

The record is dumped out with:

SELECT
'-------'||char(10)||'-- '||id_admin||' - '||name||' ['||id_belongs_to||','||belongs_to||']'||char(10)||'-------'||char(10)||
'INSERT INTO berlin_admin'||char(10)||
' (id_admin, name, notes, admin_level, valid_since, valid_until, changed_type, id_belongs_to, belongs_to)
 VALUES'||
 group_concat((
 SELECT
  char(10)||' ('||id_admin||','''||name||''','''||notes||''','||admin_level||','''||valid_since||''','''||valid_until||''','||changed_type||','||id_belongs_to||','''||belongs_to||''')'
))||';' AS sql_insert
FROM 
(
 SELECT * FROM berlin_admin
 WHERE ((id_admin IN (1811042191)) AND (admin_level IN (11)))
 ORDER BY changed_type,valid_since
);


In this way, when manually editing the data
- it is easier to insure the uniqueness
-- but also speeds up the creation of the database with a bulk INSERTing of the data

Hope this helps

Mark

andrea antonello

unread,
Feb 8, 2016, 5:30:40 AM2/8/16
to spatiali...@googlegroups.com
Hi Mark,
thanks a lot for your thoroughly answer, much appreciated.

I will do some tests.

All the best,
Andrea



--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To post to this group, send email to spatiali...@googlegroups.com.
Visit this group at https://groups.google.com/group/spatialite-users.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages