Corrupted xapian database

281 views
Skip to first unread message

Natalka Onyshchenko

unread,
Aug 21, 2020, 6:47:01 AM8/21/20
to Alaveteli Dev

Hello!

Today we started receiving multiple errors: Failed to open Xapian database /home/alaveteli/alaveteli_new/lib/acts_as_xapian/xapiandbs/production: DatabaseError: Error reading block 1160: got end of file

I ran xapian-check on the database folder and got this:

Database couldn't be opened for reading: DatabaseError: EOF reading block 1160
Continuing check anyway
record:
Failed to check B-tree: DatabaseError: EOF reading block 1160
termlist:
Failed to check B-tree: DatabaseCorruptError: Expected block 51819 to be level 2, not 0
postlist:
Failed to check B-tree: DatabaseCorruptError: Expected block 221 to be level 3, not 0
position:
Failed to check B-tree: DatabaseError: EOF reading block 161914
spelling:
Failed to check B-tree: DatabaseCorruptError: Expected block 710 to be level 3, not 0
synonym:
Lazily created, and not yet used.

Total errors found: 6

Do you have an advice how to fix this or investigate further?

Gareth Rees

unread,
Aug 26, 2020, 8:11:37 AM8/26/20
to Alaveteli Dev
Did this happen out of no where, or was it after upgrading to 0.38?

We did recently update our Xapian library bindings [1] and made a note in the changelog about needing to check the database format.

If it's not that, then I'm not sure with the current information. A quick search led me to a now-fixed Xapian bug [2] that might have some useful insight to figure out what's happened.

Best,

Gareth

Natalka Onyshchenko

unread,
Aug 27, 2020, 3:14:31 PM8/27/20
to Alaveteli Dev
No, I'm at 0.37 now. I've just restored the older index from the backup after all.

But the state of the indexing worries me. Lately update-xapian-index has been producing a lot of errors like this:
External Command: Error from command "pdftotext /tmp/foiextract20200827-12955-1cfsqbd - {:append_to=>"", :binary_output=>false, :timeout=>1200}":
Syntax Warning: May not be a PDF file (continuing anyway)
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't read xref table

Or like this:
Error processing zip file: #<Zip::Error: Zip end of central directory signature not found>
External Command: Error from command "/usr/bin/unzip -qq -c /tmp/foiextract20200827-12955-f5l6bb word/document.xml {:binary_output=>false}":
End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /tmp/foiextract20200827-12955-8hm2w or
        /tmp/foiextract20200827-12955-8hm2w.zip, and cannot find /tmp/foiextract20200827-12955-8hm2w.ZIP, period.

It looks like something might be wrong with emails but from these messages I can't even figure out which emails to look at.




--
You received this message because you are subscribed to a topic in the Google Groups "Alaveteli Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/alaveteli-dev/WTrErfDlY-U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to alaveteli-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/alaveteli-dev/4c1a2b59-ecb9-4a6d-a387-2aa41e83ecf7n%40googlegroups.com.

Gareth Rees

unread,
Sep 14, 2020, 11:58:04 AM9/14/20
to alavet...@googlegroups.com
Hey Natalka,

Good to hear you managed to restore.

Those errors aren’t super concerning. We see them when authorities send files in formats we can’t parse. It’s not ideal of course, but it just results in a few attachments that aren’t fully searchable. It is annoying that it’s hard to identify the specific attachments that couldn’t be parsed, but at the moment we’re unlikely to have much time to improve the logging around that.

Given you didn’t update Alaveteli before the corruption, it would be worth performing some hardware diagnosis – might be a disk or memory stick on it’s way out.

Best,

-- 
Gareth Rees


Reply all
Reply to author
Forward
0 new messages