Hi Jens!
Thanks for sharing your thoughts.
Jens W. Klein schreef op 01-04-15 om 14:08:
Getting a PDF file of a few MB that is now stored on NFS seems very
snappy at least.
> > 3. Iterate over the blob_chunk table, for each row download the blob as
> > above in idea number 2, then remove the row.
> >
> > This should leave you with a clean database, without unused blob_chunks
> > and large objects lying about.
>
> Your script (in the other posting) for (3) should be the fastest way to
> convert, right? I cant tell you if locking/committing and so on is ok,
> but since it uses the RelStorage API it should be fine. But I think its
> the most elegant way to get the blobs out of postgres.
Let me try it again and get some figures.
This is locally on my laptop, with an SSD disk, and the Postgres server
on that same laptop.
0. Fresh import of postgres dump, with blobs still in the database,
recently packed, about 26,000 transactions: 16 minutes.
1. Use zodbconvert to convert from non-shared blobcache to shared
blobstorage:
All 25961 transactions copied successfully in 27.8 minutes.
2. Use blobdownload script to download the blobs of all transactions,
and not change anything in the database:
4577 blobs successfully downloaded from all 25961 transactions in
3.8 minutes.
BTW, subsequently it should be fine to truncate the blob_chunk and
pg_largeobject tables, if you have nothing else in your database that
may be using large objects. In my case this would have been fine.
3. Use blobchunkdownload to download using only the blob_chunk table,
and cleanup this table and the large objects afterwards:
- Download phase: 1 minute, 8 seconds.
- Optional cleanup phase: an extra 5 minutes, 45 seconds.
- Total: 6.9 minutes
I compared the resulting blobstorage dirs of the three methods (diff -r)
and they were exactly the same.
Conclusions:
- zodbconvert is by far the slowest method.
- Downloading using the blob_chunk table (method 3) is by far the fastest.
- For the final cleanup, at least on Postgres, you may simply want to
truncate the blob_chunk and pg_largeobject tables, which should take
less than a second. Should be safe if you use this database (better:
this entire database server) only for this single RelStorage site.
> > A question to check my understanding of the RelStorage code and tables.
> > - With shared blob storage the blobs are stored in a directory.
> > - With non-shared the blobs are stored in the blob_chunk table, which
> > uses large objects.
> > - When the non-shared blobs are downloaded to a shared blob storage as
> > above, the blob_chunk table can be safely emptied.
> > Right? Removing a blob_chunk row does not cascade to a removal of an
>
> I would say it can be removed completly.
>
> > object_state row or something like that? No information is lost?
>
> As far as I understand it, the blob handling is decoupled in a way, that
> the object state refering to a blob does not need to know anything about
> the blob-storage implementation. So the storage implementation is
> responsible to deliver blobs by its id.
That is my understanding too.
> > For ideas one and two, it could help to have a script to afterwards
> > remove the blobs from the database, as done in idea three.
> > Maybe that is just a one-liner in SQL, maybe some Python. I have not
> > tried this yet.
> > But I think I have read that a TRUNCATE on the blob_chunk table would
> > not remove the large objects, at least in Postgres. Am I correct?
> >
> > Thoughts? Any existing scripts that I have missed that already do this?
>
> PostgreSQL provides a script to remove orphaned blobs:
>
http://www.postgresql.org/docs/9.4/static/vacuumlo.html
> No idea if it helps here, but looks promising.
I have not tried it, but it sound like a TRUNCATE on the blob_chunk
table in combination with running this script would do the trick too.
... Well, I tried it after all. :-) Truncate blob_chunk and then run
vacuumlo on your specific database, seems good to me. Does still take
almost 3 minutes in my case.
Note, for peace of mind (al least my mind), that vacuumlo only removes
large objects from one database:
sh-3.2$ time /opt/local/lib/postgresql93/bin/vacuumlo -n other_client_db
Connected to database "other_client_db"
Test run: no large objects will be removed!
Checking chunk in public.blob_chunk
Would remove 0 large objects from database "other_client_db".
real 0m0.040s
user 0m0.002s
sys 0m0.005s
Dry run with real db:
sh-3.2$ time /opt/local/lib/postgresql93/bin/vacuumlo -n
client_plone4_blob_migration
Connected to database "client_plone4_blob_migration"
Test run: no large objects will be removed!
Checking chunk in public.blob_chunk
Would remove 4577 large objects from database
"client_plone4_blob_migration".
real 0m0.040s
user 0m0.006s
sys 0m0.008s
Real run with real db:
sh-3.2$ time /opt/local/lib/postgresql93/bin/vacuumlo
client_plone4_blob_migration
real 2m50.050s
user 0m0.040s
sys 0m0.101s
Best,