Accidentally clicked POST. Editing here and Reposting.
As I am feeding data into my new camlistore, things are running quite
slowly. I must be doing something wrong. Might be the combination of
fsync+usb3+ext4. But perhaps someone has a quick tip to help me go
faster :)
I am still waiting to receive new hardware which will be my primary
camlistore (which will have ZFS RAID). Until then, I was hoping to
ingest data and create blobs that I could dump later. So, currently, I have this:
Data size: About 2 TB. Mostly old backups, of media files, code and documents.
Running camlistored and camput on: 64-bin Ubuntu laptop; intel i5-4200M @ 2.5GHz; 8GB RAM.
Reading Blobs From: USB3.0 interface to an ext4 backup disk.
Writing Blobs To: USB3.0 interface to a different disk. (Which will later be copied off to the new hardware).
Leveldb Cache: on my laptop's SSD.
(i) camput seemed to be moving much slower than disk speeds, even consireding the USB3 interface. I'm seeing maybe about 1MBps going into the blobstore. "strace" showed that fsync was a frequent syscall.
(ii) I tried this:
camput file -permanode somedir
<Took overnight for 50GB>
And then again for the same content
camput file somedir
<Taking a few hours again, and reporting duplicates for the same 50GB>
So I have two questions:
(i) Does anyone have experience with disabling fsync with blobpacked? I see two Flush() calls on zipwriter. Not sure if that will work though.
(ii) What else can I do to speed up camput? Perhaps -- run multiple camputs in parallel on different directories?
If blobs get created at ~1MBps, 2TB will take a looong time :-)