Re: (slow camput) I must be doing something wrong

71 views
Skip to first unread message
Message has been deleted

Alok Parlikar

unread,
May 31, 2016, 1:52:44 AM5/31/16
to Camlistore
Accidentally clicked POST. Editing here and Reposting.

As I am feeding data into my new camlistore, things are running quite slowly. I must be doing something wrong. Might be the combination of fsync+usb3+ext4. But perhaps someone has a quick tip to help me go faster :)

I am still waiting to receive new hardware which will be my primary camlistore (which will have ZFS RAID). Until then, I was hoping to ingest data and create blobs that I could dump later. So, currently, I have this:

Data size: About 2 TB. Mostly old backups, of media files, code and documents.

Running camlistored and camput on: 64-bin Ubuntu laptop; intel i5-4200M @ 2.5GHz; 8GB RAM.

Reading Blobs From: USB3.0 interface to an ext4 backup disk.
Writing Blobs To: USB3.0 interface to a different disk. (Which will later be copied off to the new hardware).
Leveldb Cache: on my laptop's SSD.

(i) camput seemed to be moving much slower than disk speeds, even consireding the USB3 interface. I'm seeing maybe about 1MBps going into the blobstore. "strace" showed that fsync was a frequent syscall.

(ii) I tried this:

camput file -permanode somedir
<Took overnight for 50GB>

And then again for the same content
camput file somedir
<Taking a few hours again, and reporting duplicates for the same 50GB>


So I have two questions:

(i) Does anyone have experience with disabling fsync with blobpacked? I see two Flush() calls on zipwriter. Not sure if that will work though.
(ii) What else can I do to speed up camput? Perhaps -- run multiple camputs in parallel on different directories?

If blobs get created at ~1MBps, 2TB will take a looong time :-)





Tamás Gulácsi

unread,
May 31, 2016, 9:00:30 AM5/31/16
to Camlistore

You could comment out Sync() calls, or allow if say 10s has been since the previous one...

Mathieu Lonjaret

unread,
May 31, 2016, 9:50:06 AM5/31/16
to camli...@googlegroups.com
I'm sure there are some bottlenecks that could be eliminated in
camput, it needs some cleanup/refactoring anyway. However, camput does
a lot of different things anyway, which explains some of its
complexity.

I'm wondering, now that we have a pretty good grip on writing
importers, maybe we could write a "files importer" that would be more
streamlined and more efficient than camput for large filesets. It
could even watch a directory and push files to camlistore as they get
added to the directory.
Brad, WDYT?
> --
> You received this message because you are subscribed to the Google Groups
> "Camlistore" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to camlistore+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Alok Parlikar

unread,
May 31, 2016, 10:27:20 AM5/31/16
to camli...@googlegroups.com
Thanks, Tamas. I converted ext4 to btrfs (mostly a an blind experiment), that sped up blob-writing by 2x or so. I also disabled Sync() in localdisk store, now things are moving much faster. But I'll look around in the code to see if there's more I can speed up.



--
You received this message because you are subscribed to a topic in the Google Groups "Camlistore" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/camlistore/NuVtK5qY0HU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to camlistore+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages