how well does lipsync scale

13 views
Skip to first unread message

jdlong

unread,
Mar 31, 2011, 3:22:57 PM3/31/11
to project-lipsync
I've been testing multiple syncing tools for my analytics workflow. I
keep an EC2 instance running at amazon and I like to keep files synced
between that machine and my laptop and home computers. Lipsync looks
like a really interesting project and potentially great solution to my
sync issues.

Are any of you using it to keep gigs of files with thousands of
directories synced? Does lipsync get slower with more files? Obviously
the first sync with many files takes a long time. I'm referring to
"maintenance syncs" as things change. Do maintenance syncs of a given
size (ie 10 files of 1 mb each) get slower on a system with many files
vs. a system with only a few files?

Thanks for this neat tool, Phil. I'm looking forward to testing it.

-J


tdrusk

unread,
Apr 3, 2011, 5:07:37 PM4/3/11
to project-lipsync
Please take my information with a grain of salt. I am new to this
project, but hope to get more involved.

Lipsync uses inotify. Read more about it here(https://
secure.wikimedia.org/wikipedia/en/wiki/Inotify) If I am correct, only
changed files will be reported, which will prevent rsync from having
to scan every file in every directory. So this should be fast after
the first main sync.


// This is some other stuff I wrote before finding about inotify. If
you experience a slowdown or low memory read on...

The problem in question can be a lot of things. The file system and
rsync are two software related issues.

The file system you choose on your servers will make a difference on
how fast rsync can trickle through the directories.(https://
secure.wikimedia.org/wikipedia/en/wiki/Comparison_of_file_systems)

More relevant to this project is rsync. Lipsync uses rsync to transfer
files. After some googling I found this here. (http://rsync.samba.org/
FAQ.html)
"
memory usage
Rsync versions before 3.0.0 always build the entire list of files to
be transferred at the beginning and hold it in memory for the entire
run. Rsync needs about 100 bytes to store all the relevant information
for one file, so (for example) a run with 800,000 files would consume
about 80M of memory. -H and --delete increase the memory usage
further.

Version 3.0.0 slightly reduced the memory used per file by not storing
fields not needed for a particular file. It also introduced an
incremental recursion mode that builds the file list in chunks and
holds each chunk in memory only as long as it is needed. This mode
dramatically reduces memory usage, but it only works provided that
both sides are 3.0.0 or newer and certain options that rsync currently
can't handle in this mode are not being used.

out of memory
The usual reason for "out of memory" when running rsync is that you
are transferring a _very_ large number of files. The size of the files
doesn't matter, only the total number of files. If memory is a
problem, first try to use the incremental recursion mode: upgrade both
sides to rsync 3.0.0 or newer and avoid options that disable
incremental recursion (e.g., use --delete-delay instead of --delete-
after). If this is not possible, you can break the rsync run into
smaller chunks operating on individual subdirectories using --relative
and/or exclude rules.
"

If you are running out of memory you could try using --relative in
lipsync /bin.

James Long

unread,
Apr 3, 2011, 5:30:57 PM4/3/11
to project...@googlegroups.com, project-lipsync
Thank you! That's a great intro and far more than I already knew. This helps me put lipsync in context.

-J


Sent from my iPhone.

Phil Cryer

unread,
Apr 3, 2011, 8:37:43 PM4/3/11
to project...@googlegroups.com, tdrusk
Whoa, awesome research, thanks! And yeah, we haven't looked at any
large scale projects with lipsync, but one that I'm working on now
deals with over 50 TB that we want to sync with other remote nodes, so
this will be addressed soon.

P

--
http://philcryer.com

Reply all
Reply to author
Forward
0 new messages