Using lsyncd for inotify-based bup index

Showing 1-18 of 18 messages
Using lsyncd for inotify-based bup index Simon Sapin 12/31/11 3:12 AM
Hi,

There has been discussion of using inotify with bup. As I understand it,
it is to avoid repeatedly scanning whole directory trees when running
bup index. bup save or other operations would be unchanged.

I haven�t found it already mentioned in the mailing list archive, but I
think this can be achieved with lsyncd:
http://code.google.com/p/lsyncd/

lsyncd watches a directory tree with inotify, accumulates events for
some time and then runs some commands. In the default configuration this
is rsync, but it could be bup index instead.

I haven�t tried it (sorry), but I think this can work with no (or
little) code changes to either projects, only configuration. After an
initial full index, lsyncd runs bup index with only the filenames that
have changes so each run is very fast. Once this is in place, bup save
can be run as often as desired.

Regards,
--
Simon Sapin

Re: Using lsyncd for inotify-based bup index apenwarr 12/31/11 4:15 PM
On Sat, Dec 31, 2011 at 6:12 AM, Simon Sapin <simon...@exyr.org> wrote:
> There has been discussion of using inotify with bup. As I understand it, it
> is to avoid repeatedly scanning whole directory trees when running bup
> index. bup save or other operations would be unchanged.

Yup.

> I haven’t found it already mentioned in the mailing list archive, but I


> think this can be achieved with lsyncd:
> http://code.google.com/p/lsyncd/
>
> lsyncd watches a directory tree with inotify, accumulates events for some
> time and then runs some commands. In the default configuration this is
> rsync, but it could be bup index instead.
>
> I haven’t tried it (sorry), but I think this can work with no (or little)

> code changes to either projects, only configuration. After an initial full
> index, lsyncd runs bup index with only the filenames that have changes so
> each run is very fast. Once this is in place, bup save can be run as often
> as desired.

Generally this is the right idea, but I'm very disappointed that they
designed it that way.  Instead of having a one-shot queue-and-fire
design, which will explode as soon as anything goes wrong with
processing one event, it would be *so* much cleaner to do it the way
Apple did it with their file notification service:
http://arstechnica.com/apple/reviews/2007/10/mac-os-x-10-5.ars/7

I'm sort of hoping someone will finally give in and implement the same
idea as the FSEvents daemon in MacOS.  That design is free of holes
and, as long as the daemon is written correctly, will work properly
across reboots etc.

That said, we just need to add a simple thingy to 'bup index' to have
it take a list of filenames to update from stdin.  Then we can plug in
any inotify-type daemon we want.  (And on MacOS, we can just write a
trivial program to query FSEvents.)

Have fun,

Avery

Re: Using lsyncd for inotify-based bup index Zoran Zaric 1/1/12 5:43 AM
On Sat, Dec 31, 2011 at 07:15:24PM -0500, Avery Pennarun wrote:
> That said, we just need to add a simple thingy to 'bup index' to have
> it take a list of filenames to update from stdin.  Then we can plug in
> any inotify-type daemon we want.  (And on MacOS, we can just write a
> trivial program to query FSEvents.)

Does this look correct to you?

https://github.com/zoranzaric/bup/commit/dfe074250c1a9e4fd1b8b06bfdedf5972df161c2

Zoran

Re: Using lsyncd for inotify-based bup index Axel Kittenberger 1/1/12 10:30 AM
> Generally this is the right idea, but I'm very disappointed that they
> designed it that way.  Instead of having a one-shot queue-and-fire
> design, which will explode as soon as anything goes wrong with
> processing one event, it would be *so* much cleaner to do it the way
> Apple did it with their file notification service:http://arstechnica.com/apple/reviews/2007/10/mac-os-x-10-5.ars/7
>
> I'm sort of hoping someone will finally give in and implement the same
> idea as the FSEvents daemon in MacOS.  That design is free of holes
> and, as long as the daemon is written correctly, will work properly
> across reboots etc.

I'm not sure what you intent, but Lsyncd will not "explode" if one
event goes wrong. It differs by exit code between temporary (like
network fail) and permanent error (misconfiguration) it will keep
recalling the transfer agent on temporary fails until it works again,
and will terminate on permanent errors. Thats usually what one wants.

I found the FSEvents thing of Apple a two-edged sword. The FSEvents
daemon itself offers only a very coarse level of messaging -
"something in that directory changed". It doesn't say what, which
files, or even to report renames as such - e.g. Lsyncd can use renames
to issue also the rename command to the transfer agent, so this often
saves a ton of bandwith, if the target moves 10GB files as well
instead of one 10GB file has just   gone and another one appeared.
Apples Hotspot local search engine obviously uses the more lower
interface /dev/fsevents. The FSEvents daemon itself also uses this to
offer its coarse service. I added ability to Lsyncd to use /dev/
fsevents to get the messages on the needed detail on OSX. Its an O.K.
event reporter, but not officially disclosed by Apple and made
available only by reverse engineering. Thus it might break with any
update or release. Second shortcoming is, one needs root priviledges
to access /etc/fsevents, since it will report any file activify,
without filtering etc. Third is as I read, one event receiver not
responding might bring down the event messaging for others as well.

If you meant notification systems, yes I found them all to be non
satisfactory. Obviously every systems has been designed with one
specific use in mind an neglected all other uses, and no standard
institution ever bothered to create a good, full though through
standard to file event reporting.

inotify: watch marks for every directory instead of recursively
watching subdirectories. 1kB of kernel memory per mark -> adds up on
huge trees.

FSEvents: very coarse, "something in the directory changed",
unsuitable for anything that benefits from per-file information.

kqueue: thats what apple tells you to use, if you need finer
information. Its redicolous, not even per directory marks, not to
speak of recursiveness, but one watch per file, will sure mem-explode
on large trees.

/dev/fsevents: internal to Apple, but reverse engineered. Beside that,
and needing root priviledges its the most usable / simple interface,
simply getting all the events into a file descriptor.

fanotify: first sounded so nice with recursive marks. But it doesn't
report rename events at all. And no it does not just report them as
two files deleted and created, it does report _nothing_ -> So its
unusable to anything but a malware shield it was obviously designed
for.

Windows Event Queues: Honestly I did not yet code against those, so I
cannot tell.

About "missing" stuff. If the system reboots or so there isn't going
to be anything missing. Since Lsyncd will make a full resync of the
watched tree on its startup so any changed happened while not running
is going to be transfered. Well on OSX this could be improved as you
said to get the directory list from FSEvents to limit down
directories, but well one would have to code that.

> That said, we just need to add a simple thingy to 'bup index' to have
> it take a list of filenames to update from stdin.  Then we can plug in
> any inotify-type daemon we want.  (And on MacOS, we can just write a
> trivial program to query FSEvents.)

Yes! Thats how Lsyncd calls rsync, for example with "--files-from=-"
so it reads the file list from stdin which Lsyncd then pipes through.
After all rsync was designed for Lsyncd or cares about it, but its
made to work with. Albeit sometimes its a bit tricky to make it the
correct thing, like filter-rules which have to be created recursevily.
Re: Using lsyncd for inotify-based bup index Axel Kittenberger 1/1/12 10:45 AM
Sorry I missed a NOT:

> After all rsync was NOT designed for Lsyncd or cares about it, but its
Re: Using lsyncd for inotify-based bup index apenwarr 1/1/12 10:55 AM

It's a start, but it's not quite right.  The main trick is that we
don't want to just read the entire list of files into memory at once:
an inotify-style daemon may just want to give a huge list of files,
which might be every file on the whole disk.

On the other hand, we *could*, if we want, enforce that the files it
gives are in a reasonable order (perhaps by passing them through
'sort' on the way into stdin).

So my preferred implementation would be to avoid reduce_paths in this
use case, and just make an iterator over stdin that goes straight into
the index updater code; that way we should never have the entire list
in memory.  We should have a check that names are in strictly
bup-compatible valid order, though, which I think is currently reverse
alphabetical order if you assume directory names end in '/'.

I like Axel's suggestion of an "--update-from=-" option that would
also let you read from a file instead of stdin.

It would probably be best to have a -0 option too, which reads names
separated by nul characters.  Otherwise filenames with newlines in
them (yes!) will cause problems.

Have fun,

Avery

Re: Using lsyncd for inotify-based bup index apenwarr 1/1/12 11:22 AM
On Sun, Jan 1, 2012 at 1:30 PM, Axel Kittenberger <axk...@gmail.com> wrote:
> I'm not sure what you intent, but Lsyncd will not "explode" if one
> event goes wrong. It differs by exit code between temporary (like
> network fail) and permanent error (misconfiguration) it will keep
> recalling the transfer agent on temporary fails until it works again,
> and will terminate on permanent errors. Thats usually what one wants.

What if the system crashes and reboots?  What does it do about renames
in such a case?

> I found the FSEvents thing of Apple a two-edged sword. The FSEvents
> daemon itself offers only a very coarse level of messaging -
> "something in that directory changed". It doesn't say what, which
> files, or even to report renames as such - e.g. Lsyncd can use renames
> to issue also the rename command to the transfer agent, so this often
> saves a ton of bandwith, if the target moves 10GB files as well
> instead of one 10GB file has just   gone and another one appeared.
> [...]

Excellent!  You just clearly described the incorrect reasoning that
leads people to keep failing to implement FSEvents for Linux :)

Now, first of all, we should separate antivirus needs (*intercepting*
accesses to files and accept/rejecting them) from typical notification
needs (finding out, sometime later, whether a given file or files have
changed).  bup needs the latter.  FSEvents is great for the latter,
and sucks for antivirus.  Yes, you just need a separate API for
antivirus stuff; inotify isn't any good for it either.

The reason FSEvents is so beautiful is it's the simplest thing that
you can run, as root, on an entire filesystem, that gives you exactly
the information you need to produce efficient file change
notifications in any format you want.  It is *not* trying to be an
all-purpose super-detailed change notification API.  It is *only* the
bare minimum code to let you convert "let's crawl the entire
filesystem for changes" into "let's opendir()/readdir() only the
directories that contain changes."  That approximately converts an
O(n) process to an O(log n) process, and O(log n) is fast enough.

This is immensely valuable.  By being the simplest thing that can
possibly work, it can have a well-defined, stable API that rarely
needs to be extended or changed.  It will have few bugs.  It will be
easy for us to implement on Linux.  And it will work for the maximum
possible number of use cases.  As another advantage, by caring about
only directories and not just files, its required storage space will
be less than if it cared about every file.  Also, storing only
directory names instead of filenames reduces impact in case of
security leaks (since you don't want filenames to be visible to people
who wouldn't have been able to see them in the first place).

A daemon as simple as FSEvents can start super early in the boot
process, so it misses as few events as possible.  (It "almost" can
avoid ever needing to re-crawl the filesystem, except it might have
missed an event right before the crash, so it has to recrawl after
boot.  But it can be recording and reporting *new* events even during
the re-crawl, so you just get somewhat delayed notification of any
last-minute pre-reboot events.)

Finally, one can imagine someday reimplementing FSEvents-granularity
recording right at the kernel level: all you really need (I think) is
a 'ctime' equivalent that records all the way up the tree, so eg. if
/a/b/c/d changes, that means /a/b/c, /a/b, /a, and / are all marked
updated as well.  That makes it easy for an optimized userspace
crawler to skip over entire subtrees that are unchanged, exactly as
'bup save' does when reading the bupindex.

...

Okay, but what can we do about the lack of file-level granularity and
lack of rename notification?  As you mentioned, those can be really
useful to improve efficiency for things like rsync.

Here's a hint: bup doesn't need that information.  If you pass just
the list of modified directories to bup index (and disabled recursion
into subdirectories since you'd be providing subdir names explicitly
when needed), then bup already knows how to figure out which files in
those directories have *actually* changed, as well as how to handle
renames (via the deduplication built into bup save).

In short, the answer is that each *user* of FSEvents needs to keep its
own "database" and update that database, however it wants, given the
information from FSEvents.  Does this seem like doubling your work?
It is, sort of, but only if you're the person who had to write *both*
FSEvents *and* your app database.  (If you're the one writing lsyncd
for Linux, I guess that's true though :))

But lsyncd has zero chance of ever becoming a default, system-level,
always-installed service on Linux systems.  FSEvents *does* have that
chance, because it's small, simple, auditable, and infinitely
reusable.

Every app needs to be able to run different queries on the database
anyway.  A file browser UI - one of the important users of an
FSEvents-type API - just needs to know when one of the (few)
directories it's showing the user changes.  A backup program needs to
track all the files and directories that have changed since the last
time it did an update.  lsyncd needs to do the same, but including
rename information.  A file system search index needs the list of
files, but only ones matching given filename extensions, within a
certain maximum size, and belonging to a particular user.  FSEvents
works for *all* of these without changes, even though it's minimal.

And best of all, if I have two different file search indexers that I
run at different times, they'll both work even if they're not always
running, and they'll *never* have to rescan the entire filesystem.
FSEvents will track events on behalf of *both* of them.  This avoids
the nasty situation in Linux where you can end up with multiple
inotify-using daemons, all tracking the same directories and doing
grunt work whenever an inotify is received, thus slowing down your
whole system.  (inotify-using services need to run at high priority so
the kernel's inotify queue doesn't fill up and start losing events;
FSEvents-using daemons can run at low priority and thus never slow
down your system when it's busy doing other stuff.)

Hopefully that's convincing :)

Have fun,

Avery

Re: Using lsyncd for inotify-based bup index Axel Kittenberger 1/1/12 12:26 PM
> What if the system crashes and reboots?  What does it do about renames> in such a case?

In that case it will retransmit the file. It didn't know it was a move
when it was not running. So don't move the 10GB Files around while the
system boots :-)

> Excellent!  You just clearly described the incorrect reasoning that
> leads people to keep failing to implement FSEvents for Linux :)

The Linux kernel provides no API that would be suitable to implement
an FSEvents-like daemon. Apple has /dev/events for this. And also for
Lsyncd appart from being internal to apple, its much easier for Lsyncd
to use than the tricks needed for inotify.

> Now, first of all, we should separate antivirus needs (*intercepting*
> accesses to files and accept/rejecting them) from typical notification
> needs (finding out, sometime later, whether a given file or files have
> changed).  bup needs the latter.  FSEvents is great for the latter,
> and sucks for antivirus.  Yes, you just need a separate API for
> antivirus stuff; inotify isn't any good for it either.

fanotify has an option you can say "just want the info, do it right
away, not wanting to intercept ever", it would be possible to straight
up and design one notification system that would fit both, if someone
would to look at more than what he/she just needs.

> The reason FSEvents is so beautiful is it's the simplest thing that
> you can run, as root, on an entire filesystem, that gives you exactly
> the information you need to produce efficient file change
> notifications in any format you want.  It is *not* trying to be an
> all-purpose super-detailed change notification API.  It is *only* the
> bare minimum code to let you convert "let's crawl the entire
> filesystem for changes" into "let's opendir()/readdir() only the
> directories that contain changes."  That approximately converts an
> O(n) process to an O(log n) process, and O(log n) is fast enough.

Its both O(n), the n is just smaller. n1 is the whole harddisk, n2 is
only the dirs changed.

> Also, storing only
> directory names instead of filenames reduces impact in case of
> security leaks (since you don't want filenames to be visible to people
> who wouldn't have been able to see them in the first place).

Same could be said about directory names. But being a sysadmin myself,
I consider the content of files to be the users privacy I have no
business with, not the filename. Its like a letter, the envelope is
not secret. For example the log files of most backup solutions contain
all file names changed daily.

> Here's a hint: bup doesn't need that information.  If you pass just
> the list of modified directories to bup index (and disabled recursion
> into subdirectories since you'd be providing subdir names explicitly
> when needed), then bup already knows how to figure out which files in
> those directories have *actually* changed, as well as how to handle
> renames (via the deduplication built into bup save).

Speaking of every notification systems architect only looks after what
he/she needs :-) Wouldn't but benefit about move informations as well?
Lsyncd generally minimizes calls being smart on per file events. It
collapses them logically. So a file creation, change and deletion - a
trinity happening often in practice for temp files - will result in no
call to the transfer agent, since they cancel each other out.

> But lsyncd has zero chance of ever becoming a default, system-level,
> always-installed service on Linux systems.  FSEvents *does* have that
> chance, because it's small, simple, auditable, and infinitely
> reusable.

Again, Linux would need another notification API, neither inotify,
fanotify, dnotify or kqueue sufficide to make an efficient FSEvents
daemon. So calculate the chance for Linux kernel to add a fifth event
notification API. You would need something like /dev/fsevents on OSX
which reports *everything* thats happening, in one file descriptor
without any additional marks.

> Every app needs to be able to run different queries on the database
> anyway.  A file browser UI - one of the important users of an
> FSEvents-type API - just needs to know when one of the (few)
> directories it's showing the user changes.  A backup program needs to
> track all the files and directories that have changed since the last
> time it did an update.  lsyncd needs to do the same, but including
> rename information.  A file system search index needs the list of
> files, but only ones matching given filename extensions, within a
> certain maximum size, and belonging to a particular user.  FSEvents
> works for *all* of these without changes, even though it's minimal.

Well Apples Hotspot search engine uses /dev/fsevents directly instead
of FSEvents - they know why, because its more efficient to know the
files instead only the directory

> And best of all, if I have two different file search indexers that I
> run at different times, they'll both work even if they're not always
> running, and they'll *never* have to rescan the entire filesystem.
> FSEvents will track events on behalf of *both* of them.  This avoids
> the nasty situation in Linux where you can end up with multiple
> inotify-using daemons, all tracking the same directories and doing
> grunt work whenever an inotify is received, thus slowing down your
> whole system.  (inotify-using services need to run at high priority so
> the kernel's inotify queue doesn't fill up and start losing events;
> FSEvents-using daemons can run at low priority and thus never slow
> down your system when it's busy doing other stuff.)

Never had the reason or heared anybody having to increase the priority
of Lsyncd. Lsyncd is fairly fast in empting the event queue. Which is
quite large as well. If it really happens, Lsyncd will do a recursive
sync over the whole tree again, since it doesn't know what it missed,
so the target is on par again.

> Hopefully that's convincing :)

Convincing for what? I just try to make the best of tools and API that
is there, they are all in each own ways unsatisfactory and an FSEvents-
like daemon for GNU/Linux would require a sized change of the Linux
kernel, either fixing fanotify to report move events, or create the
one notification system to rule them all. If you can push that
through, more power to you.

If one of you decide to develop a Lsyncd config file that uses bup
instead of its built in rsync defaults, I'll gladly add that to the
project. I suppose your calling options are going to be easier than
rsync, which can be a burden in itself to get right. (the other end of
just making the best of the tools that are already there :-)
Re: Using lsyncd for inotify-based bup index apenwarr 1/1/12 5:16 PM
On Sun, Jan 1, 2012 at 3:26 PM, Axel Kittenberger <axk...@gmail.com> wrote:
>> What if the system crashes and reboots?  What does it do about renames
>> in such a case?
>
> In that case it will retransmit the file. It didn't know it was a move
> when it was not running. So don't move the 10GB Files around while the
> system boots :-)

I suppose that's better than trying to do two ssh-rename operations
where the second one fails, then getting totally out of sync.  I
suppose the advantage of rsync is no matter what kind of mess you
make, it can manage to clean it up :)  But this is still kind of a
hole in the design, which FSEvents supposedly is able to avoid.

>> Excellent!  You just clearly described the incorrect reasoning that
>> leads people to keep failing to implement FSEvents for Linux :)
>
> The Linux kernel provides no API that would be suitable to implement
> an FSEvents-like daemon. Apple has /dev/events for this. And also for
> Lsyncd appart from being internal to apple, its much easier for Lsyncd
> to use than the tricks needed for inotify.

Not sure what you mean here - fanotify seems pretty complete.  You can
even do it with inotify, but it involves scanning the whole filesystem
at startup so you can tag every single subdir, which is pretty gross.
People do it though.

>> Now, first of all, we should separate antivirus needs (*intercepting*
>> accesses to files and accept/rejecting them) from typical notification
>> needs (finding out, sometime later, whether a given file or files have
>> changed).  bup needs the latter.  FSEvents is great for the latter,
>> and sucks for antivirus.  Yes, you just need a separate API for
>> antivirus stuff; inotify isn't any good for it either.
>
> fanotify has an option you can say "just want the info, do it right
> away, not wanting to intercept ever", it would be possible to straight
> up and design one notification system that would fit both, if someone
> would to look at more than what he/she just needs.

Yeah, that's true.  I had to read up on fanotify just now, but it
looks like a pretty good kernel-level notification system.  In
particular, it has a "global mode" that just lets you listen on *all*
files without tagging each directory individually first.

>> That approximately converts an
>> O(n) process to an O(log n) process, and O(log n) is fast enough.
>
> Its both O(n), the n is just smaller. n1 is the whole harddisk, n2 is
> only the dirs changed.

Depends what you're counting.  If you're counting opendir() for *all*
directories vs. opendir() for each directory in the tree, you're
reading n directories or log n directories.  But yes, it's n files vs.
"much smaller n" files, where the two values of n are coincidentally
in the same ratio as the n vs. log n directories :)

>> Here's a hint: bup doesn't need that information.  If you pass just
>> the list of modified directories to bup index (and disabled recursion
>> into subdirectories since you'd be providing subdir names explicitly
>> when needed), then bup already knows how to figure out which files in
>> those directories have *actually* changed, as well as how to handle
>> renames (via the deduplication built into bup save).
>
> Speaking of every notification systems architect only looks after what
> he/she needs :-) Wouldn't bup benefit about move informations as well?

Well, my point is not that we should scale the daemon down to the
minimal set of requirements needed by bup.  Rather, my point is that
every app needs to maintain its own database *anyway*, so there's no
advantage to having the core system level daemon do something
complicated when you can just figure out the complicated stuff given
the simple stuff.

It's easy to calculate rename information from your own database if
you're tracking a list of directories and filenames with their inode
numbers.  Just look through your before-and-after snapshots of the
directories that have changed, and find newly-existing files in the
new directory that have the same inode numbers as files in the old
directory -> that's a rename or a hardlink (depending if the old file
still exists or not).

It's a good idea to use this method anyway, for the same reason that
git doesn't store file renames (it just recalculates them at 'git log'
or 'git diff' time).  It's just way too easy to lose track of renames.

In particular, doing it this way would allow lsyncd to *not* miss
renames that happened while it wasn't running.

>> But lsyncd has zero chance of ever becoming a default, system-level,
>> always-installed service on Linux systems.  FSEvents *does* have that
>> chance, because it's small, simple, auditable, and infinitely
>> reusable.
>
> Again, Linux would need another notification API, neither inotify,
> fanotify, dnotify or kqueue sufficide to make an efficient FSEvents
> daemon. So calculate the chance for Linux kernel to add a fifth event
> notification API. You would need something like /dev/fsevents on OSX
> which reports *everything* thats happening, in one file descriptor
> without any additional marks.

Again here, I'm confused; I don't see what makes the existing
kernel-level APIs insufficient.  We don't need a new kernel API, we
just need a userspace daemon that makes the existing kernel events
persist on disk so everybody that wants a list of updated files
doesn't need to write yet another daemon.

> Well Apples Hotspot search engine uses /dev/fsevents directly instead
> of FSEvents - they know why, because its more efficient to know the
> files instead only the directory

I'm not 100% sure why they do it that way, one way or the other.  I
half suspect it might just be because they already had the code
written for MacOS 10.4 (which didn't have fseventsd), and there was no
point rewriting it.

For a service that's always running anyhow, it probably is slightly
more efficient to skip the extra layer.

> Never had the reason or heared anybody having to increase the priority
> of Lsyncd. Lsyncd is fairly fast in empting the event queue. Which is
> quite large as well. If it really happens, Lsyncd will do a recursive
> sync over the whole tree again, since it doesn't know what it missed,
> so the target is on par again.

I'm not saying you have to *raise* the priority, but since lsyncd is
probably one of the lowest priority things you can be doing, it should
really be nice'd and ionice'd.  If you did that, you'd run a much
greater risk of exhausting the queue.

If I'm trying to open a page in my web browser and your always-running
file system syncer program is grinding my disk and slowing me down,
then the priorities are set wrong.

> If one of you decide to develop a Lsyncd config file that uses bup
> instead of its built in rsync defaults, I'll gladly add that to the
> project. I suppose your calling options are going to be easier than
> rsync, which can be a burden in itself to get right. (the other end of
> just making the best of the tools that are already there :-)

Regardless of the rest of the discussion, there's nothing wrong with
adding bup support to lsyncd, and I'd be happy to see it.

Have fun,

Avery

Re: Using lsyncd for inotify-based bup index apenwarr 1/1/12 6:58 PM
On Sun, Jan 1, 2012 at 8:16 PM, Avery Pennarun <apen...@gmail.com> wrote:
> Again here, I'm confused; I don't see what makes the existing
> kernel-level APIs insufficient.  We don't need a new kernel API, we
> just need a userspace daemon that makes the existing kernel events
> persist on disk so everybody that wants a list of updated files
> doesn't need to write yet another daemon.

Ah, I did more research and now I see: apparently fanotify really
*doesn't* send you  link/unlink notifications at all, which is a
rather shocking oversight.  So you can subscribe to all inodes, but
not get any message at all when that inode is linked differently into
a directory, or you can use inotify and get the notifications, but
then you have to set up a watch on every single directory.

So okay, you're right, that's a pretty big disaster.

Still, my fix would be to slightly extend inotify (to allow a 'global'
mode like fanotify does) or fanotify (to notify about link/unlink
events).  Since both APIs are apparently based on fsnotify, this
really ought to be easy; I'm surprised it hasn't already been done.
Sigh.

Anyway, inotify can do it; you just have to jump through a lot of
hoops and waste some kernel memory :(

Have fun,

Avery

Re: Using lsyncd for inotify-based bup index Axel Kittenberger 1/1/12 11:38 PM
> I suppose that's better than trying to do two ssh-rename operations
> where the second one fails, then getting totally out of sync.  I
> suppose the advantage of rsync is no matter what kind of mess you
> make, it can manage to clean it up :)  But this is still kind of a
> hole in the design, which FSEvents supposedly is able to avoid.

I don't see the hole. If you go from the system crash, FSEvents also
can miss an event. File changes, kernel reports to FSEvents demon -
System crashes before it writes it to the FSEvents log -> event lost.
I don't see any difference. I'm not aware the Apple cared for event
safety down that level, but a crash-safe solution would have to work
together with the journaling filesystems on very low level.

I suppose with git you can also make a full recursion to clean up any mess.

> Not sure what you mean here - fanotify seems pretty complete.  You can
> even do it with inotify, but it involves scanning the whole filesystem
> at startup so you can tag every single subdir, which is pretty gross.
> People do it though.

To repeat myself, fanotify does not report move/rename events. So
there is no completeness. Move a file and the kernel tells you nothing
about the change. I did jump on fanotify when it was released, since
it would have been a great boon to Lsyncd to use that instead of
inotify, but it was a real let down when I realized it ignores moved
files.

"People do it with inotify", whom are you telling this? :-) Lsyncd is
all about that. Again, the issue with inotify is, you need one watch
per directory, where a watch uses 540 Bytes of kernel memory on 32 bit
systems and ~1KB Bytes of kernel memory on 64bit systems. Add to it
the memory needs in the daemon to remember every inotify handle id and
which directory name it corresponds to. For Lsyncd watch directoy tree
that they need to be synchronized. A FSEvents-like daemon would need
to make watches for every directory thats is there. No matter needed
or not. I get repeatly messages by people complaining about memory
usage, because they have a million directories, I just answer, there
is not much to do about it, unless the kernel is changed, and this is
twice true if you want to make a FSEvents-like daemon on GNU/Linux.

> Yeah, that's true.  I had to read up on fanotify just now, but it
> looks like a pretty good kernel-level notification system.  In
> particular, it has a "global mode" that just lets you listen on *all*
> files without tagging each directory individually first.

Thats what I wrote. Its actually per mounted filesystem, not
everything like /dev/fsevents, but that wouldn't be much of a problem.
But again, it will not report mv events! Obviously the malware-shields
do not care about move events, and anti-malware is all what fanotify
designers cared about. Also daemons like FSEvent or Lsyncd do not care
about file descriptors, so its a hurdle if the notification system
delivers them instead of the filenames. Since when the daemon is
slowed down it will keep all kind of fds in its hands and hurting
filesystem performance since it cannot remove files unless the
watching daemon drops the handles. /dev/fsevents is what Apple
designed for hotspot and FSEvents, and a kernel interface similar to
that is what is needed to make an alike.

> Depends what you're counting.  If you're counting opendir() for *all*
> directories vs. opendir() for each directory in the tree, you're
> reading n directories or log n directories.  But yes, it's n files vs.
> "much smaller n" files, where the two values of n are coincidentally
> in the same ratio as the n vs. log n directories :)

I'm counting the definition of O() and the definiton of log. I don't
see why it should get "logarithmic", you might read up on theoretical
informatics, before throwing too much O(log n) arguments around :-).
Its both running once linearly through the data set: once the entire
hard disc, once linearly  the list of changed directories. so thus Its
both O(n), albeit one n "is much smaller" does not make it a
logarithmic algorithm.

> It's easy to calculate rename information from your own database if
> you're tracking a list of directories and filenames with their inode
> numbers.  Just look through your before-and-after snapshots of the
> directories that have changed, and find newly-existing files in the
> new directory that have the same inode numbers as files in the old
> directory -> that's a rename or a hardlink (depending if the old file
> still exists or not).

In that case you would have to remember all the inode numbers for
filenames -> again hugh memory or database need. Also inode numbers
might be reused, so you'd have to remember filesizes, timestamps and
hashes again -> more memory and IO needs.

> Again here, I'm confused; I don't see what makes the existing
> kernel-level APIs insufficient.  We don't need a new kernel API, we
> just need a userspace daemon that makes the existing kernel events
> persist on disk so everybody that wants a list of updated files
> doesn't need to write yet another daemon.

Thats what I wrote, on GNU/Linux inotify is not suitable, because it
uses a lot of memory to watch a whole hard disc, fanotify is not
suitable because you miss move/rename events. And no they aren't just
reported as two events like one file deleted and one created, they are
not reported at all. There isn't something like /dev/fsevents on
Linux, which is needed for an FSEvents-like daemon service.

> I'm not 100% sure why they do it that way, one way or the other.  I
> half suspect it might just be because they already had the code
> written for MacOS 10.4 (which didn't have fseventsd), and there was no
> point rewriting it.

No. They explictly designed /dev/fsevents for hotspot, FSEvents was
just an extra to offer a watered down interface to the public.

> I'm not saying you have to *raise* the priority, but since lsyncd is
> probably one of the lowest priority things you can be doing, it should
> really be nice'd and ionice'd.  If you did that, you'd run a much
> greater risk of exhausting the queue.

You are swinging forth and back in you arguments. Anyway CPU stuff and
IO stuff are two vastly different domains where one very seldomly
impacts the other, so if you should use ionice, not nice. If one wants
one can run the rsync commands on a higher ionice level if you care.
So the queue is emptied fast, but the IO is nice. Anyway, I have never
heared of someone complaining of  supernumerary overflows events,
because the queue was not emptied fast enough.

> Regardless of the rest of the discussion, there's nothing wrong with
> adding bup support to lsyncd, and I'd be happy to see it.

Great. You'll find it pretty adaptable through the Lua
configuration/script files.

What I'm saying repeatedly, getting a FSEvents-like daemon on Linux
might be nice, but it won't happen unless the kernel offers a suitable
notification system for it.

Re: Using lsyncd for inotify-based bup index apenwarr 1/2/12 9:28 AM
On Mon, Jan 2, 2012 at 2:38 AM, Axel Kittenberger <axk...@gmail.com> wrote:
>> I suppose that's better than trying to do two ssh-rename operations
>> where the second one fails, then getting totally out of sync.  I
>> suppose the advantage of rsync is no matter what kind of mess you
>> make, it can manage to clean it up :)  But this is still kind of a
>> hole in the design, which FSEvents supposedly is able to avoid.
>
> I don't see the hole. If you go from the system crash, FSEvents also
> can miss an event. File changes, kernel reports to FSEvents demon -
> System crashes before it writes it to the FSEvents log -> event lost.
> I don't see any difference. I'm not aware the Apple cared for event
> safety down that level, but a crash-safe solution would have to work
> together with the journaling filesystems on very low level.

Correct, I misrepresented it a bit here - all FSEvents can tell you in
a crash is that you missed events, it can't tell you which ones.  You
have to rescan the filesystem.  In theory, however, FSEvents could be
one day transparently extended (ie. by cooperating with a journaling
filesystem) to *not* miss events on a crash.  And then anybody using
FSEvents magically benefits.  That's the sign of a good design.

> To repeat myself, fanotify does not report move/rename events. So
> there is no completeness.

Yes, I did more reading about this yesterday.  Now that I think of it,
I remember seeing at the time and finding it hard to believe that it
was true.  And now, more than two years later, it's still true.  Sigh.

> "People do it with inotify", whom are you telling this? :-) Lsyncd is
> all about that. Again, the issue with inotify is, you need one watch
> per directory, where a watch uses 540 Bytes of kernel memory on 32 bit
> systems and ~1KB Bytes of kernel memory on 64bit systems. Add to it
> the memory needs in the daemon to remember every inotify handle id and
> which directory name it corresponds to.

Yes, this sucks, but then again: if you're going to do it, wouldn't it
be nice to only do it in a single global daemon that can be shared by
everyone? :)

You can resolve the userspace memory wastage problem, at least, by
simply storing the one-time mapping in the event logfile and leaving
it to clients to look it up based on the directories they *actually*
care about.  (As a bonus, this makes your program look like it's
taking less memory so you'd get fewer complaints :))

The kernel memory usage is probably not actually that high just for a
watch - I expect that's probably the memory needed to keep the inode
in memory at all.  It's still a ridiculous waste, but at least you
don't have to load the inode next time someone goes to stat() that
directory.

On my system, I just checked and I have 39000 directories.  My MacOS
system has 300000 directories.  Given a 64-bit system, that's 39 megs
and 300 megs, respectively, based on your calculations.  Disgusting,
but bearable on a modern computer.

I wonder how hard it could possibly be to add a "global" mode to
inotify in the kernel.  What I read is that fanotify and inotify
nowadays just use the same fsnotify infrastructure.  If so, this can't
be too hard to arrange, and it would avoid this whole mess.  Someone
has to solve it eventually :)  Any kernel hackers around here?  (I've
done kernel hacking before, but I have way too many projects
already...)

>> Depends what you're counting.  If you're counting opendir() for *all*
>> directories vs. opendir() for each directory in the tree, you're
>> reading n directories or log n directories.  But yes, it's n files vs.
>> "much smaller n" files, where the two values of n are coincidentally
>> in the same ratio as the n vs. log n directories :)
>
> I'm counting the definition of O() and the definiton of log. I don't
> see why it should get "logarithmic", you might read up on theoretical
> informatics, before throwing too much O(log n) arguments around :-).

If you consider the time to scan the contents of a particular
directory as constant - which is fairly reasonable, since almost all
directories have roughly the same smallish number of files - then all
that matters is n = the number of directories you have to scan.  If
you get a notification so you only have to scan one directory (and its
parents up the tree, depending how your FSEvents-clone is implemented)
then that's about O(log n) directory scans.

There's a constant factor k, related to the amount of work it takes to
scan any given directory.  k is potentially much larger if you have to
scan a directory vs. an individual file, but you leave constants out
of O() notation.

>> It's easy to calculate rename information from your own database if
>> you're tracking a list of directories and filenames with their inode
>> numbers.  Just look through your before-and-after snapshots of the
>> directories that have changed, and find newly-existing files in the
>> new directory that have the same inode numbers as files in the old
>> directory -> that's a rename or a hardlink (depending if the old file
>> still exists or not).
>
> In that case you would have to remember all the inode numbers for
> filenames -> again hugh memory or database need. Also inode numbers
> might be reused, so you'd have to remember filesizes, timestamps and
> hashes again -> more memory and IO needs.

Yes; bup does this.  So should you :)

If you don't have such a database, then when you inevitably miss
events - as you can with inotify, /dev/fsevents, or as you pointed
out, FSEvents - your program won't work correctly.  lsyncd is
basically protecting itself from this by degrading into "rsync will
just fix my mistakes" mode when it misses events.  That works for
rsync, but nothing else.  (I guess it would work for bup if you always
do a full 'bup index' scan every time lsyncd starts.)

>> I'm not 100% sure why they do it that way, one way or the other.  I
>> half suspect it might just be because they already had the code
>> written for MacOS 10.4 (which didn't have fseventsd), and there was no
>> point rewriting it.
>
> No. They explictly designed /dev/fsevents for hotspot, FSEvents was
> just an extra to offer a watered down interface to the public.

That's what I said.  In MacOS 10.4, there was *only* spotlight and
/dev/fsevents, so of course /dev/fsevents was designed for spotlight.
Then they realized it was more widely useful but they didn't want a
zillion programs waiting on /dev/fsevents, so they added the FSEvents
daemon and eg. time machine uses it.

But spotlight was already written, so it would have been useless
engineering effort to rewrite it to use FSEvents.  That's one possible
reason it continues to use /dev/fsevents.  We don't know for sure what
they would have done if they were writing spotlight from scratch
today, now that FSEvents exists.

>> I'm not saying you have to *raise* the priority, but since lsyncd is
>> probably one of the lowest priority things you can be doing, it should
>> really be nice'd and ionice'd.  If you did that, you'd run a much
>> greater risk of exhausting the queue.
>
> You are swinging forth and back in you arguments. Anyway CPU stuff and
> IO stuff are two vastly different domains where one very seldomly
> impacts the other, so if you should use ionice, not nice. If one wants
> one can run the rsync commands on a higher ionice level if you care.
> So the queue is emptied fast, but the IO is nice. Anyway, I have never
> heared of someone complaining of  supernumerary overflows events,
> because the queue was not emptied fast enough.

No, I've been consistent about this one :)

Anybody listening on /dev/fsevents on MacOS needs to respond *fast* so
they don't exhaust the /dev/fsevents queue.  How fast is fast?
Probably not *super* fast, but if something in the system is grinding
the disk really hard and you're running 1000 processes at once, it
still matters.

For all I know, inotify doesn't have a maximum queue length at all,
and just wastes kernel memory and this "isn't a problem" there; that
seems unlikely.  More likely is that you just haven't gotten any bug
reports about it in lsyncd because when you miss events, rsync just
fixes your mistakes for you.

The "correct" way to implement a system like this, priority wise, is
to have a high-priority process that receives the events into user
space, and a low-priority process that does the "real work."  FSEvents
makes this easy because fseventsd can be high priority, and everything
else can be low priority.  lsyncd right now should be both high
priority (so it doesn't miss events) and low priority (so boring
backups don't interfere with interactive performance).

Have fun,

Avery

Re: Using lsyncd for inotify-based bup index Axel Kittenberger 1/2/12 12:48 PM
> Yes, I did more reading about this yesterday.  Now that I think of it,> I remember seeing at the time and finding it hard to believe that it> was true.  And now, more than two years later, it's still true.  Sigh.

I'm honest about this. See e.g. my message to the kernel list that
never got answered:
http://help.lockergnome.com/linux/Fanotify-mv-rename--ftopict529275.html

> On my system, I just checked and I have 39000 directories.  My MacOS
> system has 300000 directories.  Given a 64-bit system, that's 39 megs
> and 300 megs, respectively, based on your calculations.  Disgusting,
> but bearable on a modern computer.

You will never get any distro to include this by default to waste
aprox 300 MB of memory like that. And some servers even have much,
much more directories. Not going to happen.

> I wonder how hard it could possibly be to add a "global" mode to
> inotify in the kernel.  What I read is that fanotify and inotify
> nowadays just use the same fsnotify infrastructure.  If so, this can't
> be too hard to arrange, and it would avoid this whole mess.  Someone
> has to solve it eventually :)  Any kernel hackers around here?  (I've
> done kernel hacking before, but I have way too many projects
> already...)

I looked into the code for a few days, its not a simple task, but I
suppose it could be done, since inotify knows this. The problem they
have is properly intercepting move events with the given structure,
which inotify doesn't have to, but fitting for fanotify should be
possible.

> If you consider the time to scan the contents of a particular
> directory as constant - which is fairly reasonable, since almost all
> directories have roughly the same smallish number of files - then all
> that matters is n = the number of directories you have to scan.  If
> you get a notification so you only have to scan one directory (and its
> parents up the tree, depending how your FSEvents-clone is implemented)
> then that's about O(log n) directory scans.

You don't get any logarithm anywhere! There is no logarithmn. No.
There is no hashtable, there is no tree, there is no sorting, its
linear scanning and thus its at best O(n). Where n is one case the
number of directories that are there, and in the case the number of
directories that have changed. But no algorithmn says, if you get 1000
directories, it will be 10 changes, and 10000 will be 20 changes..
what would be a logarithmic relationship. Its just a smaller n.

> There's a constant factor k, related to the amount of work it takes to
> scan any given directory.  k is potentially much larger if you have to
> scan a directory vs. an individual file, but you leave constants out
> of O() notation.

The number of directories is n.

> Yes; bup does this.  So should you :)
>
> If you don't have such a database, then when you inevitably miss
> events - as you can with inotify, /dev/fsevents, or as you pointed
> out, FSEvents - your program won't work correctly.  lsyncd is
> basically protecting itself from this by degrading into "rsync will
> just fix my mistakes" mode when it misses events.  That works for
> rsync, but nothing else.  (I guess it would work for bup if you always
> do a full 'bup index' scan every time lsyncd starts.)

No, not just rsync. Any synchronization engine is able to fully update
the target to the current status of the source, otherwise its not
doing its job correctly. Also since FSEvents might also miss events in
overflow conditions (or crashes) you would have to have a similar full
sweep anyway with anything doing this seriously.

> Anybody listening on /dev/fsevents on MacOS needs to respond *fast* so
> they don't exhaust the /dev/fsevents queue.  How fast is fast?
> Probably not *super* fast, but if something in the system is grinding
> the disk really hard and you're running 1000 processes at once, it
> still matters.

Thats because on MacOS with a pretty low limit on the queue, and yes
its dangerous. But any listener to this supports to handle an Overflow
event. e.g. hotspot rescans the harddisc if it happens. Since with
e.g. Lsyncd the transfer agent is another process than the one that
empties the queue, you can differ between niceness if you need to.

> For all I know, inotify doesn't have a maximum queue length at all,
> and just wastes kernel memory and this "isn't a problem" there; that
> seems unlikely.  More likely is that you just haven't gotten any bug
> reports about it in lsyncd because when you miss events, rsync just
> fixes your mistakes for you.

axel@gandalf:~$ cat /proc/sys/fs/inotify/max_queued_events
16384

There you go. Its limited to 16384. inotify as fsevents will issue an
Overflow event if it lost events.Its coded in a way that nothing bad
happens if events are lost.

> The "correct" way to implement a system like this, priority wise, is
> to have a high-priority process that receives the events into user
> space, and a low-priority process that does the "real work."  FSEvents
> makes this easy because fseventsd can be high priority, and everything
> else can be low priority.  lsyncd right now should be both high
> priority (so it doesn't miss events) and low priority (so boring
> backups don't interfere with interactive performance).

As explained already in the email before (do you read these?), this
can easily be done by a little change to the config to call the
transfer agents with larger nice level.

I'll let just drop it. Lsyncd was written to scratch a particular
scratch I needed in my work. It works with what is there, if you can
fix event notifications on GNU/Linux, go for it. For OSX I experienced
FSEvents itself to be too coarse, and considering Apples line "if you
need finer messaging use kqueue" a joke, instead of opening their
/dev/fsevents interface, disregarding if you argue one shouldn't need
that level of granularity. Simply also because I will not make a
completly different way of operation on OSX than on Linux, where
already limited to inotify I get the fine grained events right away
with no extra cost.

Re: Using lsyncd for inotify-based bup index apenwarr 1/2/12 1:14 PM
On Mon, Jan 2, 2012 at 3:48 PM, Axel Kittenberger <axk...@gmail.com> wrote:
> Avery wrote:
>> I wonder how hard it could possibly be to add a "global" mode to
>> inotify in the kernel.  What I read is that fanotify and inotify
>> nowadays just use the same fsnotify infrastructure.  If so, this can't
>> be too hard to arrange, and it would avoid this whole mess.  Someone
>> has to solve it eventually :)  Any kernel hackers around here?  (I've
>> done kernel hacking before, but I have way too many projects
>> already...)
>
> I looked into the code for a few days, its not a simple task, but I
> suppose it could be done, since inotify knows this. The problem they
> have is properly intercepting move events with the given structure,
> which inotify doesn't have to, but fitting for fanotify should be
> possible.

I don't quite understand the above, and I honestly want to understand
so that I can understand the amount of work needed to fix the kernel.

inotify can already intercept move events, right?  So I'd like to
understand what major surgery would be needed to just make it
intercept *all* move events and report them to userspace, rather than
reporting only "watched" ones.

>> If you consider the time to scan the contents of a particular
>> directory as constant - which is fairly reasonable, since almost all
>> directories have roughly the same smallish number of files - then all
>> that matters is n = the number of directories you have to scan.  If
>> you get a notification so you only have to scan one directory (and its
>> parents up the tree, depending how your FSEvents-clone is implemented)
>> then that's about O(log n) directory scans.
>
> You don't get any logarithm anywhere! There is no logarithmn. No.

Um, there is a tree: the filesystem is a tree.

O() notation is about average cases.  On my computer right now, the
root filesystem has 321000 files, and 39000 directories.  That's
roughly 10 files per directory, on average.  Assuming my filesystem is
arranged in a balanced tree (which is true - on average), the number
of directories I need to traverse to find a particular filename is
log(n).

If one file has changed and I want to find out which one by searching
the bupindex, which is arranged in the same tree structure as the
filesystem, I can find it in O(log n).

Without a notification system or index, I would have to scan through n
directories.

Anyway, that's all pretty academic.  The important point we both agree
on is "in the average case, scanning only changed directories is way
the hell faster than scanning the entire disk."  And in my opinion,
it's plenty fast enough for backup purposes.  It's also sufficient for
detecting renames.

Have fun,

Avery

Re: Using lsyncd for inotify-based bup index Simon Sapin 1/2/12 1:33 PM
Le 02/01/2012 22:14, Avery Pennarun a �crit :

> Anyway, that's all pretty academic.  The important point we both agree
> on is "in the average case, scanning only changed directories is way
> the hell faster than scanning the entire disk."  And in my opinion,
> it's plenty fast enough for backup purposes.  It's also sufficient for
> detecting renames.

You suggested earlier a new "modification time" metadata that would
propagate to parent directories. bup already has an index and can
compare with previously seen timestamps. If it hasn�t changed for a
directory, the whole tree hasn�t changed and can be skipped. This is not
as fast as an ideal FSevent (parents directories need to be scanned) but
I think it would be good enough for bup. (Filesystem trees are rarely
very deep.)

Maybe I�m missing something, but wouldn�t this be easier to implement
and use than a notification system? Or is it too expensive to have to
seek+write for each ancestor on each write?

A notification system like inotify is good to react quickly without
polling. It is good for a file browser and can be an attractive use case
for lsyncd+rsync with a small delay, but I think that latency is not
that important for bup.

Regards,
--
Simon Sapin

Re: Using lsyncd for inotify-based bup index apenwarr 1/2/12 2:03 PM
On Mon, Jan 2, 2012 at 4:33 PM, Simon Sapin <simon...@exyr.org> wrote:
> You suggested earlier a new "modification time" metadata that would
> propagate to parent directories. bup already has an index and can compare
> with previously seen timestamps. If it hasn’t changed for a directory, the
> whole tree hasn’t changed and can be skipped. This is not as fast as an

> ideal FSevent (parents directories need to be scanned) but I think it would
> be good enough for bup. (Filesystem trees are rarely very deep.)
>
> Maybe I’m missing something, but wouldn’t this be easier to implement and

> use than a notification system? Or is it too expensive to have to seek+write
> for each ancestor on each write?

Adding metadata to the filesystem is *always* complicated and messy.
In particular, it would be kind of annoying if you did this:

    touch a/b/c
    tar -cf file1.tar --exclude=a/b/1 a/b
    touch a/b/1/2
    tar -cf file2.tar --exclude=a/b/1 a/b

...and file1.tar != file2.tar.  Neither tarball includes a/b/1/2, so
it's a little annoying that the metadata for a/b got changed just
because a/b/1/2 changed.  Maybe tar shouldn't be recording that sort
of information, but then you're making special cases in tar to make it
not record one sort of metadata/xattr even while it *does* record the
other ones.

Maybe that's not such a big deal, but it's a little annoying, and it
would be a hard battle to win.

Given an inotify-like kernel interface, however, (but with the 'send
me everything' feature) you could have a userspace daemon that tracks
exactly this information, off to the side, in a bup-like index.  And
you could have that daemon work on multiple OSes and use whatever
kernel API is available.

Incidentally, I'm halfway sure btrfs already has this capability,
since it supports real-time snapshotting and all.  I think you can ask
it whether a directory tree matches what it did before.  In fact,
btrfs should even be able to tell you what *parts* of a giant file
have changed since last time; that would be really great for
incremental backups of VM files.

> A notification system like inotify is good to react quickly without polling.
> It is good for a file browser and can be an attractive use case for
> lsyncd+rsync with a small delay, but I think that latency is not that
> important for bup.

Correct, bup doesn't care that much about latency.  Note however that
an FSEvents-style daemon doesn't imply high latency; it can ping your
process just as easily as the kernel can.

Have fun,

Avery

Re: Using lsyncd for inotify-based bup index Axel Kittenberger 1/2/12 2:12 PM
> I don't quite understand the above, and I honestly want to understand
> so that I can understand the amount of work needed to fix the kernel.
>
> inotify can already intercept move events, right?  So I'd like to
> understand what major surgery would be needed to just make it
> intercept *all* move events and report them to userspace, rather than
> reporting only "watched" ones.

"intercept" meaning the watcher saying "no" to an operation before it
happens. Thats what malware shields need, but you and me not care
about.

> O() notation is about average cases.

You got that wrong. Big O notation is limiting function, a function
that is guaranteed to grow faster than the real thing. That is it is
"worst case". That makes the rest of the argument kinda based on a
flawed assumption.

> If one file has changed and I want to find out which one by searching
> the bupindex, which is arranged in the same tree structure as the
> filesystem, I can find it in O(log n).

With event notification this case would be O(1).

> Anyway, that's all pretty academic.  The important point we both agree
> on is "in the average case, scanning only changed directories is way
> the hell faster than scanning the entire disk."  And in my opinion,
> it's plenty fast enough for backup purposes.  It's also sufficient for
> detecting renames.

Yes, thats event oriented syncing is way faster is what Lsyncd was all
about.  Otherwise one could just call rsync in a cron job to run every
few minutes or so - what plenty of people actually do.

Re: Using lsyncd for inotify-based bup index Gabriele Santilli 1/3/12 2:55 AM
On Mon, Jan 2, 2012 at 10:14 PM, Avery Pennarun <apen...@gmail.com> wrote:

> O() notation is about average cases.

Not really, it's about asymptotic behavior.

http://en.wikipedia.org/wiki/Asymptotic_complexity