Monitoring FS changes

Chris Stankevitz

unread,

Dec 31, 2015, 2:24:20 AM12/31/15

to

Hi,

I have a directory /foo that recursively contain ~250,000 files/directories.

I would like my application to know when a file is added, removed, or
modified under /foo. Is there a way to do that with FreeBSD?

I believe on linux a facility called iNotify accomplishes this.

On OSX a facility called FSEvents accomplishes this.

kqueue apparently requires me to open every file and/or directory in my
tree... which won't work because I have so many.

Is there any other option? Perhaps

i=0
while (true)
{
zfs snapshot pool/foo@${i}
zfs diff pool/foo@${i-1} pool/foo@${i}
++i
}

Thank you,

Chris
_______________________________________________
freeb...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"

Mark Felder

unread,

Jan 3, 2016, 4:09:21 PM1/3/16

to

On Thu, Dec 31, 2015, at 01:24, Chris Stankevitz wrote:
> Hi,
>
> I have a directory /foo that recursively contain ~250,000
> files/directories.
>
> I would like my application to know when a file is added, removed, or
> modified under /foo. Is there a way to do that with FreeBSD?
>
> I believe on linux a facility called iNotify accomplishes this.
>
> On OSX a facility called FSEvents accomplishes this.
>
> kqueue apparently requires me to open every file and/or directory in my
> tree... which won't work because I have so many.
>
> Is there any other option? Perhaps
>
> i=0
> while (true)
> {
> zfs snapshot pool/foo@${i}
> zfs diff pool/foo@${i-1} pool/foo@${i}
> ++i
> }
>

Yes, Linux has inotify (just be aware it doesn't actually work on inodes
like it indicates: changes to alternative hard links are ignored if
they're not in the file path you're monitoring), OSX has fsevents,
Solaris derivatives have File Events Notification, and we're stuck with
kqueue which doesn't scale. I'm not aware of anything else being
available for us.

If someone, anyone out there is capable of bringing us something that
does scale it would be greatly appreciated. Lots of nice Linux software
uses this, but when they do port to FreeBSD we have to do full
filesystem scans. It's such a waste.

--
Mark Felder
ports-secteam member
fe...@FreeBSD.org

Jordan Hubbard

unread,

Jan 3, 2016, 4:37:37 PM1/3/16

to

> On Jan 3, 2016, at 1:08 PM, Mark Felder <fe...@FreeBSD.org> wrote:
>
> If someone, anyone out there is capable of bringing us something that
> does scale it would be greatly appreciated. Lots of nice Linux software
> uses this, but when they do port to FreeBSD we have to do full
> filesystem scans. It's such a waste.

I’ve been pondering this for awhile since a lot of interesting enterprise features require a working filesystem change notification mechanism that scales to thousands or even millions of files (how did we bump into the 32 bit NFS file handle problem at iXsystems? Somebody tried to share more than 4 billion files over NFS - Enterprise folks do some crazy s**t!).

The big question is less whether it’s possible and more what kind of mechanism people will find palatable. The OS X FSEvents mechanism works reasonably well and is used constantly to trigger things like spotlight search indexing and such, and I was by no means involved in its creation at Apple so I can only speak peripherally to the implementation, but it seems like it took a fairly long time for it to become “light weight” enough to use without the overhead being punitive. Any similar mechanism in FreeBSD would also have to go through some evolutionary performance iterations - do people want it badly enough to invest in it long-term? I don’t know, but I do know that a long-term investment would be necessary to really make it work well and provide all of the appropriate APIs for talking to it.

I think we can probably all agree that Linux inotify wouldn’t be worth the trouble. From the wikipedia page:

• Inotify does not support recursively watching directories, meaning that a separate inotify watch must be created for every subdirectory.[4]
• Inotify does report some but not all events in sysfs and procfs.
• Notification via inotify requires the kernel to be aware of all relevant filesystem events, which is not always possible for networked filesystems such as NFS where changes made by one client are not immediately broadcast to other clients.
• Rename events are not handled directly; i.e., inotify issues two separate events that must be examined and matched in a context of potential race conditions.

I think the first issue alone is a deal killer. Having to walk the filesystem tree posting notifications on every [new] directory just to watch a filesystem in its entirety would be pretty onerous and failure-prone to boot. By contrast: https://en.wikipedia.org/wiki/FSEvents

This is also not to say that I would expect anything in FreeBSD to be API-compatible (though the upstream clients would probably grumble at yet another notification mechanism API to #ifdef into their code), simply that there are only so many design patterns to follow. A filesystem change is a filesystem change. Everything beyond that is just a glorified pub/sub mechanism.

Assuming there’s interest, I could potentially see throwing some engineering effort into this.

- Jordan

Mark Felder

unread,

Jan 3, 2016, 4:47:51 PM1/3/16

to

On Sun, Jan 3, 2016, at 15:36, Jordan Hubbard wrote:
>
> I think we can probably all agree that Linux inotify wouldn’t be worth
> the trouble. From the wikipedia page:

Just talk to Bryan Cantrill if you want to know why we should avoid
inotify at all costs. He had to work on mapping it to FEN on SmartOS and
he discovered a world of hurt in the process. They're allegedly stuck
with the broken implementation of inotify now because Linus doesn't want
KBI breakage. Not to say we couldn't provide a compatibility shim so
inotify things can compile on FreeBSD, but it might be wise to have
something else that works better. Not sure if we really should reinvent
the wheel, but I have zero clue how FSEvents or FEN scale.

>
> Assuming there’s interest, I could potentially see throwing some
> engineering effort into this.
>
> - Jordan
>

I would love to see this happen in the near future. It is *the* reason
Dropbox hasn't released a FreeBSD-native client last I checked. I know
that Plex would use it if it was available. There's a lot of cool things
ripe for porting if we only had a mechanism...

--
Mark Felder
ports-secteam member
fe...@FreeBSD.org

Konstantin Belousov

unread,

Jan 4, 2016, 4:02:51 AM1/4/16

to

On Sun, Jan 03, 2016 at 01:36:36PM -0800, Jordan Hubbard wrote:
>
> > On Jan 3, 2016, at 1:08 PM, Mark Felder <fe...@FreeBSD.org> wrote:
> >
> > If someone, anyone out there is capable of bringing us something that
> > does scale it would be greatly appreciated. Lots of nice Linux software
> > uses this, but when they do port to FreeBSD we have to do full
> > filesystem scans. It's such a waste.
>

> I???ve been pondering this for awhile since a lot of interesting enterprise features require a working filesystem change notification mechanism that scales to thousands or even millions of files (how did we bump into the 32 bit NFS file handle problem at iXsystems? Somebody tried to share more than 4 billion files over NFS - Enterprise folks do some crazy s**t!).
>
> The big question is less whether it???s possible and more what kind of mechanism people will find palatable. The OS X FSEvents mechanism works reasonably well and is used constantly to trigger things like spotlight search indexing and such, and I was by no means involved in its creation at Apple so I can only speak peripherally to the implementation, but it seems like it took a fairly long time for it to become ???light weight??? enough to use without the overhead being punitive. Any similar mechanism in FreeBSD would also have to go through some evolutionary performance iterations - do people want it badly enough to invest in it long-term? I don???t know, but I do know that a long-term investment would be necessary to really make it work well and provide all of the appropriate APIs for talking to it.
>
> I think we can probably all agree that Linux inotify wouldn???t be worth the trouble. From the wikipedia page:
>
> ??? Inotify does not support recursively watching directories, meaning that a separate inotify watch must be created for every subdirectory.[4]
> ??? Inotify does report some but not all events in sysfs and procfs.
> ??? Notification via inotify requires the kernel to be aware of all relevant filesystem events, which is not always possible for networked filesystems such as NFS where changes made by one client are not immediately broadcast to other clients.
> ??? Rename events are not handled directly; i.e., inotify issues two separate events that must be examined and matched in a context of potential race conditions.

>
> I think the first issue alone is a deal killer. Having to walk the filesystem tree posting notifications on every [new] directory just to watch a filesystem in its entirety would be pretty onerous and failure-prone to boot. By contrast: https://en.wikipedia.org/wiki/FSEvents
>
> This is also not to say that I would expect anything in FreeBSD to be API-compatible (though the upstream clients would probably grumble at yet another notification mechanism API to #ifdef into their code), simply that there are only so many design patterns to follow. A filesystem change is a filesystem change. Everything beyond that is just a glorified pub/sub mechanism.
>

> Assuming there???s interest, I could potentially see throwing some engineering effort into this.
>

There are many people that claim to have very good ideas. This case
seems to be an opportunity for such people to contribute something
useful to the FreeBSD.

I mean, agree upon and provide the precise enough technical
specification for the API you want. It does not need to be exact in all
details, you can assume a gleam of intelligence in the coders which
would implement it, but the spec must be feasible to implement and
satisfy the core requirements for consumers.

E.g., you need a recursive notification, but there is no way to find
all dirents pointing to the given inode, as you noted above. NFS client
should not expect to (reliably) get a notification when other client
updates a directory and so on.

Ideally, this should be a man-page like text and several programs to
illustrate the intended use. Programs should be complete, but cannot
be tested (for understandable reason).

After that, I promise that the spec will be implemented.

Julian Elischer

unread,

Jan 4, 2016, 10:04:01 AM1/4/16

to

I think the point is that setting a notify onto a directory vnode
would force all directory vnodes above it (away from root)
to stay in memory. That gives a path to traverse in memory to look for
notify hooks when a change is made..
This has been discussed many times before. many yearsa ago it came
down to resources usage.. I'm not sure that is so important with 128GB
machines.
(but it needs to handle runaway resource usage). The exact syntax of
what is required needs to be spelled out well (as kib says).

Konstantin Belousov

unread,

Jan 4, 2016, 10:34:16 AM1/4/16

to

On Mon, Jan 04, 2016 at 11:03:30PM +0800, Julian Elischer wrote:
> I think the point is that setting a notify onto a directory vnode
> would force all directory vnodes above it (away from root)
> to stay in memory. That gives a path to traverse in memory to look for
> notify hooks when a change is made..

No, it is not. On lookup, you can mark instantiated vnodes, which are
children of the monitored vnode, specially. This does not change the
existing lifecycle of the vnodes, does not require to make all childs
unreclamaible, and allows the caching to work. The tracking might be
done with either modest struct vnode grow or even without, by placing
several strategic hooks into the vnode lifecycle code.

The generic problem we have there is quite different. Assume that
we establish a new monitor on a directory, and assume there exists
previously open file, which vnode should be now monitored by the
'children' rule. How can we learn that the vnode must be included in the
watching set, i.e. marked ? Same issue occurs for fhopen() and for NFS
handles.

AFAIU, Linux solves it by making the name cache reliable, so you always
know if the used name for the vnode is the child of the monitor root.
This also explains why they cannot detect hard links.

> This has been discussed many times before. many yearsa ago it came
> down to resources usage.. I'm not sure that is so important with 128GB
> machines.
> (but it needs to handle runaway resource usage). The exact syntax of
> what is required needs to be spelled out well (as kib says).
>

As explained above, this is not the issue.

Whether the problem I noted is important for the requested API, can be
only understood after the API requirements are provided. Even if it
is important, might be some significant group of consumers still do not
care or can accept such problem.

Sean Eric Fagan

unread,

Jan 4, 2016, 3:08:58 PM1/4/16

to

>The generic problem we have there is quite different. Assume that
>we establish a new monitor on a directory, and assume there exists
>previously open file, which vnode should be now monitored by the
>'children' rule. How can we learn that the vnode must be included in the
>watching set, i.e. marked ? Same issue occurs for fhopen() and for NFS
>handles.

xnu solved that by putting a parent pointer in each vnode (obviously, not set
for non-fs objects). Once they did that, this kept a reference for each
vnode, and voila, always there.

They also keep a reference cache of names; this makes a lot more sense on a
Mac OS system since so many directories and files have the same name
(there are 9400 instances of "Info.plist" on my laptop at the moment, for
example).

The memory footprint for each of these was not too large. But, then, Apple
wasn't supporting systems with less than 1gbytes of ram at the time 8-).

Rick Macklem

unread,

Jan 4, 2016, 9:23:01 PM1/4/16

to

Sean Eric Fagan wrote:
> >The generic problem we have there is quite different. Assume that
> >we establish a new monitor on a directory, and assume there exists
> >previously open file, which vnode should be now monitored by the
> >'children' rule. How can we learn that the vnode must be included in the
> >watching set, i.e. marked ? Same issue occurs for fhopen() and for NFS
> >handles.
>
> xnu solved that by putting a parent pointer in each vnode (obviously, not set
> for non-fs objects). Once they did that, this kept a reference for each
> vnode, and voila, always there.
>

Just wondering how they handle the case of multiple hard links in different directories?

rick

Sean Eric Fagan

unread,

Jan 4, 2016, 9:24:21 PM1/4/16

to

>Just wondering how they handle the case of multiple hard links in different directories?

You get back _a_ name, not necessarily _the_ name. And I believe (although
I'd have to check the code) an error if the file is open-unlinked.

Although then xnu has support for getting the next hard link (this is pretty
HFS+ specific, mind you).

Bakul Shah

unread,

Jan 4, 2016, 10:53:19 PM1/4/16

to

Why not do this (at least at first) in a user mode program?
Intercept FS system calls and write relevant info to a user
program's memory. You still need to add a syscall to watch
files/dirs for various events and to validate requests. This
will allow you to experiment with various implementations
before commiting to a complicated new mechanism in the kernel.

Something like:
For the client:

fd = new_watcher();
watch(fd, path, flags); // can add multiple watches
count = read(fd, buf, sizeof buf);

flag = 0 => remove watchpoint.
flag = recursive => watch everything underneath a tree (on the same fs)
other flag decide the events you want to watch.

read returns one or more events.

For the watcher:

fd = register_watcher(buf, length);
fd1 = get_watcher(fd, &path, &flag);

The watcher mmaps a bunch of space and the kernel uses it as a
circular buffer. The watcher can create a map of inode-id to
subscribers: a list of <fds,events> or a ptr to a list. The
indirection should make recursion cheaper to handle: objects
will point to parent dir in a recursive watch. You'd still
need to spend something like 8 bytes per file or dir (on a 4GB
addr space system).

flag = 0 => unsubscribe fd from watching the path.
close the fd when the number of watches is 0.
On a client disappearing path is NULL.

On a watcher disappearing all clients get one final event.

Hard links are not a problem: such a file will just have N
items on its list, one per watched dir.

Multiple watchers, each handling a different set of clients,
should be possible with a little extra cost. Basically what
we are doing is outsourcing most of the actual work to the
watcher but the kernel still have to tell all the watchers
when something changes.

Very likely the same mechanism can be used to report process
deaths or dead network connections etc.

Undoubtedly I have missed a bunch of things here!

Jordan Hubbard

unread,

Jan 5, 2016, 12:34:05 AM1/5/16

to

> On Jan 4, 2016, at 7:43 PM, Bakul Shah <ba...@bitblocks.com> wrote:
>
> Why not do this (at least at first) in a user mode program?
> Intercept FS system calls and write relevant info to a user
> program's memory. You still need to add a syscall to watch
> files/dirs for various events and to validate requests. This
> will allow you to experiment with various implementations
> before commiting to a complicated new mechanism in the kernel.

That’s basically how FSEvents work. There’s a fairly straight-forward (Mach IPC based) kernel upcall mechanism for communicating the filesystem change events (and control inputs for what to watch) to a daemon, fseventsd, and it’s the userland daemon which subscribers talk to and it figures out how many events to cache, when all subscribers have received the events (or timed out) and it can re-use the memory, and so on.

The kernel reporting mechanism can be relatively light-weight if you proxy all the subscription and memory management details through a userland daemon, which is why I certainly wouldn’t suggest doing it any other way…

- Jordan

Konstantin Belousov

unread,

Jan 5, 2016, 5:50:07 AM1/5/16

to

On Mon, Jan 04, 2016 at 11:59:32AM -0800, Sean Eric Fagan wrote:
> >The generic problem we have there is quite different. Assume that
> >we establish a new monitor on a directory, and assume there exists
> >previously open file, which vnode should be now monitored by the
> >'children' rule. How can we learn that the vnode must be included in the
> >watching set, i.e. marked ? Same issue occurs for fhopen() and for NFS
> >handles.
>
> xnu solved that by putting a parent pointer in each vnode (obviously, not set
> for non-fs objects). Once they did that, this kept a reference for each
> vnode, and voila, always there.

The voila part is somewhat problematic.

You must ensure the liveness of the reference to the parent vnode, which
immediately raises the question of parent vnode being recycled without
updating the children pointers. So you must track all children, to not
leave stale parent pointers around. This is because vnodes only represent
a cache of the on-disk structures.

We do have a machinery to track the parent/children relations, it is called
the name cache. But it is a cache and can be dropped at any time. This
is why e.g. vn_fullpath() or executable names in /proc or lsof output
are lost sometime.

Anyway, having great ideas does not make this stuff implemented even a
bit closer. If anybody with his/her great ideas does not bother even to
formulate the wanted API and explicitely document desired behaviour, the
thing will never happen.

>
> They also keep a reference cache of names; this makes a lot more sense on a
> Mac OS system since so many directories and files have the same name
> (there are 9400 instances of "Info.plist" on my laptop at the moment, for
> example).
>
> The memory footprint for each of these was not too large. But, then, Apple
> wasn't supporting systems with less than 1gbytes of ram at the time 8-).