Watchdog & Django

693 views
Skip to first unread message

Benjamin Schollnick

unread,
May 20, 2021, 10:15:58 PM5/20/21
to django...@googlegroups.com
Folks,

I’m working on revamping the Gallery/viewer application that I wrote in Django.

One issue I’ve had is that it’s a dynamic on the fly thumbnail creator, but I’ve used a homebrew (fairly successfully) cached system to keep track of the directories & file contents. Unlike many of the other gallery style applications I don’t require deliberate pre-scanning, the files are detected, and thumbnailed in realtime.

But I realized that I can use Watchdog to simplify that logic, and eliminate the need for any caching.

My idea is to have a table that is cleared at every startup, and it just contains the directories that have been scanned for that execution of the program. If watchdog has detected a change in a directory, it’s just removed from that table, which makes it a candidate for rescanning the next time someone views that directory.

But the big kicker is that I can’t find any documentation that can help me add watchdog monitoring into Django.

Now, I know that Django does something similar in the Dev server (which is what I primarily use right now)… And I’ve found a stackoverflow, whichs that I can use Signal to capture the SIGINT on shutdown (allowing me to stop watchdog )… I should be able to start watchdog in the __init__ file in the APP directory as well…

Now this is the sample app that I used:

import sys
import time
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler

def on_created(event):
print(f"hey, {event.src_path} has been created!")

def on_deleted(event):
print(f"what the f**k! Someone deleted {event.src_path}!")

def on_modified(event):
print(f"hey buddy, {event.src_path} has been modified")

def on_moved(event):
print(f"ok ok ok, someone moved {event.src_path} to {event.dest_path}")

if __name__ == "__main__”:

# <— startup —>
patterns = ["*"]
ignore_patterns = None
ignore_directories = False
case_sensitive = True
my_event_handler = PatternMatchingEventHandler(patterns, ignore_patterns, ignore_directories, case_sensitive)

my_event_handler.on_created = on_created
my_event_handler.on_deleted = on_deleted
my_event_handler.on_modified = on_modified
my_event_handler.on_moved = on_moved

path = sys.argv[1]#"."
go_recursively = True
my_observer = Observer()
my_observer.schedule(my_event_handler, path, recursive=go_recursively)

my_observer.start()

# stop startup

try:
while True:
time.sleep(1)
except KeyboardInterrupt:
# shutdown
my_observer.stop()
my_observer.join()


Nothing fancy, just a proof of concept based on code that I found online.

Anyone have any suggestions?

I’m thinking that I could add the startup .. stop startup to an function in the __init__ and run that at startup.
And that capturing the SIGINT, and when SIGINT is tripped, I can do the observer.stop & join?

Would it be that simple? Am I missing something? Is there a better way to do this in Django?
I don’t want to have to stand up a celery server, if I can run this, or something equivalent inside of Django…

I haven’t been able to (yet) trackdown STATRELOADER, which appears to be the Django built-in equivalent, so I can’t tell if I can repurpose that to do this instead?

I’m open to suggestions…

- Benjamin

Benjamin Schollnick

unread,
May 23, 2021, 12:23:45 PM5/23/21
to django...@googlegroups.com
Okay, this was a bit more complex, and a bit simpler than I thought.

1)  The Django app __init__ is the only place you can place the signal to stop the watchdog function.  
2) You can’t perform any app model specific work in the __init__ because the app’s model frame work wasn’t yet finished initializing.  So I ended up with a chicken & egg situation.

Now, this does *exactly*  what I needed to do, so this might not be needed for anyone else, but it without a doubt simplifies any sort of file caching issue, since you can invalidate the cache with whatever granularity you wanted (eg File Level, or in my case, Directory level).  

I created a class to contain the watchdog specific code, and a table that stored the list of cached directories (to invalidate, I just remove the directory name from that table).  

The existing scanning code is just bypassed if it’s already been scanned, if it’s invalidated, it’ll just refresh the listing(s) in the database.

eg. 

in watchdogmon.py:
class watchdog_monitor():
    def __init__(self):
        self.my_observer = Observer()
        self.my_event_handler = None

    def on_event(self, event):
        pass

    def startup(self, monitor_path, created=None,
                deleted=None, modified=None,
                moved=None):
        print("Monitoring :", monitor_path)
        patterns = ["*"]
        ignore_patterns = None
        ignore_directories = False
        case_sensitive = False
        self.my_event_handler = PatternMatchingEventHandler(patterns, ignore_patterns,
                                                           ignore_directories,
                                                           case_sensitive)

        self.my_event_handler.on_created = created
        self.my_event_handler.on_deleted = deleted
        self.my_event_handler.on_modified = modified
        self.my_event_handler.on_moved = moved

        go_recursively = True
        self.my_observer = Observer()
        self.my_observer.schedule(self.my_event_handler,
                                  monitor_path,
                                  recursive=go_recursively)

        self.my_observer.start()


    def shutdown(self, *args):
        if os.environ.get('RUN_MAIN') == 'true':
            print("Shutting down")
            self.my_observer.stop()
            self.my_observer.join()
    #    signal.send('system')
        sys.exit(0)   # So runserver does try to exit

watchdog = watchdog_monitor()

In __init__.py:
signal.signal(signal.SIGINT, watchdogmon.watchdog.shutdown)

In views.py:
print("Clearing all entries from Cache Tracking")
Cache_Tracking.objects.all().delete()

print("Starting Watchdog")
from frontend.utilities import delete_from_cache_tracking
watchdog.startup(monitor_path=os.path.join(configdata["locations"]["albums_path"],
                                       "albums"), created=delete_from_cache_tracking,
                                       deleted=delete_from_cache_tracking,
                                       modified=delete_from_cache_tracking,
                                       moved=delete_from_cache_tracking)

where delete_from_cache_tracking, just deletes the event.src_path from the cache_Tracking table.

Now, the drawback here is that watchdog utilizes the fsevents framework under Mac OS X, so there is no practical limit to what it can track, the OS is doing the work watchdog is wrapping around OS provided functionality.  (Or at least that is my understanding)

So under Mac OS X, it’s tracking probably millions or at least 10’s of thousands of files, and directories.

Under Windows, or Unix, my understanding is that watchdog requires a file handle per directory?  So some tweaking maybe required to prevent from running out of file handles under a large directory structure / number of files.  

Hope this helps someone else that is trying to integrate watchdog into Django.

- BEnjamin


tlockh...@gmail.com

unread,
May 23, 2021, 9:05:23 PM5/23/21
to django...@googlegroups.com
The app has a ready method which is called after the initialization is complete. You might be able to do your setup there...

- Tom
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/5300158C-831E-407C-A8C8-617C40D84E1B%40schollnick.net.
Reply all
Reply to author
Forward
0 new messages