Yes, restarting the scheduler every N runs has worked a little too well for us, so we haven't had as much incentives to fix this just yet. It's an important set of issues (especially for folks running a LocalExecutor) at the heart of the platform and needs to be addressed.
The issues I've seen are around keeping the DagBag up to date while the code is changing underneath. Here are a few things that may or may create problems:
* A dag_id moves from a file to another
* Your pipeline imports a module that has content that can alter the DAG and doesn't use reload(my_module) anywhere. These imports are handled by python and they won't reload unless instructed to, or by messing with `sys.modules` which seems dangerous. Maybe using a subprocess?
* I haven't monitored the code for memory leaks, but what if someone's pipeline has some sort of leak? If there's something of that nature restarting the process solves that
* I don't think that removing DAG files remove them from the DagBag currently, but it should be trivial to do
* Race conditions between the scheduler, workers and web servers, where the scheduler can get ahead of the workers for instance
There are many paths to solve this. The most obvious is to iterate on DagBag to iron these out one at a time.
My favorite and more ambitious idea is to have a process to serialize all DAG objects to the DB while monitoring them for change, versioning them, and making the workers and webserver get DAG definition by deserializing them, meaning they don't have to have the code locally and maintain their own DagBag. That solves issues around versioning and having conflicting or eneven DagBag in production. One super important blocker there is that the jinja templates aren't serializable. We also need to enforce DAG objects being serializable, currently some callbacks or PythonOperators might not be, we need to change these to use namespace references instead of actual python object references. That's probably going to take place in Q1.
I might do a call for help on the jinja template pickling (serialization). If someone from the community could either find a hack or alter the jinja project to make templates serializable that would help a lot.