Some of these bugs have been addressed over time, but I remember that there was issues around cases like:
* Moving a dag_id from a file to another, the dag may stopped getting scheduled or the old version might get scheduled
* Deleting a pipeline file, the dag may linger and still appear in production until the scheduler gets restarted
* Very rare networking or transient bugs may not be caught in a try block and the scheduler can just stop if you are not using runit, sv, upstart or some sort of daemon service that ensure the process is up
Since we've had the hack to restart the scheduler periodically we haven't seen any of these, but they may still occur. I'll open a github issue to keep track of it.