Dead scheduler

68 views
Skip to first unread message

Stefan Messmer

unread,
Nov 8, 2025, 10:45:04 AMNov 8
to py4web
Hello 
I recently have a problem with the scheduler. I get the error:
<missing reason="[Errno 2] No such file or directory: '/tmp/scheduler/1.txt'"/>
</run status="dead" completed_on="2025-11-08 14:24:23.622943">
when enqueueing a job. I am not aware of having changed anything.... 
 The /tmp/scheduler folder exists and has read/write access for the python process. 

How can I get the scheduler back to work again (on MacOS)?

Many thanks and best regards
Stefan


Massimo DiPierro

unread,
Nov 9, 2025, 9:59:59 AMNov 9
to py4web
The scheduler does not create or manage a file /tmp/scheduler/1.txt Perhaps your task does. Can you share your task?

Stefan Messmer

unread,
Nov 9, 2025, 10:37:33 AMNov 9
to py4web
Here is the code of the task:

# define your tasks (or import them from other file)

def my_task(inspection_id, myjobs = ["jumptable","diameters","laylengths","anomalies"], timeout = None):

    commands = {"jumptable": f"wolframscript -script /opt/local/share/py4web/apps/RopeInspector/wolfram/createjumptable.wls {inspection_id}",

        "diameters": f"wolframscript -script /opt/local/share/py4web/apps/RopeInspector/wolfram/diameter.wls {inspection_id}",

        "laylengths": f"wolframscript -script /opt/local/share/py4web/apps/RopeInspector/wolfram/laylength.wls {inspection_id}",

        "anomalies":f"wolframscript -script /opt/local/share/py4web/apps/RopeInspector/wolfram/finddamages.wls {inspection_id}"

    }

    inspection = db.inspections[inspection_id]

    output = ""

    if inspection != None:    

        if inspection.status != "Ongoing":

            try:

                for job in myjobs:

                    anomaly_count = 0.024*inspection.detectionrate*db(db.jumptable.inspection_id==inspection_id).count()

                    if (job == "jumptable" and db(db.jumptable.inspection_id==inspection_id).count() == 0) or \

                        (job == "diameters" and db(db.diameters.inspection_id==inspection_id).count() == 0) or \

                        (job == "laylengths" and db(db.lay_lengths.inspection_id==inspection_id).count() == 0) or \

                        (job == "anomalies" and db(db.anomalies.inspection_id==inspection_id).count() <= anomaly_count):

                        p = subprocess.run(commands[job], shell=True, check=True, capture_output=True, timeout=timeout)

                        output += f"Command {p.args} exited with {p.returncode} code, output: \n{p.stdout}\n"

                    else:

                        output += f"Inspection id={inspection_id} has already {job}.\n"

            except:

                output = f"Job={job} failed with inspection id {inspection_id}.\n"

        else

            output = f"Inspection id={inspection_id} is busy."

    else

        output = f"Inspection id={inspection_id} does not exist."

    return output


if settings.USE_SCHEDULER:

    # register your tasks with the scheduler

    scheduler.register_task("wolframscript", my_task)


As far as I could see, the process with the pid mentioned in dg.task_run has never been run. I'm not sure, but I think the problem seems to be in this function:


def make_daemon(func, filename, cwd="."):

    """Creates a daemon process running func in cwd and stdout->filename"""

    if os.fork():

        return

    # decouple from parent environment

    os.chdir(cwd)

    # os.setsid()

    os.umask(0)

    # do second fork

    if os.fork():

        sys.exit(0)

    # redirect standard file descriptors

    sys.__stdout__.flush()

    sys.__stderr__.flush()

    with open(os.devnull, "rb") as stream_in:

        os.dup2(stream_in.fileno(), sys.__stdin__.fileno())

        with open(filename, "wb") as stream_out:

            os.dup2(stream_out.fileno(), sys.__stdout__.fileno())

            os.dup2(stream_out.fileno(), sys.__stderr__.fileno())

            try:

                func()

            finally:

                stream_out.flush()

    sys.exit(0)


Many thanks for your help and best regards

Stefan


Massimo DiPierro

unread,
Nov 9, 2025, 8:04:09 PMNov 9
to py4web
I still do not see where /tmp/scheduler/1.txt is accessed which seems to be the task error. Anyway, I will take a second look at the make_daemon. 

Stefan Messmer

unread,
Nov 10, 2025, 1:51:26 PMNov 10
to py4web
The error message is generated in line 347 of scheduler.py (see below):

  def retrieve_log(self, run):

        """Retrieve the log for the run"""

        try:

            filename = self.get_output_filename(run)

            with open(filename, "rb") as stream:

                log = stream.read().decode("utf8", errors="ignore")

            os.unlink(filename)

        except Exception as err:

            log = f'<missing reason="{err}"/>'

        return log.strip()

I found the following note in the python documentation:
os.fork()

Fork a child process. Return 0 in the child and the child’s process id in the parent. If an error occurs OSError is raised.

Note that some platforms including FreeBSD <= 6.3 and Cygwin have known issues when using fork()from a thread.

Raises an auditing event os.fork with no arguments.

Warning

 

If you use TLS sockets in an application calling fork(), see the warning in the ssl documentation.

Warning

 

On macOS the use of this function is unsafe when mixed with using higher-level system APIs, and that includes using urllib.request.

Changed in version 3.8: Calling fork() in a subinterpreter is no longer supported (RuntimeError is raised).

Changed in version 3.12: If Python is able to detect that your process has multiple threads, os.fork() now raises a DeprecationWarning.

We chose to surface this as a warning, when detectable, to better inform developers of a design problem that the POSIX platform specifically notes as not supported. Even in code that appears to work, it has never been safe to mix threading with os.fork() on POSIX platforms. The CPython runtime itself has always made API calls that are not safe for use in the child process when threads existed in the parent (such as malloc and free).

Users of macOS or users of libc or malloc implementations other than those typically found in glibc to date are among those already more likely to experience deadlocks running such code.

See this discussion on fork being incompatible with threads for technical details of why we’re surfacing this longstanding platform compatibility problem to developers.

Availability: POSIX, not WASI, not Android, not iOS.

In fact the scheduler dit not working very reliable before it stops working. I observed the following problems:

  • dead tasks from time to time 
  • some scheduled tasks did never start
In fact I did not change anything related to scheduler tasks at the time it stops working... It is very strange...

Best regards
Stefan



Massimo DiPierro

unread,
Nov 15, 2025, 5:18:33 PMNov 15
to py4web
I still do not think this is a problem with the scheduler.

The scheduler ran your task. You task tried to open "/tmp/scheduler/1.txt" and threw an exception

[Errno 2] No such file or directory: '/tmp/scheduler/1.txt'

The scheduler is correctly reporting the expeption and the fact that the task died because of it (as opposed to complete and return a value).

I think you need to investigate where in your task it tried to open "/tmp/scheduler/1.txt" and why that fails.


Stefan Messmer

unread,
Nov 16, 2025, 1:38:53 PMNov 16
to py4web
I suppose it is not a problem with your scheduler, but an incompatibility of os.fork() python function with Mac OS. I got the deprecation warning and the documentation of os.fork() outlines compatibility problems in this situation. I think it does not make much sense to invest a lot of time with this scheduler and look for another approach. 
Because I need to execute a long running Wolfram script, it could be done with subprocess.run() instead of forking the whole python process. It is a bit sad, because your scheduler does exactly what I need.

Many thanks and best regards
Stefan Messmer

Massimo DiPierro

unread,
Nov 16, 2025, 5:01:32 PMNov 16
to Stefan Messmer, py4web
The reason we fork instead of subprocessing is by design to demonize the tasks. That way if the scheduler stops for any reason, tasks do not end.

Fork should work fine on osx which is posix based bsd.

It is possible that it somehow affects your specific tasks and I would like to understand why. But, the error suggests the tasks tries to access a file that does not exist and I do not see how this is related to forking. Yet without seeing the code I cannot exclude it either.

I will make a version of the scheduler that uses subprocess instead of fork. Tasks will crash py4web is stopped, but perhaps it will help in understand the issue.

Massimo


--
You received this message because you are subscribed to the Google Groups "py4web" group.
To unsubscribe from this group and stop receiving emails from it, send an email to py4web+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/py4web/a8a671d9-9766-44d9-8715-ab293da7be3fn%40googlegroups.com.

Stefan Messmer

unread,
Nov 17, 2025, 12:00:23 PMNov 17
to py4web
I got the scheduler to work again. I started debugging with the _scaffold app until I could verify that the scheduler tasks are working as expected. I first modified the default task to:
def my_task(**inputs):
    print(f"task running with {inputs}")
    try:
        # do something here
        for count in range(60):
            print(time.ctime())
            # Prints the current time with a five second difference
            time.sleep(60)
    except:
        # rollback on failure
        db.rollback()
    return {}
This task runs 10 minutes and so I have plenty of time to inspect everything (files, database, pid).
After I have verified it is working as expected I transferred it to my app and I saw that the scheduler works. The rest was hard debugging line by line. I found that a database search produced the error (the following line):
0.024*inspection.detectionrate*db(db.jumptable.inspection_id==inspection_id).count()

The line is embedded in a try structure and generates an error (see original code). However the error message did not appear in the output and this gave me a really hard time. I never searched there....

P.S. My concerns regarding the deprecation warning remain.

Many thanks and best regards
Stefan


Reply all
Reply to author
Forward
0 new messages