Dead scheduler

38 views
Skip to first unread message

Stefan Messmer

unread,
Nov 8, 2025, 10:45:04 AM (5 days ago) Nov 8
to py4web
Hello 
I recently have a problem with the scheduler. I get the error:
<missing reason="[Errno 2] No such file or directory: '/tmp/scheduler/1.txt'"/>
</run status="dead" completed_on="2025-11-08 14:24:23.622943">
when enqueueing a job. I am not aware of having changed anything.... 
 The /tmp/scheduler folder exists and has read/write access for the python process. 

How can I get the scheduler back to work again (on MacOS)?

Many thanks and best regards
Stefan


Massimo DiPierro

unread,
Nov 9, 2025, 9:59:59 AM (4 days ago) Nov 9
to py4web
The scheduler does not create or manage a file /tmp/scheduler/1.txt Perhaps your task does. Can you share your task?

Stefan Messmer

unread,
Nov 9, 2025, 10:37:33 AM (4 days ago) Nov 9
to py4web
Here is the code of the task:

# define your tasks (or import them from other file)

def my_task(inspection_id, myjobs = ["jumptable","diameters","laylengths","anomalies"], timeout = None):

    commands = {"jumptable": f"wolframscript -script /opt/local/share/py4web/apps/RopeInspector/wolfram/createjumptable.wls {inspection_id}",

        "diameters": f"wolframscript -script /opt/local/share/py4web/apps/RopeInspector/wolfram/diameter.wls {inspection_id}",

        "laylengths": f"wolframscript -script /opt/local/share/py4web/apps/RopeInspector/wolfram/laylength.wls {inspection_id}",

        "anomalies":f"wolframscript -script /opt/local/share/py4web/apps/RopeInspector/wolfram/finddamages.wls {inspection_id}"

    }

    inspection = db.inspections[inspection_id]

    output = ""

    if inspection != None:    

        if inspection.status != "Ongoing":

            try:

                for job in myjobs:

                    anomaly_count = 0.024*inspection.detectionrate*db(db.jumptable.inspection_id==inspection_id).count()

                    if (job == "jumptable" and db(db.jumptable.inspection_id==inspection_id).count() == 0) or \

                        (job == "diameters" and db(db.diameters.inspection_id==inspection_id).count() == 0) or \

                        (job == "laylengths" and db(db.lay_lengths.inspection_id==inspection_id).count() == 0) or \

                        (job == "anomalies" and db(db.anomalies.inspection_id==inspection_id).count() <= anomaly_count):

                        p = subprocess.run(commands[job], shell=True, check=True, capture_output=True, timeout=timeout)

                        output += f"Command {p.args} exited with {p.returncode} code, output: \n{p.stdout}\n"

                    else:

                        output += f"Inspection id={inspection_id} has already {job}.\n"

            except:

                output = f"Job={job} failed with inspection id {inspection_id}.\n"

        else

            output = f"Inspection id={inspection_id} is busy."

    else

        output = f"Inspection id={inspection_id} does not exist."

    return output


if settings.USE_SCHEDULER:

    # register your tasks with the scheduler

    scheduler.register_task("wolframscript", my_task)


As far as I could see, the process with the pid mentioned in dg.task_run has never been run. I'm not sure, but I think the problem seems to be in this function:


def make_daemon(func, filename, cwd="."):

    """Creates a daemon process running func in cwd and stdout->filename"""

    if os.fork():

        return

    # decouple from parent environment

    os.chdir(cwd)

    # os.setsid()

    os.umask(0)

    # do second fork

    if os.fork():

        sys.exit(0)

    # redirect standard file descriptors

    sys.__stdout__.flush()

    sys.__stderr__.flush()

    with open(os.devnull, "rb") as stream_in:

        os.dup2(stream_in.fileno(), sys.__stdin__.fileno())

        with open(filename, "wb") as stream_out:

            os.dup2(stream_out.fileno(), sys.__stdout__.fileno())

            os.dup2(stream_out.fileno(), sys.__stderr__.fileno())

            try:

                func()

            finally:

                stream_out.flush()

    sys.exit(0)


Many thanks for your help and best regards

Stefan


Massimo DiPierro

unread,
Nov 9, 2025, 8:04:09 PM (4 days ago) Nov 9
to py4web
I still do not see where /tmp/scheduler/1.txt is accessed which seems to be the task error. Anyway, I will take a second look at the make_daemon. 

Stefan Messmer

unread,
Nov 10, 2025, 1:51:26 PM (3 days ago) Nov 10
to py4web
The error message is generated in line 347 of scheduler.py (see below):

  def retrieve_log(self, run):

        """Retrieve the log for the run"""

        try:

            filename = self.get_output_filename(run)

            with open(filename, "rb") as stream:

                log = stream.read().decode("utf8", errors="ignore")

            os.unlink(filename)

        except Exception as err:

            log = f'<missing reason="{err}"/>'

        return log.strip()

I found the following note in the python documentation:
os.fork()

Fork a child process. Return 0 in the child and the child’s process id in the parent. If an error occurs OSError is raised.

Note that some platforms including FreeBSD <= 6.3 and Cygwin have known issues when using fork()from a thread.

Raises an auditing event os.fork with no arguments.

Warning

 

If you use TLS sockets in an application calling fork(), see the warning in the ssl documentation.

Warning

 

On macOS the use of this function is unsafe when mixed with using higher-level system APIs, and that includes using urllib.request.

Changed in version 3.8: Calling fork() in a subinterpreter is no longer supported (RuntimeError is raised).

Changed in version 3.12: If Python is able to detect that your process has multiple threads, os.fork() now raises a DeprecationWarning.

We chose to surface this as a warning, when detectable, to better inform developers of a design problem that the POSIX platform specifically notes as not supported. Even in code that appears to work, it has never been safe to mix threading with os.fork() on POSIX platforms. The CPython runtime itself has always made API calls that are not safe for use in the child process when threads existed in the parent (such as malloc and free).

Users of macOS or users of libc or malloc implementations other than those typically found in glibc to date are among those already more likely to experience deadlocks running such code.

See this discussion on fork being incompatible with threads for technical details of why we’re surfacing this longstanding platform compatibility problem to developers.

Availability: POSIX, not WASI, not Android, not iOS.

In fact the scheduler dit not working very reliable before it stops working. I observed the following problems:

  • dead tasks from time to time 
  • some scheduled tasks did never start
In fact I did not change anything related to scheduler tasks at the time it stops working... It is very strange...

Best regards
Stefan



Reply all
Reply to author
Forward
0 new messages