pdftk an subprocess

332 views
Skip to first unread message

Brennan

unread,
Jun 3, 2008, 3:45:06 PM6/3/08
to modwsgi
Hi:

I've got a Pylons site running on Apache with mod_wsgi 2.0 running in
Daemon mode on a linux server. I have a page which will take an Adobe
FDF file, and use the pdftk program to merge the FDF with a PDF, and
produce a flattened PDF file to return to the browser.

The relevant code is as follows:

args = '/usr/bin/pdftk /var/pylons/site/public/pdf/doc.pdf fill_form -
output - flatten dont_ask'
proc = subprocess.Popen(args,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
cwd='/var/pylons/',
shell=True)
ret,err = proc.communicate(str(c.app.fdf()))
response.headers['Content-type'] = 'application/pdf'
return ret

Running this code in paster works fine, the PDF file is returned. But
migrating to mod_wsgi, the code seems to hang up on the
proc.communicate line. I've tried every variation I can think of,
including not using pipes and stdin and stdout, but writing all data
to files instead, and not calling proc.communicate() at all. In that
case, the function returns, but the pdftk process is still left
running, without ever exiting.

I am able to log in as the same user as apache is running as, and run
the command at the command line without issue. Other commands (ls,
pwd, echo $PATH, etc) work fine.

Does anyone have any insight on what is happening here, or any ideas
on how I can track down more information about what's happening?

Graham Dumpleton

unread,
Jun 3, 2008, 6:14:35 PM6/3/08
to mod...@googlegroups.com
2008/6/4 Brennan <bren...@gmail.com>:

See prior discussion:

http://groups.google.com/group/modwsgi/browse_frm/thread/d2dd16fe291363ee

That person was unable to give me a good enough explanation to
understand how pdftk worked and why it could be the issues with
signals he claimed it was.

In the end he just went away and we never knew if he solved it or not.

If you can try and explain as much as possible about how pdftk works,
maybe this time we can work out what is going on.

Graham

Brennan

unread,
Jun 3, 2008, 7:08:31 PM6/3/08
to modwsgi
I guess I didn't search well enough before posting.

Unfortunately I don't know a whole lot about how it works, or what
signals it's sending/receiving/expecting. That previous discussion
helps a bit though, maybe. pdftk is indeed written in java, and
compiled using gcj - so according to "the internet" it could be
expecting the SIGPWR to "synchronize cross-thread garbage
collection" (http://en.wikipedia.org/wiki/SIGPWR). If mod_wsgi (or
anything else) is interfering with the program sending/receiving it, I
suppose bad stuff could be happening.

I'm not familiar enough with this stuff though. Let me know what kind
of information you need, and some rough pointers on how to get it, and
I'll me more than happy to help you help me!

I figured out how to use the gdb - and ran the program through it... I
don't see how this could help at all.. but what the heck:

[New Thread 0xb6727b90 (LWP 25346)]

Program received signal SIGPWR, Power fail/restart.
[Switching to Thread 0xb6727b90 (LWP 25346)]
0xb7f1d410 in __kernel_vsyscall ()
(gdb) c
Continuing.

Program received signal SIGXCPU, CPU time limit exceeded.
0xb7f1d410 in __kernel_vsyscall ()
(gdb) c
Continuing.

Program received signal SIGPWR, Power fail/restart.
0xb7f1d410 in __kernel_vsyscall ()
(gdb) c
Continuing.

Program received signal SIGXCPU, CPU time limit exceeded.
0xb7f1d410 in __kernel_vsyscall ()
(gdb) c
Continuing.

Program exited normally.

I also tried registering a signal handler in my python function:
import signal
def handler(signum,frame):
pass
signal.signal(signal.SIGPWR, handler)

But "signal only works in main thread" according to the exception it
now throws. This is the first time I've ever dealt with signals in
python, so I may be doing something stupid there :).

Thanks,

Brennan


On Jun 3, 5:14 pm, "Graham Dumpleton" <graham.dumple...@gmail.com>
wrote:
> 2008/6/4 Brennan <brent...@gmail.com>:
> http://groups.google.com/group/modwsgi/browse_frm/thread/d2dd16fe2913...

Graham Dumpleton

unread,
Jun 3, 2008, 7:31:00 PM6/3/08
to mod...@googlegroups.com
2008/6/4 Brennan <bren...@gmail.com>:

>
> I guess I didn't search well enough before posting.
>
> Unfortunately I don't know a whole lot about how it works, or what
> signals it's sending/receiving/expecting. That previous discussion
> helps a bit though, maybe. pdftk is indeed written in java, and
> compiled using gcj - so according to "the internet" it could be
> expecting the SIGPWR to "synchronize cross-thread garbage
> collection" (http://en.wikipedia.org/wiki/SIGPWR). If mod_wsgi (or
> anything else) is interfering with the program sending/receiving it, I
> suppose bad stuff could be happening.

As much as I can work out so far, I can't see how Apache and/or
mod_wsgi could be interfering with SIGPWR, especially since it is in
the context of a separately executed process. This is what made the
original persons claim not make too much sense.

> I'm not familiar enough with this stuff though. Let me know what kind
> of information you need, and some rough pointers on how to get it, and
> I'll me more than happy to help you help me!
>
> I figured out how to use the gdb - and ran the program through it... I
> don't see how this could help at all.. but what the heck:
>
> [New Thread 0xb6727b90 (LWP 25346)]
>
> Program received signal SIGPWR, Power fail/restart.
> [Switching to Thread 0xb6727b90 (LWP 25346)]
> 0xb7f1d410 in __kernel_vsyscall ()
> (gdb) c
> Continuing.
>
> Program received signal SIGXCPU, CPU time limit exceeded.
> 0xb7f1d410 in __kernel_vsyscall ()
> (gdb) c
> Continuing.

This is interesting though. Do you have CPU limits set on processes in any way?

> Program received signal SIGPWR, Power fail/restart.
> 0xb7f1d410 in __kernel_vsyscall ()
> (gdb) c
> Continuing.
>
> Program received signal SIGXCPU, CPU time limit exceeded.
> 0xb7f1d410 in __kernel_vsyscall ()
> (gdb) c
> Continuing.
>
> Program exited normally.
>
> I also tried registering a signal handler in my python function:
> import signal
> def handler(signum,frame):
> pass
> signal.signal(signal.SIGPWR, handler)
>
> But "signal only works in main thread" according to the exception it
> now throws. This is the first time I've ever dealt with signals in
> python, so I may be doing something stupid there :).

I'd be curious to see the exception message you are talking about.

The mod_wsgi package disables signal.signal() from doing anything and
logs a warning in Apache error logs. That is unless you disabled that
feature of mod_wsgi.

Graham

bren...@gmail.com

unread,
Jun 3, 2008, 7:50:34 PM6/3/08
to mod...@googlegroups.com
According to the SIGPWR wikipedia page, both SIGPWR and SIGXCPU are used for the garbage collector used by gcj. There are also many mentions of how to configure gdb to ignore those two signals for that very reason.

It's possible this signal business is a red herring if mod_wsgi or Apache don't interfere with them.

I'm curious how to check...

Sent via BlackBerry from T-Mobile

-----Original Message-----
From: "Graham Dumpleton" <graham.d...@gmail.com>

Date: Wed, 4 Jun 2008 09:31:00
To:mod...@googlegroups.com
Subject: [modwsgi] Re: pdftk an subprocess

Graham Dumpleton

unread,
Jun 3, 2008, 9:55:03 PM6/3/08
to mod...@googlegroups.com
2008/6/4 <bren...@gmail.com>:

> According to the SIGPWR wikipedia page, both SIGPWR and SIGXCPU are used for the garbage collector used by gcj. There are also many mentions of how to configure gdb to ignore those two signals for that very reason.
>
> It's possible this signal business is a red herring if mod_wsgi or Apache don't interfere with them.
>
> I'm curious how to check...

A guess at a way of checking is that with similar WSGI application
code, run a test Python script instead of pdftk. In that script have
it go into a loop which does a sleep of 1 second each time, up to a
maximum of 100 times so it eventually exits. Before you go into this
loop output process ID somewhere so can see it and then register
signal handlers to SIGPWR and SIGXCPU. Print out something from signal
handlers somewhere you can see it.

When request is handled and script run, find the pid of executed
script process and then send it SIGPWR and SIGXCPU to see if it
actually receives the signals or not. If not, then at start of script,
perhaps also dump out current sigmask for that process.

Graham

Brennan

unread,
Jun 4, 2008, 11:03:26 AM6/4/08
to modwsgi
Okay - as you suggested, I wrote a python script to run rather than
pdftk:

http://pastebin.com/m47012b4d

It's not pretty, but whatever...

So in my Pylons controller, I call this script and the browser hangs
(because this will not return until the signals are received). Then
from the command line, I run ps ax -H, and get the pid of the running
script. Then in a python shell:

import os, signal
os.kill(pid,signal.SIGPWR)
os.kill(pid,signal.SIGXCPU)

At which point, nothing happens. The running script does NOT receive
the signals. This can be verified by checking the output in the
testsigout.txt file - there will be no "Sig Handler got..." lines.

Running the script from command line behaves as expected.

I'm not familiar with getting a sigmask for a process.

Brennan

On Jun 3, 8:55 pm, "Graham Dumpleton" <graham.dumple...@gmail.com>
wrote:
> 2008/6/4  <brent...@gmail.com>:
>
> > According to the SIGPWR wikipedia page, both SIGPWR and SIGXCPU are used for the garbage collector used by gcj.  There are also many mentions of how to configure gdb to ignore those two signals for that very reason.
>
> > It's possible this signal business is a red herring if mod_wsgi or Apache don't interfere with them.
>
> > I'm curious how to check...
>
> A guess at a way of checking is that with similar WSGI application
> code, run a test Python script instead of pdftk. In that script have
> it go into a loop which does a sleep of 1 second each time, up to a
> maximum of 100 times so it eventually exits. Before you go into this
> loop output process ID somewhere so can see it and then register
> signal handlers to SIGPWR and SIGXCPU. Print out something from signal
> handlers somewhere you can see it.
>
> When request is handled and script run, find the pid of executed
> script process and then send it SIGPWR and SIGXCPU to see if it
> actually receives the signals or not. If not, then at start of script,
> perhaps also dump out current sigmask for that process.
>
> Graham
>
> > Sent via BlackBerry from T-Mobile
>
> > -----Original Message-----
> > From: "Graham Dumpleton" <graham.dumple...@gmail.com>
>
> > Date: Wed, 4 Jun 2008 09:31:00
> > To:mod...@googlegroups.com
> > Subject: [modwsgi] Re: pdftk an subprocess
>
> > 2008/6/4 Brennan <brent...@gmail.com>:

Graham Dumpleton

unread,
Jun 5, 2008, 3:56:41 AM6/5/08
to mod...@googlegroups.com
Interesting.

If you haven't already, tell me which operating system variant/version
you are using.

I'll try a similar test on MacOSX, but may take me a couple of days to
get myself setup to do that.

Graham

2008/6/4 Brennan <bren...@gmail.com>:

Brennan Todd

unread,
Jun 5, 2008, 9:52:14 AM6/5/08
to mod...@googlegroups.com
This is on Gentoo Linux.

Graham Dumpleton

unread,
Jun 8, 2008, 9:38:48 AM6/8/08
to mod...@googlegroups.com
Okay, I know what it is about mod_wsgi which causes this problem with signals.

What mod_wsgi daemon mode does is block off signals so that they
aren't handled in the worker threads dealing with requests. Signals
are then waited upon and dealt with in main thread which also handles
ensuring process shutdown.

Problem is that the blocking of signals for the worker threads is
being inherited across a fork/exec which isn't what I expected.

Fixing this issue will be tricky and will need some rework of
mod_wsgi, with a movement away from relying solely on signals for
ensuring daemon process shutdown to a more complicated arrangement
using a pipe of death. I was looking at using a pipe of death anyway
as only safe way of doing things when introduce transient processes.

I will create an issue on mod_wsgi site for this. Sorry to say, it
will not be something that is addressed quickly. The only way around
it would be to write a C wrapper program for executing pdftk which
removes the blocks on those signals.

Graham

2008/6/5 Graham Dumpleton <graham.d...@gmail.com>:

Brennan Todd

unread,
Jun 16, 2008, 10:30:40 AM6/16/08
to Graham Dumpleton, mod...@googlegroups.com
Okay. Thank you for the update!

Brennan Todd

unread,
Jun 19, 2008, 1:52:44 PM6/19/08
to Graham Dumpleton, mod...@googlegroups.com
Okay - programming in C is not my strong suit. Aside from maybe 2 classes in college, I've never really done it.  So that being said, here (attached) is a wrapper program I wrote just to see if I could do it.

I assume it's horrible, and will give wily hackers of nefarious intent easy access to my social security number and bank accounts... but it works!

It will clear all blocked signals, and execute whatever command you pass in as arguments. So instead of calling "pdftk [args]", I can call "exec pdftk [args]", and my pylons page will now correctly return my flattened PDF file.

If anyone out there with actual C skills and time to spare would like to provide some comments or fixes to it, please do.

On Sun, Jun 8, 2008 at 8:38 AM, Graham Dumpleton <graham.d...@gmail.com> wrote:
exec.c
Reply all
Reply to author
Forward
0 new messages