what i want to achieve:
i have a cgi file, that writes an entry to a text-file..
like a log entry (when was it invoked, when did his worke end).
it's one line of text.
the problem is:
what happens if 2 users invoke the cgi at the same time?
and it will happen, because i am trying now to stress test it, so i will
start 5-10 requests in parallel and so on.
so, how does one synchronizes several processes in python?
first idea was that the cgi will create a new temp file every time,
and at the end of the stress-test, i'll collect the content of all those
files. but that seems as a stupid way to do it :(
another idea was to use a simple database (sqlite?) which probably has
this problem solved already...
any better ideas?
There was a thread about this recently ("low-end persistence
strategies") and for Unix the simplest answer seems to be the
fcntl.flock function. For Windows I don't know the answer.
Maybe os.open with O_EXCL works.
This is a very hard problem to solve in the general case, and the answer
depends more on the operating system you're running on than on the
programming language you're using.
On the other hand, you said that each process will be writing a single line
of output at a time. If you call flush() after each message is written,
that should be enough to ensure that the each line gets written in a single
write system call, which in turn should be good enough to ensure that
individual lines of output are not scrambled in the log file.
If you want to do better than that, you need to delve into OS-specific
things like the flock function in the fcntl module on unix.
Unfortunately this assumes that the open() call will always succeed,
when in fact it is likely to fail sometimes when another file has
already opened the file but not yet completed writing to it, AFAIK.
> If you want to do better than that, you need to delve into OS-specific
> things like the flock function in the fcntl module on unix.
The OP was probably on the right track when he suggested that things
like SQLite (conveniently wrapped with PySQLite) had already solved this
But they haven't. They depend on messy things like server processes
constantly running, which goes against the idea of a cgi that only
runs when someone calls it.
Perhaps, but a relational database seems like a pretty heavy-weight
solution for a log file.
SQLite is an in-process dbm.
(7) Can multiple applications or multiple instances of the same
application access a single database file at the same time?
Multiple processes can have the same database open at the same
time. Multiple processes can be doing a SELECT at the same
time. But only one process can be making changes to the database
But multiple processes changing the database simultaneously is
precisely what the OP wants to do.
On the other hand, it works ;-)
Gerhard Häring - g...@ghaering.de - Python, web & database development
And collating in a 'official log file' can be done periodically by
another process, on a time-scale that is 'useful' if not
Just trying to understand here...
Er, no. The OP precisely wants exactly one process to be able to write at a time. If he was happy with multiple processes writing simultaneously, he wouldn't need any locking mechanism at all >:)
If you keep reading that FAQ entry, you discover that SQLite implements its own locking mechanism internally, allowing different processes to *interleave* writes to the database, and preventing any data corruption which might arise from simultaneous writes.
That said, I think an RDBM is a ridiculously complex solution to this simple problem. A filesystem lock, preferably using the directory or symlink trick (but flock() is fun too, if you're into that sort of thing), is clearly the solution to go with here.
Not in my experience. At least under Unix, it's perfectly OK
to open a file while somebody else is writing to it. Perhaps
Windows can't deal with that situation?
Grant Edwards grante Yow! FOOLED you! Absorb
at EGO SHATTERING impulse
visi.com rays, polyester poltroon!!
Hmm... just tried it: you're right! On the other hand, the results were
unacceptable: each process has a separate file pointer, so it appears
whichever one writes first will have its output overwritten by the
Change the details, but the heart of my objection is the same.
What isn't described in the above quote from the FAQ is how SQLite
*protects* your data from corruption in this case, unlike the "raw"
approach where you just use file handles.
And PySQLite conveniently wraps the relevant calls with retries when the
database is "locked" by the writing process, making it roughly a
no-brainer to use SQLite databases as nice simple log files where you're
trying to write from multiple CGI processes like the OP wanted.
Disclaimer: I haven't actually done that myself, and have only started
playing with pysqlite2 a day ago, but I have spent a fair bit of time
experimenting and reading the relevant docs and I believe I've got this
Oh, ok. But what kind of locks does it use?
It doesn't really matter, does it?
I'm sure the locking mechanisms it uses have changed between different releases, and may even be selected based on the platform being used.
Huh? Sure, if there's some simple way to accomplish the locking, the
OP's act can do the same thing without SQlite's complexity.
> I'm sure the locking mechanisms it uses have changed between
> different releases, and may even be selected based on the platform
> being used.
Well, yes, but WHAT ARE THEY??????
Did you open the files for 'append' ?
Umm... the part you were right about was NOT the possibility that
Windows can't deal with the situation, but the suggestion that it might
actually be able to (since apparently it can). Sorry to confuse.
Beats me, and I'm certainly not going to dig through the code to find out :) For the OP's purposes, the mechanism I mentioned earlier in this thread is almost certainly adequate. To briefly re-summarize, when you want to acquire a lock, attempt to create a directory with a well-known name. When you are done with it, delete the directory. This works across all platforms and filesystems likely to be encountered by a Python program.
I think the FAQ can answer that better than I can, since I'm not sure
whether you're asking about any low-level (OS) locks it might use or
higher-level (e.g. database-level locking) that it might use. In
summary, however, at the database level it provides only coarse-grained
locking on the entire database. It *is* supposed to be a relatively
simple/lightweight solution compared to typical RDBMSes...
(There's also an excrutiating level of detail about this whole area in
the page at http://www.sqlite.org/lockingv3.html ).
Nope. I suppose that would be a rational thing to do for log files,
wouldn't it? I wonder what happens when one does that...
Compared to what the OP was asking for, which was a way to synchronize
appending to a serial log file, SQlite is very complex. It's also
much more complex than (say) the dbm module, which is what Python apps
normally use as a lightweight db.
> (There's also an excrutiating level of detail about this whole area in
> the page at http://www.sqlite.org/lockingv3.html ).
Oh ok, it says it uses some special locking system calls on Windows.
Since those calls aren't in the Python stdlib, it must be using C
extensions, which again means complexity. But it looks like the
built-in msvcrt module has ways to lock parts of files in Windows.
Really, I think the Python library is somewhat lacking in not
providing a simple, unified interface for doing stuff like this.
Would BerkleyDB support that?
On windows, with PyWin32, to read this little sample-code :
import win32file, win32con, pywintypes
hfile = win32file._get_osfhandle(file.fileno())
win32file.LockFileEx(hfile, win32con.LOCKFILE_EXCLUSIVE_LOCK, 0, 0xffff,
hfile = win32file._get_osfhandle(file.fileno())
win32file.UnlockFileEx(hfile, 0, 0xffff, pywintypes.OVERLAPPED())
file = open("FLock.txt", "r+")
for i in range(500):
It's got one. Well, three, actually.
The syslog module solves the problem quite nicely, but only works on
Unix. If the OP is working on Unix systems, that may be a good
The logging module has a SysLogHandler that talks to syslog on
Unix. It also has an NTEventLogHandler for use on NT. I'm not familiar
with NT's event log, but I presume it has the same kind of
functionality as Unix's syslog facility.
I then wrote simple function to write then flush what it is passed:
foo.write("%s\n" % msg)
I then opened another terminal and did 'tail -f myfile.txt'.
It worked just fine.
Maybe that will help. Seems simple enough to me for basic logging.
actually this is what i implemented after asking the question, and works
i just thought that maybe there is a solution where i don't have to deal
with 4000 files in the temp folder :)
but the problem now is that the cgi will have to wait for that directory
to be gone, when he is invoked.. and i do not want to code that :)
i'm too lazy..
so basically i want the code to TRY to write to the file, and WAIT if it
is opened for write right now...
something like a mutex-synchronized block of the code...
f = os.open(filename,os.O_WRONLY | os.O_APPEND)
#FIXME: what about releasing the lock?
it seems to do what i need ( the flock() call waits until he can get
access).. i just don't know if i have to unlock() the file before i
> ok, i ended up with the following code:
> def syncLog(filename,text):
> f = os.open(filename,os.O_WRONLY | os.O_APPEND)
> #FIXME: what about releasing the lock?
> it seems to do what i need ( the flock() call waits until he can get
> access).. i just don't know if i have to unlock() the file before i
> close it..
The lock should free when you close the file descriptor. Personally,
I'm a great believer in doing things explicitly rather than
implicitly, and would add the extra fcntl.flock(f, fcntl.LOCK_UN) call
before closing the file.
> and would add the extra fcntl.flock(f, fcntl.LOCK_UN) call
> before closing the file.
f = file("filename", "a")
if line fits into the stdio buffer. Otherwise os.write can be used.
As this depends on the OS support for append, it is not portable. But
neither is locking. And I am not sure if it works for NFS-mounted files.
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: pi...@vanoostrum.org
Excel seems like a pretty heavyweight solution for most of the
applications it's used for, too. Most people are interested in solving a
problem and moving on, and while this may lead to bloatware it can also
lead to the inclusion of functionality that can be hugely useful in
other areas of the application.