write to the same file from multiple processes at the same time?

gabor

unread,

May 27, 2005, 8:32:53 AM5/27/05

to

hi,

what i want to achieve:
i have a cgi file, that writes an entry to a text-file..
like a log entry (when was it invoked, when did his worke end).
it's one line of text.

the problem is:
what happens if 2 users invoke the cgi at the same time?

and it will happen, because i am trying now to stress test it, so i will
start 5-10 requests in parallel and so on.

so, how does one synchronizes several processes in python?

first idea was that the cgi will create a new temp file every time,
and at the end of the stress-test, i'll collect the content of all those
files. but that seems as a stupid way to do it :(

another idea was to use a simple database (sqlite?) which probably has
this problem solved already...

any better ideas?

thanks,
gabor

Paul Rubin

unread,

May 27, 2005, 8:49:49 AM5/27/05

to

gabor <ga...@nekomancer.net> writes:
> so, how does one synchronizes several processes in python?
>
> first idea was that the cgi will create a new temp file every time,
> and at the end of the stress-test, i'll collect the content of all
> those files. but that seems as a stupid way to do it :(

There was a thread about this recently ("low-end persistence
strategies") and for Unix the simplest answer seems to be the
fcntl.flock function. For Windows I don't know the answer.
Maybe os.open with O_EXCL works.

Roy Smith

unread,

May 27, 2005, 9:14:25 AM5/27/05

to

gabor <ga...@nekomancer.net> wrote:
> so, how does one synchronizes several processes in python?

This is a very hard problem to solve in the general case, and the answer
depends more on the operating system you're running on than on the
programming language you're using.

On the other hand, you said that each process will be writing a single line
of output at a time. If you call flush() after each message is written,
that should be enough to ensure that the each line gets written in a single
write system call, which in turn should be good enough to ensure that
individual lines of output are not scrambled in the log file.

If you want to do better than that, you need to delve into OS-specific
things like the flock function in the fcntl module on unix.

Peter Hansen

unread,

May 27, 2005, 9:18:17 AM5/27/05

to

Roy Smith wrote:

> gabor <ga...@nekomancer.net> wrote:
> On the other hand, you said that each process will be writing a single line
> of output at a time. If you call flush() after each message is written,
> that should be enough to ensure that the each line gets written in a single
> write system call, which in turn should be good enough to ensure that
> individual lines of output are not scrambled in the log file.

Unfortunately this assumes that the open() call will always succeed,
when in fact it is likely to fail sometimes when another file has
already opened the file but not yet completed writing to it, AFAIK.

> If you want to do better than that, you need to delve into OS-specific
> things like the flock function in the fcntl module on unix.

The OP was probably on the right track when he suggested that things
like SQLite (conveniently wrapped with PySQLite) had already solved this
problem.

-Peter

Paul Rubin

unread,

May 27, 2005, 9:21:21 AM5/27/05

to

Peter Hansen <pe...@engcorp.com> writes:
> The OP was probably on the right track when he suggested that things
> like SQLite (conveniently wrapped with PySQLite) had already solved
> this problem.

But they haven't. They depend on messy things like server processes
constantly running, which goes against the idea of a cgi that only
runs when someone calls it.

Roy Smith

unread,

May 27, 2005, 9:27:38 AM5/27/05

to

Peter Hansen <pe...@engcorp.com> wrote:
> The OP was probably on the right track when he suggested that things
> like SQLite (conveniently wrapped with PySQLite) had already solved this
> problem.

Perhaps, but a relational database seems like a pretty heavy-weight
solution for a log file.

Jp Calderone

unread,

May 27, 2005, 9:31:48 AM5/27/05

to pytho...@python.org

SQLite is an in-process dbm.

Jp

fraca7

unread,

May 27, 2005, 9:34:05 AM5/27/05

to

gabor a écrit :

> [snip]

Try this:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65203

Paul Rubin

unread,

May 27, 2005, 9:43:04 AM5/27/05

to

Jp Calderone <exa...@divmod.com> writes:
> >But they haven't. They depend on messy things like server processes
> >constantly running, which goes against the idea of a cgi that only
> >runs when someone calls it.
>
> SQLite is an in-process dbm.

http://www.sqlite.org/faq.html#q7

(7) Can multiple applications or multiple instances of the same
application access a single database file at the same time?

Multiple processes can have the same database open at the same
time. Multiple processes can be doing a SELECT at the same
time. But only one process can be making changes to the database
at once.

But multiple processes changing the database simultaneously is
precisely what the OP wants to do.

Gerhard Haering

unread,

May 27, 2005, 10:00:56 AM5/27/05

to pytho...@python.org

On the other hand, it works ;-)

-- Gerhard
--
Gerhard Häring - g...@ghaering.de - Python, web & database development

signature.asc

jean-marc

unread,

May 27, 2005, 10:08:25 AM5/27/05

to

Sorry, why is the temp file solution 'stupid'?, (not
aesthetic-pythonistic???) - it looks OK: simple and direct, and
certainly less 'heavy' than any db stuff (even embedded)

And collating in a 'official log file' can be done periodically by
another process, on a time-scale that is 'useful' if not
instantaneous...

Just trying to understand here...

JMD

Jp Calderone

unread,

May 27, 2005, 10:17:58 AM5/27/05

to pytho...@python.org

Er, no. The OP precisely wants exactly one process to be able to write at a time. If he was happy with multiple processes writing simultaneously, he wouldn't need any locking mechanism at all >:)

If you keep reading that FAQ entry, you discover that SQLite implements its own locking mechanism internally, allowing different processes to *interleave* writes to the database, and preventing any data corruption which might arise from simultaneous writes.

That said, I think an RDBM is a ridiculously complex solution to this simple problem. A filesystem lock, preferably using the directory or symlink trick (but flock() is fun too, if you're into that sort of thing), is clearly the solution to go with here.

Jp

Grant Edwards

unread,

May 27, 2005, 10:50:04 AM5/27/05

to

On 2005-05-27, Peter Hansen <pe...@engcorp.com> wrote:
> Roy Smith wrote:
>> gabor <ga...@nekomancer.net> wrote:
>> On the other hand, you said that each process will be writing a single line
>> of output at a time. If you call flush() after each message is written,
>> that should be enough to ensure that the each line gets written in a single
>> write system call, which in turn should be good enough to ensure that
>> individual lines of output are not scrambled in the log file.
>
> Unfortunately this assumes that the open() call will always succeed,
> when in fact it is likely to fail sometimes when another file has
> already opened the file but not yet completed writing to it, AFAIK.

Not in my experience. At least under Unix, it's perfectly OK
to open a file while somebody else is writing to it. Perhaps
Windows can't deal with that situation?

--
Grant Edwards grante Yow! FOOLED you! Absorb
at EGO SHATTERING impulse
visi.com rays, polyester poltroon!!

Message has been deleted

Peter Hansen

unread,

May 27, 2005, 6:02:56 PM5/27/05

to

Grant Edwards wrote:
> On 2005-05-27, Peter Hansen <pe...@engcorp.com> wrote:
>>Unfortunately this assumes that the open() call will always succeed,
>>when in fact it is likely to fail sometimes when another file has
>>already opened the file but not yet completed writing to it, AFAIK.
>
> Not in my experience. At least under Unix, it's perfectly OK
> to open a file while somebody else is writing to it. Perhaps
> Windows can't deal with that situation?

Hmm... just tried it: you're right! On the other hand, the results were
unacceptable: each process has a separate file pointer, so it appears
whichever one writes first will have its output overwritten by the
second process.

Change the details, but the heart of my objection is the same.

-Peter

Peter Hansen

unread,

May 27, 2005, 6:06:41 PM5/27/05

to

Paul Rubin wrote:
> http://www.sqlite.org/faq.html#q7
[snip]

> Multiple processes can have the same database open at the same
> time. Multiple processes can be doing a SELECT at the same
> time. But only one process can be making changes to the database
> at once.
>
> But multiple processes changing the database simultaneously is
> precisely what the OP wants to do.

What isn't described in the above quote from the FAQ is how SQLite
*protects* your data from corruption in this case, unlike the "raw"
approach where you just use file handles.

And PySQLite conveniently wraps the relevant calls with retries when the
database is "locked" by the writing process, making it roughly a
no-brainer to use SQLite databases as nice simple log files where you're
trying to write from multiple CGI processes like the OP wanted.

Disclaimer: I haven't actually done that myself, and have only started
playing with pysqlite2 a day ago, but I have spent a fair bit of time
experimenting and reading the relevant docs and I believe I've got this
all correct.

-Peter

Paul Rubin

unread,

May 27, 2005, 6:10:16 PM5/27/05

to

Peter Hansen <pe...@engcorp.com> writes:
> And PySQLite conveniently wraps the relevant calls with retries when
> the database is "locked" by the writing process, making it roughly a
> no-brainer to use SQLite databases as nice simple log files where
> you're trying to write from multiple CGI processes like the OP wanted.

Oh, ok. But what kind of locks does it use?

Jp Calderone

unread,

May 27, 2005, 6:21:04 PM5/27/05

to pytho...@python.org

It doesn't really matter, does it?

I'm sure the locking mechanisms it uses have changed between different releases, and may even be selected based on the platform being used.

Jp

Paul Rubin

unread,

May 27, 2005, 6:22:17 PM5/27/05

to

Jp Calderone <exa...@divmod.com> writes:
> >Oh, ok. But what kind of locks does it use?
>
> It doesn't really matter, does it?

Huh? Sure, if there's some simple way to accomplish the locking, the
OP's act can do the same thing without SQlite's complexity.

> I'm sure the locking mechanisms it uses have changed between
> different releases, and may even be selected based on the platform
> being used.

Well, yes, but WHAT ARE THEY??????

Christopher Weimann

unread,

May 27, 2005, 6:27:26 PM5/27/05

to Peter Hansen, pytho...@python.org

On 05/27/2005-06:02PM, Peter Hansen wrote:
>
> Hmm... just tried it: you're right! On the other hand, the results were
> unacceptable: each process has a separate file pointer, so it appears
> whichever one writes first will have its output overwritten by the
> second process.

Did you open the files for 'append' ?

Peter Hansen

unread,

May 27, 2005, 6:29:59 PM5/27/05

to

Peter Hansen wrote:

> Grant Edwards wrote:
>> Not in my experience. At least under Unix, it's perfectly OK
>> to open a file while somebody else is writing to it. Perhaps
>> Windows can't deal with that situation?
>
> Hmm... just tried it: you're right!

Umm... the part you were right about was NOT the possibility that
Windows can't deal with the situation, but the suggestion that it might
actually be able to (since apparently it can). Sorry to confuse.

-Peter

Jp Calderone

unread,

May 27, 2005, 6:30:32 PM5/27/05

to pytho...@python.org

Beats me, and I'm certainly not going to dig through the code to find out :) For the OP's purposes, the mechanism I mentioned earlier in this thread is almost certainly adequate. To briefly re-summarize, when you want to acquire a lock, attempt to create a directory with a well-known name. When you are done with it, delete the directory. This works across all platforms and filesystems likely to be encountered by a Python program.

Jp

Peter Hansen

unread,

May 27, 2005, 6:33:44 PM5/27/05

to

I think the FAQ can answer that better than I can, since I'm not sure
whether you're asking about any low-level (OS) locks it might use or
higher-level (e.g. database-level locking) that it might use. In
summary, however, at the database level it provides only coarse-grained
locking on the entire database. It *is* supposed to be a relatively
simple/lightweight solution compared to typical RDBMSes...

(There's also an excrutiating level of detail about this whole area in
the page at http://www.sqlite.org/lockingv3.html ).

-Peter

Peter Hansen

unread,

May 27, 2005, 6:35:47 PM5/27/05

to

Nope. I suppose that would be a rational thing to do for log files,
wouldn't it? I wonder what happens when one does that...

-Peter

Paul Rubin

unread,

May 27, 2005, 6:52:15 PM5/27/05

to

Peter Hansen <pe...@engcorp.com> writes:
> I think the FAQ can answer that better than I can, since I'm not sure
> whether you're asking about any low-level (OS) locks it might use or
> higher-level (e.g. database-level locking) that it might use. In
> summary, however, at the database level it provides only
> coarse-grained locking on the entire database. It *is* supposed to be
> a relatively simple/lightweight solution compared to typical RDBMSes...

Compared to what the OP was asking for, which was a way to synchronize
appending to a serial log file, SQlite is very complex. It's also
much more complex than (say) the dbm module, which is what Python apps
normally use as a lightweight db.

> (There's also an excrutiating level of detail about this whole area in
> the page at http://www.sqlite.org/lockingv3.html ).

Oh ok, it says it uses some special locking system calls on Windows.
Since those calls aren't in the Python stdlib, it must be using C
extensions, which again means complexity. But it looks like the
built-in msvcrt module has ways to lock parts of files in Windows.

Really, I think the Python library is somewhat lacking in not
providing a simple, unified interface for doing stuff like this.

Andy Leszczynski

unread,

May 27, 2005, 6:59:43 PM5/27/05

to

gabor wrote:
> the problem is:
> what happens if 2 users invoke the cgi at the same time?

Would BerkleyDB support that?

Do Re Mi chel La Si Do

unread,

May 28, 2005, 3:10:44 AM5/28/05

to

Hi !

On windows, with PyWin32, to read this little sample-code :

import time
import win32file, win32con, pywintypes

def flock(file):
hfile = win32file._get_osfhandle(file.fileno())
win32file.LockFileEx(hfile, win32con.LOCKFILE_EXCLUSIVE_LOCK, 0, 0xffff,
pywintypes.OVERLAPPED())

def funlock(file):
hfile = win32file._get_osfhandle(file.fileno())
win32file.UnlockFileEx(hfile, 0, 0xffff, pywintypes.OVERLAPPED())

file = open("FLock.txt", "r+")
flock(file)
file.seek(123)
for i in range(500):
file.write("AAAAAAAAAA")
print i
time.sleep(0.001)

#funlock(file)
file.close()

Michel Claveau

Mike Meyer

unread,

May 28, 2005, 2:47:56 PM5/28/05

to

Paul Rubin <http://phr...@NOSPAM.invalid> writes:
> Really, I think the Python library is somewhat lacking in not
> providing a simple, unified interface for doing stuff like this.

It's got one. Well, three, actually.

The syslog module solves the problem quite nicely, but only works on
Unix. If the OP is working on Unix systems, that may be a good
solution.

The logging module has a SysLogHandler that talks to syslog on
Unix. It also has an NTEventLogHandler for use on NT. I'm not familiar
with NT's event log, but I presume it has the same kind of
functionality as Unix's syslog facility.

<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

ucn...@gmail.com

unread,

May 29, 2005, 1:32:41 AM5/29/05

to

Well I just tried it on Linux anyway. I opened the file in two python
processes using append mode.

I then wrote simple function to write then flush what it is passed:

def write(msg):
foo.write("%s\n" % msg)
foo.flush()

I then opened another terminal and did 'tail -f myfile.txt'.

It worked just fine.

Maybe that will help. Seems simple enough to me for basic logging.

Cheers,
Bill

gabor

unread,

May 30, 2005, 5:13:43 AM5/30/05

to

actually this is what i implemented after asking the question, and works
fine :)

i just thought that maybe there is a solution where i don't have to deal
with 4000 files in the temp folder :)

gabor

unread,

May 30, 2005, 5:12:57 AM5/30/05

to

Jp Calderone wrote:
> To briefly re-summarize, when you
> want to acquire a lock, attempt to create a directory with a well-known
> name. When you are done with it, delete the directory. This works
> across all platforms and filesystems likely to be encountered by a
> Python program.

thanks...

but the problem now is that the cgi will have to wait for that directory
to be gone, when he is invoked.. and i do not want to code that :)
i'm too lazy..

so basically i want the code to TRY to write to the file, and WAIT if it
is opened for write right now...

something like a mutex-synchronized block of the code...

gabor

unread,

May 30, 2005, 8:28:11 AM5/30/05

to

ok, i ended up with the following code:

def syncLog(filename,text):
f = os.open(filename,os.O_WRONLY | os.O_APPEND)
fcntl.flock(f,fcntl.LOCK_EX)
os.write(f,text)
#FIXME: what about releasing the lock?
os.close(f)

it seems to do what i need ( the flock() call waits until he can get
access).. i just don't know if i have to unlock() the file before i
close it..

gabor

Mike Meyer

unread,

May 30, 2005, 2:12:49 PM5/30/05

to

gabor <ga...@nekomancer.net> writes:

> ok, i ended up with the following code:
>
> def syncLog(filename,text):
> f = os.open(filename,os.O_WRONLY | os.O_APPEND)
> fcntl.flock(f,fcntl.LOCK_EX)
> os.write(f,text)
> #FIXME: what about releasing the lock?
> os.close(f)
>
> it seems to do what i need ( the flock() call waits until he can get
> access).. i just don't know if i have to unlock() the file before i
> close it..

The lock should free when you close the file descriptor. Personally,
I'm a great believer in doing things explicitly rather than
implicitly, and would add the extra fcntl.flock(f, fcntl.LOCK_UN) call
before closing the file.

gabor

unread,

May 31, 2005, 3:54:29 AM5/31/05

to

Mike Meyer wrote:
> gabor <ga...@nekomancer.net> writes:
>
>
>>ok, i ended up with the following code:
>>
>>def syncLog(filename,text):
>> f = os.open(filename,os.O_WRONLY | os.O_APPEND)
>> fcntl.flock(f,fcntl.LOCK_EX)
>> os.write(f,text)
>> #FIXME: what about releasing the lock?
>> os.close(f)
>>
>>it seems to do what i need ( the flock() call waits until he can get
>>access).. i just don't know if i have to unlock() the file before i
>>close it..
>
>
> The lock should free when you close the file descriptor. Personally,
> I'm a great believer in doing things explicitly rather than
> implicitly,

> and would add the extra fcntl.flock(f, fcntl.LOCK_UN) call
> before closing the file.

done :)

gabor

Piet van Oostrum

unread,

May 31, 2005, 6:37:45 AM5/31/05

to

Isn't a write to a file that's opened as append atomic in most operating
systems? At least in modern Unix systems. man open(2) should give more
information about this.

Like:
f = file("filename", "a")
f.write(line)
f.flush()

if line fits into the stdio buffer. Otherwise os.write can be used.

As this depends on the OS support for append, it is not portable. But
neither is locking. And I am not sure if it works for NFS-mounted files.
--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: pi...@vanoostrum.org

Steve Holden

unread,

May 31, 2005, 9:57:47 AM5/31/05

to pytho...@python.org

Roy Smith wrote:
> Peter Hansen <pe...@engcorp.com> wrote:
>
>>The OP was probably on the right track when he suggested that things
>>like SQLite (conveniently wrapped with PySQLite) had already solved this
>>problem.
>
>
> Perhaps, but a relational database seems like a pretty heavy-weight
> solution for a log file.

Excel seems like a pretty heavyweight solution for most of the
applications it's used for, too. Most people are interested in solving a
problem and moving on, and while this may lead to bloatware it can also
lead to the inclusion of functionality that can be hugely useful in
other areas of the application.

regards
Steve
--
Steve Holden +1 703 861 4237 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/