Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

avoiding file corruption

0 views
Skip to first unread message

Amir Michail

unread,
Aug 27, 2006, 3:44:33 AM8/27/06
to
Hi,

Trying to open a file for writing that is already open for writing
should result in an exception.

It's all too easy to accidentally open a shelve for writing twice and
this can lead to hard to track down database corruption errors.

Amir

Paolo Pantaleo

unread,
Aug 27, 2006, 5:22:35 AM8/27/06
to pytho...@python.org
27 Aug 2006 00:44:33 -0700, Amir Michail <amic...@gmail.com>:
> --
> http://mail.python.org/mailman/listinfo/python-list
>
Even if it could be strange, the OS usually allow you to open a file
twice, that's up to the programmer to ensure the consistency of the
operations.

PAolo

--
if you have a minute to spend please visit my photogrphy site:
http://mypic.co.nr

Amir Michail

unread,
Aug 27, 2006, 6:00:23 AM8/27/06
to

Paolo Pantaleo wrote:
> 27 Aug 2006 00:44:33 -0700, Amir Michail <amic...@gmail.com>:
> > Hi,
> >
> > Trying to open a file for writing that is already open for writing
> > should result in an exception.
> >
> > It's all too easy to accidentally open a shelve for writing twice and
> > this can lead to hard to track down database corruption errors.
> >
> > Amir
> >
> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >
> Even if it could be strange, the OS usually allow you to open a file
> twice, that's up to the programmer to ensure the consistency of the
> operations.
>
> PAolo
>

But if this is usually a serious bug, shouldn't an exception be raised?

Amir

Diez B. Roggisch

unread,
Aug 27, 2006, 8:05:49 AM8/27/06
to
Amir Michail schrieb:

> Paolo Pantaleo wrote:
>> 27 Aug 2006 00:44:33 -0700, Amir Michail <amic...@gmail.com>:
>>> Hi,
>>>
>>> Trying to open a file for writing that is already open for writing
>>> should result in an exception.
>>>
>>> It's all too easy to accidentally open a shelve for writing twice and
>>> this can lead to hard to track down database corruption errors.
>>>
>>> Amir
>>>
>>> --
>>> http://mail.python.org/mailman/listinfo/python-list
>>>
>> Even if it could be strange, the OS usually allow you to open a file
>> twice, that's up to the programmer to ensure the consistency of the
>> operations.
>>
>> PAolo
>>
>
> But if this is usually a serious bug, shouldn't an exception be raised?

executing "rm -rf /" via subprocess is usually also a bad idea. So? No
language can prevent you from doing such mistake. And there is no way to
know if a file is opened twice - it might that you open the same file
twice via e.g. a network share. No way to know that it is the same file.

Diez

Paddy

unread,
Aug 27, 2006, 9:14:14 AM8/27/06
to
I've never done this in anger so feel free to mock (a little :-).

I'd have a fixed field at the beginning of the field that can hold the
hostname process number, and access time of a writing process, togeher
with a sentinal value that means "no process has access to the file".

A program would:
1. wait a random time.
2. open for update the file
3. read the locking data
4. If it is already being used by another process then goto 1.
5. write the process's locking data and time into the lock field.
6 Modify the files other fields.
7 write the sentinal value to the locking field.
8. Close and flush the file to disk.

I have left what to do if a process has locked the file for too long as
a simple exercise for you ;-).

- Paddy.

Amir Michail

unread,
Aug 27, 2006, 9:33:50 AM8/27/06
to

The scenario I have in mind is something like this:

def f():
db=shelve.open('test.db', 'c')
# do some stuff with db
g()
db.close()

def g():
db=shelve.open('test.db', 'c')
# do some stuff with db
db.close()

I think it would be easy for python to check for this problem in
scenarios like this.

Amir

Diez B. Roggisch

unread,
Aug 27, 2006, 10:22:12 AM8/27/06
to
Amir Michail schrieb:

You are requesting a general solution for a very particular problem. As
I pointed out, that solution is unlikely to work reliably - if not
infeasible at all.

If you really have problems as the above, use a custom wrapper for
shelve that prevents _you_ from making that mistake.

Diez

Bryan Olson

unread,
Aug 27, 2006, 10:33:03 AM8/27/06
to

The right solution is file locking. Unfortunately, the Python
tandard distribution doesn't have a portable file lock, but you
can do it on Unix and Win NT or better. See:

http://mail.python.org/pipermail/python-win32/2005-February/002957.html

and/or

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65203.


--
--Bryan

Grant Edwards

unread,
Aug 27, 2006, 10:41:05 AM8/27/06
to
On 2006-08-27, Amir Michail <amic...@gmail.com> wrote:

> Trying to open a file for writing that is already open for writing
> should result in an exception.

MS Windows seems to do something similar, and it pisses me off
no end. Trying to open a file and read it while somebody else
has it open for writing causes an exception. If I want to open
a file and read it while it's being writtent to, that's my
business.

Likewise, if I want to have a file open for writing twice,
that's my business as well. I certainly don't want to be
hobbled to prevent me from wandering off in the wrong direction.

> It's all too easy to accidentally open a shelve for writing
> twice and this can lead to hard to track down database
> corruption errors.

It's all to easy to delete the wrong element from a list. It's
all to easy to re-bind the wrong object to a name. Should
lists be immutable and names be permanently bound?

--
Grant Edwards grante Yow! I'm in a twist
at contest!! I'm in a
visi.com bathtub! It's on Mars!! I'm
in tip-top condition!

Amir Michail

unread,
Aug 27, 2006, 10:51:18 AM8/27/06
to
Grant Edwards wrote:
> On 2006-08-27, Amir Michail <amic...@gmail.com> wrote:
>
> > Trying to open a file for writing that is already open for writing
> > should result in an exception.
>
> MS Windows seems to do something similar, and it pisses me off
> no end. Trying to open a file and read it while somebody else
> has it open for writing causes an exception. If I want to open
> a file and read it while it's being writtent to, that's my
> business.
>
> Likewise, if I want to have a file open for writing twice,
> that's my business as well. I certainly don't want to be
> hobbled to prevent me from wandering off in the wrong direction.
>
> > It's all too easy to accidentally open a shelve for writing
> > twice and this can lead to hard to track down database
> > corruption errors.
>
> It's all to easy to delete the wrong element from a list. It's
> all to easy to re-bind the wrong object to a name. Should
> lists be immutable and names be permanently bound?
>

How often do you need to open a file multiple times for writing?

As a high-level language, Python should prevent people from corrupting
data as much as possible.

Amir

Tim Scheidemantle

unread,
Aug 27, 2006, 10:51:23 AM8/27/06
to Python List
Amir Michail wrote:
> Hi,
>
> Trying to open a file for writing that is already open for writing
> should result in an exception.
Look at fcntl module, I use it in a class to control access from within my processes.
I don't think this functionality should be inherent to python though.
Keep in mind only my processes open the shelve db so your mileage may vary.
get and set methods are just for convenience
This works under linux, don't know about windows.

#!/usr/bin/env python

import fcntl, shelve, time, bsddb
from os.path import exists

class fLocked:

def __init__(self, fname):
if exists(fname):
#verify it is not corrupt
bsddb.db.DB().verify(fname)
self.fname = fname
self.have_lock = False
self.db = shelve.open(self.fname)
self.fileno = self.db.dict.db.fd()

def __del__(self):
try: self.db.close()
except: pass

def aquire_lock(self, timeout = 5):
if self.have_lock: return True
started = time.time()
while not self.have_lock and (time.time() - started < timeout):
try:
fcntl.flock(self.fileno, fcntl.LOCK_EX + fcntl.LOCK_NB)
self.have_lock = True
except IOError:
# wait for it to become available
time.sleep(.5)
return self.have_lock

def release_lock(self):
if self.have_lock:
fcntl.flock(self.fileno, fcntl.LOCK_UN)
self.have_lock = False
return not self.have_lock

def get(self, key, default = {}):
if self.aquire_lock():
record = self.db.get(key, default)
self.release_lock()
else:
raise IOError, "Unable to lock %s" % self.fname
return record

def set(self, key, value):
if self.aquire_lock():
self.db[key] = value
self.release_lock()
else:
raise IOError, "Unable to lock %s" % self.fname

if __name__ == '__main__':
fname = 'test.db'
dbs = []
for i in range(2): dbs.append(fLocked(fname))
print dbs[0].aquire_lock()
print dbs[1].aquire_lock(1) #should fail getting flock
dbs[0].release_lock()
print dbs[1].aquire_lock() #should be able to get lock


--Tim

Bryan Olson

unread,
Aug 27, 2006, 11:00:10 AM8/27/06
to
Paddy wrote:
> I've never done this in anger so feel free to mock (a little :-).
>
> I'd have a fixed field at the beginning of the field that can hold the
> hostname process number, and access time of a writing process, togeher
> with a sentinal value that means "no process has access to the file".
>
> A program would:
> 1. wait a random time.
> 2. open for update the file
> 3. read the locking data
> 4. If it is already being used by another process then goto 1.
> 5. write the process's locking data and time into the lock field.
> 6 Modify the files other fields.
> 7 write the sentinal value to the locking field.
> 8. Close and flush the file to disk.

That doesn't really work; you have still have a race condition.

Locking the file is the good solution, but operating systems
vary in how it works. Other reasonable solutions are to re-name
the file, work with the renamed version, then change it back
after closing; and to use "lock files", which Wikipedia explains
near the bottom of the "File locking" article.


--
--Bryan

Bryan Olson

unread,
Aug 27, 2006, 11:10:04 AM8/27/06
to
Grant Edwards wrote:

> Amir Michail wrote:
>
>> Trying to open a file for writing that is already open for writing
>> should result in an exception.
>
> MS Windows seems to do something similar, and it pisses me off
> no end. Trying to open a file and read it while somebody else
> has it open for writing causes an exception. If I want to open
> a file and read it while it's being writtent to, that's my
> business.

Windows is actually much more sophisticated. It does allows shared
write access; see the FILE_SHARE_WRITE option for Win32's CreateFile.
You can also lock specific byte ranges in a file.


--
--Bryan

Grant Edwards

unread,
Aug 27, 2006, 11:59:49 AM8/27/06
to
On 2006-08-27, Amir Michail <amic...@gmail.com> wrote:

> How often do you need to open a file multiple times for writing?

Not very often, but I don't think it should be illegal. That's
probably a result of being a 25 year user of Unix where it's
assumed that the user knows what he's doing.

> As a high-level language, Python should prevent people from
> corrupting data as much as possible.

For somebody with a Unix background it seems overly restrictive.

--
Grant Edwards grante Yow! Youth of today! Join
at me in a mass rally
visi.com for traditional mental
attitudes!

Message has been deleted

Duncan Booth

unread,
Aug 27, 2006, 3:11:51 PM8/27/06
to
Dennis Lee Bieber wrote:

> On Sun, 27 Aug 2006 14:41:05 -0000, Grant Edwards <gra...@visi.com>
> declaimed the following in comp.lang.python:


>
>>
>> MS Windows seems to do something similar, and it pisses me off
>> no end. Trying to open a file and read it while somebody else
>> has it open for writing causes an exception. If I want to open
>> a file and read it while it's being writtent to, that's my
>> business.
>>

> Though strangely, Windows seems to permit one to make a COPY of that
> open file, and then open that with another application...

Yes, so long as the file hasn't been opened so as to deny reading you can
open it for reading, but you do have to specify the sharing mode. Microsoft
too follow the rule that "Explicit is better than implicit."

Cliff Wells

unread,
Aug 28, 2006, 4:23:42 AM8/28/06
to pytho...@python.org
On Sun, 2006-08-27 at 07:51 -0700, Amir Michail wrote:

> How often do you need to open a file multiple times for writing?

How often do you write code that you don't understand well enough to
fix? This issue is clearly a problem within *your* application.

I'm curious how you could possibly think this could be solved in any
case. What if you accidentally open two instances of the application?
How would Python know? You are asking Python to perform an OS-level
operation (and a questionable one at that).

My suggestion is that you use a real database if you need concurrent
access. If you don't need concurrent access then fix your application.

> As a high-level language, Python should prevent people from corrupting
> data as much as possible.

"Data" is application-specific. Python has no idea how you intend to
use your data and therefore should not (even if it could) try to protect
you.

Regards,
Cliff

0 new messages