[Python-ideas] Atomic file.get(offset, length)

22 views
Skip to first unread message

Matt Chaput

unread,
Jul 21, 2012, 2:59:52 PM7/21/12
to python...@python.org
I wish Python binary file objects had an atomic seek-read method, so I wouldn't have to perform my own locking everywhere to prevent other threads from moving the file pointer between seek and read. Is this something that can be bubbled up from the underlying platform? I think the Linux C equivalent is pread. I also think Java has something like this but can't find a reference now.

Or does this exist and I missed it? (On a mmap file this is trivial, of course.) Has this been discussed before?

Cheers,

Matt

_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Terry Reedy

unread,
Jul 21, 2012, 3:35:04 PM7/21/12
to python...@python.org
On 7/21/2012 2:59 PM, Matt Chaput wrote:
> I wish Python binary file objects had an atomic seek-read method, so
> I wouldn't have to perform my own locking everywhere to prevent other
> threads from moving the file pointer between seek and read.

If you are reading a file from multiple threads, I suggest you write
your own seek_and_read_with_locks function that does exactly what you
need in one place. Or add a .readx method to a subclass.

> Is this something that can be bubbled up from the underlying
> platform? I think the Linux C equivalent is pread.

If there is a standard posix function that is not yet wrapped in os, you
can propose its addition. But some research to see has widespread and
actually standardized it is.

--
Terry Jan Reedy

Guido van Rossum

unread,
Jul 21, 2012, 4:35:21 PM7/21/12
to Terry Reedy, python...@python.org
On Sat, Jul 21, 2012 at 12:35 PM, Terry Reedy <tjr...@udel.edu> wrote:
> On 7/21/2012 2:59 PM, Matt Chaput wrote:
>>
>> I wish Python binary file objects had an atomic seek-read method, so
>> I wouldn't have to perform my own locking everywhere to prevent other
>> threads from moving the file pointer between seek and read.
>
>
> If you are reading a file from multiple threads, I suggest you write your
> own seek_and_read_with_locks function that does exactly what you need in one
> place. Or add a .readx method to a subclass.
>
>
>> Is this something that can be bubbled up from the underlying
>> platform? I think the Linux C equivalent is pread.
>
> If there is a standard posix function that is not yet wrapped in os, you can
> propose its addition. But some research to see has widespread and actually
> standardized it is.

"man pread" on OS/X suggests it exists there too. I presume the use
case is to have a large data file open for reading by multiple
threads. This is a reasonable use case and it makes some sense to
extend our binary readable streams (buffered and unbuffered) with an
API for this purpose.

However, it's probably just efficient to just have a separate open
stream per thread -- I doubt that open file descriptors are scarcer
resources than threads, and I presume the kernel will happily share
any buffering it does on behalf of multiple open files referencing the
same file. If you're worried about the buffer space, the default
buffer size is 8K, which is hardly worth mentioning compared to the
default thread stack allocation. Depending on your use case you may
get away with an unbuffered stream just fine.

This approach seems better than implementing something using locks
(since the locks create contention that is not inherent in the
problem) and is available right now, without waiting for Python 3.4 to
be released...

--
--Guido van Rossum (python.org/~guido)

Victor Stinner

unread,
Jul 22, 2012, 5:25:59 PM7/22/12
to python...@python.org
> "man pread" on OS/X suggests it exists there too

"man pread" or "import os; help(os.pread" ;-) pread() and pwrite()
have been added to Python 3.3.

Victor

Guido van Rossum

unread,
Jul 22, 2012, 6:36:44 PM7/22/12
to Victor Stinner, python...@python.org
On Sun, Jul 22, 2012 at 2:25 PM, Victor Stinner
<victor....@gmail.com> wrote:
>> "man pread" on OS/X suggests it exists there too
>
> "man pread" or "import os; help(os.pread" ;-) pread() and pwrite()
> have been added to Python 3.3.

Awesome. :-) But does the io module offer an API that uses it? It's
kind of awkward to have to call os.pread() with stream.fileno() as an
argument.

--
--Guido van Rossum (python.org/~guido)

Cameron Simpson

unread,
Jul 22, 2012, 7:04:30 PM7/22/12
to Guido van Rossum, python...@python.org, Terry Reedy
On 21Jul2012 13:35, Guido van Rossum <gu...@python.org> wrote:
| On Sat, Jul 21, 2012 at 12:35 PM, Terry Reedy <tjr...@udel.edu> wrote:
| > On 7/21/2012 2:59 PM, Matt Chaput wrote:
| >> I wish Python binary file objects had an atomic seek-read method, so
| >> I wouldn't have to perform my own locking everywhere to prevent other
| >> threads from moving the file pointer between seek and read.
[...]
| >> Is this something that can be bubbled up from the underlying
| >> platform? I think the Linux C equivalent is pread.
[...]
| "man pread" on OS/X suggests it exists there too. I presume the use
| case is to have a large data file open for reading by multiple
| threads. This is a reasonable use case and it makes some sense to
| extend our binary readable streams (buffered and unbuffered) with an
| API for this purpose.

On most Linux boxen you can say:

man 3p pread

which will show you the POSIX man page, if it exists.

And it does!

So pread will exist on pretty much every UNIX platform, and I'd be
amazed if it wasn't on Windows.

In fact, it remarks that pread appeared in SysVr4, which is quite old.

| However, it's probably just efficient to just have a separate open
| stream per thread

It doubles the system call count per read (if pread is a system call,
which it ually will be (it is on Linux and MacOSX, and is hard to implement
otherwise without an annoying and slow locking scheme concealed inside
the C library).

I'd be +1 for adding pread and pwrite to the os module. It seems
reasonable and quite useful and should work on most platforms.

Cheers,
--
Cameron Simpson <c...@zip.com.au>

Rimmer: It will be happened; it shall be going to be happening; it will be
was an event that could will have been taken place in the future.
- Red Dwarf, _Future Echoes_

Antoine Pitrou

unread,
Jul 22, 2012, 7:47:10 PM7/22/12
to python...@python.org
On Sun, 22 Jul 2012 15:36:44 -0700
Guido van Rossum <gu...@python.org> wrote:
> On Sun, Jul 22, 2012 at 2:25 PM, Victor Stinner
> <victor....@gmail.com> wrote:
> >> "man pread" on OS/X suggests it exists there too
> >
> > "man pread" or "import os; help(os.pread" ;-) pread() and pwrite()
> > have been added to Python 3.3.
>
> Awesome. :-) But does the io module offer an API that uses it? It's
> kind of awkward to have to call os.pread() with stream.fileno() as an
> argument.

It doesn't. I guess we could add an "offset" keyword-only argument to
read() and write(), but then we need to provide a Windows
implementation as well (it seems using overlapped I/O with
ReadFile() / WriteFile() could make it possible). Also, I'm not sure it
makes sense for buffered I/O, or only unbuffered.

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net

Guido van Rossum

unread,
Jul 22, 2012, 8:16:57 PM7/22/12
to Antoine Pitrou, python...@python.org
On Sun, Jul 22, 2012 at 4:47 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
> On Sun, 22 Jul 2012 15:36:44 -0700
> Guido van Rossum <gu...@python.org> wrote:
>> On Sun, Jul 22, 2012 at 2:25 PM, Victor Stinner
>> <victor....@gmail.com> wrote:
>> >> "man pread" on OS/X suggests it exists there too
>> >
>> > "man pread" or "import os; help(os.pread" ;-) pread() and pwrite()
>> > have been added to Python 3.3.
>>
>> Awesome. :-) But does the io module offer an API that uses it? It's
>> kind of awkward to have to call os.pread() with stream.fileno() as an
>> argument.
>
> It doesn't. I guess we could add an "offset" keyword-only argument to
> read() and write(), but then we need to provide a Windows
> implementation as well (it seems using overlapped I/O with
> ReadFile() / WriteFile() could make it possible). Also, I'm not sure it
> makes sense for buffered I/O, or only unbuffered.

Given that the use case is to avoid race conditions when more than one
thread is doing random-access reads on the same open file, I think it
makes some sense to implement it for both buffered and unbuffered
streams -- and even for text streams, since those support seek() as
well, so the race condition exists for those too.

But note that the pread() man page (at least the one I checked :-)
specifies that pread() doesn't affect the file pointer. So I suppose
it should also not affect the buffer. That may make it hard to
implement it for text streams (which IIRC rely quite heavy on
buffering for their implementation), but it should be easy for
buffered streams: it should just be passed on to the underlying
unbuffered stream.

(For those jumping in the middle of the thread: I know it's past the
feature freeze, so these considerations are for 3.4. Also, os.pread()
is in 3.3.)

--
--Guido van Rossum (python.org/~guido)

Antoine Pitrou

unread,
Jul 23, 2012, 5:52:16 AM7/23/12
to python...@python.org
On Sun, 22 Jul 2012 17:16:57 -0700
Indeed, it should not affect the buffer. That's why I'm questioning the
addition of this feature to buffered streams (whose whole point is their
implicit buffer management). Also, there are implementation subtleties
when e.g. reading from an area which overlaps the current buffer :-)

As you pointed out, I think a reasonable solution to the race condition
problem is to use several file descriptors. It may not work so well if
you also write to the file, though.

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


Guido van Rossum

unread,
Jul 23, 2012, 11:00:37 AM7/23/12
to Antoine Pitrou, python...@python.org
If you write and read a file from multiple threads you're crazy.

--
--Guido van Rossum (python.org/~guido)

Mark Lawrence

unread,
Jul 23, 2012, 11:12:05 AM7/23/12
to python...@python.org
Or Dutch? :)

--
Cheers.

Mark Lawrence.

Yuval Greenfield

unread,
Jul 23, 2012, 12:50:29 PM7/23/12
to Guido van Rossum, Antoine Pitrou, python...@python.org
On Mon, Jul 23, 2012 at 6:00 PM, Guido van Rossum <gu...@python.org> wrote:
If you write and read a file from multiple threads you're crazy.

Perhaps, though I could imagine a fine-grained-locking DB doing this with constant sized data structures. Though that might be a good time to pull this one out:

Reply all
Reply to author
Forward
0 new messages