do_loop_readv_writev() not as described for drivers implementing only write()?

Mike McTernan

unread,

Mar 2, 2010, 8:20:02 AM3/2/10

to

Hi,

I'm using writev() with an old FPGA driver which only implements
write(), not aio_write(). I'm expecting the behaviour described in the
man page for writev():

"The data transfers performed by readv() and writev() are atomic: the
data written by writev() is written as a single block that is not
intermingled with output from writes in other processes (but see pipe(7)
for an exception); analogously, readv() is guaranteed to read a
contiguous block of data from the file, regardless of read operations
performed in other threads or processes that have file descriptors
referring to the same open file description (see open(2))."

I appear to be observing intermingling of individual iovec entries that
are being written to the same fd from different threads i.e. each call
to writev() isn't producing a contiguous block to be output. This is at
odds with the man page description.

Looking into the kernel sources (from around 2.6.28 to 2.6.33), the
driver doesn't implement aio_write(), so vfs_writev() gets handled by
do_loop_readv_writev() as a series of discrete calls to the driver's
write().

I can't see where any locking is applied to ensure each iovec is handled
serially without 'internmingling', which would awkwardly have to be
outside the driver in this case.

Hunting around I found various good articles on writev() and the aio
stuff e.g.

http://lwn.net/Articles/170954/
http://lwn.net/Articles/24366/

But nowhere can I find whether it is expected behaviour that
writev/readv() for an driver which only implements write/readv() is
actually non-atomic. Lots of sources are stating the atomicity of these
calls though.

Have I overlooked some good docs or some locking hidden in the vfs
handling?

Aside I'm working to update the driver to provide aio_write() so it can
provide it's own locking such that the userspace.

Kind Regards,

Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Daniel Baluta

unread,

Mar 3, 2010, 7:20:03 AM3/3/10

to

On Tue, Mar 2, 2010 at 2:55 PM, Mike McTernan <mmct...@airvana.com> wrote:
> Hi,
>
> I'm using writev() with an old FPGA driver which only implements
> write(), not aio_write(). �I'm expecting the behaviour described in the
> man page for writev():
>
> "The �data �transfers �performed by readv() and writev() are atomic: the
> data written by writev() is written as a single block that is not
> intermingled with output from writes in other processes (but see pipe(7)
> for an exception); analogously, readv() is guaranteed to read a
> contiguous block of data from the file, regardless of read operations
> performed in other threads or processes that have file descriptors
> referring to the same open file description (see open(2))."
>
> I appear to be observing intermingling of individual iovec entries that
> are being written to the same fd from different threads i.e. each call
> to writev() isn't producing a contiguous block to be output. �This is at
> odds with the man page description.
>
> Looking into the kernel sources (from around 2.6.28 to 2.6.33), the
> driver doesn't implement aio_write(), so vfs_writev() gets handled by
> do_loop_readv_writev() as a series of discrete calls to the driver's
> write().
>
> I can't see where any locking is applied to ensure each iovec is handled
> serially without 'internmingling', which would awkwardly have to be
> outside the driver in this case.

I'm also interested about this topic. Can anyone help?
I would say that the libc enforces atomicity, but I have to dig deeper for
a real answer :).

thanks,
Daniel.

Mike McTernan

unread,

Mar 3, 2010, 9:00:03 AM3/3/10

to

Hi,

> I would say that the libc enforces atomicity, but I have to dig deeper
for
> a real answer :).

If libc did do it, it would have to be fancy since readv() and writev()
are documented as being atomic between processes, as well as threads.

Still, all things being possible, I visually checked the glibc I'm
using.

glibc-2.8/sysdeps/unix/sysv/linux/writev.c:

ssize_t
__libc_writev (fd, vector, count)
int fd;
const struct iovec *vector;
int count;
{
if (SINGLE_THREAD_P)
return do_writev (fd, vector, count);

int oldtype = LIBC_CANCEL_ASYNC ();

ssize_t result = do_writev (fd, vector, count);

LIBC_CANCEL_RESET (oldtype);

return result;
}

So we end up in do_writev(), which is defined in the same file:

static ssize_t
do_writev (int fd, const struct iovec *vector, int count)
{
ssize_t bytes_written;

bytes_written = INLINE_SYSCALL (writev, 3, fd, CHECK_N (vector,
count), count);

if (bytes_written >= 0 || errno != EINVAL || count <= UIO_FASTIOV)
return bytes_written;

return __atomic_writev_replacement (fd, vector, count);
}

So assuming the syscall to writev() returns >= 0 (which is true for my
driver), I can't see locking being provided here unless
LIBC_CANCEL_ASYNC is doing something spectacular which I've overlooked.

The other observation from this is that the kernel could detect a case
where writev() is not supported by a driver (i.e. write() is implemented
but aio_write() is not) and return -ENOSYS instead of calling
do_loop_readv_writev(). This would allow the libc to use it's
replacement function - __atomic_writev_replacement() in this case.

How writev() correctly works is still a mystery to me at this point.

Regards,

Mike

Ulrich Drepper

unread,

Mar 3, 2010, 9:00:01 PM3/3/10

to

On Wed, Mar 3, 2010 at 04:11, Daniel Baluta <daniel...@gmail.com> wrote:
> I would say that the libc enforces atomicity,

Not at all. Any such implementation would unconditionally have to use
file locking and that's only a convention, not a requirement. The
libc only tries to work around a missing writev syscall.