I'm using writev() with an old FPGA driver which only implements
write(), not aio_write(). I'm expecting the behaviour described in the
man page for writev():
"The data transfers performed by readv() and writev() are atomic: the
data written by writev() is written as a single block that is not
intermingled with output from writes in other processes (but see pipe(7)
for an exception); analogously, readv() is guaranteed to read a
contiguous block of data from the file, regardless of read operations
performed in other threads or processes that have file descriptors
referring to the same open file description (see open(2))."
I appear to be observing intermingling of individual iovec entries that
are being written to the same fd from different threads i.e. each call
to writev() isn't producing a contiguous block to be output. This is at
odds with the man page description.
Looking into the kernel sources (from around 2.6.28 to 2.6.33), the
driver doesn't implement aio_write(), so vfs_writev() gets handled by
do_loop_readv_writev() as a series of discrete calls to the driver's
write().
I can't see where any locking is applied to ensure each iovec is handled
serially without 'internmingling', which would awkwardly have to be
outside the driver in this case.
Hunting around I found various good articles on writev() and the aio
stuff e.g.
http://lwn.net/Articles/170954/
http://lwn.net/Articles/24366/
But nowhere can I find whether it is expected behaviour that
writev/readv() for an driver which only implements write/readv() is
actually non-atomic. Lots of sources are stating the atomicity of these
calls though.
Have I overlooked some good docs or some locking hidden in the vfs
handling?
Aside I'm working to update the driver to provide aio_write() so it can
provide it's own locking such that the userspace.
Kind Regards,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
I'm also interested about this topic. Can anyone help?
I would say that the libc enforces atomicity, but I have to dig deeper for
a real answer :).
thanks,
Daniel.
> I would say that the libc enforces atomicity, but I have to dig deeper
for
> a real answer :).
If libc did do it, it would have to be fancy since readv() and writev()
are documented as being atomic between processes, as well as threads.
Still, all things being possible, I visually checked the glibc I'm
using.
glibc-2.8/sysdeps/unix/sysv/linux/writev.c:
ssize_t
__libc_writev (fd, vector, count)
int fd;
const struct iovec *vector;
int count;
{
if (SINGLE_THREAD_P)
return do_writev (fd, vector, count);
int oldtype = LIBC_CANCEL_ASYNC ();
ssize_t result = do_writev (fd, vector, count);
LIBC_CANCEL_RESET (oldtype);
return result;
}
So we end up in do_writev(), which is defined in the same file:
static ssize_t
do_writev (int fd, const struct iovec *vector, int count)
{
ssize_t bytes_written;
bytes_written = INLINE_SYSCALL (writev, 3, fd, CHECK_N (vector,
count), count);
if (bytes_written >= 0 || errno != EINVAL || count <= UIO_FASTIOV)
return bytes_written;
return __atomic_writev_replacement (fd, vector, count);
}
So assuming the syscall to writev() returns >= 0 (which is true for my
driver), I can't see locking being provided here unless
LIBC_CANCEL_ASYNC is doing something spectacular which I've overlooked.
The other observation from this is that the kernel could detect a case
where writev() is not supported by a driver (i.e. write() is implemented
but aio_write() is not) and return -ENOSYS instead of calling
do_loop_readv_writev(). This would allow the libc to use it's
replacement function - __atomic_writev_replacement() in this case.
How writev() correctly works is still a mystery to me at this point.
Regards,
Mike
Not at all. Any such implementation would unconditionally have to use
file locking and that's only a convention, not a requirement. The
libc only tries to work around a missing writev syscall.