Thanks - looking at the code (tracer.cpp) I have reservations about this approach.
The pipe atomic limit needs to be large enough that groups of related records can
always be written out atomically. Reducing the limit below the maximum size of
an atomic group would break drmemtrace. E.g. suppose an instruction could
generate up to 20 records (maybe it's a SIMD scatter load?), or there are other
sequences that need atomicity (a comment in output_buffer says a branch and
its target need to be output together). atomic_write_size() needs to be large
enough to allow those to be written atomically. If it's at its default of 4096,
then it probably is. But I don't want the tracer to hold on to 20 independent
records (or 4096 bytes of records) waiting to write them out as a chunk,
or even as multiple atomic writes - the idea would be to have minimal groups
of records written to the pipe as soon as they are ready, using atomicity only
to preserve the record grouping mandated by the trace format, and not to
batch up independent records simply to reduce write() calls.
I'm having difficulty understanding where to change this in tracer.cpp.
It looks like output_buffer() will write out all available data in a buffer and
create a new one. Reducing the buffer size below the maximum group size
to force more frequent pipe writes would break drmemtrace for the same
reason that reducing the pipe atomic write limit would. So instead we need
to write out record groups when available, and advance the buffer pointer.
I just don't know whether this is feasible without completely re-engineering
tracer.cpp.