On Mon, 14 Apr 2008, Federico Lucifredi wrote:
> The premise is using strace output of a process that we are
> interested in creating an I/O behavioral profile of and munging that
> trace to produce data on the total number of open/read/close/write/
> llseek/... calls and, more importantly, on how many writes and reads
> fell into which size bucket (the author uses < 1K, 1-8K, 1K-1M, >1M
> but those could be modified).
>
> The scripts to process the data are included. The interesting issue
> would be, IMHO, to use that data to simulate the I/O pattern of the
> observed process for filesystem and option tweaking purposes....
>
> I may put some cycles into this. Any opinions/comments/suggestions/
> pointers to existing tools are appreciated!
We for instance use strace for a subset of the above as preparation for
our preload tool. strace to detect which files are accessed where, custom
tool to transform this into lists of blocks based on the filesystem
layout, and making use of such lists to preload them into cache.
I also have some hacky script that uses strace to trace the memory need of
a process (by tracing brk, mmap and friends).
So strace definitely is usefull not only for debugging. The obvious
problem with it lies in the incapability to trace how mmap'ed files are
accessed exactly. You can only assume that the whole mmap'ed block is
also accessed, but no pattern and no size is available.
Ciao,
Michael.
> On Mon, 14 Apr 2008, Federico Lucifredi wrote:
> > I may put some cycles into this. Any opinions/comments/suggestions/
> > pointers to existing tools are appreciated!
>
> We for instance use strace for a subset of the above as preparation
> for
> our preload tool. strace to detect which files are accessed where,
> custom
> tool to transform this into lists of blocks based on the filesystem
> layout, and making use of such lists to preload them into cache.
My colleague with the beautiful name should *definitely* look into
iogrind, a real simulator for I/O based on application usage:
0. You get a snapshot of your file system's layout (which you can keep
around)
1. You run your app under valgrind.
2. You run the iogrind tool on valgrind's output and the FS snapshot.
3. You get a nice treemap of functions based on time spent in I/O.
Michael Meeks wrote this tool, and it has been pretty useful to analyze
startup time of OpenOffice.org and other beasts.
Federico
Michael Matz wrote:
> Hi,
>
> On Mon, 14 Apr 2008, Federico Lucifredi wrote:
>
>> The premise is using strace output of a process that we are
>> interested in creating an I/O behavioral profile of and munging that
>> trace to produce data on the total number of open/read/close/write/
>> llseek/... calls and, more importantly, on how many writes and reads
>> fell into which size bucket (the author uses < 1K, 1-8K, 1K-1M, >1M
>> but those could be modified).
>>
>> The scripts to process the data are included. The interesting issue
>> would be, IMHO, to use that data to simulate the I/O pattern of the
>> observed process for filesystem and option tweaking purposes....
>>
>> I may put some cycles into this. Any opinions/comments/suggestions/
>> pointers to existing tools are appreciated!
>
> We for instance use strace for a subset of the above as preparation for
> our preload tool. strace to detect which files are accessed where, custom
> tool to transform this into lists of blocks based on the filesystem
> layout, and making use of such lists to preload them into cache.
this is very interesting. Where can I look to see how this is done into
detail ?
>
> I also have some hacky script that uses strace to trace the memory need of
> a process (by tracing brk, mmap and friends).
can I see this as well ? And, given the comment below, you have to
assume the whole mmap'd file as a memory charge ?
>
> So strace definitely is usefull not only for debugging. The obvious
> problem with it lies in the incapability to trace how mmap'ed files are
> accessed exactly. You can only assume that the whole mmap'ed block is
> also accessed, but no pattern and no size is available.
>
>
> Ciao,
> Michael.
>
> >
>
--
_________________________________________
-- "'Problem' is a bleak word for challenge" - Richard Fish
(Federico L. Lucifredi) - http://www.lucifredi.com
On Tue, 1 Jul 2008, Federico Lucifredi wrote:
> > We for instance use strace for a subset of the above as preparation
> > for our preload tool. strace to detect which files are accessed
> > where, custom tool to transform this into lists of blocks based on the
> > filesystem layout, and making use of such lists to preload them into
> > cache.
>
> this is very interesting. Where can I look to see how this is done into
> detail ?
Package "preload" on openSuSE. There is
http://en.opensuse.org/SUPER_preloading_internals but that doesn't go much
into technical detail (wasn't written by coolo or me).
It goes like this: user straces a "session" that interests him, where
session might for instance be the complete process tree starting from a
rc-script, or simply the startup of e.g. firefox. That strace is fed into
"prepare_preload -s -p" (input is strace, output is command file). This
parses the strace into simple commands, like
open /usr/bin/firefox
stat /etc/ld.so.preload
open /etc/ld.so.cache
...
For the distribution a pregenerated set of these files is placed into
/etc/preload.d/ . These are somewhat hand-edited to not contain too many
machine- or installation-specific paths (some installation specific ones
are left in, usually tailored to the default installations).
That's how far the strace involvement goes, it's only used to initially
generate the above command files, i.e. simply a list of files that are
stated or read by processes.
The rest involves BMAP. Periodically whenever something interesting
happened (e.g. new packages installed) update_preload runs
"prepare_preload -c" (input is command file) on those command files.
This will gather the on-disk layout of all referenced files and
directories via the /sbin/print-bmap tool (as far as the filesystem in use
allows), and prepare_preload then tries to sort all these accesses into
increasing block order (even interleaving accesses to the same file),
resulting in a file like:
S /usr/lib/libvorbis.so.0
W /etc/fonts/conf.d/20-fix-globaladvance.conf
O /usr/share/X11/locale/compose.dir 6
O /etc/bash_completion.d/yast2-completion.sh 7
R 6 0 1
R 7 0 1
C 7
R 6 1 6
C 6
(stat /usr/lib/libvorbis.so.0, "read whole small file"
/etc/fonts/conf.d/20-fix-globaladvance.conf, open
/usr/share/X11/locale/compose.dir as 6 and
/etc/bash_completion.d/yast2-completion.sh as 7, read 1 block starting at
ofs 0 from 6, one block starting at 0 from 7, close 7, read 6 blocks at 1
from file 6, close 6).
This now system specific file is the input to the /sbin/preload program
(and stored in /var/cache/preload), which is run at strategic points in
the startup process (e.g. a KDE session is preloaded while the KDM is
still waiting for the username/password).
I've put the above into http://en.opensuse.org/Preload now :)
> > I also have some hacky script that uses strace to trace the memory
> > need of a process (by tracing brk, mmap and friends).
>
> can I see this as well ?
Attached (the python script is the real one, the shell script just a
wrapper).
> And, given the comment below, you have to assume the whole mmap'd file
> as a memory charge ?
Right. One can try to be clever and at least leave out shared memory or
non-RAM memory (e.g. framebuffer), but the above script doesn't do that.
And it still doesn't help in counting the really committed memory, though
the newer linux kernel means can help there (/proc/pid/smaps), though that
requires polling in parallel to the process, hence isn't as precise.
Ciao,
Michael.