Conclusions, in short:
- There is a big opportunity for prefetching as cold start (after
poweroff) takes about 14 seconds, and warm startup is only 4 seconds. So
it is 10 seconds gain.
- Variance of results for cold start is big, ranging from 13.21s to
16.71s, standard deviation up to 1.1657. This makes comparisons of
prefetching algorithms harder.
- There is no difference between cold start using "Restart" and cold
start using poweroff-poweron.
- Dropping caches using /proc/sys/vm/drop_caches, hdparm -f and reading
large file gives significantly faster startup time (~11s vs. ~14s), so
does not seem suitable for real tests. If someone knows better way of
simulating cold start, I would be glad to try it out.
Comments welcome :)
Krzysztof Lichota
> - There is no difference between cold start using "Restart" and cold
> start using poweroff-poweron.
> - Dropping caches using /proc/sys/vm/drop_caches, hdparm -f and reading
> large file gives significantly faster startup time (~11s vs. ~14s), so
> does not seem suitable for real tests. If someone knows better way of
> simulating cold start, I would be glad to try it out.
Interesting. I suppose you did
echo 3 >/proc/sys/vm/drop_caches
didn't you? Did you do also sync before (as dirty data are not dropped
from caches)? If even sync before drop caches does not help, then it would
be worthwhile to find which data remains cached - you can use blktrace and
e2block2map to find the difference in requested IO...
Honza
--
Jan Kara <ja...@suse.cz>
SuSE CR Labs
Yes. You can see the code here (function drop_caches):
http://prefetch.googlecode.com/svn/trunk/tools/toolslib.sh
> If even sync before drop caches does not help, then it would
> be worthwhile to find which data remains cached - you can use blktrace and
> e2block2map to find the difference in requested IO...
Good idea, I will try that :)
Krzysztof Lichota
I have compiled Gutsy kernel on my own, and the startup time for
OpenOffice (cold start) dropped to ~9 seconds. I have repeated it a few
times... Funnily enough, with dropping caches it stays at ~11 seconds.
And original Gutsy kernel still shows ~14s.
I will try blocktrace tomorrow to see the difference.
BTW. Are you on IRC somewhere on freenode?
Krzysztof Lichota
What causes such a huge variance? Are you using a different I/O
scheduler?
> I will try blocktrace tomorrow to see the difference.
>
> BTW. Are you on IRC somewhere on freenode?
Probably good idea to start hanging around a channel there. Say,
#prefetch.
> Krzysztof Lichota
>
> signature.asc
> 1KDownload
I have no idea. I am comparing the sources now.
I haven't changed much in my custom compiled kernel - just disabled most
of unnecessary device drivers and enabled blktrace.
> Probably good idea to start hanging around a channel there. Say,
> #prefetch.
And how to set it up?
I am not sure there would be enough members to the channel. Maybe for a
start we should stick somewhere else?
Krzysztof Lichota
> I will try blocktrace tomorrow to see the difference.
>
> BTW. Are you on IRC somewhere on freenode?
No. But I can join somewhere.
Attached. The archive contains also *.diff and *.sdiff files which are
diffs for sorted config files.
Last 2 lines of dmesg file for my custom kernel shows the error I get
when trying to stop blktrace. Maybe you know why it happens?
Also, interesting is that after running OpenOffice with blktrace,
startup time jumped to ~12-~13 seconds. This might be caused by
interfering log dumping with startup time, I will try dumping through
network.
>> I will try blocktrace tomorrow to see the difference.
>>
>> BTW. Are you on IRC somewhere on freenode?
> No. But I can join somewhere.
OK, maybe #ubuntu-soc would be a good place for a start, unless you know
a better place?
Krzysztof Lichota
I have confirmed it - with network blktrace dump it went back to ~9s.
I have put trace from blktrace on my custom kernel at
http://prefetch.googlecode.com/svn/trunk/results/openoffice/shorter-startup-on-custom-kernel/testmachine-kl1/test-1-blktrace-on-custom-kernel/
Krzysztof Lichota
> Last 2 lines of dmesg file for my custom kernel shows the error I get
> when trying to stop blktrace. Maybe you know why it happens?
Sorry, no clue.
> >> BTW. Are you on IRC somewhere on freenode?
> > No. But I can join somewhere.
>
> OK, maybe #ubuntu-soc would be a good place for a start, unless you know
> a better place?
I've just joined :)
Just join #prefetch. I'm actually in there now.
behdad
OK. I will change these options, recompile and let you know :)
>> Last 2 lines of dmesg file for my custom kernel shows the error I get
>> when trying to stop blktrace. Maybe you know why it happens?
> Sorry, no clue.
I have already solved it - apparently blktrace -k is not a good way to
terminate the trace.
>>>> BTW. Are you on IRC somewhere on freenode?
>>> No. But I can join somewhere.
>> OK, maybe #ubuntu-soc would be a good place for a start, unless you know
>> a better place?
> I've just joined :)
I can see you :)
Krzysztof Lichota
This might be the problem:
hdparm -t results in my kernel go much faster to maximum value (60 MB/s)
than in Ubuntu default kernel.
Ubuntu kernel seems to have "slow start".
Test was performed by running first 10 times hdparm -T, then 10 times
hdaparm -t (automatically in loop, not manually).
==> hdparm-t-rc3-kl1.txt <==
Timing buffered disk reads: 106 MB in 3.01 seconds = 35.21 MB/sec
Timing buffered disk reads: 182 MB in 3.03 seconds = 60.04 MB/sec
Timing buffered disk reads: 178 MB in 3.02 seconds = 59.01 MB/sec
Timing buffered disk reads: 176 MB in 3.03 seconds = 58.16 MB/sec
Timing buffered disk reads: 176 MB in 3.01 seconds = 58.47 MB/sec
Timing buffered disk reads: 180 MB in 3.03 seconds = 59.43 MB/sec
Timing buffered disk reads: 182 MB in 3.04 seconds = 59.90 MB/sec
Timing buffered disk reads: 180 MB in 3.02 seconds = 59.57 MB/sec
Timing buffered disk reads: 180 MB in 3.02 seconds = 59.58 MB/sec
Timing buffered disk reads: 182 MB in 3.03 seconds = 60.06 MB/sec
==> hdparm-T-rc3-kl1.txt <==
Timing cached reads: 316 MB in 2.00 seconds = 157.75 MB/sec
Timing cached reads: 310 MB in 2.01 seconds = 154.17 MB/sec
Timing cached reads: 310 MB in 2.01 seconds = 154.58 MB/sec
Timing cached reads: 310 MB in 2.01 seconds = 154.45 MB/sec
Timing cached reads: 314 MB in 2.00 seconds = 156.73 MB/sec
Timing cached reads: 312 MB in 2.01 seconds = 155.23 MB/sec
Timing cached reads: 310 MB in 2.01 seconds = 154.45 MB/sec
Timing cached reads: 312 MB in 2.00 seconds = 155.74 MB/sec
Timing cached reads: 320 MB in 2.01 seconds = 159.15 MB/sec
Timing cached reads: 320 MB in 2.01 seconds = 159.52 MB/sec
==> hdparm-t-ubuntu-generic.txt <==
Timing buffered disk reads: 68 MB in 3.20 seconds = 21.22 MB/sec
Timing buffered disk reads: 70 MB in 3.31 seconds = 21.12 MB/sec
Timing buffered disk reads: 100 MB in 3.03 seconds = 32.99 MB/sec
Timing buffered disk reads: 176 MB in 3.02 seconds = 58.20 MB/sec
Timing buffered disk reads: 178 MB in 3.00 seconds = 59.26 MB/sec
Timing buffered disk reads: 166 MB in 3.02 seconds = 54.94 MB/sec
Timing buffered disk reads: 178 MB in 3.01 seconds = 59.21 MB/sec
Timing buffered disk reads: 180 MB in 3.03 seconds = 59.40 MB/sec
Timing buffered disk reads: 182 MB in 3.04 seconds = 59.90 MB/sec
==> hdparm-T-ubuntu-generic.txt <==
Timing cached reads: 318 MB in 2.01 seconds = 158.59 MB/sec
Timing cached reads: 332 MB in 2.00 seconds = 165.83 MB/sec
Timing cached reads: 332 MB in 2.01 seconds = 165.15 MB/sec
Timing cached reads: 330 MB in 2.01 seconds = 164.58 MB/sec
Timing cached reads: 334 MB in 2.01 seconds = 166.04 MB/sec
Timing cached reads: 328 MB in 2.00 seconds = 163.83 MB/sec
Timing cached reads: 330 MB in 2.00 seconds = 164.73 MB/sec
Timing cached reads: 330 MB in 2.00 seconds = 164.84 MB/sec
Timing cached reads: 328 MB in 2.00 seconds = 163.97 MB/sec
Timing cached reads: 334 MB in 2.01 seconds = 166.16 MB/sec
Krzysztof Lichota
Interestingly, ubuntu kernel seems to be consistently faster by ~5% in
buffered reads. On the other hand as you noted it seems to have slower
first few rounds in hdparm - wasn't it just interference of some userspace
task (e.g. updatedb running or something similar)?
I have disabled crond and atd, so it should not be any scheduled task,
but I cannot exclude some random activity. I will repeat the tests to be
sure.
Krzysztof Lichota
I have repeated each of the tests 3 times using automatic script.
The results are consistent with previous results, generic kernel is much
slower to start, but provides higher average throughput.
I have put the results in:
http://prefetch.googlecode.com/svn/trunk/results/openoffice/shorter-startup-on-custom-kernel/testmachine-kl1/test2-hdparm-measurements/
I will see what is the startup time for kernel with all the options of
Ubuntu turned on.
Krzysztof Lichota
It is the same ~9.5 seconds. hdparm shows similar results as with
cut-down set of options.
So it is something inherent into the git version I am compiling. I will
take a look at diff of sources then.
Krzysztof Lichota
I looked at the diff and I see nothing there that could justify the change.
Maybe you can find something - the diff is attached.
Krzysztof Lichota
By vanilla kernel you mean Linus kernel?
I have to use Ubuntu kernel (i.e. as far as I understand Ubuntu git tree
which will become default kernel for Gutsy) as my project is supposed to
work on Gutsy...
And in this kernel OpenOffice starts faster than when dropping caches...
Maybe I should just use dropping caches as this gives consistent results
on all kernels (~11s)?
> BTW: What are the numbers for your kernel with and
> without fcache? Do you still see the problem that OpenOffice start is
> faster after sync; echo 3 >/proc/sys/vm/drop_caches; then after a reboot?
I haven't tried fcache at all. Do you have any links how to set it up?
BTW. I have run the tests on Feisty on other partition and results are
similar as in plain Gutsy, i.e. ~14 seconds. hdparm behaves similarly also.
I will write to Ubuntu kernel team, maybe they will know something.
Krzysztof Lichota
> I have to use Ubuntu kernel (i.e. as far as I understand Ubuntu git tree
> which will become default kernel for Gutsy) as my project is supposed to
> work on Gutsy...
Oh, OK. Somehow from the kernel name is dmesg I thought the recompile kernel
is Linus'.
> And in this kernel OpenOffice starts faster than when dropping caches...
> Maybe I should just use dropping caches as this gives consistent results
> on all kernels (~11s)?
Yes, that's one possibility. But I'd be personally interested to know
what remains in the cache after dropping them so that the subsequent start
is faster... I could then fix dropping caches in the kernel.
> > BTW: What are the numbers for your kernel with and
> > without fcache? Do you still see the problem that OpenOffice start is
> > faster after sync; echo 3 >/proc/sys/vm/drop_caches; then after a reboot?
>
> I haven't tried fcache at all. Do you have any links how to set it up?
Hmm, Gentoo has some HOWTO: http://gentoo-wiki.com/HOWTO_FCache.
But given you are fixed to Ubuntu kernels, you'll have to port Jens'
patches. I recall that it required some small modifications to compile with
recent kernels...
> BTW. I have run the tests on Feisty on other partition and results are
> similar as in plain Gutsy, i.e. ~14 seconds. hdparm behaves similarly also.
>
> I will write to Ubuntu kernel team, maybe they will know something.
Yes, probably a good idea.
No, it is from Ubuntu git tree (http://kernel.ubuntu.com/git).
>> And in this kernel OpenOffice starts faster than when dropping caches...
>> Maybe I should just use dropping caches as this gives consistent results
>> on all kernels (~11s)?
> Yes, that's one possibility. But I'd be personally interested to know
> what remains in the cache after dropping them so that the subsequent start
> is faster... I could then fix dropping caches in the kernel.
I have rerun the tests using completely automatic testing and the
results are different. It is on average about ~11s.
I do not understand that. Maybe I should test with poweroff now :/
Raw results are at:
http://prefetch.googlecode.com/svn/trunk/results/openoffice/shorter-startup-on-custom-kernel/testmachine-kl1/test3-automatic-openoffice-startup-after-restart/
I didn't have time to describe them yet.
The base tests are made with 60 seconds sleep before performing
measurements. I have also done another series with 20 seconds sleep and
the results are 1-2 seconds higher, due to some disk activity after login.
I hope analyzing blocktraces will give us the hint what is going on :)
Krzysztof Lichota
I have described the results on this page:
http://code.google.com/p/prefetch/wiki/ShorterOpenOfficeStartupOnCustomGutsyKernel
I have also sent an e-mail to Ubuntu kernel-team, we will see what they say.
Krzysztof Lichota
I have run blktraces and e2block2file to further investigate the problem.
All raw results are at:
http://prefetch.googlecode.com/svn/trunk/results/openoffice/shorter-startup-on-custom-kernel/testmachine-kl1/test4-blktraces/
run1-cold-minus-warm-blocks.map is a e2block2map result of blocks (with
some approximation, only start block + 8 blocks were mapped) which were
read during cold startup but were not read during startup with drop_caches.
It guess it should be empty, but it is not. It contains reads from
various files, the most interesting part I think reads from some X
server modules. But they might be justified, I do not know. Maybe you
can figure it out.
Further conclusions:
1. Cold start apparently seems to be faster (9,5s) with respect to start
with drop_caches (~11.5) due to some files generic system files (like
kubuntu icons, fonts, etc.) being cached during system start (see
run1-warm-minus-cold-blocks.map). Interesting thing is that there is a
lot of reading of python files, maybe putting off loading python would
decrease OpenOffice startup time.
2. During start with drop_caches OpenOffice reads 98352 KiB,
during real cold start 90724KiB (see
run1-warm-dropcaches.hda.blktrace.0.summary.txt and
run1-cold.hda.blktrace.0.summary.txt). Assuming drive throughput of 60
MB/s it will take about 1.5 second to read all files in best case.
As warm startup of OpenOffice takes ~4.5s, it should be possible to
achieve 6 seconds startup by just prefetching all these files in one run
(assuming they are in one place on disk), without considering the order
of reads.
3. I have adapted plot-timeline script by Federico Mena-Quintero to
parse blkparse output and show timeline of disk requests (it is in
tools/timeline-blktrace/ directory in svn).
It shows reads issued and completed in various colors, so it is easy to
spot some patterns. The result for cold startup is in file
run1-cold.hda.blktrace.0.jpg (excuse me for the quality, the png file
was over 5 MB). Requests issued by soffice binary are in red, requests
completed in blue.
The timeline shows that soffice binary stops issuing reads after about
1.5 second from start, the remaining is full of completion of read
requests spread all over the time. This suggests that either they are
not issued in proper order (so not completed requests are blocking the
execution) or files are spread over disk causing massive seeks. I will
investigate it further by analyzing distance between sectors.
4. The results of blktrace are pretty repeatable, I have run the
blktrace 3 times and the variations in blktrace were minor.
Krzysztof Lichota