I had a look at my kern.log recently, and noticed the following
output. It happens every time a snapshot is being deleted and
looking back to the logs from the first time I used booted a
kernel (2.6.24.2) with ddsnap, it has always happened.
May 9 12:00:01 ant kernel: ddsnap_destroy: Unblocking helper threads
May 9 12:00:01 ant kernel: ddsnap_destroy: closing socket connections
May 9 12:00:01 ant kernel: worker: ddwrk/7 exiting for snapshot 7
May 9 12:00:01 ant kernel: control: socket error -32
May 9 12:00:01 ant kernel: control: ddcon/7 exiting for snapshot 7
May 9 12:00:01 ant kernel: incoming: socket error -32
May 9 12:00:01 ant kernel: incoming: ddcli/7 exiting for snapshot 7
May 9 12:00:01 ant kernel: ddsnap_destroy: thread 1 exited
May 9 12:00:01 ant kernel: ddsnap_destroy: thread 2 exited
May 9 12:00:01 ant kernel: ddsnap_destroy: thread 3 exited
May 9 12:00:51 ant kernel: ddsnap_create: Created snapshot device snapstore=253:2 origin=253:0 socket=/dev/ddsnap/control snapshot=7
May 9 12:00:51 ant kernel: incoming: Client thread started, pid=1480 for snapshot 7
May 9 12:00:51 ant kernel: worker: Worker thread started, pid=1481 for snapshot 7
May 9 12:00:51 ant kernel: worker: worker recovering for snapshot 7
May 9 12:00:51 ant kernel: control: Control thread started, pid=1482 for snapshot 7
May 9 12:00:51 ant kernel: requeue_queries:
May 9 12:00:51 ant kernel: worker: worker resuming for snapshot 7
May 9 12:00:51 ant kernel: incoming: identify succeeded. chunksize 12
The corresponding logs (UTC time):
Fri May 9 11:00:01 2008: [2356] snap_server: Received connection
Fri May 9 11:00:01 2008: [2356] snap_server: Client 0 disconnected
Fri May 9 11:00:01 2008: [2356] snap_server: Client 0 disconnected
Fri May 9 11:00:01 2008: [2289] monitor: Lost connection 3
Fri May 9 11:00:01 2008: [2356] snap_server: Received connection
Fri May 9 11:00:01 2008: [2356] delete_snap: Delete snaptag 7 (snapnum 7)
Fri May 9 11:00:50 2008: [2356] snap_server: Client 0 disconnected
Fri May 9 11:00:50 2008: [2356] snap_server: Received connection
Fri May 9 11:00:50 2008: [2356] create_snapshot: Create snapshot tag = 7, bit = 7)
Fri May 9 11:00:50 2008: [2356] snap_server: Client 0 disconnected
Fri May 9 11:00:50 2008: [2356] snap_server: Received connection
Fri May 9 11:00:50 2008: [2356] snap_server: Client 0 disconnected
Fri May 9 11:00:50 2008: [2289] monitor: Received connection 20
Fri May 9 11:00:50 2008: [2289] connect_clients: connect clients to /ddsnap/server
Fri May 9 11:00:50 2008: [2356] snap_server: Received connection
Fri May 9 11:00:51 2008: [2356] incoming: client id 0, snaptag 7
Fri May 9 11:00:51 2008: [2289] incoming: Everything connected properly, all is well
Is that to be expected?
Also, as I've already mentionned, but I haven't really had time
to look more into this since, above, from 11:00:01 to 11:00:50
basically (I think, but that could be shorter), every IO to the
rootfs blocks. Is that normal? Shouldn't at least reads come
through? Could it be related to the errors I'm seeing? I can see
the error as well on another machine.
The setup is (irrelevant parts removed):
The root filesystem is on a ddsnap, the real device being a
linear dm (LVM) or /dev/md0, md0 being a RAID1 on 2 sata disks
(100GB). The ddsnap agent and server log is /dev/ddsnap/log, the
sockets are /dev/ddsnap/control and /dev/ddsnap/server. /dev is
a tmpfs. There are 50GB allocated for snapshots (on a linear of
a SATA disk (LVM)).
$ cat /proc/mounts
udev /dev tmpfs rw 0 0
/dev/mapper/safe-debian / ext3 rw,errors=remount-ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-04-25_00:00:13 /snapshot/safe-debian-2008-04-25_00:00:13 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-04-27_00:01:36 /snapshot/safe-debian-2008-04-27_00:01:36 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-04_00:01:23 /snapshot/safe-debian-2008-05-04_00:01:23 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-06_00:00:57 /snapshot/safe-debian-2008-05-06_00:00:57 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-07_00:00:55 /snapshot/safe-debian-2008-05-07_00:00:55 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-08_00:00:19 /snapshot/safe-debian-2008-05-08_00:00:19 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_00:01:28 /snapshot/safe-debian-2008-05-09_00:01:28 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_01:00:28 /snapshot/safe-debian-2008-05-09_01:00:28 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_02:00:28 /snapshot/safe-debian-2008-05-09_02:00:28 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_03:00:26 /snapshot/safe-debian-2008-05-09_03:00:26 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_04:00:22 /snapshot/safe-debian-2008-05-09_04:00:22 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_05:00:16 /snapshot/safe-debian-2008-05-09_05:00:16 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_06:00:15 /snapshot/safe-debian-2008-05-09_06:00:15 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_07:01:07 /snapshot/safe-debian-2008-05-09_07:01:07 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_08:00:50 /snapshot/safe-debian-2008-05-09_08:00:50 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_09:00:48 /snapshot/safe-debian-2008-05-09_09:00:48 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_10:00:47 /snapshot/safe-debian-2008-05-09_10:00:47 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_11:00:51 /snapshot/safe-debian-2008-05-09_11:00:51 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_12:00:50 /snapshot/safe-debian-2008-05-09_12:00:50 ext3 ro,data=ordered 0 0
$ sudo dmsetup table
safe-debian: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-ddreal: 0 199991296 linear 9:0 384
safe-debian-snap-2008-04-25_00:00:13: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-04-27_00:01:36: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-04_00:01:23: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-06_00:00:57: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-07_00:00:55: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-08_00:00:19: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_00:01:28: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_01:00:28: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_02:00:28: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_03:00:26: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_04:00:22: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_05:00:16: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_06:00:15: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_07:01:07: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_08:00:50: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_09:00:48: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_10:00:47: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_11:00:51: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_12:00:50: 0 199991296 ddsnap 253:0 253:2 4096
unsafe-debian--ddsnap: 0 104857600 linear 8:19 146252160
unsafe-fast: 0 208617472 striped 2 128 8:19 41943424 8:4 384
$ sudo ddsnap status /dev/ddsnap/server
Snapshot store block size = 4096; 9,742,071 of 13,107,200 chunks free
Origin size: 102,395,543,552 bytes
Write density: 0
Creation time: Tue Apr 22 13:29:31 2008
Snap Creation time Usecnt Prio Chunks Unshared Shared
13 Fri Apr 25 00:00:13 2008 1 0 1600267 1030551 569716
5 Sun Apr 27 00:01:36 2008 1 0 903211 295765 607446
1 Sun May 4 00:01:23 2008 1 0 387785 200010 187775
0 Tue May 6 00:00:57 2008 1 0 382601 187937 194664
4 Wed May 7 00:00:55 2008 1 0 376664 181553 195111
10 Thu May 8 00:00:19 2008 1 0 339974 185791 154183
6 Fri May 9 00:01:28 2008 1 0 202366 33803 168563
8 Fri May 9 01:00:28 2008 1 0 202293 33723 168570
9 Fri May 9 02:00:28 2008 1 0 202228 33692 168536
12 Fri May 9 03:00:26 2008 1 0 202174 33683 168491
18 Fri May 9 04:00:22 2008 1 0 202119 33716 168403
15 Fri May 9 05:00:16 2008 1 0 202059 33722 168337
17 Fri May 9 06:00:15 2008 1 0 201950 33706 168244
3 Fri May 9 07:01:07 2008 1 0 122176 37351 84825
11 Fri May 9 08:00:50 2008 1 0 42987 33711 9276
14 Fri May 9 09:00:48 2008 1 0 42894 33783 9111
16 Fri May 9 10:00:47 2008 1 0 42711 33921 8790
2 Fri May 9 11:00:51 2008 1 0 36844 35029 1815
7 Fri May 9 12:00:50 2008 1 0 33974 33833 141
totals 3340457 2525280 815177
Cheers,
Stéphane
Also, as I've already mentionned, but I haven't really had time
to look more into this since, above, from 11:00:01 to 11:00:50
basically (I think, but that could be shorter), every IO to the
rootfs blocks. Is that normal? Shouldn't at least reads come
through? Could it be related to the errors I'm seeing? I can see
the error as well on another machine.
The setup is (irrelevant parts removed):
The root filesystem is on a ddsnap, the real device being a
linear dm (LVM) or /dev/md0, md0 being a RAID1 on 2 sata disks
(100GB). The ddsnap agent and server log is /dev/ddsnap/log, the
sockets are /dev/ddsnap/control and /dev/ddsnap/server. /dev is
a tmpfs. There are 50GB allocated for snapshots (on a linear of
a SATA disk (LVM)).
How much memory does the system have? It looks like it may be
blocking during the snapshot deletion, which happens synchronously.
ddsnap server is single threaded and has to traverse the entire tree
(which may be getting out of cache, causing disk io and a long block).
Shapor
Hi Jiaying,
thanks. False alarm then.
[...]
> Reads to the filesystem usually cause writes to the block device
> because the filesystem may need to update its metadata during
> the read. You can try to mount the filesystem with noatime option
> and see if it helps to improve read performance.
Oh, of course, I forgot about that. But aren't those meant to be
buffered and the block device to be accessed only from time to
time in background?
Here for instance, it blocks the shell, it blocks vim while
typing (because of some getpwd() I think) for several seconds.
From time to time, it causes some failure of some cron job or
nagios checks because of timeouts being exceeded, so it's not
great.
Can't it be written in a way that doesn't lock the whole FS for
several seconds?
[...]
> According to our measurements, ddsnap usually introduces 8~9 times
> of overhead on writes. But the performance of running ddsnap on top of
> software raid may be even worse. We are working on optimize ddsnap
> write performance and hopefully will cut the extra overhead in half in a
> few months. Have you tried to run ddsnap on top of SATA disk directly?
[...]
The performances are OK, and actually the ddsnap storage in on a
SATA disk via a linear LVM, so there should hardly be any
overhead. Read performances on the RAID 1 (the real FS) are
better than on a single disk.
Cheers,
Stéphane
Hi Shapor,
Enough memory I'd say. There are 4GB RAM (most of which unused)
and 8GB swap, I see ddsnap_server occupying consistently 131MB.
Is there any tuning I can do to improve that?
Isn't there a way for ddsnap to service the I/Os while it is
deleting a snashot?
Would making more frequent snapshots reduce the deletion time?
Looking at my kern.log, it took up to 182 seconds, the average
time being 36s (over 131 deletions mostly every hour).
Cheers,
Stéphane
36 seconds of dead time does sound like way too much.
This is the kind of issue we want to address in
our 0.9 and 0.10 releases. Can you file a bug
for "snapshot deletion causes all I/O to block for up to three minutes",
and give details in the bug of your setup? We
will create a performance test case for this;
there's some chance the optimization we're
starting to test for 0.9 will help some.
On Fri, May 09, 2008 at 10:09:21AM -0700, Jiaying Zhang wrote:
[...]
> Yes. When a ddsnap device is removed, it closes the sockets that its kernel
> threads use to
> talk to 'ddsnap server' and 'ddsnap agent' so that 'ddsnap server' and
> 'ddsnap agent' can reap
> the resources associated with that device. The 'socket errors' are caused by
> this and the
> kernel threads should exit soon. Maybe we should change our code to not
> print out the error
> message upon normal exit since they are kind of confusing to users.
Hi Jiaying,
thanks. False alarm then.
[...]
> Reads to the filesystem usually cause writes to the block device
> because the filesystem may need to update its metadata during
> the read. You can try to mount the filesystem with noatime option
> and see if it helps to improve read performance.
Oh, of course, I forgot about that. But aren't those meant to be
buffered and the block device to be accessed only from time to
time in background?
Here for instance, it blocks the shell, it blocks vim while
typing (because of some getpwd() I think) for several seconds.
From time to time, it causes some failure of some cron job or
nagios checks because of timeouts being exceeded, so it's not
great.
Can't it be written in a way that doesn't lock the whole FS for
several seconds?
> According to our measurements, ddsnap usually introduces 8~9 times
> of overhead on writes. But the performance of running ddsnap on top of
> software raid may be even worse. We are working on optimize ddsnap
> write performance and hopefully will cut the extra overhead in half in a
> few months. Have you tried to run ddsnap on top of SATA disk directly?
[...]
The performances are OK, and actually the ddsnap storage in on a
SATA disk via a linear LVM, so there should hardly be any
overhead. Read performances on the RAID 1 (the real FS) are
better than on a single disk.
But someone else mentionned that deleting a snapshot takes some
time and effort and that while it was doing that, it couldn't
handle any IO. Wouldn't that be what I observe?
> The problem you described seems
> to be caused by memory pressure. The system may have difficulty to
> satisfy other memory requests when writing dirty pages to the ddsnap
> device is extremely slow. You can have a look at /proc/zoneinfo
> to see how much free memory available in normal zone. You may still
> see a lot of free memory in 'top'. But most kernel operations can only
> use memory in the normal zone.
Comparing with a machine that doesn't do much, the
/proc/zoneinfo -- though I can't really understand what it
says -- looks fine:
Node 0, zone Normal
pages free 1406
min 936
low 1170
high 1404
scanned 0 (a: 0 i: 0)
spanned 225280
present 223520
nr_free_pages 1406
nr_inactive 78448
nr_active 86755
nr_anon_pages 0
nr_mapped 0
nr_file_pages 165203
nr_dirty 0
nr_writeback 3
nr_slab_reclaimable 42520
nr_slab_unreclaimable 6070
nr_page_table_pages 681
nr_unstable 0
nr_bounce 0
nr_vmscan_write 210
protection: (0, 0, 19292, 19292)
pagesets
cpu: 0 pcp: 0
count: 145
high: 186
batch: 31
cpu: 0 pcp: 1
count: 58
high: 62
batch: 15
all_unreclaimable: 0
prev_priority: 12
start_pfn: 4096
There's a difference on the "pagesets" though which in most of
the other machines (without ddsnap) I looked at were at 0 or
about.
$ free
total used free shared buffers cached
Mem: 3375700 1953980 1421720 0 671740 621848
-/+ buffers/cache: 660392 2715308
Swap: 8000224 20 8000204
>
> You can give 'ddsnap server' a larger amount of memory with the
> '--cachesize'
> option and see if it speeds up 'ddsnap delete'.
[...]
Looking at the man page, "--cachesize" is an option to the
"initialize" command, not "server". The code seems to suggest
that the manual is wrong though. So I should be able to increase
the cache-size without wiping out the snapstore right? I'll try
and do that without rebooting which might be tricky because of
it being for the rootfs.
Also, it says:
"If not specified or if specified as 0 (zero), defaults to
min(128MB, system ram size/4)."
Based on the fact that top reports RSS == 131MB and that there
are 4GB RAM here, shouldn't it be "max" above instead of "min"?
Cheers,
Stéphane
That was a bad idea. I must have screwed up somewhere, the
snapstore got corrupted in some way, I had to wipe it out.
It even managed to unsync my RAID 1, not sure how...
So, with an empty snapstore, I'll have to wait some time before
I can reproduce that.
Cheers,
Stéphane
On Fri, May 09, 2008 at 11:14:54AM -0700, Jiaying Zhang wrote:
[...]
> Here for instance, it blocks the shell, it blocks vim while
> > typing (because of some getpwd() I think) for several seconds.
> > From time to time, it causes some failure of some cron job or
> > nagios checks because of timeouts being exceeded, so it's not
> > great.
> >
> > Can't it be written in a way that doesn't lock the whole FS for
> > several seconds?
>
>
> ddsnap does not lock the whole FS.
But someone else mentionned that deleting a snapshot takes some
time and effort and that while it was doing that, it couldn't
handle any IO. Wouldn't that be what I observe?
>
> You can give 'ddsnap server' a larger amount of memory with the
> '--cachesize'
> option and see if it speeds up 'ddsnap delete'.
[...]
Looking at the man page, "--cachesize" is an option to the
"initialize" command, not "server". The code seems to suggest
that the manual is wrong though. So I should be able to increase
the cache-size without wiping out the snapstore right? I'll try
and do that without rebooting which might be tricky because of
it being for the rootfs.
Also, it says:
"If not specified or if specified as 0 (zero), defaults to
min(128MB, system ram size/4)."
Based on the fact that top reports RSS == 131MB and that there
are 4GB RAM here, shouldn't it be "max" above instead of "min"?
Hi Jiaying,
thanks, for some reason, I thought I had done a "svn update"
before double-checking, looks like I didn't.
Also shouldn't it be "max(128MB, RAM/4)"
Cheers,
Stéphane
No,
I used "ddsnap initialize" to wipe out the snapstore after it
got corrupted. As I said, while reading the code, I had realised
that the manpage was wrong. I did try and restart the server
with the -k option, but to do that on a rootfs, you need to take
a lot of precautions, and I have screwed up when doing that (I
have allowed the original device to be written directly by doing
a dmsetup resume too early by mistake), so it was totally my
fault.
Cheers,
Stéphane
It looks like it should.
ddsnap --help
is correct:
-k, --cachesize=size Buffer cache size (default = max(128M,1/4
sys RAM)
the man page isn't:
Specifies the amount of RAM to dedicate to the snapshot btree cache. If not specified or
if specified as 0 (zero), defaults to min(128MB, system ram size/4).
regards,
Stéphane
That typo probably crept in because the behavior
is surprising... why *wouldn't* we use more RAM on
big systems? The answer is that we capped cache size
at 1/4 sys RAM so that really small system tests would not blow up.
- Dan
Hi guys,
OK, now, I've got 256MB allocated to ddsnap cache, / has noatime
and the snapshots mount cleanly (thanks to the dmsetup suspend
in a fortress FS trick)
I can see a lot of improvement.
I can't tell at the moment if increasing the cache to 256MB
helped reduce the snapshot deletion time. It's now about 15s,
but a lot less of the snapstore is being used at the moment. If
the deletion time is proportional to the amount of used space in
the snapstore then it could very well be that there's no
improvment at all.
noatime improved things in that I haven't experienced so far
cases where I had to wait for the deletion to finish before
being able to do anything.
Cheers,
Stéphane
On Fri, May 09, 2008 at 10:09:21AM -0700, Jiaying Zhang wrote:
[about ddsnap performance on the root filesystem]
[...]
> Reads to the filesystem usually cause writes to the block device
> because the filesystem may need to update its metadata during
> the read. You can try to mount the filesystem with noatime option
> and see if it helps to improve read performance.
[...]
Hi guys,
OK, now, I've got 256MB allocated to ddsnap cache, / has noatime
and the snapshots mount cleanly (thanks to the dmsetup suspend
in a fortress FS trick)
I can see a lot of improvement.
I can't tell at the moment if increasing the cache to 256MB
helped reduce the snapshot deletion time. It's now about 15s,
but a lot less of the snapstore is being used at the moment. If
the deletion time is proportional to the amount of used space in
the snapstore then it could very well be that there's no
improvment at all.
noatime improved things in that I haven't experienced so far
cases where I had to wait for the deletion to finish before
being able to do anything.