"socket error" in dmesg

2 views
Skip to first unread message

Stephane Chazelas

unread,
May 9, 2008, 7:53:36 AM5/9/08
to zuma...@googlegroups.com
Hi guys,

I had a look at my kern.log recently, and noticed the following
output. It happens every time a snapshot is being deleted and
looking back to the logs from the first time I used booted a
kernel (2.6.24.2) with ddsnap, it has always happened.

May 9 12:00:01 ant kernel: ddsnap_destroy: Unblocking helper threads
May 9 12:00:01 ant kernel: ddsnap_destroy: closing socket connections
May 9 12:00:01 ant kernel: worker: ddwrk/7 exiting for snapshot 7
May 9 12:00:01 ant kernel: control: socket error -32
May 9 12:00:01 ant kernel: control: ddcon/7 exiting for snapshot 7
May 9 12:00:01 ant kernel: incoming: socket error -32
May 9 12:00:01 ant kernel: incoming: ddcli/7 exiting for snapshot 7
May 9 12:00:01 ant kernel: ddsnap_destroy: thread 1 exited
May 9 12:00:01 ant kernel: ddsnap_destroy: thread 2 exited
May 9 12:00:01 ant kernel: ddsnap_destroy: thread 3 exited
May 9 12:00:51 ant kernel: ddsnap_create: Created snapshot device snapstore=253:2 origin=253:0 socket=/dev/ddsnap/control snapshot=7
May 9 12:00:51 ant kernel: incoming: Client thread started, pid=1480 for snapshot 7
May 9 12:00:51 ant kernel: worker: Worker thread started, pid=1481 for snapshot 7
May 9 12:00:51 ant kernel: worker: worker recovering for snapshot 7
May 9 12:00:51 ant kernel: control: Control thread started, pid=1482 for snapshot 7
May 9 12:00:51 ant kernel: requeue_queries:
May 9 12:00:51 ant kernel: worker: worker resuming for snapshot 7
May 9 12:00:51 ant kernel: incoming: identify succeeded. chunksize 12

The corresponding logs (UTC time):

Fri May 9 11:00:01 2008: [2356] snap_server: Received connection
Fri May 9 11:00:01 2008: [2356] snap_server: Client 0 disconnected
Fri May 9 11:00:01 2008: [2356] snap_server: Client 0 disconnected
Fri May 9 11:00:01 2008: [2289] monitor: Lost connection 3
Fri May 9 11:00:01 2008: [2356] snap_server: Received connection
Fri May 9 11:00:01 2008: [2356] delete_snap: Delete snaptag 7 (snapnum 7)
Fri May 9 11:00:50 2008: [2356] snap_server: Client 0 disconnected
Fri May 9 11:00:50 2008: [2356] snap_server: Received connection
Fri May 9 11:00:50 2008: [2356] create_snapshot: Create snapshot tag = 7, bit = 7)
Fri May 9 11:00:50 2008: [2356] snap_server: Client 0 disconnected
Fri May 9 11:00:50 2008: [2356] snap_server: Received connection
Fri May 9 11:00:50 2008: [2356] snap_server: Client 0 disconnected
Fri May 9 11:00:50 2008: [2289] monitor: Received connection 20
Fri May 9 11:00:50 2008: [2289] connect_clients: connect clients to /ddsnap/server
Fri May 9 11:00:50 2008: [2356] snap_server: Received connection
Fri May 9 11:00:51 2008: [2356] incoming: client id 0, snaptag 7
Fri May 9 11:00:51 2008: [2289] incoming: Everything connected properly, all is well

Is that to be expected?

Also, as I've already mentionned, but I haven't really had time
to look more into this since, above, from 11:00:01 to 11:00:50
basically (I think, but that could be shorter), every IO to the
rootfs blocks. Is that normal? Shouldn't at least reads come
through? Could it be related to the errors I'm seeing? I can see
the error as well on another machine.

The setup is (irrelevant parts removed):

The root filesystem is on a ddsnap, the real device being a
linear dm (LVM) or /dev/md0, md0 being a RAID1 on 2 sata disks
(100GB). The ddsnap agent and server log is /dev/ddsnap/log, the
sockets are /dev/ddsnap/control and /dev/ddsnap/server. /dev is
a tmpfs. There are 50GB allocated for snapshots (on a linear of
a SATA disk (LVM)).

$ cat /proc/mounts
udev /dev tmpfs rw 0 0
/dev/mapper/safe-debian / ext3 rw,errors=remount-ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-04-25_00:00:13 /snapshot/safe-debian-2008-04-25_00:00:13 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-04-27_00:01:36 /snapshot/safe-debian-2008-04-27_00:01:36 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-04_00:01:23 /snapshot/safe-debian-2008-05-04_00:01:23 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-06_00:00:57 /snapshot/safe-debian-2008-05-06_00:00:57 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-07_00:00:55 /snapshot/safe-debian-2008-05-07_00:00:55 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-08_00:00:19 /snapshot/safe-debian-2008-05-08_00:00:19 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_00:01:28 /snapshot/safe-debian-2008-05-09_00:01:28 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_01:00:28 /snapshot/safe-debian-2008-05-09_01:00:28 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_02:00:28 /snapshot/safe-debian-2008-05-09_02:00:28 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_03:00:26 /snapshot/safe-debian-2008-05-09_03:00:26 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_04:00:22 /snapshot/safe-debian-2008-05-09_04:00:22 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_05:00:16 /snapshot/safe-debian-2008-05-09_05:00:16 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_06:00:15 /snapshot/safe-debian-2008-05-09_06:00:15 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_07:01:07 /snapshot/safe-debian-2008-05-09_07:01:07 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_08:00:50 /snapshot/safe-debian-2008-05-09_08:00:50 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_09:00:48 /snapshot/safe-debian-2008-05-09_09:00:48 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_10:00:47 /snapshot/safe-debian-2008-05-09_10:00:47 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_11:00:51 /snapshot/safe-debian-2008-05-09_11:00:51 ext3 ro,data=ordered 0 0
/dev/mapper/safe-debian-snap-2008-05-09_12:00:50 /snapshot/safe-debian-2008-05-09_12:00:50 ext3 ro,data=ordered 0 0

$ sudo dmsetup table
safe-debian: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-ddreal: 0 199991296 linear 9:0 384
safe-debian-snap-2008-04-25_00:00:13: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-04-27_00:01:36: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-04_00:01:23: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-06_00:00:57: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-07_00:00:55: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-08_00:00:19: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_00:01:28: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_01:00:28: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_02:00:28: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_03:00:26: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_04:00:22: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_05:00:16: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_06:00:15: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_07:01:07: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_08:00:50: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_09:00:48: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_10:00:47: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_11:00:51: 0 199991296 ddsnap 253:0 253:2 4096
safe-debian-snap-2008-05-09_12:00:50: 0 199991296 ddsnap 253:0 253:2 4096
unsafe-debian--ddsnap: 0 104857600 linear 8:19 146252160
unsafe-fast: 0 208617472 striped 2 128 8:19 41943424 8:4 384

$ sudo ddsnap status /dev/ddsnap/server
Snapshot store block size = 4096; 9,742,071 of 13,107,200 chunks free
Origin size: 102,395,543,552 bytes
Write density: 0
Creation time: Tue Apr 22 13:29:31 2008
Snap Creation time Usecnt Prio Chunks Unshared Shared
13 Fri Apr 25 00:00:13 2008 1 0 1600267 1030551 569716
5 Sun Apr 27 00:01:36 2008 1 0 903211 295765 607446
1 Sun May 4 00:01:23 2008 1 0 387785 200010 187775
0 Tue May 6 00:00:57 2008 1 0 382601 187937 194664
4 Wed May 7 00:00:55 2008 1 0 376664 181553 195111
10 Thu May 8 00:00:19 2008 1 0 339974 185791 154183
6 Fri May 9 00:01:28 2008 1 0 202366 33803 168563
8 Fri May 9 01:00:28 2008 1 0 202293 33723 168570
9 Fri May 9 02:00:28 2008 1 0 202228 33692 168536
12 Fri May 9 03:00:26 2008 1 0 202174 33683 168491
18 Fri May 9 04:00:22 2008 1 0 202119 33716 168403
15 Fri May 9 05:00:16 2008 1 0 202059 33722 168337
17 Fri May 9 06:00:15 2008 1 0 201950 33706 168244
3 Fri May 9 07:01:07 2008 1 0 122176 37351 84825
11 Fri May 9 08:00:50 2008 1 0 42987 33711 9276
14 Fri May 9 09:00:48 2008 1 0 42894 33783 9111
16 Fri May 9 10:00:47 2008 1 0 42711 33921 8790
2 Fri May 9 11:00:51 2008 1 0 36844 35029 1815
7 Fri May 9 12:00:50 2008 1 0 33974 33833 141
totals 3340457 2525280 815177


Cheers,
Stéphane

Jiaying Zhang

unread,
May 9, 2008, 1:09:21 PM5/9/08
to Stephane Chazelas, zuma...@googlegroups.com

Yes. When a ddsnap device is removed, it closes the sockets that its kernel threads use to
talk to 'ddsnap server' and 'ddsnap agent' so that 'ddsnap server' and 'ddsnap agent' can reap
the resources associated with that device. The 'socket errors' are caused by this and the
kernel threads should exit soon. Maybe we should change our code to not print out the error
message upon normal exit since they are kind of confusing to users.

Also, as I've already mentionned, but I haven't really had time
to look more into this since, above, from 11:00:01 to 11:00:50
basically (I think, but that could be shorter), every IO to the
rootfs blocks. Is that normal? Shouldn't at least reads come
through? Could it be related to the errors I'm seeing? I can see
the error as well on another machine.

Reads to the filesystem usually cause writes to the block device
because the filesystem may need to update its metadata during
the read. You can try to mount the filesystem with noatime option
and see if it helps to improve read performance.


The setup is (irrelevant parts removed):

The root filesystem is on a ddsnap, the real device being a
linear dm (LVM) or /dev/md0, md0 being a RAID1 on 2 sata disks
(100GB). The ddsnap agent and server log is /dev/ddsnap/log, the
sockets are /dev/ddsnap/control and /dev/ddsnap/server. /dev is
a tmpfs. There are 50GB allocated for snapshots (on a linear of
a SATA disk (LVM)).

According to our measurements, ddsnap usually introduces 8~9 times
of overhead on writes. But the performance of running ddsnap on top of
software raid may be even worse. We are working on optimize ddsnap
write performance and hopefully will cut the extra overhead in half in a
few months. Have you tried to run ddsnap on top of SATA disk directly?

Jiaying

Shapor Naghibzadeh

unread,
May 9, 2008, 1:25:58 PM5/9/08
to Stephane Chazelas, zuma...@googlegroups.com
On Fri, May 9, 2008 at 4:53 AM, Stephane Chazelas
<Stephane...@yahoo.fr> wrote:
>
> Fri May 9 11:00:01 2008: [2356] snap_server: Received connection
> Fri May 9 11:00:01 2008: [2356] snap_server: Client 0 disconnected
> Fri May 9 11:00:01 2008: [2356] snap_server: Client 0 disconnected
> Fri May 9 11:00:01 2008: [2289] monitor: Lost connection 3
> Fri May 9 11:00:01 2008: [2356] snap_server: Received connection
> Fri May 9 11:00:01 2008: [2356] delete_snap: Delete snaptag 7 (snapnum 7)
> Fri May 9 11:00:50 2008: [2356] snap_server: Client 0 disconnected
> Fri May 9 11:00:50 2008: [2356] snap_server: Received connection
> Fri May 9 11:00:50 2008: [2356] create_snapshot: Create snapshot tag = 7, bit = 7)
> Fri May 9 11:00:50 2008: [2356] snap_server: Client 0 disconnected
> Fri May 9 11:00:50 2008: [2356] snap_server: Received connection
> Fri May 9 11:00:50 2008: [2356] snap_server: Client 0 disconnected
> Fri May 9 11:00:50 2008: [2289] monitor: Received connection 20
> Fri May 9 11:00:50 2008: [2289] connect_clients: connect clients to /ddsnap/server
> Fri May 9 11:00:50 2008: [2356] snap_server: Received connection
> Fri May 9 11:00:51 2008: [2356] incoming: client id 0, snaptag 7
> Fri May 9 11:00:51 2008: [2289] incoming: Everything connected properly, all is well
>
>
> Also, as I've already mentionned, but I haven't really had time
> to look more into this since, above, from 11:00:01 to 11:00:50
> basically (I think, but that could be shorter), every IO to the
> rootfs blocks. Is that normal? Shouldn't at least reads come
> through? Could it be related to the errors I'm seeing? I can see
> the error as well on another machine.
>...

How much memory does the system have? It looks like it may be
blocking during the snapshot deletion, which happens synchronously.
ddsnap server is single threaded and has to traverse the entire tree
(which may be getting out of cache, causing disk io and a long block).

Shapor

Stephane Chazelas

unread,
May 9, 2008, 1:32:17 PM5/9/08
to Jiaying Zhang, zuma...@googlegroups.com
On Fri, May 09, 2008 at 10:09:21AM -0700, Jiaying Zhang wrote:
[...]

> Yes. When a ddsnap device is removed, it closes the sockets that its kernel
> threads use to
> talk to 'ddsnap server' and 'ddsnap agent' so that 'ddsnap server' and
> 'ddsnap agent' can reap
> the resources associated with that device. The 'socket errors' are caused by
> this and the
> kernel threads should exit soon. Maybe we should change our code to not
> print out the error
> message upon normal exit since they are kind of confusing to users.

Hi Jiaying,

thanks. False alarm then.

[...]


> Reads to the filesystem usually cause writes to the block device
> because the filesystem may need to update its metadata during
> the read. You can try to mount the filesystem with noatime option
> and see if it helps to improve read performance.

Oh, of course, I forgot about that. But aren't those meant to be
buffered and the block device to be accessed only from time to
time in background?

Here for instance, it blocks the shell, it blocks vim while
typing (because of some getpwd() I think) for several seconds.
From time to time, it causes some failure of some cron job or
nagios checks because of timeouts being exceeded, so it's not
great.

Can't it be written in a way that doesn't lock the whole FS for
several seconds?

[...]


> According to our measurements, ddsnap usually introduces 8~9 times
> of overhead on writes. But the performance of running ddsnap on top of
> software raid may be even worse. We are working on optimize ddsnap
> write performance and hopefully will cut the extra overhead in half in a
> few months. Have you tried to run ddsnap on top of SATA disk directly?

[...]

The performances are OK, and actually the ddsnap storage in on a
SATA disk via a linear LVM, so there should hardly be any
overhead. Read performances on the RAID 1 (the real FS) are
better than on a single disk.

Cheers,
Stéphane

Stephane Chazelas

unread,
May 9, 2008, 1:44:24 PM5/9/08
to Shapor Naghibzadeh, zuma...@googlegroups.com
On Fri, May 09, 2008 at 10:25:58AM -0700, Shapor Naghibzadeh wrote:
[...]

> How much memory does the system have? It looks like it may be
> blocking during the snapshot deletion, which happens synchronously.
> ddsnap server is single threaded and has to traverse the entire tree
> (which may be getting out of cache, causing disk io and a long block).
[...]

Hi Shapor,

Enough memory I'd say. There are 4GB RAM (most of which unused)
and 8GB swap, I see ddsnap_server occupying consistently 131MB.

Is there any tuning I can do to improve that?

Isn't there a way for ddsnap to service the I/Os while it is
deleting a snashot?

Would making more frequent snapshots reduce the deletion time?

Looking at my kern.log, it took up to 182 seconds, the average
time being 36s (over 131 deletions mostly every hour).

Cheers,
Stéphane

Dan Kegel

unread,
May 9, 2008, 1:48:49 PM5/9/08
to Stephane Chazelas, Shapor Naghibzadeh, zuma...@googlegroups.com
On Fri, May 9, 2008 at 10:44 AM, Stephane Chazelas
<Stephane...@yahoo.fr> wrote:
> Isn't there a way for ddsnap to service the I/Os while it is
> deleting a snashot?
>
> Would making more frequent snapshots reduce the deletion time?
>
> Looking at my kern.log, it took up to 182 seconds, the average
> time being 36s (over 131 deletions mostly every hour).

36 seconds of dead time does sound like way too much.
This is the kind of issue we want to address in
our 0.9 and 0.10 releases. Can you file a bug
for "snapshot deletion causes all I/O to block for up to three minutes",
and give details in the bug of your setup? We
will create a performance test case for this;
there's some chance the optimization we're
starting to test for 0.9 will help some.

Jiaying Zhang

unread,
May 9, 2008, 2:14:54 PM5/9/08
to Stephane Chazelas, zuma...@googlegroups.com
On 5/9/08, Stephane Chazelas <Stephane...@yahoo.fr> wrote:
On Fri, May 09, 2008 at 10:09:21AM -0700, Jiaying Zhang wrote:
[...]

> Yes. When a ddsnap device is removed, it closes the sockets that its kernel
> threads use to
> talk to 'ddsnap server' and 'ddsnap agent' so that 'ddsnap server' and
> 'ddsnap agent' can reap
> the resources associated with that device. The 'socket errors' are caused by
> this and the
> kernel threads should exit soon. Maybe we should change our code to not
> print out the error
> message upon normal exit since they are kind of confusing to users.


Hi Jiaying,

thanks. False alarm then.

[...]

> Reads to the filesystem usually cause writes to the block device
> because the filesystem may need to update its metadata during
> the read. You can try to mount the filesystem with noatime option
> and see if it helps to improve read performance.


Oh, of course, I forgot about that. But aren't those meant to be
buffered and the block device to be accessed only from time to
time in background?

I think with ext3, all metadata updates need to be flushed to on-disk journals. 
So turning off atime may have a big effect on read performance.


Here for instance, it blocks the shell, it blocks vim while
typing (because of some getpwd() I think) for several seconds.
From time to time, it causes some failure of some cron job or
nagios checks because of timeouts being exceeded, so it's not
great.

Can't it be written in a way that doesn't lock the whole FS for
several seconds?

ddsnap does not lock the whole FS. The problem you described seems
to be caused by memory pressure. The system may have difficulty to
satisfy other memory requests when writing dirty pages to the ddsnap
device is extremely slow. You can have a look at /proc/zoneinfo
to see how much free memory available in normal zone. You may still
see a lot of free memory in 'top'. But most kernel operations can only
use memory in the normal zone.

You can give 'ddsnap server' a larger amount of memory with the '--cachesize'
option and see if it speeds up 'ddsnap delete'.


> According to our measurements, ddsnap usually introduces 8~9 times
> of overhead on writes. But the performance of running ddsnap on top of
> software raid may be even worse. We are working on optimize ddsnap
> write performance and hopefully will cut the extra overhead in half in a
> few months. Have you tried to run ddsnap on top of SATA disk directly?

[...]

The performances are OK, and actually the ddsnap storage in on a
SATA disk via a linear LVM, so there should hardly be any
overhead. Read performances on the RAID 1 (the real FS) are
better than on a single disk.

Good to hear that!

Jiaying

Stephane Chazelas

unread,
May 12, 2008, 7:11:08 AM5/12/08
to Jiaying Zhang, zuma...@googlegroups.com
On Fri, May 09, 2008 at 11:14:54AM -0700, Jiaying Zhang wrote:
[...]
> Here for instance, it blocks the shell, it blocks vim while
> > typing (because of some getpwd() I think) for several seconds.
> > From time to time, it causes some failure of some cron job or
> > nagios checks because of timeouts being exceeded, so it's not
> > great.
> >
> > Can't it be written in a way that doesn't lock the whole FS for
> > several seconds?
>
>
> ddsnap does not lock the whole FS.

But someone else mentionned that deleting a snapshot takes some
time and effort and that while it was doing that, it couldn't
handle any IO. Wouldn't that be what I observe?

> The problem you described seems
> to be caused by memory pressure. The system may have difficulty to
> satisfy other memory requests when writing dirty pages to the ddsnap
> device is extremely slow. You can have a look at /proc/zoneinfo
> to see how much free memory available in normal zone. You may still
> see a lot of free memory in 'top'. But most kernel operations can only
> use memory in the normal zone.

Comparing with a machine that doesn't do much, the
/proc/zoneinfo -- though I can't really understand what it
says -- looks fine:

Node 0, zone Normal
pages free 1406
min 936
low 1170
high 1404
scanned 0 (a: 0 i: 0)
spanned 225280
present 223520
nr_free_pages 1406
nr_inactive 78448
nr_active 86755
nr_anon_pages 0
nr_mapped 0
nr_file_pages 165203
nr_dirty 0
nr_writeback 3
nr_slab_reclaimable 42520
nr_slab_unreclaimable 6070
nr_page_table_pages 681
nr_unstable 0
nr_bounce 0
nr_vmscan_write 210
protection: (0, 0, 19292, 19292)
pagesets
cpu: 0 pcp: 0
count: 145
high: 186
batch: 31
cpu: 0 pcp: 1
count: 58
high: 62
batch: 15
all_unreclaimable: 0
prev_priority: 12
start_pfn: 4096

There's a difference on the "pagesets" though which in most of
the other machines (without ddsnap) I looked at were at 0 or
about.

$ free
total used free shared buffers cached
Mem: 3375700 1953980 1421720 0 671740 621848
-/+ buffers/cache: 660392 2715308
Swap: 8000224 20 8000204


>
> You can give 'ddsnap server' a larger amount of memory with the
> '--cachesize'
> option and see if it speeds up 'ddsnap delete'.

[...]

Looking at the man page, "--cachesize" is an option to the
"initialize" command, not "server". The code seems to suggest
that the manual is wrong though. So I should be able to increase
the cache-size without wiping out the snapstore right? I'll try
and do that without rebooting which might be tricky because of
it being for the rootfs.

Also, it says:
"If not specified or if specified as 0 (zero), defaults to
min(128MB, system ram size/4)."

Based on the fact that top reports RSS == 131MB and that there
are 4GB RAM here, shouldn't it be "max" above instead of "min"?

Cheers,
Stéphane

Stephane Chazelas

unread,
May 12, 2008, 9:47:48 AM5/12/08
to Jiaying Zhang, zuma...@googlegroups.com
On Mon, May 12, 2008 at 12:11:08PM +0100, Stephane Chazelas wrote:
[...]
> Looking at the man page, "--cachesize" is an option to the
> "initialize" command, not "server". The code seems to suggest
> that the manual is wrong though. So I should be able to increase
> the cache-size without wiping out the snapstore right? I'll try
> and do that without rebooting which might be tricky because of
> it being for the rootfs.
[...]

That was a bad idea. I must have screwed up somewhere, the
snapstore got corrupted in some way, I had to wipe it out.

It even managed to unsync my RAID 1, not sure how...

So, with an empty snapstore, I'll have to wait some time before
I can reproduce that.

Cheers,
Stéphane

Jiaying Zhang

unread,
May 12, 2008, 12:10:12 PM5/12/08
to Stephane Chazelas, zuma...@googlegroups.com
On 5/12/08, Stephane Chazelas <Stephane...@yahoo.fr> wrote:
On Fri, May 09, 2008 at 11:14:54AM -0700, Jiaying Zhang wrote:
[...]

> Here for instance, it blocks the shell, it blocks vim while
> > typing (because of some getpwd() I think) for several seconds.
> > From time to time, it causes some failure of some cron job or
> > nagios checks because of timeouts being exceeded, so it's not
> > great.
> >
> > Can't it be written in a way that doesn't lock the whole FS for
> > several seconds?
>
>
> ddsnap does not lock the whole FS.


But someone else mentionned that deleting a snapshot takes some
time and effort and that while it was doing that, it couldn't
handle any IO. Wouldn't that be what I observe?

You are right. ddsnap server can't process any IO while deleting a
snapshot. So in that sense, ddsnap does lock the whole FS during the
deleting. It will be tricky to make 'ddsnap delete' asynchronous
from other operations because it traverses the whole btree to update
the chunk sharing information. That may not happen soon. But we
should add some statistical printing that tells users how large the
btree is.

Looks like the memory usage is fine. I just remember you put your
root on the ddsnap volume. We haven't done this before. When we
saw other things were slow, like vi or ssh, it was usually caused
by memory pressure. But in your case, the problem might just be
caused by slow IOs.

>
> You can give 'ddsnap server' a larger amount of memory with the
> '--cachesize'
> option and see if it speeds up 'ddsnap delete'.

[...]

Looking at the man page, "--cachesize" is an option to the
"initialize" command, not "server". The code seems to suggest
that the manual is wrong though. So I should be able to increase
the cache-size without wiping out the snapstore right? I'll try
and do that without rebooting which might be tricky because of
it being for the rootfs.

Also, it says:
"If not specified or if specified as 0 (zero), defaults to
min(128MB, system ram size/4)."

Based on the fact that top reports RSS == 131MB and that there
are 4GB RAM here, shouldn't it be "max" above instead of "min"?

The ddsnap manpage was wrong. '--cachesize' is option to the 'ddsnap server'
command. You don't need to re-initialize the volume, but need to restart the
ddsnap agent and ddsnap server. The manpage was corrected in revision 1657.
Sorry for the confusing.

Jiaying

Jiaying Zhang

unread,
May 12, 2008, 12:34:16 PM5/12/08
to Stephane Chazelas, zuma...@googlegroups.com
Running 'ddsnap initialize' overwrites the snapstore. Again, really sorry for
the mistake in the manpage ...

Jiaying

On 5/12/08, Stephane Chazelas <Stephane...@yahoo.fr> wrote:

Stephane Chazelas

unread,
May 12, 2008, 12:39:34 PM5/12/08
to Jiaying Zhang, zuma...@googlegroups.com
On Mon, May 12, 2008 at 09:10:12AM -0700, Jiaying Zhang wrote:
[...]
> The ddsnap manpage was wrong. '--cachesize' is option to the 'ddsnap server'
> command. You don't need to re-initialize the volume, but need to restart the
> ddsnap agent and ddsnap server. The manpage was corrected in revision 1657.
> Sorry for the confusing.
[...]

Hi Jiaying,

thanks, for some reason, I thought I had done a "svn update"
before double-checking, looks like I didn't.

Also shouldn't it be "max(128MB, RAM/4)"

Cheers,
Stéphane

Stephane Chazelas

unread,
May 12, 2008, 12:43:38 PM5/12/08
to Jiaying Zhang, zuma...@googlegroups.com
On Mon, May 12, 2008 at 09:34:16AM -0700, Jiaying Zhang wrote:
> Running 'ddsnap initialize' overwrites the snapstore. Again, really sorry
> for
> the mistake in the manpage ...

No,

I used "ddsnap initialize" to wipe out the snapstore after it
got corrupted. As I said, while reading the code, I had realised
that the manpage was wrong. I did try and restart the server
with the -k option, but to do that on a rootfs, you need to take
a lot of precautions, and I have screwed up when doing that (I
have allowed the original device to be written directly by doing
a dmsetup resume too early by mistake), so it was totally my
fault.

Cheers,
Stéphane

Stephane Chazelas

unread,
May 12, 2008, 12:46:16 PM5/12/08
to Jiaying Zhang, zuma...@googlegroups.com
On Mon, May 12, 2008 at 05:39:34PM +0100, Stephane Chazelas wrote:
[...]
> Also shouldn't it be "max(128MB, RAM/4)"
[...]

It looks like it should.

ddsnap --help

is correct:

-k, --cachesize=size Buffer cache size (default = max(128M,1/4
sys RAM)

the man page isn't:

Specifies the amount of RAM to dedicate to the snapshot btree cache. If not specified or


if specified as 0 (zero), defaults to min(128MB, system ram size/4).

regards,
Stéphane

Dan Kegel

unread,
May 12, 2008, 1:50:04 PM5/12/08
to Stephane Chazelas, Jiaying Zhang, zuma...@googlegroups.com
Stephane Chazelas <Stephane...@yahoo.fr> wrote:
> ddsnap --help is correct:
>
>> -k, --cachesize=size (default = max(128M,1/4 sys RAM)

>
> the man page isn't:
>
>> defaults to min(128MB, system ram size/4).

That typo probably crept in because the behavior
is surprising... why *wouldn't* we use more RAM on
big systems? The answer is that we capped cache size
at 1/4 sys RAM so that really small system tests would not blow up.
- Dan

Jiaying Zhang

unread,
May 12, 2008, 2:08:31 PM5/12/08
to Dan Kegel, Stephane Chazelas, zuma...@googlegroups.com
Should we just use 1/4 sys RAM as the default? We may want to
do more tests with large cachesize, like 2G, because all those
memory will be mlocked. I don't know if having so much memory
mlocked will cause any problem.

Jiaying

Dan Kegel

unread,
May 12, 2008, 2:11:28 PM5/12/08
to Jiaying Zhang, Stephane Chazelas, zuma...@googlegroups.com
I fear it would be antisocial to mlock 1/4 sys ram.
We should instead make sure it's easy for admins
to find out they might want to tune that parameter.
Can we log a warning if we detect it might be too small?

Stephane Chazelas

unread,
May 13, 2008, 11:45:04 AM5/13/08
to Jiaying Zhang, zuma...@googlegroups.com
On Fri, May 09, 2008 at 10:09:21AM -0700, Jiaying Zhang wrote:
[about ddsnap performance on the root filesystem]
[...]

> Reads to the filesystem usually cause writes to the block device
> because the filesystem may need to update its metadata during
> the read. You can try to mount the filesystem with noatime option
> and see if it helps to improve read performance.
[...]

Hi guys,

OK, now, I've got 256MB allocated to ddsnap cache, / has noatime
and the snapshots mount cleanly (thanks to the dmsetup suspend
in a fortress FS trick)

I can see a lot of improvement.

I can't tell at the moment if increasing the cache to 256MB
helped reduce the snapshot deletion time. It's now about 15s,
but a lot less of the snapstore is being used at the moment. If
the deletion time is proportional to the amount of used space in
the snapstore then it could very well be that there's no
improvment at all.

noatime improved things in that I haven't experienced so far
cases where I had to wait for the deletion to finish before
being able to do anything.

Cheers,
Stéphane

Jiaying Zhang

unread,
May 13, 2008, 12:53:23 PM5/13/08
to Stephane Chazelas, zuma...@googlegroups.com
On 5/13/08, Stephane Chazelas <Stephane...@yahoo.fr> wrote:
On Fri, May 09, 2008 at 10:09:21AM -0700, Jiaying Zhang wrote:
[about ddsnap performance on the root filesystem]
[...]
> Reads to the filesystem usually cause writes to the block device
> because the filesystem may need to update its metadata during
> the read. You can try to mount the filesystem with noatime option
> and see if it helps to improve read performance.
[...]

Hi guys,

OK, now, I've got 256MB allocated to ddsnap cache, / has noatime
and the snapshots mount cleanly (thanks to the dmsetup suspend
in a fortress FS trick)

I can see a lot of improvement.

I can't tell at the moment if increasing the cache to 256MB
helped reduce the snapshot deletion time. It's now about 15s,
but a lot less of the snapstore is being used at the moment. If
the deletion time is proportional to the amount of used space in
the snapstore then it could very well be that there's no
improvment at all.

Yes.  The deletion time is proportional to the amount of used
space in the snapstore. We are working on setting up an automatic
performance testing framework. Once we have that, we should
have a benchmark that tests ddsnap deletion time as the snapstore
grows and ages.

noatime improved things in that I haven't experienced so far
cases where I had to wait for the deletion to finish before
being able to do anything.

Good to hear that.

Jiaying
Reply all
Reply to author
Forward
0 new messages