Issue with degraded disks

Coalescence

unread,

May 12, 2010, 7:43:33 AM5/12/10

to ganeti

Hi all,

Testing out 2.1.2.1 on a 2 node fresh install. I'm using the 2nd NIC
as the drbd backend, over a crossover. Both ports are set to 1000Mbit,
Full duplex. No firewalling currently. Hardware identical (Dell 2950s)

The initial cluster build had is -s option set to 192.168.1.1 (1st
node, 2nd nic)

added the 2nd node with -s 192.168.1.2 (2nd node, 2nd nic)

I've got an issues with the drbd stack, when I try and create an
instance I get;

Failure: command execution error:
There are some degraded disks for this instance

However it's a fresh build, tried running repair disks etc..

kernel: 2.6.33-grml64
drbd version: version: 8.3.7 (api:88/proto:86-91)
ganeti 2.1.2.1

vmhost01:~/source/ganeti-2.1.2.1/tools# gnt-node list
Node DTotal DFree MTotal MNode MFree Pinst Sinst
vmhost01.vmtest.x.co.uk 416.4G 416.4G 2.0G 79M 1.8G 0 0
vmhost02.vmtest.x.co.uk 416.4G 416.4G 2.0G 46M 1.9G 0 0

vmhost01:~/source/ganeti-2.1.2.1/tools# gnt-cluster info
Cluster name: cluster.vmtest.x.co.uk
Cluster UUID: d9fdb890-bdb1-48b4-99df-129a0b7e75b2
Creation time: 2010-05-12 10:55:50
Modification time: 2010-05-12 10:55:50
Master node: vmhost01.vmtest.x.co.uk
Architecture (this node): 64bit (x86_64)
Tags: (none)
Default hypervisor: kvm
Enabled hypervisors: kvm
Hypervisor parameters:
- kvm:
acpi: True
boot_order: disk
cdrom_image_path:
disk_cache: default
disk_type: paravirtual
initrd_path:
kernel_args: ro
kernel_path: /boot/vmlinuz-2.6-kvmU
kvm_flag:
migration_port: 8102
nic_type: paravirtual
root_path: /dev/vda1
security_domain:
security_model: none
serial_console: True
usb_mouse:
use_localtime: False
vnc_bind_address:
vnc_password_file:
vnc_tls: False
vnc_x509_path:
vnc_x509_verify: False
OS specific hypervisor parameters:
Cluster parameters:
- candidate pool size: 10
- master netdev: br0
- lvm volume group: kvm-data
- file storage path: /srv/ganeti/file-storage
- maintenance of node health: False
- uid pool:
Default instance parameters:
- default:
auto_balance: True
memory: 128
vcpus: 1
Default nic parameters:
- default:
link: br0
mode: bridged

Here's the log from drbd

May 12 11:34:46 vmhost01 kernel: [ 735.268476] block drbd0: Starting
worker thread (from cqueue [3356])
May 12 11:34:46 vmhost01 kernel: [ 735.268696] block drbd0:
disk( Diskless -> Attaching )
May 12 11:34:46 vmhost01 kernel: [ 735.269162] block drbd0: No usable
activity log found.
May 12 11:34:46 vmhost01 kernel: [ 735.269247] block drbd0: Method to
ensure write ordering: barrier
May 12 11:34:46 vmhost01 kernel: [ 735.269319] block drbd0: Backing
device's merge_bvec_fn() = ffffffff8141d9c0
May 12 11:34:46 vmhost01 kernel: [ 735.269391] block drbd0:
max_segment_size ( = BIO size ) = 4096
May 12 11:34:46 vmhost01 kernel: [ 735.269462] block drbd0:
drbd_bm_resize called with capacity == 10485760
May 12 11:34:46 vmhost01 kernel: [ 735.269564] block drbd0: resync
bitmap: bits=1310720 words=20480
May 12 11:34:46 vmhost01 kernel: [ 735.269634] block drbd0: size =
5120 MB (5242880 KB)
May 12 11:34:46 vmhost01 kernel: [ 735.269704] block drbd0: Writing
the whole bitmap, size changed
May 12 11:34:46 vmhost01 kernel: [ 735.276988] block drbd0: 5120 MB
(1310720 bits) marked out-of-sync by on disk bit-map.
May 12 11:34:46 vmhost01 kernel: [ 735.277577] block drbd0:
recounting of set bits took additional 0 jiffies
May 12 11:34:46 vmhost01 kernel: [ 735.277649] block drbd0: 5120 MB
(1310720 bits) marked out-of-sync by on disk bit-map.
May 12 11:34:46 vmhost01 kernel: [ 735.277737] block drbd0:
disk( Attaching -> Inconsistent )
May 12 11:34:46 vmhost01 kernel: [ 735.288372] block drbd0:
conn( StandAlone -> Unconnected )
May 12 11:34:46 vmhost01 kernel: [ 735.288466] block drbd0: Starting
receiver thread (from drbd0_worker [5136])
May 12 11:34:46 vmhost01 kernel: [ 735.288605] block drbd0: receiver
(re)started
May 12 11:34:46 vmhost01 kernel: [ 735.288678] block drbd0:
conn( Unconnected -> WFConnection )
May 12 11:34:47 vmhost01 kernel: [ 735.386611] block drbd0:
role( Secondary -> Primary ) disk( Inconsistent -> UpToDate )
May 12 11:34:47 vmhost01 kernel: [ 735.386892] block drbd0: Forced to
consider local data as UpToDate!
May 12 11:34:47 vmhost01 kernel: [ 735.386979] block drbd0: Creating
new current UUID
May 12 11:34:47 vmhost01 kernel: [ 735.783285] block drbd0: Handshake
successful: Agreed network protocol version 91
May 12 11:34:47 vmhost01 kernel: [ 735.783562] block drbd0: Peer
authenticated using 16 bytes of 'md5' HMAC
May 12 11:34:47 vmhost01 kernel: [ 735.783669] block drbd0:
conn( WFConnection -> WFReportParams )
May 12 11:34:47 vmhost01 kernel: [ 735.783774] block drbd0: Starting
asender thread (from drbd0_receiver [5148])
May 12 11:34:47 vmhost01 kernel: [ 735.783961] block drbd0: data-
integrity-alg: <not-used>
May 12 11:34:47 vmhost01 kernel: [ 735.784045] block drbd0:
drbd_sync_handshake:
May 12 11:34:47 vmhost01 kernel: [ 735.784119] block drbd0: self
D84B6946F9840281:0000000000000004:0000000000000000:0000000000000000
bits:1310720 flags:0
May 12 11:34:47 vmhost01 kernel: [ 735.784212] block drbd0: peer
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:1310720 flags:0
May 12 11:34:47 vmhost01 kernel: [ 735.784305] block drbd0:
uuid_compare()=2 by rule 30
May 12 11:34:47 vmhost01 kernel: [ 735.784372] block drbd0: Becoming
sync source due to disk states.
May 12 11:34:47 vmhost01 kernel: [ 735.784452] block drbd0: Writing
the whole bitmap, full sync required after drbd_sync_handshake.
May 12 11:34:47 vmhost01 kernel: [ 735.785158] block drbd0: 5120 MB
(1310720 bits) marked out-of-sync by on disk bit-map.
May 12 11:34:47 vmhost01 kernel: [ 735.785315] block drbd0:
peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
pdsk( DUnknown -> Inconsistent )
May 12 11:34:47 vmhost01 kernel: [ 735.795519] block drbd0:
conn( WFBitMapS -> SyncSource )
May 12 11:34:47 vmhost01 kernel: [ 735.795604] block drbd0: Began
resync as SyncSource (will sync 5242880 KB [1310720 bits set]).
May 12 11:34:47 vmhost01 kernel: [ 735.796931] block drbd0: sock was
shut down by peer
May 12 11:34:47 vmhost01 kernel: [ 735.796936] block drbd0:
peer( Secondary -> Unknown ) conn( SyncSource -> BrokenPipe )
May 12 11:34:47 vmhost01 kernel: [ 735.797253] block drbd0: asender
terminated
May 12 11:34:47 vmhost01 kernel: [ 735.797346] block drbd0:
Terminating drbd0_asender
May 12 11:34:47 vmhost01 kernel: [ 735.797435] block drbd0:
Connection closed
May 12 11:34:47 vmhost01 kernel: [ 735.797510] block drbd0:
conn( BrokenPipe -> Unconnected )
May 12 11:34:47 vmhost01 kernel: [ 735.797587] block drbd0: receiver
terminated
May 12 11:34:47 vmhost01 kernel: [ 735.797658] block drbd0:
Restarting drbd0_receiver
May 12 11:34:47 vmhost01 kernel: [ 735.797726] block drbd0: receiver
(re)started
May 12 11:34:47 vmhost01 kernel: [ 735.797794] block drbd0:
conn( Unconnected -> WFConnection )
May 12 11:34:57 vmhost01 kernel: [ 746.238384] block drbd0:
role( Primary -> Secondary )
May 12 11:34:57 vmhost01 kernel: [ 746.238726] block drbd0:
conn( WFConnection -> Disconnecting )
May 12 11:34:57 vmhost01 kernel: [ 746.238814] block drbd0:
Discarding network configuration.
May 12 11:34:57 vmhost01 kernel: [ 746.238905] block drbd0:
Connection closed
May 12 11:34:57 vmhost01 kernel: [ 746.238987] block drbd0:
conn( Disconnecting -> StandAlone )
May 12 11:34:57 vmhost01 kernel: [ 746.239070] block drbd0: receiver
terminated
May 12 11:34:57 vmhost01 kernel: [ 746.239138] block drbd0:
Terminating drbd0_receiver
May 12 11:34:57 vmhost01 kernel: [ 746.239249] block drbd0:
disk( UpToDate -> Diskless ) pdsk( Inconsistent -> DUnknown )
May 12 11:34:57 vmhost01 kernel: [ 746.239465] block drbd0:
drbd_bm_resize called with capacity == 0
May 12 11:34:57 vmhost01 kernel: [ 746.239550] block drbd0: worker
terminated
May 12 11:34:57 vmhost01 kernel: [ 746.239617] block drbd0:
Terminating drbd0_worker

Any ideas, is it a drbd bug of some kind or is it because I'm trying
to use the 2nd nic with drbd (somehow)

Iustin Pop

unread,

May 12, 2010, 7:48:03 AM5/12/10

to gan...@googlegroups.com

On Wed, May 12, 2010 at 04:43:33AM -0700, Coalescence wrote:
> Hi all,
>
> Testing out 2.1.2.1 on a 2 node fresh install. I'm using the 2nd NIC
> as the drbd backend, over a crossover. Both ports are set to 1000Mbit,
> Full duplex. No firewalling currently. Hardware identical (Dell 2950s)
>
> The initial cluster build had is -s option set to 192.168.1.1 (1st
> node, 2nd nic)
>
> added the 2nd node with -s 192.168.1.2 (2nd node, 2nd nic)
>
> I've got an issues with the drbd stack, when I try and create an
> instance I get;
>
> Failure: command execution error:
> There are some degraded disks for this instance

[…]

> May 12 11:34:47 vmhost01 kernel: [ 735.795604] block drbd0: Began
> resync as SyncSource (will sync 5242880 KB [1310720 bits set]).
> May 12 11:34:47 vmhost01 kernel: [ 735.796931] block drbd0: sock was
> shut down by peer

Did you configure the kernel parameter for the DRBD module correctly per
the docs (usermode_helper=/bin/true, for now at least)?

iustin

Coalescence

unread,

May 12, 2010, 7:55:44 AM5/12/10

to ganeti

Damn, as soon as you mentioned it I remembered!

Thanks so much! All working now..

Reply all

Reply to author

Forward