Hi,
while moving / replacing disks one VM on my cluster I got the Error
"Can't change network configuration: drbd7: timeout while configuring
network (please do a gnt-instance info to see the status of disks)"
=> disk/0 has failed, but disk/1 is working fine
So I tried to repair it with "gnt-instance replace-disks -a lug-in", but
that gives me a "No disks need replacement for instance
'
lug-in.xxxxxxxxx.de'"
=> disk/0 is still degraded
I run a "gnt-cluster verify" which doesn't detect anything (exempt that
the LVM Volumes are still on node2, but they are unknown)
I then tryed a "gnt-instance replace-disks -s lug-in" which failes again
with the same error as the origional "replace-disks -n node1"
Anyone an idea what I have forgotten or what I could do?
Notice: all other (17) VMs are running without problems
Greetings
Neal
> root@node2 ~ # gnt-job info 118433
> Job ID: 118433
> Status: error
> Received: 2014-05-23 02:59:34.007653
> Processing start: 2014-05-23 02:59:34.150118 (delta 0.142465s)
> Processing end: 2014-05-23 03:02:11.719716 (delta 157.569598s)
> Total processing time: 157.712063 seconds
> Opcodes:
> OP_INSTANCE_REPLACE_DISKS
> Status: error
> Processing start: 2014-05-23 02:59:34.150118
> Execution start: 2014-05-23 02:59:36.276322
> Processing end: 2014-05-23 03:02:11.719693
> Input fields:
> comment: None
> debug_level: 0
> depends: None
> disks:
> dry_run: False
> early_release: False
> ignore_ipolicy: False
> instance_name:
lug-in.xxxxxxxxx.de
> instance_uuid: a7c2d2f2-a248-4191-b75f-1a2ce53bba9a
> mode: replace_new_secondary
> priority: 0
> reason: ['gnt:client:gnt-instance', 'replace-disks',
> 1400806773786305024],['gnt:opcode:op_instance_replace_disks',
> 'job=118433;index=0', 1400806774007633920]
> remote_node:
node1.xxxxxxxxx.de
> remote_node_uuid: 56a75bd8-f179-40c6-a8e4-3adc283968f3
> Result:
> OpExecError
> [Can't attach drbd disks on node
node1.xxxxxxxxx.de: Can't
> change network configuration: drbd7: timeout while configuring network
> (please do a gnt-instance info to see the status of disks)]
> Execution log:
> 1:2014-05-23 02:59:36.596214:message Replacing disk(s) 0, 1
> for instance '
lug-in.xxxxxxxxx.de'
> 2:2014-05-23 02:59:36.655885:message Current primary node:
>
node3.xxxxxxxxx.de
> 3:2014-05-23 02:59:36.706412:message Current seconary node:
>
node2.xxxxxxxxx.de
> 4:2014-05-23 02:59:36.748099:message STEP 1/6 Check device
> existence
> 5:2014-05-23 02:59:36.791096:message - INFO: Checking disk/0
> on
node3.xxxxxxxxx.de
> 6:2014-05-23 02:59:37.535986:message - INFO: Checking disk/1
> on
node3.xxxxxxxxx.de
> 7:2014-05-23 02:59:37.743781:message - INFO: Checking volume
> groups
> 8:2014-05-23 02:59:37.878451:message STEP 2/6 Check peer
> consistency
> 9:2014-05-23 02:59:37.901140:message - INFO: Checking disk/0
> consistency on node
node3.xxxxxxxxx.de
> 10:2014-05-23 02:59:38.336108:message - INFO: Checking disk/1
> consistency on node
node3.xxxxxxxxx.de
> 11:2014-05-23 02:59:38.778363:message STEP 3/6 Allocate new
> storage
> 12:2014-05-23 02:59:38.827892:message - INFO: Adding new
> local storage on
node1.xxxxxxxxx.de for disk/0
> 13:2014-05-23 02:59:43.446056:message - INFO: Adding new
> local storage on
node1.xxxxxxxxx.de for disk/1
> 14:2014-05-23 02:59:50.388658:message STEP 4/6 Changing drbd
> configuration
> 15:2014-05-23 02:59:50.438092:message - INFO: activating a
> new drbd on
node1.xxxxxxxxx.de for disk/0
> 16:2014-05-23 02:59:59.627383:message - INFO: activating a
> new drbd on
node1.xxxxxxxxx.de for disk/1
> 17:2014-05-23 03:00:06.809704:message - INFO: Shutting down
> drbd for disk/0 on old node
> 18:2014-05-23 03:00:08.667964:message - INFO: Shutting down
> drbd for disk/1 on old node
> 19:2014-05-23 03:00:09.100976:message - INFO: Detaching
> primary drbds from the network (=> standalone)
> 20:2014-05-23 03:00:09.581068:message - INFO: Updating
> instance configuration
> 21:2014-05-23 03:00:09.810649:message - INFO: Attaching
> primary drbds to new secondary (standalone => connected)
> - disk/0: drbd, size 32.0G
> access mode: rw
> nodeA:
node3.xxxxxxxxx.de, minor=15
> nodeB:
node1.xxxxxxxxx.de, minor=7
> port: 11021
> auth key: 8206397b5460285e6c06cf2ce1be6c8d5f8e03d3
> on primary: /dev/drbd15 (147:15) in sync, status *DEGRADED*
> on secondary: /dev/drbd7 (147:7) in sync, status *DEGRADED*
> *UNCERTAIN STATE*
> name: None
> UUID: 51f4d88f-b228-4b74-ba92-ae9fb7e6862e
> child devices:
> - child 0: plain, size 32.0G
> logical_id: lvm/843d313c-0ca3-4fe6-9a8b-7923302d711b.disk0_data
> on primary:
> /dev/lvm/843d313c-0ca3-4fe6-9a8b-7923302d711b.disk0_data (253:38)
> on secondary:
> /dev/lvm/843d313c-0ca3-4fe6-9a8b-7923302d711b.disk0_data (253:24)
> name: None
> UUID: e163561b-4b39-45a3-87e6-c507d26fc622
> - child 1: plain, size 128M
> logical_id: lvm/843d313c-0ca3-4fe6-9a8b-7923302d711b.disk0_meta
> on primary:
> /dev/lvm/843d313c-0ca3-4fe6-9a8b-7923302d711b.disk0_meta (253:39)
> on secondary:
> /dev/lvm/843d313c-0ca3-4fe6-9a8b-7923302d711b.disk0_meta (253:25)
> name: None
> UUID: 3e0180bb-e0bc-48ce-b52b-50ee6635ea7b
> - disk/1: drbd, size 128.0G
> access mode: rw
> nodeA:
node3.xxxxxxxxx.de, minor=16
> nodeB:
node1.xxxxxxxxx.de, minor=8
> port: 11025
> auth key: ff7e90ea4f2dc4d7ebaf2c50fe4bdfdc4369d389
> on primary: /dev/drbd16 (147:16) in sync, status ok
> on secondary: /dev/drbd8 (147:8) in sync, status ok
> name: None
> UUID: 53d76932-8a06-4137-8636-791b001bcb2d
> child devices:
> - child 0: plain, size 128.0G
> logical_id: lvm/7978a477-4f0f-43b8-be39-f87ece633c22.disk1_data
> on primary:
> /dev/lvm/7978a477-4f0f-43b8-be39-f87ece633c22.disk1_data (253:40)
> on secondary:
> /dev/lvm/7978a477-4f0f-43b8-be39-f87ece633c22.disk1_data (253:26)
> name: None
> UUID: e1e4d4ea-20e6-48ac-9cfb-dccb14c98878
> - child 1: plain, size 128M
> logical_id: lvm/7978a477-4f0f-43b8-be39-f87ece633c22.disk1_meta
> on primary:
> /dev/lvm/7978a477-4f0f-43b8-be39-f87ece633c22.disk1_meta (253:41)
> on secondary:
> /dev/lvm/7978a477-4f0f-43b8-be39-f87ece633c22.disk1_meta (253:28)
> name: None
> UUID: 51553d17-ba4f-4ae8-bdfa-8de9dbbd8452
> root@node2 ~ # gnt-instance replace-disks -a lug-in
> Job 118452 is trying to acquire all necessary locks
> Fri May 23 04:54:41 2014 - INFO: Checking disk/0 on
node3.xxxxxxxxx.de
> Fri May 23 04:54:41 2014 - INFO: Checking disk/0 on
node1.xxxxxxxxx.de
> Fri May 23 04:54:41 2014 - INFO: Checking disk/1 on
node3.xxxxxxxxx.de
> Fri May 23 04:54:42 2014 - INFO: Checking disk/1 on
node1.xxxxxxxxx.de
> Fri May 23 04:54:44 2014 No disks need replacement for instance
> '
lug-in.xxxxxxxxx.de'
> root@node2 ~ # gnt-cluster verify
> Submitted jobs 118580, 118581
> Waiting for job 118580 ...
> Fri May 23 10:28:43 2014 * Verifying cluster config
> Fri May 23 10:28:43 2014 * Verifying cluster certificate files
> Fri May 23 10:28:43 2014 * Verifying hypervisor parameters
> Fri May 23 10:28:43 2014 * Verifying all nodes belong to an existing group
> Waiting for job 118581 ...
> Fri May 23 10:28:44 2014 * Verifying group 'default'
> Fri May 23 10:28:44 2014 * Gathering data (3 nodes)
> Fri May 23 10:28:48 2014 * Gathering disk information (3 nodes)
> Fri May 23 10:28:58 2014 * Verifying configuration file consistency
> Fri May 23 10:28:58 2014 * Verifying node status
> Fri May 23 10:28:58 2014 * Verifying instance status
> Fri May 23 10:28:58 2014 * Verifying orphan volumes
> Fri May 23 10:28:58 2014 - ERROR: node
node2.xxxxxxxxx.de: volume
> lvm/7978a477-4f0f-43b8-be39-f87ece633c22.disk1_data is unknown
> Fri May 23 10:28:58 2014 - ERROR: node
node2.xxxxxxxxx.de: volume
> lvm/843d313c-0ca3-4fe6-9a8b-7923302d711b.disk0_meta is unknown
> Fri May 23 10:28:58 2014 - ERROR: node
node2.xxxxxxxxx.de: volume
> lvm/843d313c-0ca3-4fe6-9a8b-7923302d711b.disk0_data is unknown
> Fri May 23 10:28:58 2014 - ERROR: node
node2.xxxxxxxxx.de: volume
> lvm/7978a477-4f0f-43b8-be39-f87ece633c22.disk1_meta is unknown
> Fri May 23 10:28:58 2014 * Verifying N+1 Memory redundancy
> Fri May 23 10:28:58 2014 * Other Notes
> Fri May 23 10:28:59 2014 - NOTICE: 1 non-redundant instance(s) found.
> Fri May 23 10:28:59 2014 - NOTICE: 3 non-auto-balanced instance(s)
> found.
> Fri May 23 10:28:59 2014 * Hooks Results
> root@node2 ~ # ssh node1 cat /proc/drbd
> version: 8.3.11 (api:88/proto:86-96)
> srcversion: F937DCB2E5D83C6CCE4A6C9
> [...]
> 7: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown r-----
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
> oos:33554176
> root@node2 ~ # ssh node3 cat /proc/drbd
> version: 8.3.11 (api:88/proto:86-96)
> srcversion: F937DCB2E5D83C6CCE4A6C9
> [...]
> 15: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
> ns:0 nr:0 dw:151228 dr:532788 al:217 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1
> wo:f oos:73096
> root@node2 ~ # gnt-job info 118464
> Job ID: 118464
> Status: error
> Received: 2014-05-23 05:16:18.058327
> Processing start: 2014-05-23 05:16:18.565548 (delta 0.507221s)
> Processing end: 2014-05-23 06:02:05.403075 (delta 2746.837527s)
> Total processing time: 2747.344748 seconds
> Opcodes:
> OP_INSTANCE_REPLACE_DISKS
> Status: error
> Processing start: 2014-05-23 05:16:18.565548
> Execution start: 2014-05-23 06:01:45.634275
> Processing end: 2014-05-23 06:02:05.402960
> Input fields:
> comment: None
> debug_level: 0
> depends: None
> disks:
> dry_run: False
> early_release: False
> ignore_ipolicy: False
> instance_name:
lug-in.xxxxxxxxx.de
> instance_uuid: a7c2d2f2-a248-4191-b75f-1a2ce53bba9a
> mode: replace_on_secondary
> priority: 0
> reason: ['gnt:client:gnt-instance', 'replace-disks',
> 1400814977550833920],['gnt:opcode:op_instance_replace_disks',
> 'job=118464;index=0', 1400814978058324224]
> Result:
> OpPrereqError
> [Instance '
lug-in.xxxxxxxxx.de' is marked to be up, cannot
> shutdown disks, wrong_state]
> Execution log:
> 1:2014-05-23 06:01:45.937374:message Replacing disk(s) 0, 1
> for instance '
lug-in.xxxxxxxxx.de'
> 2:2014-05-23 06:01:45.990016:message Current primary node:
>
node3.xxxxxxxxx.de
> 3:2014-05-23 06:01:46.073593:message Current seconary node:
>
node1.xxxxxxxxx.de
> 4:2014-05-23 06:01:58.846739:message - WARNING: Could not
> prepare block device disk/0 on node
node1.xxxxxxxxx.de
> (is_primary=False, pass=1): Error while assembling disk: drbd7:
> timeout while configuring network
> 5:2014-05-23 06:02:02.714689:message STEP 1/6 Check device
> existence
> 6:2014-05-23 06:02:02.797834:message - INFO: Checking disk/0
> on
node3.xxxxxxxxx.de
> 7:2014-05-23 06:02:03.112653:message - INFO: Checking disk/0
> on
node1.xxxxxxxxx.de
> 8:2014-05-23 06:02:03.417481:message - INFO: Checking disk/1
> on
node3.xxxxxxxxx.de
> 9:2014-05-23 06:02:03.690932:message - INFO: Checking disk/1
> on
node1.xxxxxxxxx.de
> 10:2014-05-23 06:02:03.949663:message - INFO: Checking volume
> groups
> 11:2014-05-23 06:02:05.040533:message STEP 2/6 Check peer
> consistency
> 12:2014-05-23 06:02:05.141663:message - INFO: Checking disk/0
> consistency on node
node3.xxxxxxxxx.de