Converting instance to DRBD

Keith Edmunds

unread,

Jan 29, 2011, 4:31:35 PM1/29/11

to gan...@googlegroups.com

I'm trying to convert a number of instances from '-t plain' to '-t drbd'.
There's some history here of failed attempts, which may be the sole
problem, but right now I get:

# gnt-instance modify -t drbd -n secondary-node instance-name
Sat Jan 29 21:22:42 2011 Converting template to drbd
Sat Jan 29 21:22:42 2011 Creating aditional volumes...
Sat Jan 29 21:22:44 2011 Renaming original volumes...
Sat Jan 29 21:22:44 2011 Initializing DRBD devices...
Sat Jan 29 21:22:45 2011 - INFO: Waiting for instance
secondary-node to sync disks.
Sat Jan 29 21:22:56 2011 - INFO: Instance instance-name's
disks are in sync.
Failure: command execution error:
There are some degraded disks for this instance, please cleanup manually

Does anyone have a pointer to what "cleanup manually" entails?

/proc/drbd on the primary:

version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/Inconsistent C r----
ns:0 nr:0 dw:0 dr:984 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b
oos:10485760

...and on the secondary:

version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b
oos:10485760

Clearly the secondary needs to be told to sync with the primary, but I
don't know how to do that with no resources defined. Also, the output from
the conversion indicates that the disk are in sync (which I don't believe).

Advice welcome!

Thanks,
Keith

Iustin Pop

unread,

Jan 29, 2011, 7:04:33 PM1/29/11

to gan...@googlegroups.com

On Sat, Jan 29, 2011 at 09:31:35PM +0000, Keith Edmunds wrote:
> I'm trying to convert a number of instances from '-t plain' to '-t drbd'.
> There's some history here of failed attempts, which may be the sole
> problem, but right now I get:
>
> # gnt-instance modify -t drbd -n secondary-node instance-name
> Sat Jan 29 21:22:42 2011 Converting template to drbd
> Sat Jan 29 21:22:42 2011 Creating aditional volumes...
> Sat Jan 29 21:22:44 2011 Renaming original volumes...
> Sat Jan 29 21:22:44 2011 Initializing DRBD devices...
> Sat Jan 29 21:22:45 2011 - INFO: Waiting for instance
> secondary-node to sync disks.
> Sat Jan 29 21:22:56 2011 - INFO: Instance instance-name's
> disks are in sync.
> Failure: command execution error:
> There are some degraded disks for this instance, please cleanup manually
>
> Does anyone have a pointer to what "cleanup manually" entails?

Basically it means "you have to make DRBD work again manually".

> /proc/drbd on the primary:
>
> version: 8.3.7 (api:88/proto:86-91)
> srcversion: EE47D8BF18AC166BE219757
> 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/Inconsistent C r----
> ns:0 nr:0 dw:0 dr:984 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b
> oos:10485760
>
> ...and on the secondary:
>
> version: 8.3.7 (api:88/proto:86-91)
> srcversion: EE47D8BF18AC166BE219757
> 0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown r----
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b
> oos:10485760
>
> Clearly the secondary needs to be told to sync with the primary, but I
> don't know how to do that with no resources defined. Also, the output from
> the conversion indicates that the disk are in sync (which I don't believe).

The message means that the sync has finished, but not necessarily
successfully.

Have you run "gnt-cluster verify" and it finishes OK?

iustin

Keith Edmunds

unread,

Jan 30, 2011, 3:56:46 AM1/30/11

to gan...@googlegroups.com

> Have you run "gnt-cluster verify" and it finishes OK?

Yes:
# gnt-cluster verify
Sun Jan 30 08:55:23 2011 * Verifying global settings
Sun Jan 30 08:55:23 2011 * Gathering data (2 nodes)
Sun Jan 30 08:55:24 2011 * Verifying node status
Sun Jan 30 08:55:24 2011 * Verifying instance status
Sun Jan 30 08:55:24 2011 * Verifying orphan volumes
Sun Jan 30 08:55:24 2011 * Verifying orphan instances
Sun Jan 30 08:55:24 2011 * Verifying N+1 Memory redundancy
Sun Jan 30 08:55:24 2011 * Other Notes
Sun Jan 30 08:55:24 2011 - NOTICE: 6 non-redundant instance(s) found.
Sun Jan 30 08:55:24 2011 * Hooks Results

I'll set up a manual DRBD resource and see if that works.

Thanks for the help.

Keith

Iustin Pop

unread,

Jan 30, 2011, 7:32:44 AM1/30/11

to gan...@googlegroups.com

Hmm. My guess was that the DRBD helper is not correctly set, but cluster
verify should have warned about that. In any case, what is your current
usermode_helper?

You should also look at dmesg, usually it gives the reason why the peers
have disconnected.

iustin

Keith Edmunds

unread,

Jan 30, 2011, 8:12:52 AM1/30/11

to gan...@googlegroups.com

> In any case, what is your current usermode_helper?

$ grep drbd /etc/modules
drbd minor_count=128 usermode_helper=/bin/true

...BUT: thanks for the clue. I looked in dmesg and saw:

On master:
[ 842.689988] block drbd0: meta connection shut down by peer.

On secondary:
[10815.644569] block drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0
[10815.645210] block drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0 exit code 10 (0xa00)

...which show the incorrect usermode_helper. Further experimentation
showed that the drbd module was being loaded by udev (or at least prior
to /etc/modules being read), so I created an /etc/modprobe.d/local file
and put the options in there.

That seems to have partly fixed the problem in that synchronisation
definitely starts now, but it still fails with read errors. I'll look into
that to see if I can resolve it.

Thanks for your help, and hopefully the detail above will help anyone else
in a similar position.

Regards,
Keith

Reply all

Reply to author

Forward