Initial DIR & MRC Replication initialization

39 views
Skip to first unread message

Ivan Postnikov

unread,
Jun 29, 2018, 7:29:33 AM6/29/18
to XtreemFS
Hello..

I'm trying to build a distributed environment.

3 servers.
DIR, MRC, OSD on all of them.

Starting with a single DIR, trying to make MRC replication working.
Using unstable 1.6 branch as it was mentioned on the group that is has many replication glitched fixed.

server-repl-plugin/mrc.properties for all machines do look the same as:

plugin.jar = /usr/share/java/babudb-replication-plugin.ja

rbabudb.repl.participant.0 = testfs01e
babudb.repl.participant.0.port = 35676
babudb.repl.participant.1 = testfs01h
babudb.repl.participant.1.port = 35676
babudb.repl.participant.2 = testfs02h
babudb.repl.participant.2.port = 35676

But during initial handshake MRC fails.
This is debug with level 7, component = replication, most interesting part of it, just after MSG_PREPARE, MSG_PREPARE_ACK, MSG_ACCEPT, MSG_ACCEPT_ACK

[ D | FleaseMessageSender  | FleaseSt        |  44 | 2018-06-29 14:16:04.031 ] Sending 'FleaseMessage ( type=MSG_LEARN cell=replication v=0 b=(1530270962956;-1599385233) lease=2a02:6b8:b010:20:76d4:35ff:fecb:bc7c:35676/1530271023969(Fri Jun 29 14:17:03 MSK 2018) prevb=(0;0) ts=1530270964052(Fri Jun 29 14:16:04 MSK 2018) addr=testfs01h/2a02:6b8:b010:20:76d4:35ff:fecb:bc7c:35676 mepoch=-1)' from 'testfs01h/2a02:6b8:b010:20:76d4:35ff:fecb:bc7c:35676' to 'testfs01e/2a02:6b8:0:1626:225:90ff:fec9:4e44:35676' ...
[ D | RPCClientRequest     | BabuDB Trans... |  38 | 2018-06-29 14:16:04.031 ] send buffer #1: ReusableBuffer( capacity=8192 limit=73 position=0)
[ D | RPCClientRequest     | BabuDB Trans... |  38 | 2018-06-29 14:16:04.031 ] send buffer #2: ReusableBuffer( capacity=122 limit=122 position=0)
[ D | FleaseHolder         | FleaseSt        |  44 | 2018-06-29 14:16:04.031 ] Received new Lease (replication: 2a02:6b8:b010:20:76d4:35ff:fecb:bc7c:35676/1530271023969).
[ D | RPCClientRequest     | BabuDB Trans... |  38 | 2018-06-29 14:16:04.031 ] sending record marker: 30/43/122
[ E | FleaseStage          | FleaseSt        |  44 | 2018-06-29 14:16:04.032 ] service ***CRASHED***, shutting down
[ D | RPCClientRequest     | BabuDB Trans... |  38 | 2018-06-29 14:16:04.032 ] send buffer #1: ReusableBuffer( capacity=8192 limit=73 position=0)
[ E | FleaseStage          | FleaseSt        |  44 | 2018-06-29 14:16:04.032 ] java.lang.Exception: java.lang.AssertionError
[ D | RPCClientRequest     | BabuDB Trans... |  38 | 2018-06-29 14:16:04.032 ] send buffer #2: ReusableBuffer( capacity=122 limit=122 position=0)
 ...                                           org.xtreemfs.foundation.flease.proposer.FleaseProposer.processMessage(FleaseProposer.java:424)
 ...                                           org.xtreemfs.foundation.flease.FleaseStage.run(FleaseStage.java:423)
[ E | FleaseStage          | FleaseSt        |  44 | 2018-06-29 14:16:04.033 ] root cause: java.lang.AssertionError
 ...                                           org.xtreemfs.babudb.replication.control.FleaseHolder.getAddress(FleaseHolder.java:144)
 ...                                           org.xtreemfs.babudb.replication.control.FleaseHolder.statusChanged(FleaseHolder.java:119)
 ...                                           org.xtreemfs.foundation.flease.FleaseStage.learnedEvent(FleaseStage.java:314)
 ...                                           org.xtreemfs.foundation.flease.acceptor.FleaseAcceptor.handleLEARN(FleaseAcceptor.java:268)
 ...                                           org.xtreemfs.foundation.flease.proposer.FleaseProposer.learn(FleaseProposer.java:1066)
 ...                                           org.xtreemfs.foundation.flease.proposer.FleaseProposer.processAcceptResponse(FleaseProposer.java:985)
 ...                                           org.xtreemfs.foundation.flease.proposer.FleaseProposer.processMessage(FleaseProposer.java:413)
 ...                                           org.xtreemfs.foundation.flease.FleaseStage.run(FleaseStage.java:423)


What else could be usable and/or useful ?
Does anyone has the clues where to dig ?


--
Regards,
Ivan Postnikov

Robert Schmidtke

unread,
Jul 2, 2018, 10:59:47 AM7/2/18
to XtreemFS
Hi Ivan,

thanks for your message. Let me begin with pointing out that MRC and DIR replication have yet to be improved, even in the 1.6 unstable branch, as we know there are issues during failover. We currently do not recommend using MRC and DIR replication in production environments.
That being said, here are a couple of things you could do to help address the issue:

- Check the relevant User Guide Section: http://www.xtreemfs.org/xtfs-guide-1.5.1/index.html#tth_sEc6.3
- Check the provided example configuration files (mrc-test1.properties, mrc-test2.properties, mrc-test3.properties): https://github.com/xtreemfs/xtreemfs/tree/master/contrib/server-repl-plugin/src/main/resources/config
    - Specifically check the order of entries in the configuration files. They are NOT the same.
    - Check spelling and formatting (at least from your message, there might be a line break after babudb-replication-plugin.ja (note the missing r which appears in the next line). I assume this is a copy-paste issue though).
- Post your entire log files of DIR and all MRCs involved. I would assume INFO level is sufficient here. Be sure to anonymize them propery, or send them to me privately -- find my corporate e-mail address here: http://www.zib.de/members/schmidtke

I hope this helps, please let us know how everything turns out.

Cheers
Robert
Reply all
Reply to author
Forward
0 new messages