Is there special flags for configuration for elliptics to check all nodes for data?

49 views
Skip to first unread message

Kirill Bushminkin

unread,
Sep 29, 2016, 5:17:24 PM9/29/16
to reverbrain
We have situation when data is misplaced (lying  on wrong nodes) When read operation is performed only two nodes is actually check for data in two groups (I assume this nodes is chose by route tables) and we got -2 error. Is there special configuration so Elliptics will try other nodes?

Evgeniy Polyakov

unread,
Sep 29, 2016, 6:11:28 PM9/29/16
to Kirill Bushminkin, reverbrain
HI Kirill

30.09.2016, 00:17, "Kirill Bushminkin" <kbush...@gmail.com>:
> We have situation when data is misplaced (lying  on wrong nodes) When read operation is performed only two nodes is actually check for data in two groups (I assume this nodes is chose by route tables) and we got -2 error. Is there special configuration so Elliptics will try other nodes?

There is no configuration flag per se, but there is IO flag for command which forces given request not to be routed
according to its ID, but instead to be sent directly to specified node.
You will have to write a helper function which will iterate over all nodes you want to check and send request
to each of them with direct bit set. C++ bindings have session::set_direct_id() set of methods, which accept
either address of the node or address+backend.

Kirill Bushminkin

unread,
Sep 29, 2016, 6:13:54 PM9/29/16
to reverbrain, kbush...@gmail.com, z...@ioremap.net
Hi Evgeniy,
  Thank you for promtp reply.
  But by default it will route it to all nodes or to only matched one?

K

пятница, 30 сентября 2016 г., 1:11:28 UTC+3 пользователь Evgeniy Polyakov написал:

Evgeniy Polyakov

unread,
Sep 29, 2016, 6:17:02 PM9/29/16
to Kirill Bushminkin, reverbrain
30.09.2016, 01:13, "Kirill Bushminkin" <kbush...@gmail.com>:
> Hi Evgeniy,
>   Thank you for promtp reply.
>   But by default it will route it to all nodes or to only matched one?

Any request will be sent to only one node no matter whether it has direct IO flag or not.

In some cases elliptics client library can perform additional steps (like checking another groups) transparently for the caller,
but there is no API calls which would accept a list of remote addresses and check them one after another.

Maybe its a good helper method to have actually

Kirill Bushminkin

unread,
Sep 29, 2016, 6:22:14 PM9/29/16
to reverbrain, kbush...@gmail.com, z...@ioremap.net
Thank you for clarification. I will follow your advice. All that time I thought that in case of read failure it will try to read it from other nodes.


K

пятница, 30 сентября 2016 г., 1:17:02 UTC+3 пользователь Evgeniy Polyakov написал:

Evgeniy Polyakov

unread,
Sep 29, 2016, 6:29:49 PM9/29/16
to Kirill Bushminkin, reverbrain


30.09.2016, 01:22, "Kirill Bushminkin" <kbush...@gmail.com>:
> Thank you for clarification. I will follow your advice. All that time I thought that in case of read failure it will try to read it from other nodes.

This is indeed implemented in elliptics client library - if you use read_data() and similar methods,
elliptics client will check groups one after another if read fails, but this will be not a single request or transaction,
but instead it will allocate and send new command every time. These methods use route table and group numbers
to get nodes which are supposed to host your key.

If you want to check nodes not according to route table, you have to use direct IO flag, but essentially
perform the same steps as read_data() family of methods.

I even think that using read_handler class as a base and overriding send_to_next_group() method should be enough.
Please share your work if you extend it this way, I believe this will be a good addition.

Kirill Bushminkin

unread,
Oct 7, 2016, 10:25:54 AM10/7/16
to reverbrain, kbush...@gmail.com, z...@ioremap.net
Hi Evgeniy,
  I run into weird behavior. May be you can explain it.
  I implemented code to ask node with direct flag (via set_direct_id in session class) And it seam to work. But in one of my test case it provide strange result.
Scenario
1) Running 1 node in group 1 and 1 node in group 2
2) Write some data
3) Checked with DIRECT flag - key on both nodes 
4) turn node from group 2 into node in group 1 (now we have two nodes in group 1)
5) Run read test again:
Expected: to find keys on all nodes as it was in previous test
Got: Found key on first node which was in group 1 all the time but not found on the second node. And got weird log with -95 error code:
2016-10-07 17:14:36.414251 0000000000000000/6605/6569 DEBUG: 1:1fc74bde7bfc33f3...0000000000000023: READ: received trans: 2 <- 127.0.0.1:42192/0: size: 208, cflags: 0x9 [need_ack|direct], status: 0. 
2016-10-07 17:14:36.414387 0000000000000000/6605/6569 DEBUG: 127.0.0.1:42192: 1:1fc74bde7bfc33f3...0000000000000023: RECV cmd: READ: cmd-size: 208, nonblocking: 0 
2016-10-07 17:14:36.414481 0000000000000000/6605/6569 DEBUG: 127.0.0.1:42192: 1:1fc74bde7bfc33f3...0000000000000023: backend_id: -1, place: 0xebcec0, backend_place: (nil), backend_place->pool->backend_id: -1, cmd->backend_id: -1 
2016-10-07 17:14:36.414609 0000000000000000/6577/6569 DEBUG: 127.0.0.1:42192: 1:1fc74bde7bfc33f3...0000000000000023: got IO event: 0x7f6188001640: cmd: READ, hsize: 120, dsize: 208, mode: BLOCKING, backend_id: -1 
2016-10-07 17:14:36.414710 0000000000000000/6577/6569 INFO: 1:1fc74bde7bfc33f3...0000000000000023: READ: client: 127.0.0.1:42192, trans: 2, cflags: 0x9 [need_ack|direct], io-flags: 0x0 [], io-offset: 0, io-size: 0/0, io-user-flags: 0x0, io-num: 0, ts: '1970-01-01 03:00:00.000000', time: 0 usecs, err: -95. 
2016-10-07 17:14:36.414802 0000000000000000/6577/6569 NOTICE: 1:1fc74bde7bfc33f3...0000000000000023: READ: ack trans: 2 -> 127.0.0.1:42192: cflags: 0x208 [direct|reply], status: -95. 
2016-10-07 17:14:36.414889 0000000000000000/6577/6569 DEBUG: Incrementing counter: 34, err: 0, value is: 5 0. 
2016-10-07 17:14:36.414907 0000000000000000/6605/6569 INFO: 1:1fc74bde7bfc33f3...0000000000000023: READ: sending trans: 2 -> 127.0.0.1:42192/-1: size: 0, cflags: 0x208 [direct|reply], start-sent: 0/120 
2016-10-07 17:14:36.414993 0000000000000000/6577/6569 DEBUG: 127.0.0.1:42192: 1:1fc74bde7bfc33f3...0000000000000023: processed IO event: 0x7f6188001640, cmd: READ 
2016-10-07 17:14:36.415127 0000000000000000/6605/6569 INFO: 1:1fc74bde7bfc33f3...0000000000000023: READ: sending trans: 2 -> 127.0.0.1:42192/-1: size: 0, cflags: 0x208 [direct|reply], finish-sent: 120/120 
2016-10-07 17:14:36.415506 0000000000000000/6605/6569 DEBUG: 2:1fc74bde7bfc33f3...0000000000000023: READ: received trans: 3 <- 127.0.0.1:42192/0: size: 208, cflags: 0x9 [need_ack|direct], status: 0. 
2016-10-07 17:14:36.415622 0000000000000000/6605/6569 DEBUG: 127.0.0.1:42192: 2:1fc74bde7bfc33f3...0000000000000023: RECV cmd: READ: cmd-size: 208, nonblocking: 0 
2016-10-07 17:14:36.415692 0000000000000000/6605/6569 DEBUG: 127.0.0.1:42192: 2:1fc74bde7bfc33f3...0000000000000023: backend_id: -1, place: 0xebcec0, backend_place: (nil), backend_place->pool->backend_id: -1, cmd->backend_id: -1 
2016-10-07 17:14:36.415799 0000000000000000/6575/6569 DEBUG: 127.0.0.1:42192: 2:1fc74bde7bfc33f3...0000000000000023: got IO event: 0x7f6188001600: cmd: READ, hsize: 120, dsize: 208, mode: BLOCKING, backend_id: -1 
2016-10-07 17:14:36.415929 0000000000000000/6575/6569 INFO: 2:1fc74bde7bfc33f3...0000000000000023: READ: client: 127.0.0.1:42192, trans: 3, cflags: 0x9 [need_ack|direct], io-flags: 0x0 [], io-offset: 0, io-size: 0/0, io-user-flags: 0x0, io-num: 0, ts: '1970-01-01 03:00:00.000000', time: 1 usecs, err: -95. 
2016-10-07 17:14:36.416100 0000000000000000/6575/6569 NOTICE: 2:1fc74bde7bfc33f3...0000000000000023: READ: ack trans: 3 -> 127.0.0.1:42192: cflags: 0x208 [direct|reply], status: -95. 
2016-10-07 17:14:36.416227 0000000000000000/6575/6569 DEBUG: Incrementing counter: 34, err: 0, value is: 6 0. 
2016-10-07 17:14:36.416257 0000000000000000/6605/6569 INFO: 2:1fc74bde7bfc33f3...0000000000000023: READ: sending trans: 3 -> 127.0.0.1:42192/-1: size: 0, cflags: 0x208 [direct|reply], start-sent: 0/120 
2016-10-07 17:14:36.416324 0000000000000000/6575/6569 DEBUG: 127.0.0.1:42192: 2:1fc74bde7bfc33f3...0000000000000023: processed IO event: 0x7f6188001600, cmd: READ 
2016-10-07 17:14:36.416422 0000000000000000/6605/6569 INFO: 2:1fc74bde7bfc33f3...0000000000000023: READ: sending trans: 3 -> 127.0.0.1:42192/-1: size: 0, cflags: 0x208 [direct|reply], finish-sent: 120/120

If I will turn the first node off - it will find key on second node.
And it will find on both if I make second node as group 2 again.

Any suggestions?

K

пятница, 30 сентября 2016 г., 1:29:49 UTC+3 пользователь Evgeniy Polyakov написал:

Evgeniy Polyakov

unread,
Oct 7, 2016, 2:14:55 PM10/7/16
to Kirill Bushminkin, reverbrain
Hi Kirill

07.10.2016, 17:27, "Kirill Bushminkin" <kbush...@gmail.com>:
>   I run into weird behavior. May be you can explain it.
>   I implemented code to ask node with direct flag (via set_direct_id in session class) And it seam to work. But in one of my test case it provide strange result.
> Scenario
> 1) Running 1 node in group 1 and 1 node in group 2
> 2) Write some data
> 3) Checked with DIRECT flag - key on both nodes

Do you specify backend ID when setting ID?

> 4) turn node from group 2 into node in group 1 (now we have two nodes in group 1)
> 5) Run read test again:
> Expected: to find keys on all nodes as it was in previous test
> Got: Found key on first node which was in group 1 all the time but not found on the second node. And got weird log with -95 error code:

Looks like your first test didn't use direct flag, and you just read the data usual way, and the second time you hadn't specify ID
-95 - operation not supported - this error returns generic command processor when it receives unknown command.
Every IO command (like read or write) must have backend ID where command is about to be processed, and you have -1 (default) in the log.

I've attached a simple program which connects to remote server and reads data from every backend in specified group.
Basically it reads current route table, finds out every server and every backend in it which joined required group and then
asks them one after another whether they have required key

$ g++ direct.cpp -o direct -lelliptics_cpp -lelliptics_client --std=c++0x -W -Wall -lboost_thread
$ ./direct
key: test_key, found at: 127.0.0.1:1026, backend: 1, io: io-flags: 0x0 [], io-offset: 0, io-size: 161064/161064, io-user-flags: 0x0, io-num: 0, ts: '2016-10-07 20:55:04.000000'
key: test_key, found at: 127.0.0.1:1026, backend: 2, io: io-flags: 0x0 [], io-offset: 0, io-size: 161064/161064, io-user-flags: 0x0, io-num: 0, ts: '2016-10-07 20:55:04.000000'
direct.cpp

Kirill Bushminkin

unread,
Oct 7, 2016, 2:53:39 PM10/7/16
to Evgeniy Polyakov, reverbrain
Evgeniy, thank you for this!
I have used s.set_direct_id(ip); I thought that if not specify backend it will try all on node.
Also I have passed array with both groups to s.set_groups({1,2}); is it valid or I should specify single every time?

Many thanks!
K

Evgeniy Polyakov

unread,
Oct 7, 2016, 4:44:39 PM10/7/16
to Kirill Bushminkin, reverbrain

07.10.2016, 21:53, "Kirill Bushminkin" <kbush...@gmail.com>:
> I have used s.set_direct_id(ip); I thought that if not specify backend it will try all on node.

No, server will only process command with precisely specified credentials like backend ID,
and if there is no backend command will be scheduled to be processed by common server pool,
which in turn doesn't accept IO commands, it is used for commands like checking status and so on.

> Also I have passed array with both groups to s.set_groups({1,2}); is it valid or I should specify single every time?

Actually direct command execution doesn't require group specification, since you are precisely specifying
server and backend where command has to be processed, group and key are used when you do not use
direct processing and thus elliptics client has to find out server/backend according to current route table.

But elliptics session code performs some basic command checks before sending it to remote server, in particular it checks whether session
has groups or not. When you are using direct command, you can specify any group even one that doesn't exist
it the route table (like 123456).
Reply all
Reply to author
Forward
0 new messages