[ovs-discuss] [IDL] unhandled/unexpected OVSDB reply to set_db_change_aware method

77 views
Skip to first unread message

Girish Moodalbail

unread,
Jul 7, 2020, 2:57:28 AM7/7/20
to ovs-discuss, ovn-kub...@googlegroups.com, Han Zhou, Dumitru Ceara, Numan Siddique
Hello all,

In the OVN Kubernetes project, we are seeing a ton of following errors in ovsdb-server-nb.log and ovsdb-server-sb.log.

---------8<------------8<------
2020-07-07T06:30:53.320Z|20733|jsonrpc|WARN|unix#3145: send error: Broken pipe
2020-07-07T06:30:53.320Z|20734|reconnect|WARN|unix#3145: connection dropped (Broken pipe)
---------8<------------8<------

(Note: The N in uniX#N used for unbound unix sockets is monotonically increasing number. So, with unix#3145 there are more than 3000 such errors in the log file)

I am going to be vague in explaining why it might be happening, so sincere apologies for that. So, this is what we are seeing and how to reproduce the issue.

1. Unhandled reply case
========================

(a) Client
~~~~~~~~~~~
$  ovn-sbctl -vjsonrpc -vstream --no-leader-only --db=unix:/var/run/openvswitch/ovnsb_db.sock get SB_Global . options:e2e_timestamp
<output snipped>
2020-07-07T04:54:22Z|00006|jsonrpc|DBG|unix:/var/run/openvswitch/ovnsb_db.sock: send request, method="set_db_change_aware", params=[true], id=3
<output snipped>
"1594097605"  (<-- returned value of e2e_timestamp)

(b) Server
~~~~~~~~~~~
<output snipped>
2020-07-07T06:30:53.320Z|20731|jsonrpc|DBG|unix#3145: received request, method="set_db_change_aware", params=[true], id=3
2020-07-07T06:30:53.320Z|20732|jsonrpc|DBG|unix#3145: send reply, result={}, id=3
2020-07-07T06:30:53.320Z|20733|jsonrpc|WARN|unix#3145: send error: Broken pipe
2020-07-07T06:30:53.320Z|20734|reconnect|WARN|unix#3145: connection dropped (Broken pipe)
<output snipped>

So, the OVSDB server did try to send the reply. However, before it could send it through stream-fd.c`fd_send() the client program exits and closed the Unix Socket and we see broken pipe on the sender

2. Unexpected reply case
=========================
(a) Client
~~~~~~~~~~~
$ ovn-sbctl -v --no-leader-only --db=unix:/var/run/openvswitch/ovnsb_db.sock get datapath_binding 3b735851-db34-489a-a54c-8bb947431a74 external_ids:logical-switch

<output snipped>
2020-07-07T06:43:18Z|00021|ovsdb_idl|DBG|unix:/var/run/openvswitch/ovnsb_db.sock: DATA_MONITOR_COND_SINCE_REQUESTED -> MONITORING at lib/ovsdb-idl.c:764
2020-07-07T06:43:18Z|00022|jsonrpc|DBG|unix:/var/run/openvswitch/ovnsb_db.sock: received reply, result={}, id=3
2020-07-07T06:43:18Z|00023|ovsdb_idl|DBG|unix:/var/run/openvswitch/ovnsb_db.sock: received unexpected reply message: {"error":null,"id":3,"result":{}}
<output snipped>


(b) Server
~~~~~~~~~~~
<output snipped>
2020-07-07T06:43:18.402Z|21691|jsonrpc|DBG|unix#3281: received request, method="set_db_change_aware", params=[true], id=3
2020-07-07T06:43:18.402Z|21692|jsonrpc|DBG|unix#3281: send reply, result={}, id=3
<output snipped>

So, in the 2nd case the Server's reply makes it all the way to the Client, but the client is not expecting it.

So, looking at the lib/ovsdb-idl.c's ovsdb_idl_process_response()....

ovsdb_idl_process_response() {

    case IDL_S_SERVER_MONITOR_COND_REQUESTED:
        if (ok) {
            idl->server.monitoring = OVSDB_IDL_MONITORING_COND;
            ovsdb_idl_db_parse_monitor_reply(&idl->server, msg->result,
                                             OVSDB_IDL_MM_MONITOR_COND);
            if (ovsdb_idl_check_server_db(idl)) {
                ovsdb_idl_send_db_change_aware(idl);
            }
        } else {

}

... I see that we call the OVSDB set_db_change_aware method through ovsdb_idl_send_db_change_aware(), but there is no state machine to process the reply for it. In the 1st case, the client exits early and connection is broken? Whilst, in the 2nd case the reply comes through but the code is not expecting it.

Regards,
~Girish

Dan Williams

unread,
Jul 9, 2020, 9:40:18 AM7/9/20
to Girish Moodalbail, ovs-discuss, ovn-kub...@googlegroups.com, Han Zhou, Dumitru Ceara, Numan Siddique
I guess we're not seeing this at Red Hat because we're using
monitor_all still... anyone have thoughts about this issue?

Dan

Dumitru Ceara

unread,
Jul 9, 2020, 3:02:50 PM7/9/20
to Dan Williams, Girish Moodalbail, ovs-discuss, ovn-kub...@googlegroups.com, Han Zhou, Numan Siddique
Hi Girish, Dan,

A bit as a side note, I wonder if there's actually a benefit from using
"set_db_change_aware" on clients that use "short lived" connections to
the DB. It does make sense for long lived connections though.

The problem here is that we always call ovsdb_idl_send_db_change_aware()
after monitor_cond_since(data). And the replies will also arrive in the
same order:
- first monitor_cond_since reply
- then set_db_change_aware reply

If the second one is delayed (e.g. network issues), the IDL might
process the monitor_cond_since(data) reply and the client (ovn-sbctl
will close the connection).

After a quick glance at the code I think it should be safe to first do
set_db_change_aware and afterwards monitor_cond_since(data).
For case 2, it would be possible to handle the set_db_change_aware
result. As far as I see though the "unexpected reply message" DBG log in
the current implementation is quite benign. And the handler would
basically just ignore the reply because the reply is always the same:

https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb-server.7.rst#4118-database-change-awareness

>
> I guess we're not seeing this at Red Hat because we're using
> monitor_all still... anyone have thoughts about this issue?
>
> Dan
>

For the first case (ovn-sbctl closing the connection too early), we
process messages from the server in batches [0] so we'd only hit the
issue if the reply for set_db_change_aware is delayed and is not
processed in the same batch as the reply for monitor_cond_since. Maybe
that's why we don't hit the issue too often.

Regards,
Dumitru

[0] https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L923

Dumitru Ceara

unread,
Jul 10, 2020, 7:39:43 AM7/10/20
to Dan Williams, Girish Moodalbail, ovs-discuss, ovn-kub...@googlegroups.com, Han Zhou, Numan Siddique, Ben Pfaff, Ilya Maximets
I went ahead and sent a patch for this operation order change:

https://patchwork.ozlabs.org/project/openvswitch/patch/1594380801-32134-1-g...@redhat.com/

I tested it locally and it works fine. I had to simulate a delay in
processing the set_db_change_aware request on the server side such that
the IDL client (ovn-sbctl) closes the connection before the server has a
chance to reply to set_db_change_aware. With the patch applied, the
server will always send the reply to set_db_change_aware before sending
the reply to monitor_cond_since(data).

I couldn't find a way to add a unit test for this scenario though. Ideas
are welcome.

Thanks,
Dumitru

Reply all
Reply to author
Forward
0 new messages