[PATCH] rpc: Fix use after free when connection outlives its server

0 views
Skip to first unread message

Rafael Ávila de Espíndola

<espindola@scylladb.com>
unread,
Feb 12, 2019, 5:20:17 PM2/12/19
to seastar-dev@googlegroups.com, gleb@scylladb.com, Rafael Ávila de Espíndola
It is possible to have multiple connections with the
invalid_connection_id id. When that happens, the server no longer has
a complete list of connections

Without a complete list, the future returned by server::stop might
complete while a connection is still running and

_server._conns.erase(get_connection_id());

in server::connection::process might access free memory.

Signed-off-by: Rafael Ávila de Espíndola <espi...@scylladb.com>
---
include/seastar/rpc/rpc.hh | 12 +++++++++---
src/rpc/rpc.cc | 26 ++++++++++++++++++--------
2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/include/seastar/rpc/rpc.hh b/include/seastar/rpc/rpc.hh
index 77471670..606ce9a1 100644
--- a/include/seastar/rpc/rpc.hh
+++ b/include/seastar/rpc/rpc.hh
@@ -515,7 +515,13 @@ class server {
server_socket _ss;
resource_limits _limits;
rpc_semaphore _resources_available;
- std::unordered_map<connection_id, shared_ptr<connection>> _conns;
+
+ // There can be many connections with invalid_connection_id, so keep both a map and a set.
+ std::unordered_map<connection_id, shared_ptr<connection>> _conns_map;
+ std::unordered_set<shared_ptr<connection>> _all_conns;
+ void add_connection(shared_ptr<connection>);
+ void remove_connection(shared_ptr<connection>);
+
promise<> _ss_stopped;
gate _reply_gate;
server_options _options;
@@ -530,8 +536,8 @@ class server {
future<> stop();
template<typename Func>
void foreach_connection(Func&& f) {
- for (auto c : _conns) {
- f(*c.second);
+ for (auto c : _all_conns) {
+ f(*c);
}
}
gate& reply_gate() {
diff --git a/src/rpc/rpc.cc b/src/rpc/rpc.cc
index 575c8361..0926bfb4 100644
--- a/src/rpc/rpc.cc
+++ b/src/rpc/rpc.cc
@@ -725,6 +725,16 @@ namespace rpc {
{}


+ void server::add_connection(shared_ptr<connection> c) {
+ _conns_map.emplace(c->get_connection_id(), c);
+ _all_conns.insert(c);
+ }
+
+ void server::remove_connection(shared_ptr<connection> c) {
+ _conns_map.erase(c->get_connection_id());
+ _all_conns.erase(c);
+ }
+
future<feature_map>
server::connection::negotiate(feature_map requested) {
feature_map ret;
@@ -751,15 +761,15 @@ namespace rpc {
_parent_id = deserialize_connection_id(e.second);
_is_stream = true;
// remove stream connection from rpc connection list
- _server._conns.erase(get_connection_id());
+ _server.remove_connection(shared_from_this());
f = smp::submit_to(_parent_id.shard(), [this, c = make_foreign(static_pointer_cast<rpc::connection>(shared_from_this()))] () mutable {
auto sit = _servers.find(*_server._options.streaming_domain);
if (sit == _servers.end()) {
throw std::logic_error(format("Shard {:d} does not have server with streaming domain {:x}", engine().cpu_id(), *_server._options.streaming_domain).c_str());
}
auto s = sit->second;
- auto it = s->_conns.find(_parent_id);
- if (it == s->_conns.end()) {
+ auto it = s->_conns_map.find(_parent_id);
+ if (it == s->_conns_map.end()) {
throw std::logic_error(format("Unknown parent connection {:d} on shard {:d}", _parent_id, engine().cpu_id()).c_str());
}
auto id = c->get_connection_id();
@@ -908,7 +918,7 @@ namespace rpc {
_stream_queue.abort(std::make_exception_ptr(stream_closed()));
return stop_send_loop().then_wrapped([this] (future<> f) {
f.ignore_ready_future();
- _server._conns.erase(get_connection_id());
+ _server.remove_connection(shared_from_this());
if (is_stream()) {
return deregister_this_stream();
} else {
@@ -935,8 +945,8 @@ namespace rpc {
auto sit = server::_servers.find(*_server._options.streaming_domain);
if (sit != server::_servers.end()) {
auto s = sit->second;
- auto it = s->_conns.find(_parent_id);
- if (it != s->_conns.end()) {
+ auto it = s->_conns_map.find(_parent_id);
+ if (it != s->_conns_map.end()) {
it->second->_streams.erase(get_connection_id());
}
}
@@ -975,7 +985,7 @@ namespace rpc {
id = {_next_client_id++ << 16 | uint16_t(engine().cpu_id())};
}
auto conn = _proto->make_server_connection(*this, std::move(fd), std::move(addr), id);
- _conns.emplace(id, conn);
+ add_connection(conn);
conn->process();
});
}).then_wrapped([this] (future<>&& f){
@@ -995,7 +1005,7 @@ namespace rpc {
_servers.erase(*_options.streaming_domain);
}
return when_all(_ss_stopped.get_future(),
- parallel_for_each(_conns | boost::adaptors::map_values, [] (shared_ptr<connection> conn) {
+ parallel_for_each(_all_conns, [this] (shared_ptr<connection> conn) {
return conn->stop();
}),
_reply_gate.close()
--
2.20.1

Benny Halevy

<bhalevy@scylladb.com>
unread,
Feb 13, 2019, 3:20:03 AM2/13/19
to Rafael Ávila de Espíndola, seastar-dev@googlegroups.com, gleb@scylladb.com
LGTM

Gleb Natapov

<gleb@scylladb.com>
unread,
Feb 13, 2019, 3:32:35 AM2/13/19
to Rafael Ávila de Espíndola, seastar-dev@googlegroups.com
On Tue, Feb 12, 2019 at 02:20:03PM -0800, Rafael Ávila de Espíndola wrote:
> It is possible to have multiple connections with the
> invalid_connection_id id. When that happens, the server no longer has
> a complete list of connections
>
All ids are either invalid or non invalid. The simple solution is to
make invalid ids unique as well. Something like that:


diff --git a/include/seastar/rpc/rpc_types.hh b/include/seastar/rpc/rpc_types.hh
index 666294562b..729e191f5c 100644
--- a/include/seastar/rpc/rpc_types.hh
+++ b/include/seastar/rpc/rpc_types.hh
@@ -227,14 +227,20 @@ struct connection_id {
return id == o.id;
}
operator bool() const {
- return id;
+ return shard() != 0xffff;
}
- size_t shard() {
+ size_t shard() const {
return size_t(id & 0xffff);
}
+ static connection_id make_invalid_id(uint64_t id) {
+ return {id << 16 | 0xffff};
+ }
+ static connection_id make_id(uint64_t id, uint16_t shard) {
+ return {id << 16 | shard};
+ }
};

-constexpr connection_id invalid_connection_id{0};
+const connection_id invalid_connection_id = connection_id::make_invalid_id(0);

std::ostream& operator<<(std::ostream&, const connection_id&);

diff --git a/src/rpc/rpc.cc b/src/rpc/rpc.cc
index 575c8361c9..8251cc0327 100644
--- a/src/rpc/rpc.cc
+++ b/src/rpc/rpc.cc
@@ -970,10 +970,9 @@ namespace rpc {
keep_doing([this] () mutable {
return _ss.accept().then([this] (connected_socket fd, socket_address addr) mutable {
fd.set_nodelay(_options.tcp_nodelay);
- connection_id id = invalid_connection_id;
- if (_options.streaming_domain) {
- id = {_next_client_id++ << 16 | uint16_t(engine().cpu_id())};
- }
+ connection_id id = _options.streaming_domain ?
+ connection_id::make_id(_next_client_id++, uint16_t(engine().cpu_id())) :
+ connection_id::make_invalid_id(_next_client_id++);
auto conn = _proto->make_server_connection(*this, std::move(fd), std::move(addr), id);
_conns.emplace(id, conn);
conn->process();
--
Gleb.

Avi Kivity

<avi@scylladb.com>
unread,
Feb 13, 2019, 3:35:21 AM2/13/19
to Rafael Ávila de Espíndola, seastar-dev@googlegroups.com, gleb@scylladb.com
On 13/02/2019 00.20, Rafael Ávila de Espíndola wrote:
> It is possible to have multiple connections with the
> invalid_connection_id id. When that happens, the server no longer has
> a complete list of connections
>
> Without a complete list, the future returned by server::stop might
> complete while a connection is still running and
>
> _server._conns.erase(get_connection_id());
>
> in server::connection::process might access free memory.


Connections should not outlive the server. But that does not match the
description here. What is going on?


> Signed-off-by: Rafael Ávila de Espíndola <espi...@scylladb.com>
> ---
> include/seastar/rpc/rpc.hh | 12 +++++++++---
> src/rpc/rpc.cc | 26 ++++++++++++++++++--------
> 2 files changed, 27 insertions(+), 11 deletions(-)
>
> diff --git a/include/seastar/rpc/rpc.hh b/include/seastar/rpc/rpc.hh
> index 77471670..606ce9a1 100644
> --- a/include/seastar/rpc/rpc.hh
> +++ b/include/seastar/rpc/rpc.hh
> @@ -515,7 +515,13 @@ class server {
> server_socket _ss;
> resource_limits _limits;
> rpc_semaphore _resources_available;
> - std::unordered_map<connection_id, shared_ptr<connection>> _conns;
> +
> + // There can be many connections with invalid_connection_id, so keep both a map and a set.
> + std::unordered_map<connection_id, shared_ptr<connection>> _conns_map;


One of them can a regular pointer, since the lifetime will be bounded.
In fact probably both, but not in scope for this patch.


> + std::unordered_set<shared_ptr<connection>> _all_conns;
> + void add_connection(shared_ptr<connection>);
> + void remove_connection(shared_ptr<connection>);
> +
> promise<> _ss_stopped;
> gate _reply_gate;
> server_options _options;
> @@ -530,8 +536,8 @@ class server {
> future<> stop();
> template<typename Func>
> void foreach_connection(Func&& f) {
> - for (auto c : _conns) {
> - f(*c.second);
> + for (auto c : _all_conns) {
> + f(*c);
> }
> }
> gate& reply_gate() {
> diff --git a/src/rpc/rpc.cc b/src/rpc/rpc.cc
> index 575c8361..0926bfb4 100644
> --- a/src/rpc/rpc.cc
> +++ b/src/rpc/rpc.cc
> @@ -725,6 +725,16 @@ namespace rpc {
> {}
>
>
> + void server::add_connection(shared_ptr<connection> c) {
> + _conns_map.emplace(c->get_connection_id(), c);
> + _all_conns.insert(c);


If the second fails, the first has to be backed out.

Rafael Avila de Espindola

<espindola@scylladb.com>
unread,
Feb 13, 2019, 10:42:50 AM2/13/19
to Gleb Natapov, seastar-dev@googlegroups.com
Gleb Natapov <gl...@scylladb.com> writes:

> On Tue, Feb 12, 2019 at 02:20:03PM -0800, Rafael Ávila de Espíndola wrote:
>> It is possible to have multiple connections with the
>> invalid_connection_id id. When that happens, the server no longer has
>> a complete list of connections
>>
> All ids are either invalid or non invalid. The simple solution is to
> make invalid ids unique as well. Something like that:

The general idea sounds good to me.

> diff --git a/include/seastar/rpc/rpc_types.hh b/include/seastar/rpc/rpc_types.hh
> index 666294562b..729e191f5c 100644
> --- a/include/seastar/rpc/rpc_types.hh
> +++ b/include/seastar/rpc/rpc_types.hh
> @@ -227,14 +227,20 @@ struct connection_id {
> return id == o.id;
> }
> operator bool() const {
> - return id;
> + return shard() != 0xffff;
> }
> - size_t shard() {
> + size_t shard() const {
> return size_t(id & 0xffff);
> }
> + static connection_id make_invalid_id(uint64_t id) {
> + return {id << 16 | 0xffff};
> + }
> + static connection_id make_id(uint64_t id, uint16_t shard) {
> + return {id << 16 | shard};
> + }

How about an assert that shard != 0xffff?

> };
>
> -constexpr connection_id invalid_connection_id{0};
> +const connection_id invalid_connection_id = connection_id::make_invalid_id(0);

If invalid ids are to be unique, we should replace every use of this
constant, no?

Cheers,
Rafael

Rafael Avila de Espindola

<espindola@scylladb.com>
unread,
Feb 13, 2019, 10:47:25 AM2/13/19
to Avi Kivity, seastar-dev@googlegroups.com, gleb@scylladb.com
Avi Kivity <a...@scylladb.com> writes:

> On 13/02/2019 00.20, Rafael Ávila de Espíndola wrote:
>> It is possible to have multiple connections with the
>> invalid_connection_id id. When that happens, the server no longer has
>> a complete list of connections
>>
>> Without a complete list, the future returned by server::stop might
>> complete while a connection is still running and
>>
>> _server._conns.erase(get_connection_id());
>>
>> in server::connection::process might access free memory.
>
>
> Connections should not outlive the server. But that does not match the
> description here. What is going on?

We have
std::unordered_map<connection_id, shared_ptr<connection>> _conns;

and we add two connections:

_conns[invalid_connection_id] = ...;
_conns[invalid_connection_id] = ...;

During shutdown we wait for all connections in _conns, but we will have
only one with invalid_connection_id.

Do you have a preference on a solution like this or what Gleb has
proposed (having multiple, unique, invalid ids)?

Cheers,
Rafael

Avi Kivity

<avi@scylladb.com>
unread,
Feb 13, 2019, 10:51:46 AM2/13/19
to Rafael Avila de Espindola, seastar-dev@googlegroups.com, gleb@scylladb.com
I have a preference for understanding the problem first. I don't
understand how a connection can outlive the server.


Gleb Natapov

<gleb@scylladb.com>
unread,
Feb 13, 2019, 11:02:53 AM2/13/19
to Rafael Avila de Espindola, seastar-dev@googlegroups.com
In my new version I call make_id(0 from make_invalid_id() :)

> > };
> >
> > -constexpr connection_id invalid_connection_id{0};
> > +const connection_id invalid_connection_id = connection_id::make_invalid_id(0);
>
> If invalid ids are to be unique, we should replace every use of this
> constant, no?
>
The uniqnes is important in only one place: when a server assign an id
to a new connection. The patch makes sure to produce unique value there,
in all other places invalid_connection_id marks that id is not assigned
yet.

--
Gleb.

Gleb Natapov

<gleb@scylladb.com>
unread,
Feb 13, 2019, 11:06:05 AM2/13/19
to Avi Kivity, Rafael Avila de Espindola, seastar-dev@googlegroups.com
It cannot, unless there is a bug. Currently there is such bug in case
server is not configured to use streams. My (and Rafael's) patch fixes it, but
mine is much simple and does not require to keep two containers in sync.

--
Gleb.

Avi Kivity

<avi@scylladb.com>
unread,
Feb 13, 2019, 11:21:54 AM2/13/19
to Gleb Natapov, Rafael Avila de Espindola, seastar-dev@googlegroups.com
So the bug is that the connection outlives its server, not a
use-after-free because it outlives its server.


Gleb Natapov

<gleb@scylladb.com>
unread,
Feb 13, 2019, 11:45:12 AM2/13/19
to Avi Kivity, Rafael Avila de Espindola, seastar-dev@googlegroups.com
The bug is that connections are not properly inserted into a container
that is used to stop all connections when server is stopped. That can
cause two problems: 1. a served does not wait for connections when it goes
down. 2. a connection can be freed prematurely since its shared pointer is
not stored in the server like it should which will cause use after free.

--
Gleb.

Avi Kivity

<avi@scylladb.com>
unread,
Feb 13, 2019, 11:50:10 AM2/13/19
to Gleb Natapov, Rafael Avila de Espindola, seastar-dev@googlegroups.com
I understand now, thanks.

Rafael Avila de Espindola

<espindola@scylladb.com>
unread,
Feb 13, 2019, 1:03:03 PM2/13/19
to Gleb Natapov, seastar-dev@googlegroups.com
Gleb Natapov <gl...@scylladb.com> writes:

>> > -constexpr connection_id invalid_connection_id{0};
>> > +const connection_id invalid_connection_id = connection_id::make_invalid_id(0);
>>
>> If invalid ids are to be unique, we should replace every use of this
>> constant, no?
>>
> The uniqnes is important in only one place: when a server assign an id
> to a new connection. The patch makes sure to produce unique value there,
> in all other places invalid_connection_id marks that id is not assigned
> yet.

Fair enough. In which case:

* I like your patch better.
* Can invalid_connection_id remain constexpr?
* Please add an assert after _conns.emplace(id, conn) to make sure a new
entry was added.

Thanks,
Rafael

Gleb Natapov

<gleb@scylladb.com>
unread,
Feb 13, 2019, 1:39:34 PM2/13/19
to Rafael Avila de Espindola, seastar-dev@googlegroups.com
On Wed, Feb 13, 2019 at 10:02:57AM -0800, Rafael Avila de Espindola wrote:
> Gleb Natapov <gl...@scylladb.com> writes:
>
> >> > -constexpr connection_id invalid_connection_id{0};
> >> > +const connection_id invalid_connection_id = connection_id::make_invalid_id(0);
> >>
> >> If invalid ids are to be unique, we should replace every use of this
> >> constant, no?
> >>
> > The uniqnes is important in only one place: when a server assign an id
> > to a new connection. The patch makes sure to produce unique value there,
> > in all other places invalid_connection_id marks that id is not assigned
> > yet.
>
> Fair enough. In which case:
>
> * I like your patch better.
> * Can invalid_connection_id remain constexpr?
Yes. In my new patch it is.

> * Please add an assert after _conns.emplace(id, conn) to make sure a new
> entry was added.
>
Will do.

I'll send new version tomorrow.

--
Gleb.
Reply all
Reply to author
Forward
0 new messages