Bug in Dispatcher's handling of write watches

78 views
Skip to first unread message

Marco Nicolosi

unread,
Mar 28, 2017, 9:19:31 AM3/28/17
to Dbus-cxx
Hello,
I'm using dbus-cxx-0.9.0 for a project and I noticed a small inaccuracy in the handling of write watches.

When DBus::Dispatcher gets notified of a newly added watch, it inserts the watch into its member m_watches_map that is a std::map storing watch pointers ordered by fd.
This map is used by the function Dispatcher::handle_read_and_write_watches(std::vector<struct pollfd>* fds) to get the pointer to the watch object having a specific fd and perform handle_read()/handle_write() on it.
Right after the creation and initialization of a new DBus::Connection, DBus::Dispatcher gets notified of two newly added watches to be monitored: one write watch and one read watch having both the same file descriptor.
Initially the write watch is disabled and the read watch is enabled, so the Dispatcher's main loop polls only for POLLIN events on that file descriptor. When the write watch gets enabled the POLLOUT events are also monitored on that fd.

From what I saw, after the initialization of a connection, DBus always notifies the addition of the write watch first and then that of the read watch.
So the Dispatcher proceeds to insert the write watch first and the read watch after in its watch map (m_watches_map).
But since keys in a std::map container are unique, the inserion of the read watch with same fd (i.e. map key) overwrites that of the write watch that occurred before.
Most of the time DBus reliles on the read watch only, so the handle_read_and_write_watches(...) function will call the correct handle_read() operation on the read watch found in the m_watches_map.
But in those sporadic cases in which DBus enables the write watch, the handle_write() operation is erroneously performed on the read watch instead of the write watch that is not stored in m_watches_map (because it has been replaced).

The simple solution to this problem is to make m_watches_map an std::multimap so it can store two nodes with the same key (fd) but having as values the pointers to the two read & write watches.

So in dispatcher.h, line 97, we have to convert Dispatcher's member m_watches_map to a multimap:

  std::multimap<int,Watch::pointer> m_watches_map;

and in dispatcher.cpp we need to modify handle_read_and_write_watches(..) (line 202) and on_add_watch(..) (line 260) as follows:


 
void Dispatcher::handle_read_and_write_watches( std::vector<struct pollfd>* fds ){
   
for ( const struct pollfd& fd : *fds ){
     
if( (fd.events == POLLIN) &&
         
(fd.revents & POLLIN ) ){
          std
::lock_guard<std::mutex> watch_lock( m_mutex_watches );
         
auto range = m_watches_map.equal_range(fd.fd);
         
for (auto i = range.first; i != range.second; ++i)
         
{
             
Watch::pointer watch = i->second;
             
if( watch->is_readable() ) {
                  watch
->handle_read();
             
}
         
}
     
}else if( (fd.events == POLLOUT) &&
             
(fd.revents & POLLOUT ) ){
          std
::lock_guard<std::mutex> watch_lock( m_mutex_watches );
         
auto range = m_watches_map.equal_range(fd.fd);
         
for (auto i = range.first; i != range.second; ++i)
         
{
             
Watch::pointer watch = i->second;
             
if( watch->is_writable() ) {
                  watch
->handle_write();
             
}
         
}
     
}
   
}
 
}



 
bool Dispatcher::on_add_watch(Watch::pointer watch)
 
{
   
if ( not watch or not watch->is_valid() ) return false;
   
    SIMPLELOGGER_DEBUG
( "dbus.Dispatcher", "add watch  fd:" << watch->unix_fd() );
 
    std
::lock_guard<std::mutex> watch_lock( m_mutex_watches );
    m_watches
.push_back( watch );
    m_watches_map
.insert( std::pair<int,Watch::pointer>(watch->unix_fd(),watch) );

    wakeup_thread
();

   
return true;
 
}



Please let me know if you consider this analysis correct.
Hope this helps.


Best regards,
Marco Nicolosi

Robert Middleton

unread,
Mar 28, 2017, 1:30:16 PM3/28/17
to Marco Nicolosi, Dbus-cxx
Marco,

That's interesting. I was assuming that the watches were either
read-only or write-only. That could explain some deadlock issues that
I've been trying to track down. I will have to try this out this week
and see what happens.

The code used to have two separate maps for read watches and write
watches which I had combined into one, so breaking those up back into
two separate maps should accomplish the same thing.

-Robert Middleton
> --
> You received this message because you are subscribed to the Google Groups
> "Dbus-cxx" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dbus-cxx+u...@googlegroups.com.
> To post to this group, send email to dbus...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dbus-cxx/e8a97abe-74b5-49b6-af58-8f87c407db7f%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Nick Burkitt

unread,
Mar 28, 2017, 1:35:32 PM3/28/17
to Dbus-cxx
Robert - the example I sent you uses a different scheme to accomplish the same thing without resorting to two maps. Many of the dbus examples I found treat watches incorrectly in this regard. And, in fact, in a low-traffic condition, the condition never degenerates into a failure, but if you ramp up the message rates,it soon will. 

-Nick

Marco Nicolosi

unread,
Mar 31, 2017, 3:48:14 AM3/31/17
to Dbus-cxx
Robert,
thanks for the answer.

I took a look at the sources of the latest stable branch of DBus (version 1.10.16) and I can confirm DBus notifies two read & write watches with the same file descriptor.
Indeed, whether you use tcp sockets or unix domain sockets, the two write & read watches of that transport socket will be created in the function _dbus_transport_new_for_socket(...) (dbus-transport-socket.c lines 1295, 1302) using the same fd returned by the core socket(...) function previously called in _dbus_connect_tcp_socket_with_nonce(...) or _dbus_connect_unix_socket(...).

Moreover I can confirm that the "read watch added" callback is called always after the "write watch added" callback (generating the replacement in Dispatcher's m_watches_map).
In fact, when asking for a new connection via dbus_bus_get(...)/dbus_bus_get_private(...) functions, _dbus_connection_new_for_transport(...) is called and this function executes socket_connection_set(DBusTransport *transport), prevoiusly set in DBusTransport's function vtable by _dbus_transport_new_for_socket(...).
During the execution of socket_connection_set(...) function the write watch is appended first to the DBusConnection's internal DBusWatchList; the read watch is appended immediately after.
When a DBus::Connection is initialized, watch callbacks are set (and immediately called) through _dbus_watch_list_set_functions(...) that will scan DBusConnection's DBusWatchList from head (i.e. the write watch) to tail (the read watch).
So, the "write watch added" callback will be called first, followed by the "read watch added" callback.

Surely dividing m_watches_map back into two separate maps will fix the (quite infrequent) issue.



Best regards,
Marco Nicolosi

Robert Middleton

unread,
Apr 1, 2017, 12:51:50 PM4/1/17
to Marco Nicolosi, Dbus-cxx
I had some time today to look at this some more, and I did at least a partial merge of Nick's code into the trunk.  I did try out the code, but it kept on segfaulting for me after a short while.  The performance seems to be about the same still, and I do occasionally get a timeout error with sending a lot of data.  Some interesting things that I have noticed:

  • One time while testing, the sending side of my code stopped for a few seconds before resuming.
  • Generally, the problems with the timeout seem to be on the sending side of the code.  I can look at what's going through dbus-monitor, and see that the sending code will send a message to the recieving side, but the response that is sent by the receiving side never gets to the sending side.
  • Two times, the problem happened on the sending side after a watch was toggled and disabled.
I can't tell exactly where this problem is coming in, but the last issue that I saw at least would imply that there's actually a problem with libdbus, not with dbus-cxx.  I'm continuing to run tests on my side, and I uploaded the code that I'm using to test to github: https://github.com/rm5248/dbus-cxx-testprogs

I have not seen any issues coming back from poll(), although the code currently just logs a message that it had an error so it is possible to miss.

-Robert Middleton

To unsubscribe from this group and stop receiving emails from it, send an email to dbus-cxx+unsubscribe@googlegroups.com.

To post to this group, send email to dbus...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages