Ready To Write Pdf

0 views

Skip to first unread message

Jovanna Ponder

unread,

Aug 3, 2024, 5:09:04 PM8/3/24

to renciperpa

Following on from my previous post re unreliable WiFi connections, I found that the problem was caused by calling DataWriter::write()too soon after creating the DataWriter. Because write()is non-blocking it returned immediately without doing anything. Also, because it is void, there is no error return code.

The assmebling of the plumbing you have found is a the DataWriter discovering the DataReaders (and vice versa) in the system. This process is known as Discovery and, in a simplified way, is how the DDS Entities in a system locate each other and decide if they can communicate or not.

It is not recommended to use a timer to wait for 2 DDS Entities to discovery each other as it is not deterministic. Instead, you could use the match notification statuses on the DataWriter and DataReader (e.g., on_publication_matched and on_subscription_matched). You could also access these statuses directly using the matched_publications() and matched_subscriptions() APIs.

I then attached it to the DataWriter, as per the examples in another RTI page. This class is instantiated, as I can set a breakpoint on the constructor. Alas when the other app (implementing the DataReader) is started, the on_publication_matched() callback is never called. I want that function to call condition_variable::notify_one(), so that the publisher function can wait on it so it knows when the DataWriter is ready to send.

I have a similar issue. I changed my code to Modern C++, and when I send something, the Listener receives the message from on_subscription matched "subscription matched" and then from on_liveliness_changed "liveliness changed". Once all 10 - 15 messages, I receive a message.

The on_subscription_matched callback will be called whenever the associated DataReader discovers a matching DataWriter. This matching will only occur if they both have the same Topic and compatible QoS policies.

The on_liveliness_changed callback will be called whenever the liveliness of one of the associated DataReader's matched DataWriters changes. Liveliness is configurable via QoS. Of the various possible changes in liveliness, one is that a new matching entity has been discovered - this would explain why this callback is called at the same time as on_subscription_matched.

What I probably really meant that on_data_available is not always called. I am workign on project involving some User Interface and some mediation service utilizing DDS. When I press a button in the UI, a message is put together (based on an IDL) and then sent via DDS to the mediation service. However, not every time I press the button the message arrives at the mediation service. I oriented myself on the connext_dds/c++11/hello_idl example.

I have written a full reply in the thread you opened regarding this problem (Need to send data more than once for data to be sent), but in short, I think you should look into Durability QoS (see my message on your thread for more info).

I am glad you have it working now but you should be aware of a potential race condition that could now occur in your system (apologies for not pointing this out sooner). Currently, when the on_publication_matched callback on the Publisher is fired, we consider the system ready (with all entities ready to go etc.). However, in reality, Discovery is a bidirectional process, and it could happen that the Subscriber has not yet finished its own matching process with the Publisher (and therefore is not ready to begin receiving data).

This is the downfall of the solution I have provided you with.
If you were not using Request/Reply, I would suggest you to use Durability QoS to mitigate this (and that way the "late joining" DataReader would still receive missed samples). Unfortunately, the nature of the R/R communication pattern doesn't lend itself that well to this (though this is use-case dependent).

The simplest way to mitigate this race condition is to establish another Topic (with Transient Local Durability) which is written by the Replier-side application. This Topic would simply notify the Requester-side application that Discovery on this end has completed (heavy, but deterministic).

Funny you should say that, because I did find a problem when I enacted the on_publication_matched callback. It worked fine when I did it on the Registry app (which receives initial communication from the device app), but when I transferred it to the device, the first message from the device failed. It would only work on the second or third transmission.

Re that suggestion of using a separate topic just to check if the Replier is ready, I have a question. To reiterate, I have a central registry app (the Replier in this case) that trades messages with any number of apps representing devices. So the registry and devices act as both publishers and subscribers.

Does the discovery take place before any reading or writing by the publisher/subscriber? In other words does a subscriber know about a publisher (and vice versa) before receiving any message from it, or does the message itself start the discovery process?

If discovery occurs before messages start flying, would this separate topic in the Replier just respond to every new subscriber it discovers with an "I'm ready" notification, or just the first one? I take it each new subscriber would just wait till it had received such a notification before sending its 'real' topic messages?

By default, discovery begins when an Entity (DataWriter, DataReader, etc.) is created. In other words, before the reading and writing of user data topics (i.e., the topics you have defined and created in your application).

Discovery is a per entity process. If there were a single DataWriter and two DataReaders, the DataWriter would have to discover each DataReader separately. So you will need to inform every new entity when discovery has completed.

Thanks Sam, but if Discovery is per entity, why would the separate 'discovery' topic (with Transient Local Durability you mentioned in your post of 17 May) be relevant? I assume it would have its own DataReader, in which case it would just tell the Requester that it was ready, it wouldn't say anything about the other DataReader's handling the application data.

Can you see any opportunity for Alex's suggestion above, that I use the on_subscription_matched event? I can see the benefit of the on_publication_matched event in the Writer, but not so convinced about the one for the Reader. After all it would have to tell its corresponding writer apps, and what if they weren't ready...?

On your Device's DataWriter you have the on_publication_matched callback which notifies you when Discovery at the local end has completed. The problem is that you have no way of knowing on the Device's DataWriter when the Registry's DataReader is ready.

If you create a new Transient Local topic and write() it when the Registry's on_subscription_matched callback is fired, you can convey this information to the Device application. Since the new topic's Durability is configured as Transient Local, it doesn't matter if its Discovery has completed yet or not (since the message will be resent to any late joiners). The receipt of a message from this new topic can act as a flag that the DataReader's on_subscription_matched callback has fired.

So, on the Device's DataWriter, once you have received a message from this new Transient Local topic (and once the DataWriter's on_publication_matched callback has fired) you can be sure that Discovery has completed on both ends and begin your communication.

If you were not using Request Reply (are you using RTI's Requester and Replier entities or implementing your own?) I would simply suggest that you configure the Durability of your DataReaders and DataWriters as Transient Local (as that completely mitigates this race condition). But, due to the nature of Request / Reply it is not currently possible to do this.

Oh heck, I was unaware of RTI's Requester and Replier entities. I didn't see them in the tutorials, and unless you are aware of something's existence you don't go looking for it! As I said, this is my first DDS app, and it seems the requester/replier model is far more suited to what I want - namely for a device to request the Registry to send back a single struct of information. Especially as it seems it would automatically take care of the connection/timing problems that have bedevilled me. In fact even a SimpleReplier would be perfect for this.... if I could get it working.

I used the example Modern C++ SimpleReplier which does not explicitly use a SimpleReplierListener - according to the comment below it, just defining the SimpleReplier with a functor should be enough. However when I run it, it just returns immediately. Shouldn't it block until it receives a request?

Another example suggests that the listener's on_request_available() callback, rather than the SimpleReplier itself, returns the reply to the requester. Unfortunately it shows only the SimpleReplier constructor (with the listener as a parameter), but no method calls.

The reason for the difference between the two examples you have linked is that they are from two different versions of Connext DDS (one from 5.1.0 and the other from 5.3.1). Depending which version of Connext DDS you are using (the latest release is 6.0.0) the best practices can be different. In Connext DDS 6.0.0, the example you have linked uses a lambda expression, it is equivalent to the example from the 5.3.1 documentation.

The installation of Connext DDS should have created a directory called "rti_workspace" for you. Within this directory we ship an example of Request/Reply (rti_workspace/6.0.0/examples/connext_dds/c++11/hello_world_request_reply). All of the code is already written so you will just need to follow the instructions in the README.txt in that directory to build the example. The example is not using SimpleReplier or SimpleRequester but I think it will help you see the intended use of the APIs.

Looking at the sample source you suggested (.../c++11/hello_world_request_reply), is there a typo in run_example() in PrimeNumberReplier.cxx? The ReplierParams datawriter_qos/datareader_qosboth get their values from the RequesterExampleProfile element in the USER_QOS_PROFILES.xml, whereas I see there is also a ReplierExampleProfile element. I presume the latter is correct for the replier, as it reverses the datawriter_qos/datareader_qos values.