Trouble with RPCs in Autobahn Python

68 views
Skip to first unread message

Felix Oberhansl

unread,
Feb 4, 2019, 9:14:38 AM2/4/19
to Autobahn
Hi,

I'm working for Siemens on an evaluation of WAMP, the Autobahn libraries and the Crossbar.io Router. Doing so, I ran into some trouble with a test setup using AutobahnPython for the implementation of my clients and the Crossbar.io Router.
The test setup consists of one client who is calling a remote procedure at another client (once every 120 milliseconds at most). The RPC takes some time (about 4 milliseconds) to process. The Crossbar.io Router connects both clients, using WebSocket transport.

The first thing I noticed was that, after some time, the callee reported "session leaving 'wamp.close.transport_lost'", the router reported that the callee didn't respond to a ping in time and the caller complained, that the callee left. After disabling the Ping in the Router Configuration the problem disappeared, so it seems that the callee can't respond to a ping while it's processing the Remote Procedure Call, at least not the way I implemented my callee. Is there a way I can implement my callee using Autobahn Python and still have WebSocket Pings? 

I also observed another strange behavior, while testing my setup with disabled WebSocket Pings. After some time the callee took a very long time (up to 30 seconds) to process RPCs. I tried to reproduce the behavior and get as much information as possible and I came up with the following trace:
- the caller calls the remote procedure (once every 120ms at most)
- the Router forwards the call without any serious delay, I  got this from the logfile of the Router
- the callee takes a very long time until the actual user written RPC gets executed, I measured the execution time of the user written code and it didn't change (still 4ms)
- when the callee yields the result and reports that it finished executing the user written RPC, the Router forwards it without any serious delay

Could it be that this problem occurs when a remote procedure call is performed, while the result of a previous RPC wasn't yet provided? Because my trace also showed that after the delay, multiple results of outstanding RPCs were yielded in a very short time period, almost like the client "queued them to execute them all at once and then send the results back". I implemented a callee with the same behavior using AutobahnCpp and RawSocket communication and it works. I also ran my setup on different hardware devices to rule out other factors. 

Is there a logical explanation for this behavior or could there be something wrong with my implementation after all? If you want to reproduce the error or have a look at my implementation, just let me know and I will put together a minimal example and publish it here. 

Any help is very much appreciated, thanks!

Omer Akram

unread,
Feb 4, 2019, 3:10:50 PM2/4/19
to Autobahn
Hi Felix, welcome to the world of Autobahn


On Monday, February 4, 2019 at 7:14:38 PM UTC+5, Felix Oberhansl wrote:
Hi,

I'm working for Siemens on an evaluation of WAMP, the Autobahn libraries and the Crossbar.io Router. Doing so, I ran into some trouble with a test setup using AutobahnPython for the implementation of my clients and the Crossbar.io Router.
The test setup consists of one client who is calling a remote procedure at another client (once every 120 milliseconds at most). The RPC takes some time (about 4 milliseconds) to process. The Crossbar.io Router connects both clients, using WebSocket transport.

Do you think its something you can show us ? Without looking at the code it might be guess work at most for us to suggest changes
 

The first thing I noticed was that, after some time, the callee reported "session leaving 'wamp.close.transport_lost'", the router reported that the callee didn't respond to a ping in time and the caller complained, that the callee left. After disabling the Ping in the Router Configuration the problem disappeared, so it seems that the callee can't respond to a ping while it's processing the Remote Procedure Call, at least not the way I implemented my callee. Is there a way I can implement my callee using Autobahn Python and still have WebSocket Pings? 

It feels like you are "choking" the websocket transport, hence your WAMP client doesn't even receive the ping message. I am assuming your callee returns lots of bytes in a single response, if possible I would suggest sending those bytes in smaller chunks.

I also observed another strange behavior, while testing my setup with disabled WebSocket Pings. After some time the callee took a very long time (up to 30 seconds) to process RPCs. I tried to reproduce the behavior and get as much information as possible and I came up with the following trace:
- the caller calls the remote procedure (once every 120ms at most)
- the Router forwards the call without any serious delay, I  got this from the logfile of the Router
- the callee takes a very long time until the actual user written RPC gets executed, I measured the execution time of the user written code and it didn't change (still 4ms)
- when the callee yields the result and reports that it finished executing the user written RPC, the Router forwards it without any serious delay

Could it be that this problem occurs when a remote procedure call is performed, while the result of a previous RPC wasn't yet provided? Because my trace also showed that after the delay, multiple results of outstanding RPCs were yielded in a very short time period, almost like the client "queued them to execute them all at once and then send the results back". I implemented a callee with the same behavior using AutobahnCpp and RawSocket communication and it works. I also ran my setup on different hardware devices to rule out other factors. 

Is there a logical explanation for this behavior or could there be something wrong with my implementation after all? If you want to reproduce the error or have a look at my implementation, just let me know and I will put together a minimal example and publish it here. 

I think a code example, that helps reproduce the issue would be very helpful, also have you taken a look at our existing examples https://github.com/crossbario/autobahn-python/tree/master/examples ?
 

Any help is very much appreciated, thanks!


Sure, we'd love you help find a solution to your problems.

Thanks! 

Zaar Hai

unread,
Feb 5, 2019, 6:15:03 AM2/5/19
to autob...@googlegroups.com
Hi Felix,

Sorry if I'm asking the obvious, but do you write your RPC handler in a async manner? Just making your your handler does not block the event loop.

--
You received this message because you are subscribed to the Google Groups "Autobahn" group.
To unsubscribe from this group and stop receiving emails from it, send an email to autobahnws+...@googlegroups.com.
To post to this group, send email to autob...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/autobahnws/132d5d55-0c29-4c86-b861-0c2bbfb0d29b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Zaar

Felix Oberhansl

unread,
Feb 5, 2019, 10:42:51 AM2/5/19
to Autobahn
Thank you for your response!
 
It feels like you are "choking" the websocket transport, hence your WAMP client doesn't even receive the ping message. I am assuming your callee returns lots of bytes in a single response, if possible I would suggest sending those bytes in smaller chunks.
 
Actually my callee doesn't return anything. I have put together a little example consisting of a callee and a caller. The callee is almost the same as the callee from the provided examples. The caller calls the remote procedure every 5 seconds. With my setup this example fails almost immediately, the remote procedure call takes about 3 seconds and I set my auto_ping_interval at 1000 and my auto_ping_timeout at 2000. To me it seems that this behavior occurs, when the execution time of the remote procedure call exceeds the ping timeout. 

This is my setup:
- one Raspberry Pi running the caller
- one Raspberry Pi running the callee
- one Raspberry Pi running the Crossbar.io Router

If there is anything wrong with my code or you need additional information, please let me know. The caller implementation differs from the examples, because of the actual application I am creating, but the callee is pretty much the same. 

Thanks again for your help!


callee.py
caller.py

Omer Akram

unread,
Feb 5, 2019, 2:17:41 PM2/5/19
to Autobahn
As Zaar said above, you seem to be blocking the twisted reactor (RPi can be slow sometimes), hence crossbar doesn't get a response in time.

What you could change in your callee is to wrap your long_op function into another function that calls reactor.callInThread(), below diff would also work:

--- callee.py   2019-02-06 00:14:56.171618335 +0500
+++ c.py        2019-02-06 00:15:25.784277328 +0500
@@ -28,7 +28,7 @@
            print("calculation DONE")         
         
        try:
-            yield self.register(long_op, u"com.example.callee.longop")
+            yield self.register(lambda: reactor.callInThread(long_op), u"com.example.callee.longop")
        except Exception as e:
            print("failed to register procedure: {}".format(e))
        else:

Zaar Hai

unread,
Feb 5, 2019, 8:43:08 PM2/5/19
to Autobahn
I'll add that if you have no particular reason to use Twisted I would suggest using Asyncio API instead - it provides much clearer distinction between async and non-async code.
In fact, you can do just:

res = await self.session.call(u"com.example.callee.longop")

And get result/exception right away (w/o callbacks that is).

Tobias Oberstein

unread,
Feb 6, 2019, 3:55:56 AM2/6/19
to autob...@googlegroups.com

Hi Felix,

yeah, Autobahn supports answering/triggering websocket ping/pongs while running user code - or in general, supports multiple concurrent operations

> Is there a way I can implement my callee using Autobahn Python and still have WebSocket Pings? 

by avoiding blocking the reactor (when using Twisted) or event loop (when using asyncio)

things to avoid:

- running CPU bound code
- doing blocking I/O (or blocking syscalls in general)

rgd the former:
if you have CPU bound code (eg a registered WAMP procedure that will hog up the CPU / single thread), run this code on a background thread. on Twisted: deferToThread is the tool. https://twistedmatrix.com/documents/current/core/howto/threading.html#running-code-in-threads

rgd the latter:
of the 2 I/O classes network and storage, Twisted/asyncio solve the network part: they provide an asynchronous networking framework. just avoid the blocking APIs in the Python stdlib. eg if you want to do a HTTP request from a WAMP procedure body, the way to do that is not using the "standard" requests library (http://docs.python-requests.org/), but an asynchronous one (!): https://treq.readthedocs.io/

now disk IO. different from say NodeJS, asyncio/Twisted does not cover disk IO - and hence this is sth to watch out. eg if you want to read a 1GB file into a Python string, standard os.open().read() will block in the OS system call! bad. again, either defer to a background thread - or mmap the file.

---

the need to avoid blocking operations in asynchronous programming is by definition: asynchronous programming is a form of cooperative multiprocessing where a currently running piece of code must itself and voluntarily yield control to the outer scheduler (the reactor or event loop).

hogging up the CPU is not giving up control at all, for longer time, while running - bad

hope above helps and doesn't confuse! let me know;)

cheers,
/Tobias


Am 04.02.19 um 15:14 schrieb 'Felix Oberhansl' via Autobahn:

Zaar Hai

unread,
Feb 7, 2019, 4:55:19 AM2/7/19
to Autobahn
As a remedy for the disk IO troubles that Tobias mentioned, you can use aiofiles.

Felix Oberhansl

unread,
Feb 12, 2019, 3:22:01 AM2/12/19
to Autobahn
Hi all,

thank you very much for your help and taking the time explaining it to me, it helps me a lot!
 

Reply all
Reply to author
Forward
0 new messages