wss vs ws performance

2,028 views
Skip to first unread message

Tony

unread,
Aug 13, 2013, 8:30:15 PM8/13/13
to webso...@googlegroups.com
This is not WebSocket++-specific question, more of an asio and/or OpenSSL question.

I ask it to share some experiences and hear about some of yours.

Quite contrary to SO brouhaha at:


I find it that there is a huge performance hit when using OpenSSL on Windows with WebSocket++.

Client in this case is Windows 7 desktop and server is Windows 2012 Server, all on the same LAN.

WebSocket++ client (no SSL) and  .NET 4.5 WebSocket server:
- echo (one 32 bit integer exchanged) roundtrip takes about 0.8 milliseconds on average.

WebSocket++ client (SSL) and  .NET 4.5 WebSocket server:
- echo roundtrip takes about 50 milliseconds on average.
 
WebSocket++ client (SSL) and  WebSocket++ echo_server_tls server:
- echo roundtrip takes about 250 milliseconds on average.
 
It is possible that I made some dumb mistake ( I am trying to make sure that I removed all logging on WebSocket++ side, compiled all in Release, etc.) but so far that is what I get.

jimbo...@gmail.com

unread,
Aug 13, 2013, 9:04:57 PM8/13/13
to webso...@googlegroups.com
My guess is the problem is ms.  My 1 core ubuntu vps vastly outperformed my 4 core win 7 laptop.  No disk i/o.

I had no added ping lag on my vps.  Can't speak to win 7 since the base performance was so bad, I gave up.

Tony

unread,
Aug 13, 2013, 9:26:38 PM8/13/13
to webso...@googlegroups.com
I am going to try Ubuntu client next. My concern, however, is the fact that .NET server was 5 times faster (with the same client) than WebSocket++ server on the same Server 2012 machine.

So far I can only think it may be attributable to OpenSSL, compiled for Windows, used by  WebSocket++ server, versus native SSPI (most probably) used by .NET server.

But it is still just guessing, I'll continue measuring. Dumping Windows 7 client entirely is not an option in my case.

Tony

unread,
Aug 14, 2013, 9:26:56 PM8/14/13
to webso...@googlegroups.com
At this time I have excluded OpenSSL as the probable cause of described difference in performance.

I used POCO library (which also uses OpenSSL) and got echo roundtrip down to less than 1 millisecond.

Since POCO is not asynchronous, and I am using a simple blocking POCO client that blocks on send, then blocks on receive, all in a tight loop, this is not easily comparable with an asynchronous boost asio io run loop.

I still have to isolate the issue on something specific to either boost asio or WebSocket++.

Peter Thorson

unread,
Aug 14, 2013, 9:31:10 PM8/14/13
to webso...@googlegroups.com
To be clear here. The delay you are referring to is a message round trip after the connection has already been set up?

Tony

unread,
Aug 14, 2013, 9:38:27 PM8/14/13
to webso...@googlegroups.com
Correct. 

Tony

unread,
Aug 19, 2013, 6:24:02 PM8/19/13
to webso...@googlegroups.com
I've came to the conclusion that the performance drop that I am seeing is due to WebSocket++ or (let me not rush to conclusions)
the way I am using it.

Here are the results that I am getting:

All server programs were located on the same Windows Server 2012 OS machine.
All client programs were run from the same Windows 7 OS machine. Both machines are on the same LAN.
Boost version used was 1.54.0, while OpenSSL version was 1.0.0i.

Boost servers were asynchronous and minimally adapted from Boost cpp03 examples.
WebSocket++ servers were echo_server and echo_server_tls, minimally adapted in the same way.

This adaptation involves reading just one 32 bit integer, incrementing it and sending it back right away.

WebSocket++ clients were developed from WebSocket++ samples and boost clients were cast in the same mold to make
client code as similar as possible. 

All clients send one integer, as soon as connection gets established, then use a high performance timer,
from boost chrono library, to measure the time it takes to receive the number bigger by 1 than the one just sent.

Note, however, that this adaptation to incrementing integer is irrelevant for the tests performed. 
Tests involved a single client, running Boost io_service on just one thread, talking to a single server.

In the case of boost this was done like so:

 asioThread_ = boost::thread([this](){ io_service_.run();});


and in the case of WebSocket++ client like so:

   asioThread_ = websocketpp::lib::thread(&client::run, &client_);

  
WebSocket++ clients and servers had logging suppressed like so:

   client_.clear_access_channels(websocketpp::log::alevel::all);
   client_
.clear_error_channels(websocketpp::log::elevel::all);


  
Representative results for boost client/server combination, standard sockets, with no SSL:

Roundtrip took 510188 nanoseconds
Roundtrip took 521993 nanoseconds
Roundtrip took 476312 nanoseconds
Roundtrip took 519941 nanoseconds
Roundtrip took 471180 nanoseconds
Roundtrip took 504542 nanoseconds
Roundtrip took 473747 nanoseconds
  
Results for boost client/server combination, standard sockets, with OpenSSL:

Roundtrip took 727814 nanoseconds
Roundtrip took 717549 nanoseconds
Roundtrip took 716522 nanoseconds
Roundtrip took 667248 nanoseconds
Roundtrip took 682647 nanoseconds
Roundtrip took 746291 nanoseconds
Roundtrip took 692912 nanoseconds

Results for WebSocket++ client, echo_server combination, with no SSL (ws protocol):

Roundtrip took 931581 nanoseconds
Roundtrip took 970590 nanoseconds
Roundtrip took 970590 nanoseconds
Roundtrip took 1139968 nanoseconds
Roundtrip took 1001386 nanoseconds
Roundtrip took 919776 nanoseconds
Roundtrip took 1066571 nanoseconds

Results for WebSocket++ client, echo_server_tls combination, with OpenSSL (wss protocol):

Roundtrip took 263297234 nanoseconds
Roundtrip took 254142586 nanoseconds
Roundtrip took 264079454 nanoseconds
Roundtrip took 252430324 nanoseconds
Roundtrip took 263652929 nanoseconds
Roundtrip took 253435302 nanoseconds
Roundtrip took 246399939 nanoseconds

I don't know what to make of the last result. I would expect the same kind of overhead that
you see when comparing standard and WebSockets with no SSL, which is about the factor of 2.
But a factor of 250?!

It is good news that Boost asio, with OpenSSL, doesn't seem to be the issue. With the going rate
of maintenance of these libraries, any deep problem would be hard to get fixed quickly.

I tried some profiling but at this point I am getting way too much information to parse. 
I will need to make educated guesses and concentrate on the parts of code, in order to
start seeing some clues in profiler reports.

Some dumb user (that is, my) error is still not excluded, of course.

Any suggestions appreciated.


On Wednesday, August 14, 2013 9:31:10 PM UTC-4, Peter Thorson wrote:

Peter Thorson

unread,
Aug 19, 2013, 7:59:09 PM8/19/13
to webso...@googlegroups.com
I've confirmed that on OS X and Linux message round trip time over a local network when using TLS does not exhibit the issue described. RTT for a short ws:// message was ~400-550us and the same message over wss:// was ~ 650-750us. These values are consistent with your numbers for the regular boost client/server. I'll take a peek through the TLS code and see if I can identify any areas that might merit closer inspection in profiling.

Tony

unread,
Aug 19, 2013, 8:34:42 PM8/19/13
to webso...@googlegroups.com
That's great news! 

I can't figure out where is the issue on my side yet. Everything seems rather straightforward - compiling and running the servers.
I'll try using boost for everything, like I did with boost client and server, instead of mixture of native C++11 and boost.

Tony

unread,
Aug 20, 2013, 2:48:37 AM8/20/13
to webso...@googlegroups.com
Unfortunately, no change when recompiling all with Boost (no native shared pointers, bind etc.).
I tried couple of other servers in the office - same order of magnitude RTT as reported previously.

Even loopback, with client and echo_server_tls running on the same machine, is comparably quite slow - RTT around 2 milliseconds.

Trying the same client from machine at home gives more reasonable results with echo.websocket.org (using a string "Hello Tony", as they seem not to support binary frames):


Roundtrip took 65679005 nanoseconds
Roundtrip took 66310566 nanoseconds
Roundtrip took 65478170 nanoseconds
Roundtrip took 90317522 nanoseconds
Roundtrip took 68500640 nanoseconds
Roundtrip took 73889551 nanoseconds


Roundtrip took 167942914 nanoseconds
Roundtrip took 180063885 nanoseconds
Roundtrip took 211592988 nanoseconds
Roundtrip took 176652998 nanoseconds
Roundtrip took 166803148 nanoseconds
Roundtrip took 172418380 nanoseconds

Times are not great, but that server is easily overloaded or slow.
However, 60 milliseconds for non-SSL versus about 3 times more for SSL RTT seems reasonable.

Peter Thorson

unread,
Aug 19, 2013, 6:49:47 PM8/19/13
to Tony, webso...@googlegroups.com
This information does indicate a potential WebSocket++ problem. There definitely shouldn't be a 250 factor difference here. I'll see if I can set up some way of testing this. I have very limited access to Windows based development environments especially as it relates to profiling.

At minimum I'll try and confirm whether or not this issue affects Linux / OS X systems.

--
You received this message because you are subscribed to the Google Groups "WebSocket++" group.
To unsubscribe from this group and stop receiving emails from it, send an email to websocketpp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tony

unread,
Aug 24, 2013, 8:13:55 PM8/24/13
to webso...@googlegroups.com
You know that you are in for a ride when you do a search on Google for a particular type of problem,
and after hours of reading you can't find anything truly relevant, just a few things here and there,
that tun out to be red herrings.

The real cause will probably be quite interesting, once I succeed in identifying it.

I did a number of steps to exclude dumb errors in build environment. This means building on a couple of
different machines (Windows 7 and Windows 8 desktops), with MSVC versions 10 and 11, latest and previous version
of OpenSSL libraries, but always with the latest boost 1.54.0.

After building all with boost (i.e. not defining preprocessor symbols like _WEBSOCKETPP_CPP11_MEMORY_;_WEBSOCKETPP_CPP11_FUNCTIONAL_)
I got a bit more consistent results.

For example, I see negligible difference in RTT when using standard sockets or WebSockets (without SSL).

So let's start with the known server ws://echo.websocket.org and JavaScript client in Chrome. This is all WAN:

Echo: 63.000ms 
Echo: 63.000ms 
Echo: 65.000ms 
Echo: 64.000ms 
Echo: 65.000ms 

How about WebSocket++ client and the same server?

Roundtrip took 63500147 nanoseconds
Roundtrip took 63112239 nanoseconds
Roundtrip took 63197364 nanoseconds
Roundtrip took 62144253 nanoseconds
Roundtrip took 63368890 nanoseconds
 
Beautiful! Almost identical, with WebSocket++ client tiny bit better.

Ok, how about WebSocket++ echo_server on Amazon cloud machine and JavaScript Chrome client?

Echo: 57.000ms 
Echo: 55.000ms 
Echo: 55.000ms 
Echo: 56.000ms 
Echo: 58.000ms 

Even better! Now for the WebSocket++ client with WebSocket++ echo_server on Amazon cloud machine:

Roundtrip took 58964786 nanoseconds
Roundtrip took 58267470 nanoseconds
Roundtrip took 57645339 nanoseconds
Roundtrip took 57964942 nanoseconds
Roundtrip took 57197282 nanoseconds

All good, sensible, consistent.

Now to SLL land - JavaScript client in Chrome, connecting to wss://echo.websocket.org

Echo: 66.000ms
Echo: 65.000ms
Echo: 66.000ms
Echo: 65.000ms
Echo: 79.000ms

Great! Slower by a factor no more than 1.3.

Now WebSocket++ SSL client talking to wss://echo.websocket.org

Roundtrip took 166055623 nanoseconds
Roundtrip took 167015707 nanoseconds
Roundtrip took 166816656 nanoseconds
Roundtrip took 166745803 nanoseconds
Roundtrip took 166943070 nanoseconds

Bummer! As if something somewhere likes to add 100 ms to RTT.

Peter Thorson

unread,
Aug 25, 2013, 4:45:22 PM8/25/13
to Tony, webso...@googlegroups.com
The only code that is different between a TLS WebSocket++ endpoint and a plain one is the following:

Plain asio endpoints derive from: websocketpp/transport/asio/security/none.hpp
TLS asio endpoints derive from: websocketpp/transport/asio/security/tls.hpp

There is very little code here, most of it boilerplate to set up different boost types to use for the socket. 95% of the WebSocket++ code supports the setup and teardown of the TLS session context. WebSocket++ code for actually sending a message is identical in the TLS and Plaintext case. The only difference being what happens inside boost.

The next things I'd want to look at are ensuring that the encryption itself is performed similarly. Same ciphers and socket options, etc.

I'd also be interested in seeing the results of a test that sends more data.. say messages a few kB in size.

Tony

unread,
Aug 25, 2013, 5:51:03 PM8/25/13
to webso...@googlegroups.com, Tony
A quick heads up:

Sending more data, about 2500 characters text message as opposed to 10 characters "Hello Tony" made a huge difference!

WebSocket++ client and  wss://echo.websocket.org

Roundtrip took 97484130 nanoseconds
Roundtrip took 103294336 nanoseconds
Roundtrip took 98396299 nanoseconds
Roundtrip took 102806265 nanoseconds
Roundtrip took 96812555 nanoseconds

WebSocket++ client and  WebSocket++ echo_server_tls on Amazon cloud machine:

Roundtrip took 97357716 nanosecond 
Roundtrip took 97727528 nanosecond
Roundtrip took 97615642 nanosecond
Roundtrip took 97262140 nanosecond
Roundtrip took 94293958 nanosecond

Better results with more data!



Tony

unread,
Aug 25, 2013, 6:04:42 PM8/25/13
to webso...@googlegroups.com, Tony
Exactly the same text, but JavaScript Chrome client to wss://echo.websocket.org:

Slightly better but the same order of magnitude:

 Echo: 78.000ms
Echo: 79.000ms

Tony

unread,
Aug 25, 2013, 7:00:20 PM8/25/13
to webso...@googlegroups.com, Tony
Okay, so it looks it was all Nagle!

I added the following to my WebSocket++ client:

void NetClient::OnSocketInit(websocketpp::connection_hdl, boost::asio::ssl::stream<boost::asio::ip::tcp::socket>& socket)
{
 socket
.lowest_layer().set_option(boost::asio::ip::tcp::no_delay(true));
}

and now JavaScript client and WebSocket++ client have the same RTT when talking to wss://echo.websocket.org . Here is comparison of WebSocket++ without and with Nagle disabled,
same short message 'Hello Tony':

Server:

Roundtrip took 166833732 nanoseconds
Roundtrip took 167097774 nanoseconds
Roundtrip took 165945265 nanoseconds
Roundtrip took 167158433 nanoseconds
Roundtrip took 166016118 nanoseconds
Roundtrip took 166083913 nanoseconds
Roundtrip took 167640132 nanoseconds

The same with Nagle disabled:

Roundtrip took 65886468 nanoseconds
Roundtrip took 67728137 nanoseconds
Roundtrip took 66973476 nanoseconds
Roundtrip took 67005080 nanoseconds
Roundtrip took 68207543 nanoseconds
Roundtrip took 66762956 nanoseconds
Roundtrip took 66586843 nanoseconds
Roundtrip took 66918425 nanoseconds


Peter Thorson

unread,
Aug 25, 2013, 9:50:44 PM8/25/13
to Tony, webso...@googlegroups.com, Tony
This sounds about right. Nagle will improve latency / RTT for short messages at the potential expense of overall throughput. Wether or not the nagle algorithm helps your application or not isn't something WebSocket++ can decide. The tls_init and socket_init handlers are provided so you can configure settings like this accordingly. :)

I'll look into providing some examples of setting socket options like this in the docs.

Pat Le Cat

unread,
Nov 12, 2013, 6:15:19 AM11/12/13
to webso...@googlegroups.com, Tony
https://en.wikipedia.org/wiki/Nagle's_algorithm

"Negative effect on non-small writes" + "Applications that expect real time responses can react poorly with Nagle's algorithm. Applications such as networked multiplayer video games expect that actions in the game are sent immediately, while the algorithm purposefully delays transmission, increasing bandwidth efficiency at the expense of latency."

How to know when to apply it at all?

Peter Thorson

unread,
Nov 19, 2013, 11:09:57 AM11/19/13
to Pat Le Cat, webso...@googlegroups.com
There is no hard and fast rule for when to use Nagle/No Delay and when not to. The default is to have it on. Experience and profiling can help you determine when to turn it off. The best rule of thumb I can give is that if you application sends small messages (small means significantly smaller than the MTU of your network, often in the ~1280-1500 byte range) and is more sensitive to latency than bandwidth then turn it off. Otherwise, leave it on.

Reply all
Reply to author
Forward
0 new messages