Is full-duplex socket use possible with OpenSSL?

Jason Pettiss

unread,

Oct 22, 2009, 2:46:44 PM10/22/09

to

I have a server which reads/writes a socket independently; that is to say, at the same time (not a request-response model). I note in the FAQ it says I must not allow multiple threads to use an SSL connection, so clearly if my sockets are blocking I cannot support full-duplex traffic (because I cannot call SSL_write while an SSL_read is blocking, for instance).

It's important that I be able to read a packet as soon as one is available, and at the same time, send a packet as soon as I have one to send... I would not want to delay the send until a pending read were complete for example.

I'm uncertain whether placing the socket into non-blocking mode will actually help here: if an SSL_read returns telling me I need to call it again later, is it alright to go ahead and start a new SSL_write operation?

Also I'm wondering if the limitation of not being able to write/read at the same time in blocking mode is easily overcome, for example by preventing re-negotiation (my application is on both ends of the pipe here), or by replacing the read/write BIOs, or by supplying some magical mutex callback function or something.

Thanks for any tips,

Jason Pettiss
jpet...@yahoo.com

______________________________________________________________________
OpenSSL Project http://www.openssl.org
User Support Mailing List openss...@openssl.org
Automated List Manager majo...@openssl.org

David Schwartz

unread,

Oct 22, 2009, 7:55:13 PM10/22/09

to

Jason Pettiss wrote:

> I have a server which reads/writes a socket independently; that is to
> say, at the same time (not a request-response model). I note in the
> FAQ it says I must not allow multiple threads to use an SSL connection,
> so clearly if my sockets are blocking I cannot support full-duplex
> traffic (because I cannot call SSL_write while an SSL_read is blocking,
> for instance).

> It's important that I be able to read a packet as soon as one is
> available, and at the same time, send a packet as soon as I have one to
> send... I would not want to delay the send until a pending read were
> complete for example.

> I'm uncertain whether placing the socket into non-blocking mode will
> actually help here: if an SSL_read returns telling me I need to call it
> again later, is it alright to go ahead and start a new SSL_write
> operation?

That's not what SSL_read will tell you. SSL_read will tell you that it
cannot make further forward progress until something happens. You can call
SSL_read at any later time you wish. The report that it cannot make forward
progress is just a hint.

The only quirks are with SSL_write. You must set
SSL_ACCEPT_MOVING_WRITE_BUFFER (unless you are sure your write buffer will
never move). And you must present a consistent data stream to SSL_write. (So
you can't try to send 'FOO', get 1 back, and later try to send anything that
doesn't start with 'OO'.)

> Also I'm wondering if the limitation of not being able to write/read at
> the same time in blocking mode is easily overcome, for example by
> preventing re-negotiation (my application is on both ends of the pipe
> here), or by replacing the read/write BIOs, or by supplying some
> magical mutex callback function or something.

Blocking mode is way more trouble than it's worth. I would just ditch it,
and all the problems it causes, once and for all. Then never look back.

DS

Darryl Miles

unread,

Oct 22, 2009, 8:40:26 PM10/22/09

to

But this flag (while documented to the contrary) does nothing inside
libssl. So yes the documentation says you should set it, prove to me
that OpenSSL behaves in a different way because you set it.

A hint to DS: grep the source tree of OpenSSL and follow all the
code-paths determined by this flag to their conclusion.

>> Also I'm wondering if the limitation of not being able to write/read at
>> the same time in blocking mode is easily overcome, for example by
>> preventing re-negotiation (my application is on both ends of the pipe
>> here), or by replacing the read/write BIOs, or by supplying some
>> magical mutex callback function or something.
>
> Blocking mode is way more trouble than it's worth. I would just ditch it,
> and all the problems it causes, once and for all. Then never look back.

My own thoughts on the Original Posters comments are:
* The OpenSSL API does indicate the threading issues with your
proposed usage. It is true to say that if you serialize the usage of
any 'SSL *' instance with respect to itself then you will never
experience a usage/threading problem. This is to say that two (or more)
threads can each independently operate the OpenSSL API with _DIFFERENT_
'SSL *' instances at the same time (without regard for one another).

* Now the next question you might want to ask, "is it allowed for
exactly two threads to operate specifically the SSL_read() and
SSL_write() on the _SAME_ 'SSL *' instance at the same time ?" My
understanding would be that the answer is NO. This is a limitation in
the OpenSSL library, since some of the shared parts of 'SSL *' have no
protection and the SSL_read() and SSL_write() code-paths have not been
audited/reworked to minimize the contention/data-race issues.

However this does not exclude the use of OpenSSL for full-duplex operations.

You need to separate your 3 concerns:

* The desire to process incoming data as soon as possible.
* The desire to send outgoing data as soon as possible.
* The desire to have your application go to sleep when neither of the
above is possible and the desire for your operating system to wake up
your application as soon as some condition changes which _MIGHT_ make it
possible for one of the first 2 points (read/write) to take place now.

The 'read' case)
Well this is already covered in both blocking and non-blocking usage,
your application gets back control (to process data) as soon as data can
be processed.

The 'write' case)
Well this is already covered in both blocking and non-blocking usage,
your application gets back control (to create more data to send) as soon
as the layer below OpenSSL (usually the OS kernel buffering) has stored.

The 'sleep/wakeup' mechanism)
Well this is clearly an issue of blocking verses non-blocking. There
is a clear case that you _MUST_ use blocking IO here (this is despite Mr
Schwartz's comments otherwise). The reason you must use non-blocking
is that in order to satisfy concerns 1 and 2 you can-not possibly let
the operating system block your application from having control of the
'SSL *' because (if you remember from the comment 2 I made right at the
start) the OpenSSL API does not let you operate SSL_read() and
SSL_write() on the _SAME_ 'SSL *' instance at the same time. So if some
other thread is stuck and asleep in the middle of using 'SSL *' then
it is unsafe for you to use it from another (unblocked) thread.
So to me there is no clear way to use blocking IO once all the facts
are considered with your intended usage and your design criteria.

The only other comment I can make is that both the SSL_read() and
SSL_write() calls have a soft-error return for when no further work
(progress) can be made. It is at this point you perform your 'sleep'
function and indicate to the OS which events you want that sleep to be
woken by.

This is based on your applications intent, usually an application is
always ready to read in more data (but internal buffering and memory
exhaustion considerations should be made), so it usually indicates to
the OS to wake me up if more data is available to read.

Your application then also has to evaluate its intent to send data, you
don't always have something more to send. If you do then you need to
indicate to the OS to wake me up if I can push more data down into the
kernel buffer.

You then call your OS sleep function with the appropriate wakeup events
(and possible maximum timeout).

You can then keep looping around this basic IO sleep/wake cycle.

Darryl

David Schwartz

unread,

Oct 23, 2009, 4:57:43 AM10/23/09

to

Darryl Miles wrote:

> But this flag (while documented to the contrary) does nothing inside
> libssl. So yes the documentation says you should set it, prove to me
> that OpenSSL behaves in a different way because you set it.

One of the biggest downsides of open source software is that encourages
people to code to what something happens to do rather than what it's
guaranteed to do.

> A hint to DS: grep the source tree of OpenSSL and follow all the
> code-paths determined by this flag to their conclusion.

Software development doesn't work that way. That's how you produce code that
suddenly fails mysteriously when you upgrade an unrelated component. The
first rule of software development is "thou shall not assume that something
that happens a particular way is guaranteed to do so, especially when the
documentation specifically warns that it is not".

> Now the next question you might want to ask, "is it allowed for
> exactly two threads to operate specifically the SSL_read() and
> SSL_write() on the _SAME_ 'SSL *' instance at the same time ?" My
> understanding would be that the answer is NO. This is a limitation in
> the OpenSSL library, since some of the shared parts of 'SSL *' have no
> protection and the SSL_read() and SSL_write() code-paths have not been
> audited/reworked to minimize the contention/data-race issues.

This is how everything else works, it's odd to say it's somehow a limitation
of OpenSSL that it works the same way everything else works. Try to read to
a string in one thread while you write to it from another. The general rule
of thread synchronization is that it is your responsibility to serialize
access to the same object from concurrent threads and the library's job to
synchronize accesses to distinct objects. OpenSSL follows this general rule.

Kernel objects are the exception, only because we cannot allow a program
(broken or valid) to screw up kernel objects. So the kernel has no choice
but to "overserialize".

> Your application then also has to evaluate its intent to send data, you
> don't always have something more to send. If you do then you need to
> indicate to the OS to wake me up if I can push more data down into the
> kernel buffer.

No, that is not how OpenSSL works. When you want to send data, you simply
call SSL_write. You only check if I/O is possible if OpenSSL specifically
tells you to. (OpenSSL may need to do something other than write to send the
data, for example, it may need to read renegotiation data.)

The other gotcha is that if you use separate read and write threads, you
*must* remember that an SSL connection only has one state. You cannot
independently maintain your own state in each thread, or you can deadlock.
This is a major cause of SSL deadlocks in "two thread" applications that run
their threads independently.

Here's the nightmare scenario:

1) You are in a point in the protocol where the other side will not send
anything unless we send something first. However, we try to read just in
case it sends something.

2) You call SSL_write from your write thread trying to send the data that
will keep the application protocol going, but a renegotiation is in progress
and no data has been received yet. You get WANT_READ.

(At this point, the SSL connection's one and only status is "want read to
send".)

2) The renegotiation data is received, but no application data is received.

3) You call SSL_read from your read thread (either just to try it, or
because you get a 'select' hit from the renegotiation data being received,
it doesn't matter. The OpenSSL library reads the renegotiation data, but no
application data is available. You get WANT_READ, since application data
needs to be received to make forward progress.

(At this point, the SSL connection's one and only status is "want read to
receive". Note that the read thread's actions *invalidate* the state the
write thread thinks it's in.)

4) Your write thread, having no idea that the read thread received a
different status, stupidly thinks it cannot make forward progress based on
the state it got from step 1 (since that's the last thing *it* did).
However, it *can* make forward progress (because another thread changed the
SSL state).

5) Now the other end is waiting for you to send data, and you are waiting to
receive the renegotiation data you already received.

You see, in step 4, the write thread *must* know that the read thread
changed the SSL connection's status. Otherwise you deadlock.

DS

Darryl Miles

unread,

Oct 23, 2009, 10:07:01 AM10/23/09

to

David Schwartz wrote:
> Darryl Miles wrote:
>
>> But this flag (while documented to the contrary) does nothing inside
>> libssl. So yes the documentation says you should set it, prove to me
>> that OpenSSL behaves in a different way because you set it.
>
> One of the biggest downsides of open source software is that encourages
> people to code to what something happens to do rather than what it's
> guaranteed to do.

Can I please see your "working" (i.e. white paper) on your conclusion
(or that of someone elses conclusion you are merely relaying here); that
this issue is more dominant when involving "open source" ? My gut says
it doesn't agree with you on that statement.

Sure, such an issue exists but just because things are "open source"
doesn't increase it. Lets call it "code by observation".

>> A hint to DS: grep the source tree of OpenSSL and follow all the
>> code-paths determined by this flag to their conclusion.
>
> Software development doesn't work that way. That's how you produce code that
> suddenly fails mysteriously when you upgrade an unrelated component. The
> first rule of software development is "thou shall not assume that something
> that happens a particular way is guaranteed to do so, especially when the
> documentation specifically warns that it is not".

But there is no "master grand plan for the future" on implementing this
point. At best there was once was but that plan was then found
unnecessary, or was abandoned.

So this is a call to all active developers on OpenSSL what exactly is
your plan for the future with SSL_ACCEPT_MOVING_WRITE_BUFFER. Can the
Open Source community please have an online document outlining the idea,
the concept and the timescales of your intentions.

Lets give a month to come up with such a plan (or at least pipe up that
more time is needed to produce a plan) before this relic should be
earmarked to removal. Especially in the shadow of OpenSSL version 1.0

The existing documentation isn't very clear, it doesn't sufficiently
cover what this flags means:
* by citing a good example and a bad example with explanation as to
which rule(s) is/are broken
* a detailed statement of rules (when to use and when not to use)
* a detailed explanation of scope (the things this flag can not fix
but users might think it fixes)

Anybody wishing to write up such documentation I can assist which some
unclear situations which I do not think the existing documentation
adequately covers.

But first lets hear the future plan to code something actually needing
it, otherwise I would like to see my
SSL_ACCEPT_MANNED_SPACE_FLIGHT_TO_PLUTO added to OpenSSL please.

Since there is no one in the OpenSSL developer community which is
standing up for this flag (from a specification and coding point of view
with an intention to finish the work relating to it). It should be
removed to make users life easier. This does not mean such a flag can
never go in, but the merits of it can be discussed at a later date (once
a body of code is available to require it).

Darryl

Darryl Miles

unread,

Oct 23, 2009, 10:47:51 AM10/23/09

to

David Schwartz wrote:
> Darryl Miles wrote:
>

> This is how everything else works, it's odd to say it's somehow a limitation
> of OpenSSL that it works the same way everything else works. Try to read to
> a string in one thread while you write to it from another. The general rule
> of thread synchronization is that it is your responsibility to serialize
> access to the same object from concurrent threads and the library's job to
> synchronize accesses to distinct objects. OpenSSL follows this general rule.
>
> Kernel objects are the exception, only because we cannot allow a program
> (broken or valid) to screw up kernel objects. So the kernel has no choice
> but to "overserialize".

FYI modern kernel's do not need to serialize (let alone "overserialize",
whatever that means, is that a computer science term?). I.e. the read()
write() code paths for the same file-descriptor/handle can be called
simultaneously from two or more threads without any harm to the kernel.
Sure fine grained serialization of the workings inside the kernel
might take place, but thats is implementation detail, irrelevant to the
contract the kernel API provides its users.

This is merely a result of prudent multi-threaded coding inside the
kernel presumably as a result of a "performance centric usage case" that
customers/users want.

I advocate that some users would find it useful to be able to invoke
SSL_read() and SSL_write() from exactly two threads on the same 'SSL *'
simultaneously. There is merit in this and as things stands OpenSSL
does not allow it due to a design choice (aka "design limitation").

I do not advocate that expanding the above scope to allowing more than
two threads or two threads both SSL_write() or both SSL_read() would be
useful. I see no merit in that (one factor in this is that the nature
of SSL/TLS is that a sequence of packets are serialized on the wire and
that checksums/state from the last packet influence the encoding of the
next, this is part of the "tamper proof security" provided by SSL/TLS,
so there is no case to parallelize).

There is no reason why OpenSSL can not allow two threaded operation if
it were designed differently. So I stand by my usage of the word
"limitation".

>> Your application then also has to evaluate its intent to send data, you
>> don't always have something more to send. If you do then you need to
>> indicate to the OS to wake me up if I can push more data down into the
>> kernel buffer.
>
> No, that is not how OpenSSL works.

Who was talking about OpenSSL here ? "Your application ...." is the
clue here, see if you can get a clue, try reading it again in context
was the topic being discussed.

Victor Duchovni

unread,

Oct 23, 2009, 11:04:41 AM10/23/09

to

On Fri, Oct 23, 2009 at 03:47:51PM +0100, Darryl Miles wrote:

> I advocate that some users would find it useful to be able to invoke
> SSL_read() and SSL_write() from exactly two threads on the same 'SSL *'
> simultaneously. There is merit in this and as things stands OpenSSL does
> not allow it due to a design choice (aka "design limitation").

You are mistaken. There are no message boundaries, and multiple threads
reading and writing the same SSL session would get random fragments of
the remote data on read, and emit random fragments of data on write.

There is no sensible use-case for concurrent multiple thread access
to an SSL object. All access must be serialized to ensure remotely
reasonable semantics.

--
Viktor.

Jason Pettiss

unread,

Oct 23, 2009, 11:50:38 AM10/23/09

to

> > Now the next question you might want to ask, "is it
> allowed for
> > exactly two threads to operate specifically the
> SSL_read() and
> > SSL_write() on the _SAME_ 'SSL *' instance at the same
> time ?" My
> > understanding would be that the answer is
> NO. This is a limitation in
> > the OpenSSL library, since some of the shared parts of
> 'SSL *' have no
> > protection and the SSL_read() and SSL_write()
> code-paths have not been
> > audited/reworked to minimize the contention/data-race
> issues.
>

> This is how everything else works, it's odd to say it's
> somehow a limitation
> of OpenSSL that it works the same way everything else
> works. Try to read to
> a string in one thread while you write to it from another.

I think we've lost the point: if I write to a socket from more than one thread at a time, clearly I've messed up. Even if the operating system doesn't complain, my stream is nonsense (unless I only ever write a single byte at a time).

However, it's clearly alright to read a socket from one thread while writing a socket from another: indeed, this is the purpose of a socket. That OpenSSL doesn't allow this usage seems like a limitation of the library. (Although maybe it's actually of the TLS protocol itself...?)

> The other gotcha is that if you use separate read and write
> threads, you
> *must* remember that an SSL connection only has one state.
> You cannot
> independently maintain your own state in each thread, or
> you can deadlock.

> You see, in step 4, the write thread *must* know that the

> read thread
> changed the SSL connection's status. Otherwise you
> deadlock.

Your explanation here is excellent. If I understand it correctly it's not really the problem of multiple access to a shared buffer which would understandably cause corruption, it's that there's a single flag which indicates the 'direction' if you will of the SSL structure itself:

#define SSL_ERROR_WANT_READ 2
#define SSL_ERROR_WANT_WRITE 3

And since these are defined in such a way that you can't have both READ|WRITE at the same time, if I don't somehow externally remember this information and share it between my threads I could run into trouble.

Ok so to summarize you Dave and Darryl: Blocking sockets + OpenSSL will only work for a request-response model without redesigning the library itself, because external synchronization deadlocks (for obvious reasons) and no synchronization deadlocks because the library/application no longer know what needs to happen to make forward progress.

Forgive me if I misunderstand either of you, but it sounds like if I use non-blocking sockets, I'll be able to use but a single thread to both push & pull independent streams of data, and I don't have to wait for an interrupted write to complete in order to begin a new read, or vice versa, so long as I remember the actual WANT_* state of each stream.

I'd been warned away from non-blocking socket use in OpenSSL from the varies searches I did across this mailing list, but honestly I'd actually prefer to use them.

To make sure I'm clear on this: if I myself don't have any data to read and an SSL_write returns WANT_READ, that doesn't mean I myself need to call SSL_read-- what it means is I need to wait until the socket is readable, and then call SSL_write again (with the same args of course).

It'd be awesome if there was a 'canonical' example for this... I've read through several different applications using OpenSSL (stunnel, Ice, curl) but they're so heavily hacked up to overcome various system limitations / implementation needs that it's not entirely obvious what's going on.

Guess I'll go make that example now. :)

Thanks much,

--jason

Victor Duchovni

unread,

Oct 23, 2009, 12:08:52 PM10/23/09

to

On Fri, Oct 23, 2009 at 08:50:38AM -0700, Jason Pettiss wrote:

> However, it's clearly alright to read a socket from one thread while
> writing a socket from another: indeed, this is the purpose of a socket.
> That OpenSSL doesn't allow this usage seems like a limitation of the
> library. (Although maybe it's actually of the TLS protocol itself...?)

SSL is a state-machine, not a pipe. Reading data may require writes, and
writing data may require reads (e.g. when re-negotiating). If you want
to write and read as data arrives in either direction, don't block, and
enter the state machine to move data in either direction as data arrives.

--
Viktor.

Jason Pettiss

unread,

Oct 23, 2009, 12:15:35 PM10/23/09

to

> > I advocate that some users would find it useful to be
> able to invoke
> > SSL_read() and SSL_write() from exactly two threads on
> the same 'SSL *'
> > simultaneously. There is merit in this and as
> things stands OpenSSL does
> > not allow it due to a design choice (aka "design
> limitation").
>
> You are mistaken. There are no message boundaries, and
> multiple threads
> reading and writing the same SSL session would get random
> fragments of
> the remote data on read, and emit random fragments of data
> on write.
>
> There is no sensible use-case for concurrent multiple
> thread access
> to an SSL object. All access must be serialized to ensure
> remotely
> reasonable semantics.

Alright, here's a simple use case: I have a large file here, you have a large file there. We'd like to trade them. We have two independent streams available (one from me to you, one from you to me). A socket, in other words.

We could take turns sending discrete pieces of each file but that's silly and slow.

Assuming we can load these gigantic files into memory to make the example simpler, we could both do this to write:

char* p = entire_file_buffer;
char* e = p + size_of_file;
while (p!=e) {
int n = send(sock_fd, p, e-p);
if (n<0) return ERR;
p += n;
}

And we both do this to read:

char* p = entire_file_buffer;
char* e = p + size_of_file;
while (p!=e) {
int n = recv(sock_fd, p, e-p);
if (n<0) return ERR;
p += n;
}

It's simple, uses two threads, one socket, and makes the best use of our bandwidth.

So I'm hoping it is your misunderstanding actually, that you thought we were suggesting two different threads should be able to write the same SSL* at the same time, or that two different threads be able to read the same SSL* at the same time, which clearly doesn't make sense for a stream-based protocol. We weren't suggesting that.

We were suggesting that it would be really, really nice if the example above could have send replaced with SSL_write and recv replaced with SSL_read and it would just work. :)

--jason

Victor Duchovni

unread,

Oct 23, 2009, 12:19:47 PM10/23/09

to

On Fri, Oct 23, 2009 at 09:15:35AM -0700, Jason Pettiss wrote:

> We could take turns sending discrete pieces of each file but that's silly and slow.
>
> Assuming we can load these gigantic files into memory to make the example simpler, we could both do this to write:

It is possible to use non-blocking SSL_read() SSL_write() calls that
are interleaved, but not without a mutex or a separate thread that
owns all SSL I/O that consumes requests to read/write.

It is simpler to use two SSL connections. SSL is a state-machine, not a pipe.

--
Viktor.

Jason Pettiss

unread,

Oct 23, 2009, 12:34:22 PM10/23/09

to

> It is possible to use non-blocking SSL_read() SSL_write()
> calls that
> are interleaved, but not without a mutex or a separate
> thread that
> owns all SSL I/O that consumes requests to read/write.
>
> It is simpler to use two SSL connections. SSL is a
> state-machine, not a pipe.

Awesome the former suggestion fits my needs exactly: I have one thread that's gotta manage N sockets for both read & write and it's pretty agnostic about the data itself: just wants to push it along. I wasn't sure if it was ok to interleave but the confirmation is very nice to have.

Can I use two SSL connections over a single socket? That doesn't seem possible. How are the SSL connections going to synchronize use of that socket?

Two unidirectional sockets is my last resort here... in my experience unidirectional traffic is horrible for latency and without disabling TCP_NODELAY, it kills your throughput (assuming you're passing smallish messages).

--jason

Victor Duchovni

unread,

Oct 23, 2009, 1:15:22 PM10/23/09

to

On Fri, Oct 23, 2009 at 09:34:22AM -0700, Jason Pettiss wrote:

> > It is possible to use non-blocking SSL_read() SSL_write() calls that
> > are interleaved, but not without a mutex or a separate thread that
> > owns all SSL I/O that consumes requests to read/write.
> >
> > It is simpler to use two SSL connections. SSL is a
> > state-machine, not a pipe.

Two SSL connections over two sockets of course. Unless you want to
implement a stream multiplexor between TCP and SSL. Then you could
indeed build two SSL objects one for each logical direction of
data transfer. You can do nifty things with bio_pairs(), but building
multiple streams over TCP is probably too much complexity for what you want.

> Two unidirectional sockets is my last resort here... in my
> experience unidirectional traffic is horrible for latency
> and without disabling TCP_NODELAY,
> it kills your throughput (assuming you're passing smallish messages).

If you are proxying an inter-active protocol, you need to do it over a
single socket to avoid Nagle delays (or set TCP_NODELAY, which is fine
if you never send small packets unnecessarily).

If you are moving large files in two directions, just avoid writes that
don't fill the socket buffer.

--
Viktor.

David Schwartz

unread,

Oct 23, 2009, 3:11:06 PM10/23/09

to

Darryl Miles wrote:

> > Kernel objects are the exception, only because we cannot allow a
> > program
> > (broken or valid) to screw up kernel objects. So the kernel has no
> > choice
> > but to "overserialize".

> FYI modern kernel's do not need to serialize (let alone
> "overserialize",
> whatever that means, is that a computer science term?). I.e. the
> read()
> write() code paths for the same file-descriptor/handle can be called
> simultaneously from two or more threads without any harm to the kernel.

The kernel must be designed such that a non-privileged application can do
anything, even things that don't make logical sense, without harm to the
kernel. So the kernel has to handle even cases that make no sense at all,
such as two concurrent multi-byte 'write' operations to the same TCP socket.
It does this by extensive internal synchronization code that would normally
not be required.

Because OpenSSL doesn't have this issue, there is no reason it should have
that type of synchronization. As has already been pointed out in this
thread, it is perfectly fine if OpenSSL crashes if there are two concurrent
SSL_write calls to the same SSL connection. There is no sensible reason to
do that, and OpenSSL has nothing to defend (like the kernel does). So making
this work would be overserialization -- locking just to "permit" what is not
sane anyway.

> Sure fine grained serialization of the workings inside the kernel
> might take place, but thats is implementation detail, irrelevant to the
> contract the kernel API provides its users.

The contract the kernel API provides is that nothing the user does can mess
the kernel up, even if the user does something insane.

> This is merely a result of prudent multi-threaded coding inside the
> kernel presumably as a result of a "performance centric usage case"
> that
> customers/users want.

No, it's kernel self-defense. User-space libraries generally do not have
that kind of self defense. Try to read from a string in one thread while you
write to it in another and see what happens.

> I advocate that some users would find it useful to be able to invoke
> SSL_read() and SSL_write() from exactly two threads on the same 'SSL *'
> simultaneously. There is merit in this and as things stands OpenSSL
> does not allow it due to a design choice (aka "design limitation").

Right, but it's due to the fact that OpenSSL is like pretty much every other
thread-safe library. It doesn't permit concurrent access to the same object
from multiple threads unless that's a pure read access that doesn't change
any state. The lack of a useful feature that is atypical of libraries is not
a design flaw or unusual quirk.

> There is no reason why OpenSSL can not allow two threaded operation if
> it were designed differently. So I stand by my usage of the word
> "limitation".

Fine, it's a limitation of OpenSSL that it's like pretty much every other
thread-safe, user-space library.

> >> Your application then also has to evaluate its intent to send data,
> you
> >> don't always have something more to send. If you do then you need
> to
> >> indicate to the OS to wake me up if I can push more data down into
> the
> >> kernel buffer.

> > No, that is not how OpenSSL works.

> Who was talking about OpenSSL here ? "Your application ...." is the
> clue here, see if you can get a clue, try reading it again in context
> was the topic being discussed.

You were talking about how an application interacts with OpenSSL (look back
two paragraphs from the one you quoted). And that's not how an application
interacts with OpenSSL. You do not go to the OS when you want to do
something, like you would with TCP.

An application that wants to write data to an SSL connection calls SSL_write
whether or not it is possible to send data on the underlying SSL connection.
An application that wants to read data from an SSL connection calls SSL_read
whether or not there's data available to be read on the socket.

As I explained, operating as you describe will cause deadlocks. The data you
are waiting for may have already arrived and been processed by OpenSSL. An
OpenSSL-using application should not try to "look through" the SSL state
machine except when told to look at the socket by OpenSSL (by
WANT_READ/WANT_WRITE indications).

And, to be helpful, I would suggest that the simplest solution for your
application, assuming it doesn't need to handle large numbers of SSL
connections, would be to wrap the SSL connection in a service thread. That
service thread would have its own read/write state machine that tracks the
SSL state machine, issues SSL_read/SSL_write operations, blocks on the
socket when told to do so by OpenSSL, and so on. That way, you can emulate
blocking read/write operations if you want (blocking until the service
thread wakes you).

DS

Darryl Miles

unread,

Oct 24, 2009, 5:17:35 AM10/24/09

to

Victor Duchovni wrote:
> SSL is a state-machine, not a pipe. Reading data may require writes, and
> writing data may require reads (e.g. when re-negotiating). If you want
> to write and read as data arrives in either direction, don't block, and
> enter the state machine to move data in either direction as data arrives.

"not a pipe" is a little ambiguous. The generally accepted meaning of a
pipe is a single direction of data flow.

What we are talking about is a bidirectional-pipe (other people just
call this a 'socket' to differentiate it from a "pipe").

I don't interpret Jason's comments as implying that "SSL is a pipe". At
no point has Jason's problem been about only wanting a single direction
of data flow (without requirement for data to be flowing in the other
direction). Please read the original post again.

Hey did you know that TCP is a state-machine too. I bet you did. Hey
reading data might require writes too, in TCP that is, for example I
can't read any more new application data because the other end keeps
sending me the same data block over and over, so I must write an ACK so
that it sends me some new application data to process.

These matters have absolutely nothing to do with how application threads
of execution are provided an API to do the business. This is all down
to design rules and implementation.

Darryl

Darryl Miles

unread,

Oct 24, 2009, 5:51:10 AM10/24/09

to

The issue is down to the OpenSSL API thread-safety rules (which are
dictated to by the internal design of OpenSSL).

I covered those thread-safety rules in a previous posting.

Yes the common application design pattern for full-duplex SSL streams is
to only ever have one thread doing the work on a specific instance of
'SSL *' at any one time.

Given your application design requirements you indicated in your
original posting then in order to achieve all your goals you must use
the kernel socket in non-blocking mode.

The reasons why were explained in a previous reply of mine.

There is no reason why you should be warned away from using OpenSSL with
non-blocking sockets. But you have to understand multi-threaded
programming is hard and therefore more programmers will have difficulty
previously in simply not understanding the concepts correctly.

Jason Pettiss wrote:
> To make sure I'm clear on this: if I myself don't have any data to read and an SSL_write returns WANT_READ, that doesn't mean I myself need to call SSL_read-- what it means is I need to wait until the socket is readable, and then call SSL_write again (with the same args of course).

Okay this is a new question. Yes if you call SSL_write() and get back
-1/WANT_READ then yes you do need to call SSL_read() to unstall that
situation. In-fact SSL_peek() might be better to use if you have no
where to put the application data right now but you want to attempt to
see if the condition can be resolved by the next SSL protocol packet to
be processed. Obviously if application data exists and is in the way
calling SSL_peek() won't clear the SSL_write() stall. You must
SSL_read() that data so the OpenSSL library can get to the final part of
the renegotiation handshake packet.

I do not believe the SSL_write() call is allowed to access the
underlying BIO/kernel-socket to read in more data. I think SSL_write()
is allowed to process any data already read into buffer (from kernel to
OpenSSL library internal buffer) in an attempt to unstall the situation
itself. But it can't invoke read() on the kernel for it.

Due to this you have to call SSL_read()|SSL_peek() at least once, since
these calls are allowed to access the underlying BIO/kernel-socket to
attempt to read() in more data.

So once you observe an SSL_write() returning -1/WANT_READ you should
immediately attempt to SSL_read()|SSL_peek() and if that also returns
-1/WANT_READ then you can go to sleep and wait for more data to come in
(wait until the socket is readable).

When that data comes in you call SSL_read()|SSL_peek() and if that
doesn't return -1/WANT_READ then should give your SSL_write() another
try. From memory I think SSL_read()|SSL_peek() return 0 (i.e. no new
application data) at least once to eat up the final part of the
renegotiation handshake process.

But it doesn't hurt to always call SSL_write() after every
SSL_read()|SSL_peek() if you know that you are under this special
condition (that SSL_write() previously returned -1/WANT_READ). Once
your SSL_write() returns something other than -1/WANT_READ you can clear
this special condition.

Now the same is true in reverse. The special condition that is
SSL_read() returning -1/WANT_WRITE. I'm sure you can work out the
details on this.

David Schwartz

unread,

Oct 24, 2009, 7:02:44 PM10/24/09

to

Darryl Miles wrote:

> I do not believe the SSL_write() call is allowed to access the
> underlying BIO/kernel-socket to read in more data. I think SSL_write()
> is allowed to process any data already read into buffer (from kernel to
> OpenSSL library internal buffer) in an attempt to unstall the situation
> itself. But it can't invoke read() on the kernel for it.

If SSL_write has to read from the socket to make forward progress, there is
absolutely no reason it shouldn't just do so. There is no reason it should
compel the application to do it.

My documentation says:

[T]he return value of SSL_write() will yield SSL_ERROR_WANT_READ or
SSL_ERROR_WANT_WRITE. As at any time a re-negotiation is possible, a
call to SSL_write() can also cause read operations! The calling
process
then must repeat the call after taking appropriate action to satisfy
the needs of SSL_write(). The action depends on the underlying BIO.
When using a non-blocking socket, nothing is to be done, but select()
can be used to check for the required condition. When using a
buffering
BIO, like a BIO pair, data must be written into or retrieved out of
the
BIO before being able to continue.

This suggests the exact opposite of what you said. One of these sources is
right and the other is wrong, and it makes a huge difference which!

My understanding, for many years, coincides with this documentation.
However, I can't think of any specific case where this difference would have
affected me, as my coding is extremely defensive and would tolerate either
mechanism without a problem.

DS

Darryl Miles

unread,

Oct 26, 2009, 2:03:54 AM10/26/09

to

"One of these sources is right and the other is wrong" ... Yes, no,
maybe... You maybe correct in the detail here, I am going on my
hazy-memory of experimenting with this situation and the observable
behavior. But I never wrote up notes on the matter not saw fit to
improve the documentation.

My conclusions on it were that an SSL_write() can cause a packet decode
to complete but only:
* If the data for the entire packet has already been read() into the
SSL user-space buffer (i.e. no longer in the BIO/kernel). The
read-ahead optimization makes it possible for this to happen.
* If there is no application data waiting to be destructively removed
ahead of the re-negotiation packet. i.e. SSL_read(). Until all
application data has been sunk/removed from OpenSSL it won't decode the
next packet.

My memory on this was that SSL_write() itself won't call on the BIO to
perform a read() but it will attempt to decode the next incoming packet
from the data it may already have, this is in the hope that it turns out
to be the re-negotiation response (in many situations it gets lucky!).
If it decodes the next packet and it turns out to be incoming
application data then SSL_write() is stuffed! No amount of calling it
again will clear the -1/WANT_READ condition.

The largest part of my previous post was explaining how to handle the
situation generalls and calling SSL_read() and then re-trying
SSL_write() to see if the condition has cleared it the way to deal with
it. You can not rely on repeatedly calling SSL_write() alone to clear
the problem. Which was my interpretation of what Jason was asking.

To re-express the same thing another way:

SSL_write() calls can not by-pass the already in-progress inbound
application data (to get at the re-negotiation response packet
immediately). There is a possibility there is still some application
data waiting to be SSL_read() before the re-negotiation SSL protocol
packet can be seen, decoded and processed.

Imagine the re-negotiation SSL protocol packet is actually still inside
the kernel buffering (waiting for user-space to read() to pull it). Now
image that there are at least 2 large full-size application data packets
also spanned across the user-space and kernel buffers (ahead of the SSL
re-negotiation packet).

SSL_write() has no where to put the data once it has decoded a large
full-sized application data packet. Inside OpenSSL there is a rigid
buffering scheme, there is a decode buffer into which the encrypted
packet is read in from BIO/kernel. There is also a clear-text buffer
into which the resultant application data from a single packet decode
can be stored. The decode/decyption process only takes place if the
clear-text buffer is empty (i.e. user-space has SSL_read() all the
previous data from it, so it will attempt to pull in more data and
re-fill it).

It is for sure that OpenSSL doesn't have an infinite expandable memory
buffer to keep holding application data to allow SSL_write() to find the
re-negotiation packet. So it is the worst-case scenario I have in mind
when explaining how to handle the matter in my previous post.

The documentation could certainly be improved no matter what the correct
way to express the situation is. The docs were written to support the
implementation (not the other way around).

Darryl

Konstantin Ivanov

unread,

Oct 26, 2009, 2:43:30 PM10/26/09

to

Hi all,

I am developing a server application which is based on Windows IO Completion ports which basically means that the reads and write to the socket are asynchronous. This also means that I cannot use the SSL_read and SSL_write functions which are tied to the socket fd if I am correct. So I tried to use the BIO_read and BIO_write, but I am having difficulty in using it. Basically what I would like to do is to read the content passed from the client over SSL connection into the buffer, which I can decrypt using, parse, and then issue another read command on the completion port. For send, I would like to write data into an encrypted buffer and then post a send command to the completion port with the pointer to encrypted data. Can someone please comment on how I could implement such functionality as I believe I am suing the BIO_read and BIO_write incorrect (this was the tutorial that I referred to: http://h71000.www7.hp.com/doc/83final/ba554_90007/ch04s03.html)

Thanks,

Kyle Hamilton

unread,

Oct 26, 2009, 3:30:21 PM10/26/09

to

My understanding is that if SSL_ERROR_WANT_WRITE happened with
SSL_read(), the next SSL_read() would actually call write() to make
the forward progress.

-Kyle H

Darryl Miles

unread,

Oct 26, 2009, 7:13:23 PM10/26/09

to

Kyle Hamilton wrote:
> My understanding is that if SSL_ERROR_WANT_WRITE happened with
> SSL_read(), the next SSL_read() would actually call write() to make
> the forward progress.

Yes that is possible, as the data for the write is already inside
OpenSSL library. Infact all the write to BIO/kernel does (in this case)
is push already encrypted data that has been prepared (but was not
written to BIO/kernel) from a previous SSL_write(). It doesn't actually
prepare any new application data for encryption, this is what I called
an attempt to "flush" the data downwards (when I discussed
SSL_shutdown() issues in that bug that has now been fixed).

I suggested the SSL_read() with -1/WANT_WRITE special condition would be
handled the same way as the reverse, since that is an easy way for a new
user to understand.

This situation is also rarer to observe, but easiest to get right.

New users should deal with the SSL_write() with -1/WANT_READ first since
that special condition has a few more caveats to it.

Darryl Miles

unread,

Oct 26, 2009, 7:22:04 PM10/26/09

to

BIOs should be used for overlapped IO. Your BIO layer is responsible
for allocating and pinning chunks of memory while the OS has the IO in
progress and then getting IO completion signals and
unpinning/deallocating that memory.

Like all good programs your BIO should track the total amount of memory
in use by a single socket and place arbitrary limits so that the correct
soft-error returns can be provided to effect flow-control.

Of course you _CAN_ still use SSL_read() and SSL_write(). Those two
functions are for managing clear-text (aka application data) in relation
to the SSL communications stream. The API from the point-of-view should
work in a useful way even with overlapped IO.

With overlapped IO you create your own BIO layer (which is the buffering
layer underneath the OpenSSL library. You then use this instead of the
default "BIO socket" implementation. Your BIO is only handling
cypher-text data and its job is to effect flow-control, buffering and
conveyance of the cypher-text data to the other end of the connection.

If you really want assistance with overlapped IO then I suggest you
create a new thread for it.

If you are having major problems with overlapped IO why don't you used
regular sockets first get your code working on that. You can upgrade
your code to use overlapped IO later but all of the code that handlers
clear-text can remain the same (you won't need to re-work it).
Overlapped IO is the Windows performance networking solution does your
application even need that find of performance ? Are you moving large
amounts of bulk-data around ?

David Schwartz

unread,

Oct 26, 2009, 7:41:49 PM10/26/09

to

Konstantin Ivanov wrote:

> I am developing a server application which is based on Windows IO
> Completion ports which basically means that the reads and write to
> the socket are asynchronous. This also means that I cannot use the
> SSL_read and SSL_write functions which are tied to the socket fd
> if I am correct.

No, they are tied to the underlying BIO, which need to be a socket.

> So I tried to use the BIO_read and BIO_write, but I am having
> difficulty in using it. Basically what I would like to do is to
> read the content passed from the client over SSL connection into
> the buffer, which I can decrypt using, parse, and then issue another
> read command on the completion port. For send, I would like to write
> data into an encrypted buffer and then post a send command to the
> completion port with the pointer to encrypted data.

That will not work. SSL does not have "encrypt" and "decrypt" operations
that are visible at application level.

> Can someone please comment on how I could implement such
> functionality as I believe I am suing the BIO_read and BIO_write
> incorrect (this was the tutorial that I referred to:
> http://h71000.www7.hp.com/doc/83final/ba554_90007/ch04s03.html)

Use BIO pairs. There's example code in the 'apps' directory. Your code has
to manage four logically-independent data streams.

1) If you receive data from the socket, give it to the OpenSSL engine.

2) If you have plaintext your application wants to send, give it to the
OpenSSL engine.

3) If the OpenSSL engine has encrypted data it wants to send over the
socket, give it to the socket.

4) If the OpenSSL engine has decrypted data it wants to give to your
application, get it from OpenSSL and process it.

Do not assume any correspondence between these operations (even though there
will almost always be one). If you send some plaintext data, OpenSSL will
likely have some ciphertext to send on the socket, but don't stop checking
for ciphertext just because you didn't send any plaintext. And it's not an
error if your plaintext generates no ciphertext (OpenSSL may not yet have
received enough information to know how to encrypt it.)

Do not try to "look through" the SSL state machine. Just run all four data
pumps, and it will work.

DS

Mark

unread,

Oct 28, 2009, 4:40:40 AM10/28/09

to

> On Fri, Oct 23, 2009 at 03:47:51PM +0100, Darryl Miles wrote:
>

> > I advocate that some users would find it useful to be able
> to invoke
> > SSL_read() and SSL_write() from exactly two threads on the
> same 'SSL *'
> > simultaneously. There is merit in this and as things
> stands OpenSSL does
> > not allow it due to a design choice (aka "design limitation").
>

> You are mistaken. There are no message boundaries, and
> multiple threads
> reading and writing the same SSL session would get random fragments of
> the remote data on read, and emit random fragments of data on write.
>
> There is no sensible use-case for concurrent multiple thread access
> to an SSL object. All access must be serialized to ensure remotely
> reasonable semantics.

I can think of one. In the near future I will need to add SSL support
to a
legacy application which uses two threads to read/write from/to a
socket.
If SSL supported this it would make my life much easier. As the
situation
stands I am not sure how to tackle this project.

Best Regards,
Mark Williams, Tech OP ltd

David Schwartz

unread,

Oct 29, 2009, 7:29:23 AM10/29/09

to

Mark Williams wrote:

> I can think of one. In the near future I will need to add SSL support
> to a
> legacy application which uses two threads to read/write from/to a
> socket.
> If SSL supported this it would make my life much easier. As the
> situation
> stands I am not sure how to tackle this project.

There are two obvious, simple ways:

1) Have another application that does the SSL work, you can even use
existing ssl proxies. Then you don't have to change the IO in your pump.

2) Let the two threads read and write to your own two independent queues and
service the application side of the SSL connection with your own code to and
from the read and write queues.

DS

Mark

unread,

Oct 29, 2009, 7:45:18 AM10/29/09

to

> Mark Williams wrote:
>
> > I can think of one. In the near future I will need to add
> SSL support
> > to a
> > legacy application which uses two threads to read/write from/to a
> > socket.
> > If SSL supported this it would make my life much easier. As the
> > situation
> > stands I am not sure how to tackle this project.
>
> There are two obvious, simple ways:
>
> 1) Have another application that does the SSL work, you can even use
> existing ssl proxies. Then you don't have to change the IO in
> your pump.

The client wants the whole thing contained in one library so I don't
think this
one is an option.

> 2) Let the two threads read and write to your own two
> independent queues and
> service the application side of the SSL connection with your
> own code to and from the read and write queues.

Won't I still need to combine the reading and writing to the SSL object
into a
single thread for this? This is the bit I am having difficulty
visualising.

Are there any samples around that do this?

Mark.

David Schwartz

unread,

Oct 29, 2009, 8:25:54 AM10/29/09

to

Mark Williams wrote:

> > 2) Let the two threads read and write to your own two
> > independent queues and
> > service the application side of the SSL connection with your
> > own code to and from the read and write queues.

> Won't I still need to combine the reading and writing to the SSL object
> into a
> single thread for this? This is the bit I am having difficulty
> visualising.

The data pump thread is more or less like this:

While (connection_is_alive)
{
If (connection_is_not_corked_in)
{
SSL_read into temporary buffer.
If we got data:
{
If a read thread is blocked, unblock it.
If the receive queue is too full, set the 'corked_in' flag.
}
If we got a fatal error, mark the connection dead.
}
If(send queue not empty)
{
Try to send some data using SSL_write
Put back what we didn't send
}
If we made no forward progress, block (see notes)
}
Tear down the connection

The read thread acquires the queue mutex, blocks on the condvar for data if
desired, pulls data off the queue, and clears the corked_in flag if it was
set (assuming the queue is still not full), and signals the data pump thread
if it uncorked.

The write thread acquires the mutex, checks if the send queue is full,
blocks on the condvar if it is, and signals the data pump thread if the queu
was empty.

The only trick left is the blocking logic in the data pump thread. This is
the hard part:

1) If you have no outbound data pending, and the connection is corked, block
only on an internal signal. (Since you don't want to do I/O either way
anyway.)

2) If you have outbound data pending and the connection is corked, block as
directed by SSL_write. If it said WANT_READ, block on read. If it said
WANT_WRITE, block on write.

3) If you have no outbound data pending (and hence, did not call SSL_write),
and the connection is uncorked, block as directed in SSL_read.

4) If you have outbound data pending, and the connection is uncorked, block
on the logical OR of the SSL_read result and the SSL_write result (block for
read on the socket if either one returned WANT_READ, block for write if
either returned WANT_WRITE).

Note that your data pump threads needs to block on a 'select' or 'poll' type
function but be unblocked when signaled. If necessary, add one end of a pipe
to the select/poll set and have you read/write threads write a byte to that
pipe to unblock the data pump thread.

This is from memory, but it should be basically correct.

By the way, I think only the logic in 4 is not obviously correct. Here's the
proof it's safe:
1) If nothing changed, and we block on the OR of both operations, we will
only unblock if one of those operations can make forward progress. (We only
unblock on X if one operation Xould make forward progress on X, and nothing
has changed since then.)
2) If something changed, then we already made some forward progress.
So either way, we make forward progress in each pass of the loop, which is
the best you can hope for.

DS

Mark

unread,

Oct 29, 2009, 8:49:44 AM10/29/09

to

Thanks. This will take me some time to digest.

There is one added complication in that the protocol is a datagram
protocol at a
higher level (although it uses TCP). I am concerned that the whole
protocol could
block if there is not enough data to encrypt a whole outgoing message
but the peer cannot
continue until it gets the message.

Mark.

David Schwartz

unread,

Oct 29, 2009, 9:20:12 AM10/29/09

to

Mark Williams wrote:

> There is one added complication in that the protocol is a datagram
> protocol at a
> higher level (although it uses TCP). I am concerned that the whole
> protocol could
> block if there is not enough data to encrypt a whole outgoing message
> but the peer cannot
> continue until it gets the message.

What do you mean by "not enough data to encrypt a whole outgoing message"?
The only way it can block is if each side is waiting for the other, and if
that happens, the application protocol is broken anyway. There is no way
this logic can cause one side to internally block.

The 'cork' logic only stops us from reading if we have already read data the
application has not processed yet. If the application does not process read
data, then it is broken, but we are not. The write queue logic only stops us
from accepting data from the application to send if we have unsent data. If
the other side does not read this data, then it is broken but we are not.

In fact, any application layered on top of TCP is broken if it cannot handle
a TCP implementation that permits only a single byte to be in flight at a
time. If it *ever* allows each side to insist on writing before reading at
the same time, it is broken.

On the off chance you do have to deal with a broken TCP-using application
(and you do all too often), just make sure your queues, in both directions
on both sides, are larger than the largest protocol data unit. (More
precisely, the amount of data both sides might try to write before reading
any data.)

DS

Mark

unread,

Oct 29, 2009, 9:55:20 AM10/29/09

to

Hi David,

> > There is one added complication in that the protocol is a datagram
> > protocol at a
> > higher level (although it uses TCP). I am concerned that the whole
> > protocol could
> > block if there is not enough data to encrypt a whole
> outgoing message
> > but the peer cannot
> > continue until it gets the message.
>
> What do you mean by "not enough data to encrypt a whole
> outgoing message"?
> The only way it can block is if each side is waiting for the
> other, and if
> that happens, the application protocol is broken anyway.
> There is no way
> this logic can cause one side to internally block.

I may be making a wrong assumption but if the cypher used is a block
cypher does it not wait until a full block of data is ready before it
can encrypt and send the data? If a message does not consist of enough
data to fill a block, could there be unencrypted data left in a buffer
somewhere? The peer would see that a whole message has not been
received
an wait for the rest of it ... which never comes.

Mark.

Darryl Miles

unread,

Oct 29, 2009, 11:33:37 AM10/29/09

to

Mark wrote:
> There is one added complication in that the protocol is a datagram
> protocol at a
> higher level (although it uses TCP). I am concerned that the whole
> protocol could
> block if there is not enough data to encrypt a whole outgoing message
> but the peer cannot
> continue until it gets the message.

SSL_write() can be for any length the API datatype allows (I think it is
currently a C data type of 'int'). If you use SSL_write() with 1 byte
lengths you will get an encoded SSL protocol packet sent over the write
with a single byte of application-data payload. This would not be very
efficient use of SSL (since you'd have many bytes of SSL overhead per
byte of application-data).

The sending side is allowed to merge more application-data together
under such circumstances that forward flow-control is not allowing the
fresh new data we are currently holding to be sent in the "first attempt
at transmission" to happen immediately AND the user makes an API call to
write more data. What is not allowed is for the stack to hold onto the
data (possibly forever) in the hope that the user will make an API call
to write more data.

I've tried to choose my words carefully in the above paragraph, so that
the words equally apply to TCP as SSL. In the case of SSL since it is
done over a reliable streaming-transport there no such thing as a "first
attempt at transmission" since it is reliable; there is only a single
action to commit data into the TCP socket. But it is possible for the
TCP socket to not be accepting data just yet (due to flow-control). It
would be that conceptual boundary this that relates to.

Also one difference between TCP and SSL is that TCP has octet-boundary
sequences/acknowledgments but in SSL all data is wrap up into
packetized-chunks. This means TCP other optimizations it can make with
regards to retransmissions make it more efficient. Those things don't
apply to SSL.

If you use larger writes (to SSL_write()) then this is chunked up into
the largest possible packets the protocol allows and those are sent over
the wire.

It is presumed that every SSL_write() requires a flush (at TCP level
this mechanism is called a "Push"). This basically means the data needs
to flush to the reading API at the far end on exactly the byte boundary
(or more) data than you sent. This mean you have a guarantee to not
starve the receiving side of data that the sending API has
sent/committed. This is true at both the TCP and SSL levels.

If you think about it the SSL level could not make the guarantee easily
if the lower level did not also provide that guarantee.

Providing you use non-blocking APIs there is no way things can block
(meaning now way for your application to no be in control at all times
to make a decision), this means socket<>SSL is using non-blocking it
also means the SSL<>your_datagram_protocol is using non-blocking paradigm.

The only issue you then need to look at is starvation (imagine if the
receiving side was in a loop to keep reading until there was no more
data, but due to the CPU time need to do the data processing in that
loop it was possible for the sending side to keep the receiving side
stocked full of data). If you just looped until you had no more data
from SSL_read() (before servicing the SSL_write() side) then the
SSL_write would be starved.

So you might want to only loop like this a limited number of times, or
automatically break out of trying to decode/process more data in order
to service the other half a little bit.

Now there is another issue which isn't really a blocking one, it is more
a "deadlock". This is where due to your IO pump design and the
interaction between the upper levels of your application and the
datagram/SSL levels you ended up designing your application such that
the same thread is used to both service the IO pump and the upper levels
of the application (the data processing). This is possible but requires
careful design. For whatever reason the upper levels stalled/blocked
waiting for IO, and this means your thread of execution lost control and
starved the IO pump part from doing its work (because its the same thread).

Everything that happens on the IO pump part needs to be non-blocking, if
you use the same thread to service the upper levels of your application
then you must know for sure they are non-blocking. Otherwise you are
best separating the threads here the IO pump and the upper levels.

Often this is best because it frees up the constriction about what you
can do in an upper level, it does not matter any more what you do there,
call/use whatever library you want without regard for blocking behavior.
You can also use a single IO pump thread to manage multiple
connections if you want (and performance allows) then you need to think
about per 'SSL *' IO starvation, i.e. make sure you service everyone a
little bit as you go round-robin.

Darryl

Ger Hobbelt

unread,

Oct 29, 2009, 4:12:56 PM10/29/09

to

>> There is one added complication in that the protocol is a datagram
>> protocol at a
>> higher level (although it uses TCP). I am concerned that the whole
>> protocol could
>> block if there is not enough data to encrypt a whole outgoing message
>> but the peer cannot
>> continue until it gets the message.

If you mean that the upper layer protocol is message-oriented rather
than stream-oriented ('datagram' is a Rorschach blot for me that says:
UDP sorry) and the protocol is constructed such that outgoing
message REQ(A) must have produced [a complete] answer message ANS(A)
before the next outgoing message REQ(B) is sent over the wire, then
you're in fancy land anyway, as that is not a class 101 scenario for
TCP, which is by design stream-oriented.

At the TCP level and given such a 'message interdependency'
requirement, which is the extreme form of what I'm assuming you mean
with the mention of 'datagram protocol', you'll need to ensure both
sender, receiver (and any intermediaries) have their TX (transmit) and
RX (receive) buffers flushed entirely before the next 'message'
exchange (REQ(B)->ANS(B)) can take place.

To get a glimmer of what must be done then (and which might be needed
in your case, when your protocol is less extreme and consequently can
- and will - not wait for response messages for previously sent
messages before the next message goes out) think about old-fashioned
telnet: a keypress is way smaller than a TCP packet can be, so telnet
needed a way to push that ENTER keypress out the door and pronto, so
you, the user, would get some 'interactivity' on your console. The
TCP_NONAGLE socket flag has been introduced to service this need way
back when: given a short timeout, tiny buffer fills are flushed into a
TX packet anyway. The receiver will be able to fetch any byte length
data it actually receives, so when we entice the sender into
transmitting even the small chunks, we're good to go there.

> It is presumed that every SSL_write() requires a flush (at TCP level this
> mechanism is called a "Push"). This basically means the data needs to flush
> to the reading API at the far end on exactly the byte boundary (or more)
> data than you sent. This mean you have a guarantee to not starve the
> receiving side of data that the sending API has sent/committed. This is
> true at both the TCP and SSL levels.
>
> If you think about it the SSL level could not make the guarantee easily if
> the lower level did not also provide that guarantee.

^^^^ the guarantee at the lower level is NONAGLE, which is /not/ the
default in TCP stacks as it can result in suboptimal network usage by
transmitting overly small packets on the wire.

I haven't done this sort of thing with SSL on top for a few years now,
but from what I hear in this thread SSL_write(len := 1) will pad such
data while crypting on a per-write invocation basis (note those last
few words!) and thus write a full SSL packet into the TX side of the
socket for each write into the TX pipeline (I may be Dutch, but I live
by the German rule: "Vertrauen ist gut, Kontrolle ist besser", and you
should too: trust is good, but making dang sure is so much better ;-)
)
Also, there's the purely emotional and very unchecked goblin at the
back of my brain who mumbles: "oh yeah? no buffering incoming
plaintext on the TX side so the SSL layer doesn't get to do a lot of
otherwise probably superfluous work when the write chain is abused by
the application layer by writing tiny chunks all the time?" Don't take
my goblin for his word, he's a definite a-hole sometimes ;-) , but it
won't hurt to make sure the 'non-buffering' flush/push aspect of your
write-side BIO chain is guaranteed. Does the OpenSSL documentation
explicitly mention this behaviour? should be the authoritative answer
there.

From my work with BIOs, I seem to recall the SSL BIO encapsulates
SSL_write et al (or was it vice-versa? Heck, that's what I get when
doing this off the top of my head while not having used SSL for the
last half year), so the findings for one expand to the other.
Injecting other BIOs in your chain (base64, etc.) will impact this 'is
all data flushed throughout == non-buffering TX behaviour' aspect.

Anyway, using NONAGLE (telnet is **NO**nagle, default socket using
applications use the default(!) NAGLE) on the TX side should, assuming
the SSL/BIO chain flushes as well, ensure your outgoing REQ(n) gets
out the door and on the wire. Which leaves the receiver side: as the
transmitter can only 'flush' like that with SSL in the chain when the
flush is on whole SSL message boundary only (indeed resulting in some
SSL 'packets' ending up containing only a single (encrypted) content
byte as a principle), so the receiver should be okay in depleting its
RX buffers as well as the SSL layer there can, theoretically, process
every byte received, thus spitting out the content (plaintext) bytes
to the very last one which was pushed on the wire.
Hence your application layer can then get a the complete incoming
message before it sends a response, without any extra work on the RX
side.

(For those who wonder: HTTP can also be considered a message protocol:
REQ(GET url) gives you ANS(data) in the response. The basic way to
delineate 'message' here while riding on top of TCP (which is /stream/
by design) is to close the TCP stream after each message. (Client
sends message, does half-close, server receives data, the half-close
back there flushes all REQ() bytes out so the server can get it all
and construct the response, which is transmitted and the connection is
now fully-closed by the server.)
Only when you do a few extra things can you keep the connection (HTTP
persistent connections) open, and sending along the number of bytes as
a message parameter is not all of it, so having a look at how
persistent-connection supporting, well-written, HTTPS clients and
servers implement this might be handy. SSL-based telnet clients are
simpler and carry the same technology (read: tweaks) as they need the
same sort of behaviour.)

Note that I have met quite a few applications in the past which used
SSL+TCP for message protocols like these and the argument always was
"what are you complaining about? it works, doesn't it?" and, yes, it
often works without the details. And what's the bother when you can
reboot your machine when things lock up once in a while; just when it
doesn't happen all to often on your watch, hey? The trouble at TCP
level which hides the issue is the timeouts: even without the NONAGLE,
TCP stacks have timeouts ticking away, which will tickle the kernel
into sending those little chunks remaining in the TX buffers anyway
after the timeout expires, even when such a tiny chunk doesn't fill an
MTU up i.e. produces a suboptimally small TCP packet).
The only thing noticable then is the slight delays; in the end they
limit your throughput bandwidth at application level way below the
level attainable by the hardware (network) at hand.
The other noticable issue is that sometimes such apps lock up and it
requires either kill or reboot or at least severing the TCP connection
to 'recover'.

So far message-based traffic over network stream protocols. W Richard
Stevens (R.I.P.) surely explained it way better than I do, alas.

About the ever-recurring WANT_READ/WANT_WRITE stuff, heck, folks might
have a separate high-bandwidth mailing list for that one alone, if
only we collectively knew what the heck we're talking about, here's a
tip for ya to help you detect WANT_READ/WRITE misbehavin' in your
client and/or server code:
since you can edit the code, add extra code which randomly interjects
SSL renegotiation requests (check the OpenSSL API; there's a very
simple call for that) while you have the SSL connection open. The
random renegotiation action will trigger SSL into requesting and
transmitting all sorts of stuff under the hood you don't need to
concern yourself with, but as a consequence it will trigger quite a
few WANT_WRITE at read-side and WANT_WRITE at write side, thus
increasing the number of occurrences of these response codes and hence
giving you the opportunity to detect read/write processing issues that
much faster. At plaintext / application side, it does not impact the
number of bytes you get out of the SSL read side, so no harm done
there, while you forcibly kick the SSL engine into extra work which
will trigger all sorts of otherwise 'spuriously occurring' issues
regarding return codes and your implementation.

Be sure to add such a (semi-)random renegotiation injector at both
client and server side; I have found that many applications are
incorrectly constructed and a couple of hours of this in a test bed
will almost certainly blow them to kingdom come.
Better now than once you've entered production, I'd say. ;-)

And having a gander at the actual network traffic generated by your
app and (!) checking the timing of the outgoing messages (to detect
unexpected timeouts kicking in and causing TX after all) won't hurt:
Wireshark or other packet sniffers are an asset.

(For when you pay attention to detail: note that the TCP-level NONAGLE
behaviour still is timeout based, which often is okay as the timeout
is relatively small, but if you have an extreme case where messages
must be flushed /immediately/ onto the wire while you're using a TCP
stream (so no timeout whatsoever), then you enter the non-portable
zone of IP stack cajoling.
This is the ultimate price you pay for [ab]using a stream protocol for
message-based I/O. (And that's a generic remark, as it always applies
for any such transformation, not just with TCP or SSL/TCP.)
This would mean that you're essentially forcibly 'shaping' your TCP
packets from the application level, which is very much against design
and severely frowned upon and not supported by some IP stacks at all
(we're not talking about in-spec stuff anymore here, after all), but
I've had one occurrence where such was required (and, yes, such is a
99.9% sure indicator someone screwed up in the design elsewhere). If
such is ever the case, I would not spend the effort but advise to use
DTLS instead (which is UDP-based secure comms and thus thinking
'message' all the way through to lowest hardware layer). This is
particularly important when your message protocol comes with large
throughput requirements and a protocol design where request
transmissions can halt until a certain requests response message has
been complete received and processed by the requestor (a design, which
would suffer from round-trip network delay at such intersections no
matter what you do, BTW). You may note that is one reason why
networked game engine communication is often UDP/DTLS based, to name
one example. Fortunately I notice a lot of work is being done on
OpenSSL DTLS, so this might be a viable option for you today.

I've reiterated (and maybe garbled) some material already mentioned by
others here today, but I felt it was needed for a complete picture.
The attention needed for blocking versus non-blocking I/O is very
valid and quite necessary IMO, and so is strong mention of the
attention required to do good towards WANT_READ/WANT_WRITE
implementation details, but I also find rare the explicit mention of
the perils (both on a theoretical and practical basis) of 'making' a
message-based [application-level] protocol work over a stream-based
lower level protocol, such as TCP, as streams are legally allowed (and
should do so to optimize network usage) to buffer TX pipelines along
the way, resulting in a probable inability to deliver messages at the
other end as a whole, before further data is entered in the same
direction. (Read 'probable' as 'incidental': that's what makes the
mistake very easy to execute and hard to diagnose, especially in
environments where TTM (time To Market) is essential -- and aren't we
all part in that game? Unawareness in your reviewing peers permits one
to slip such mistakes through the net unnoticed. Network programming
never was easy and if it is now, I must have missed that newsflash on
my MTV. ;-) )

--
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web: http://www.hobbelt.com/
http://www.hebbut.net/
mail: g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------

Darryl Miles

unread,

Oct 29, 2009, 10:23:46 PM10/29/09

to

Ger Hobbelt wrote:
>> It is presumed that every SSL_write() requires a flush (at TCP level this
>> mechanism is called a "Push"). This basically means the data needs to flush
>> to the reading API at the far end on exactly the byte boundary (or more)
>> data than you sent. This mean you have a guarantee to not starve the
>> receiving side of data that the sending API has sent/committed. This is
>> true at both the TCP and SSL levels.
>>
>> If you think about it the SSL level could not make the guarantee easily if
>> the lower level did not also provide that guarantee.
>
> ^^^^ the guarantee at the lower level is NONAGLE, which is /not/ the
> default in TCP stacks as it can result in suboptimal network usage by
> transmitting overly small packets on the wire.

Huh... Nagle is to do with how a TCP stack handles when to send a first
transmission of data, it comes into play only when the difference
between the Congestion Window minus the amount of un-acknowledged data
is less than 1*MSS.

http://en.wikipedia.org/wiki/Nagle's_algorithm

The congestion window is the active window that is in use by a
connection for the sending side of the connection. The CWnd (as often
termed) is between 1*MSS and the negotiated maximum window size (that
was made at the start of the connection).

http://en.wikipedia.org/wiki/Congestion_window

The CWnd starts off small and due to the "Slow Start Algorithm" opens up
towards maximum window size for every successfully transmitted segment
of data (that didn't require retransmission).

http://en.wikipedia.org/wiki/Slow-start

This is a simplistic view (on Slow Start) since many factors such as VJ
fast recovery and SACK found in all modern stacks impact Cwnd.

In short NAGLE is to do with reducing latency (at the cost of
bandwidth). This has nothing to do with ensuring a flush of
application-data so that is appears via SSL_read() at the far end.

So with all that said on what Nagle is, I can tell you Nagle doesn't
have anything to do with the TCP Push flag and its meaning.

Here is a possible useful reference lookup the section on "Data
Delivery" at the page:

http://en.wikipedia.org/wiki/Transmission_Control_Protocol

In short the TCP Push function is to do with flushing the data at the
receiving side to the application immediately, so that is maybe read().

> Anyway, using NONAGLE (telnet is **NO**nagle, default socket using
> applications use the default(!) NAGLE) on the TX side should, assuming

I am asserting that the TCP setsockopt for TCP_NODELAY is completely
unnecessary and potentially bad advice for a cure to getting a flush of
application data sent with SSL_write() at the receiver via socket
descriptor wakeup mechanism and SSL_read().

> (For when you pay attention to detail: note that the TCP-level NONAGLE
> behaviour still is timeout based, which often is okay as the timeout
> is relatively small, but if you have an extreme case where messages
> must be flushed /immediately/ onto the wire while you're using a TCP
> stream (so no timeout whatsoever), then you enter the non-portable
> zone of IP stack cajoling.

Erm.... NONAGLE does not have a timeout of its own. So I think this is
a little bit misleading to say it is timeout based. It is based on
either receiving an ACK packet for a sufficient amount of
un-acknowledged data or is based on the standard retransmission timer
that TCP uses. i.e. no ACK was received before the retransmission timer
expires, so the TCP stack goes into retransmission mode. Neither of
these things require NAGLE and the timeout in use is a required part of
a TCP protocol stack, where as NAGLE is optional.

The NAGLE logic only comes into play for freshly written/enqueued data
(e.g. application calls write() on socket), the TCP stack has to decide
if it should "send it now" or "queue it up". That is all NAGLE is.

In short NAGLE is to do with reducing latency (at the cost of bandwidth).

"sent it now" means we could be sending only 1 byte of new TCP data but
with TCP/IP overhead we might have 40 bytes of header to go with it.
Not very efficient use of bandwidth, so this is the bandwidth cost, but
we get lower latency as we sent that 1 byte right away.

"queue it up" means we don't sent it now, but stick it in the internal
kernel write buffer. This data will get looked at for transmission when
either an ACK comes back for previously un-acknowledged data or when the
standard retransmission timer expires (no ACK comes back within the time
limit). The trade here is that by waiting for one of those 2 events we
delay the conveyance of the application data until later. This
increases the observable latency for that data.

You can see why the main use of turning nagle off is when the
application is a human user using the network socket interactively. The
bandwidth cost is the price worth paying for increased productivity;
humans hate latency. But if a robot was using telnet it would be
efficient and be able to prepare the whole data to send in one go and
write() the entire command to the socket in one system call. The robot
would not care about having data echoed back in realtime, since it never
makes mistakes. yadda yadda.

Sure using NONAGLE with a SSL has its uses, but those uses are when "low
latency" is critical to your application. Not for when you require a
guaranteed flush of application-data at the receiver side. This is
infact an unnecessary concern providing you call SSL_write().

Forgive me I skimmed read the bulk of the rest of your reply, as I found
it hard to see the relevance and also hard to follow in a number of places.

Darryl

David Schwartz

unread,

Oct 30, 2009, 7:21:10 AM10/30/09

to

Mark wrote:

> I may be making a wrong assumption but if the cypher used is a block
> cypher does it not wait until a full block of data is ready before it
> can encrypt and send the data? If a message does not consist of enough
> data to fill a block, could there be unencrypted data left in a buffer
> somewhere? The peer would see that a whole message has not been
> received
> an wait for the rest of it ... which never comes.

No, that cannot happen. SSL does not permit the properties of the underlying
cipher it happens to be using to change the properties of SSL itself. That
would be horribly broken design. SSL presents a bidirectional byte-stream
that does not preserve message boundaries to the application layer,
regardless of the underlying cipher.

SSL does not encrypt and decrypt application data. It uses the underlying
cipher to encrypt and decrypt SSL protocol data that includes the
application data, among other things. It is the SSL protocol data that has
to be adapted to the underlying cipher.

DS

Mark

unread,

Nov 2, 2009, 7:16:46 AM11/2/09

to

Hi Darrel,

Thanks for the very useful and clear explanation.

> Mark wrote:
> > There is one added complication in that the protocol is a datagram
> > protocol at a
> > higher level (although it uses TCP). I am concerned that the whole
> > protocol could
> > block if there is not enough data to encrypt a whole
> outgoing message
> > but the peer cannot
> > continue until it gets the message.
>

> What is not allowed is for the stack to
> hold onto the
> data (possibly forever) in the hope that the user will make
> an API call to write more data.

In this case my concerns are unfounded.

> Now there is another issue which isn't really a blocking one,
> it is more
> a "deadlock". This is where due to your IO pump design and the
> interaction between the upper levels of your application and the
> datagram/SSL levels you ended up designing your application such that
> the same thread is used to both service the IO pump and the
> upper levels
> of the application (the data processing). This is possible
> but requires
> careful design. For whatever reason the upper levels stalled/blocked
> waiting for IO, and this means your thread of execution lost
> control and
> starved the IO pump part from doing its work (because its the
> same thread).

The IO pump thread would definitely be independent of all other layers
of
the protocol. I don't like mixing layers.

Regards, Mark.

Mark

unread,

Nov 2, 2009, 7:26:01 AM11/2/09

to

Hi Ger,

> >> There is one added complication in that the protocol is a datagram
> >> protocol at a
> >> higher level (although it uses TCP). I am concerned that the whole
> >> protocol could
> >> block if there is not enough data to encrypt a whole
> outgoing message
> >> but the peer cannot
> >> continue until it gets the message.
>

> If you mean that the upper layer protocol is message-oriented rather
> than stream-oriented ('datagram' is a Rorschach blot for me that says:
> UDP sorry) and the protocol is constructed such that outgoing
> message REQ(A) must have produced [a complete] answer message ANS(A)
> before the next outgoing message REQ(B) is sent over the wire, then
> you're in fancy land anyway, as that is not a class 101 scenario for
> TCP, which is by design stream-oriented.

Yes, the higher layers are message oriented. The protocol is not so
restricted as 1:1 request/response though. Several messages can be
sent without any response (dependent on message type). However
only whole messages can be sent and only whole messages can be decoded
by the receiver. Messages must also arrive in the order they were sent.

Thanks for your helpful post.