While testing Python's SSL support with OpenSSL >= 0.9.8m, we have
encountered a strange error return from SSL_shutdown on a non-blocking
socket (note: this is a different problem from the one described by
Victor Stinner in an earlier thread last month). Basically:
- SSL_shutdown(<ssl object>) returns -1
- SSL_get_error(<ssl object>, -1) returns SSL_ERROR_SYSCALL
- ERR_get_errno() returns 0
- errno is equal to 0
This situation was not hit before 0.9.8m. Our temptative workaround
right now (not yet committed, awaiting your insight :-)) is to detect
this particular situation and consider the call successful rather than
raise an exception.
What encouraged me in that workaround is that some LightHTTPd users have
encountered what looks like the same issue, also starting from 0.9.8m:
http://redmine.lighttpd.net/boards/2/topics/2779
« SSL_shutdown failed, SSL_get_error returned SSL_ERROR_SYSCALL,
but errno == 0 - I think there is something wrong with your ssl
lib. »
« Since I updated to openssl 0.9.8m I have noticed the same
error messages in my log. (using lighttpd 1.4.26 with the same
patch applied) »
I would welcome any explanations and suggestions concerning this
situation. Is it an OpenSSL bug? Or does this error return correspond to
an applicative error? (in which case, which error exactly, since the
return codes don't point to anything precise)
Thank you
Antoine Pitrou.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
User Support Mailing List openss...@openssl.org
Automated List Manager majo...@openssl.org
Would you please confirm to the list the name of the Python module, the
download site for it and the version you are currently working with.
This just helps up provide assistance to this same question in future.
Please read up on this recent thread. I do not know anything about
Python modules myself but I believe this user was also debugging a
similar issue.
http://www.mail-archive.com/openss...@openssl.org/msg60444.html
"Problems with SSL_shutdown() and non blocking socket" from "Victor
Stinner" on 12-Mar-2010.
Please collaborate with the official maintainers of the Python module so
that a fix is incorporated upstream ASAP.
If you have any further questions on the matter please direct them to
this list (openssl-users).
Thanks,
Darryl
Antoine Pitrou wrote:
> While testing Python's SSL support with OpenSSL >= 0.9.8m, we have
> encountered a strange error return from SSL_shutdown on a non-blocking
> socket (note: this is a different problem from the one described by
> Victor Stinner in an earlier thread last month). Basically:
>
> - SSL_shutdown(<ssl object>) returns -1
> - SSL_get_error(<ssl object>, -1) returns SSL_ERROR_SYSCALL
> - ERR_get_errno() returns 0
> - errno is equal to 0
>
> This situation was not hit before 0.9.8m. Our temptative workaround
> right now (not yet committed, awaiting your insight :-)) is to detect
> this particular situation and consider the call successful rather than
> raise an exception.
It depends what you mean by "consider the call successful". There are 2
normal non-error states for SSL_shutdown() API calls, returning 0 and
returning 1.
You should never consider a return of -1 to mean 1. Also a return of 1
is really the only value that indicates "success".
Then you have errors that are either recoverable (what I term
soft-errors) and non-recoverable (hard-errors).
But as the recent mailing list thread indicates (
http://www.mail-archive.com/openss...@openssl.org/msg60444.html )
you may consider the specific soft-error returns of -1/WANT_READ and
-1/WANT_WRITE to be successful as-if SSL_shutdown() had returned 0. If
you are happy with keeping the non-descriptive behavior of older OpenSSL
releases.
The SYS_ERROR_SYSCALL it probably because the underlying socket is no
longer functional (see the comment overs EPIPE / ZERO_RETURN from the
recent openssl-users list thread).
You must understand it is SSL_shutdown()'s job to - commence, advance
and confirm that a cryptographically secure two-way shutdown has
performed. This is its purpose in the world. If you are seeing
-1/ERROR_SYSCALL then that is a _CORRECT_ thing for it to return in
response to observing that state while trying to perform its mission.
What SSL_shutdown() is saying by returning -1/ERROR_SYSCALL is that a
cryptographically two-way shutdown of the stream was _NOT_ completed and
that it will probably not be able to ever be completed, probably due to
the fact the underlying socket died on us. This is a fact of life you
have to live with and deal with in your application now. The reason for
the "probably" items; is that I'm sure there are other reasons that can
cause it but practically most people will see this error indication at
this stage due to those factors.
So thinking that SSL_shutdown() was successful would be incorrect, on
the basis of my definition of the purpose of SSL_shutdown(). A
cryptographically secure shutdown was not completed, therfore
SSL_shutdown() was not successful.
I'm sorry that I've introduced this quasi-fuzziness into what was a nice
clean wonderland of the Python SSL module. But it is a reality than an
application should deal with and make up its own choice about.
Many applications don't care for a cryptographically secure shutdown of
the communication transport, since they might indicate their intention
to "QUIT" in the normal application payload data. The other end would
then send back a "Bye bye, quit response message" (in the normal
application payload data) and the server end goes into a state of never
accepting any further commands from the client after that. Over and
above all this, once each end has queued the last command/response data
in respect of the "QUIT" command processing, once that application
payload has successfully cleared the SSL_write() API call, that end can
immediately proceed to calling SSL_shutdown(). This will commence
proceedings in respect of a secure cryptographic shutdown, by denying
any further SSL_write() calls (from your side) and by sending an
end-of-stream indication packet to the other end. You then have to wait
(and hope) the other end sends their end-of-stream indication packet,
before you will see SSL_shutdown() return 1 on your side. Only once you
have both sent and received the end-of-stream indication packet will
SSL_shutdown() return 1.
Many client and server implementations just "hang up" on each other once
the QUIT command response has been processed. I would guess the issue
you are seeing with -1/ERROR_SYSCALL is due to this hanging up. But to
be a good well meaning TLS/SSL citizen both ends should continue their
non-blocking event loops for a reasonable amount of time (in the order
of 5 to TCP timeout seconds) even after the last SSL_write() has been
made. During this time both ends retry SSL_shutdown() over and over
until it returns 1 (each time they get a non-blocking wakeup indication).
So you have to stand back for a moment and examine Python's use of the
OpenSSL API and decide if you are trying to be 1:1 as much as possible
to support and pass on all the cryptographic guarantees that OpenSSL
makes or if you are trying to provide a simplified view of the world
that Noddy and Big-Ears could use. Or maybe both by creating a Python
specific API calls built on top of this understanding that irons out the
issue by providing easy to digest error returns that users might like.
If you are able to observe a -1 error state where you think that a 1
should have been returned that maybe considered as a new bug. i.e.
SSL_shutdown() should return 1 at least once (possibly to be
sticky/latched) once that point in proceedings has been passed
(regardless of the overall status of the underlying transport/socket).
I am interested in the issue of errno==0, this maybe indicative of the
real errno return being lost. OpenSSL should if necessary preserve the
first errno value it didn't expect to see, even if OpenSSL itself
continues to make kernel calls that could reset the value of errno to 0.
Maybe this situation can be simulated by being a bad citizen and forcing
a socket disconnection after one or both ends have called SSL_shutdown()
at least once. I must say my testing and applications are good citizens
so it may never have been noticed; also that I may have treated the
-1/ERROR_SYSCALL case as being "unrecoverable" once SSL_shutdown() has
been started and therefore never look to check if the errno!=0 (since I
don't care for the specific reason in my usage).
> What encouraged me in that workaround is that some LightHTTPd users have
> encountered what looks like the same issue, also starting from 0.9.8m:
> http://redmine.lighttpd.net/boards/2/topics/2779
>
> « SSL_shutdown failed, SSL_get_error returned SSL_ERROR_SYSCALL,
> but errno == 0 - I think there is something wrong with your ssl
> lib. »
>
> « Since I updated to openssl 0.9.8m I have noticed the same
> error messages in my log. (using lighttpd 1.4.26 with the same
> patch applied) »
>
> I would welcome any explanations and suggestions concerning this
> situation. Is it an OpenSSL bug? Or does this error return correspond to
> an applicative error? (in which case, which error exactly, since the
> return codes don't point to anything precise)
Well the simplified view of it is this (the exact errno reason isn't
important in the decision making process, since it does not change the
outcome).
I still think it is probably due to the state of the network socket
changing to being no longer operational BEFORE SSL_shutdown() could
complete the two-way cryptographic shutdown.
So as such this situation is unrecoverable.
So as such the correct course of action is to accept that SSL_shutdown()
did not complete and to deallocate SSL objects and to clean up your
sides affairs by doing such things as closing the the socket handle you
are holding.
I think you are correct to assert that an OpenSSL bug exists if you are
able to observe -1/ERROR_SYSCALL and errno==0.
But it is not a bug to observe -1/ERROR_SYSCALL from SSL_shutdown().
HTH
> Would you please confirm to the list the name of the Python module, the
> download site for it and the version you are currently working with.
> This just helps up provide assistance to this same question in future.
This is with Python trunk (from SVN). The error is easily witnessed when
running Lib/test/test_ftplib.py after building Python against OpenSSL
0.9.8m (or 1.0.0 in my case).
I suppose a 2.6.5 release version of Python would show similar problems,
except that TLS support for FTP is new in 2.7/trunk, which makes it less
easy to reproduce.
> Please read up on this recent thread. I do not know anything about
> Python modules myself but I believe this user was also debugging a
> similar issue.
>
> http://www.mail-archive.com/openss...@openssl.org/msg60444.html
Well, not exactly. This issue (ERROR_WANT_READ / ERROR_WANT_WRITE) can
be fixed in our test cases, by doing the shutdown correctly, and indeed
I've got a patch for that.
What I'm specifically interested in is SSL_ERROR_SYSCALL with errno==0.
These issues are tracked together at http://bugs.python.org/issue8108 ,
because they both appeared when someone tried OpenSSL 0.9.8m.
(there are also a couple of 1.0.0-specific issues which seem
negotation-related, and which we'll have to tackle separately)
> What SSL_shutdown() is saying by returning -1/ERROR_SYSCALL is that a
> cryptographically two-way shutdown of the stream was _NOT_ completed and
> that it will probably not be able to ever be completed, probably due to
> the fact the underlying socket died on us.
Ok, thanks for the clarification. We were a bit baffled by errno==0
(EPIPE, ECONNABORTED, EBADF... would have been much more helpful).
So, in any case, I can interpret an SSL_ERROR_SYSCALL return from
SSL_shutdown() as "the socket was closed more or less abruptly"
response? There are no other possible reasons for this error return?
> But to
> be a good well meaning TLS/SSL citizen both ends should continue their
> non-blocking event loops for a reasonable amount of time (in the order
> of 5 to TCP timeout seconds) even after the last SSL_write() has been
> made.
He, well. The interesting thing here is that we are testing a blocking
FTP TLS client with a non-blocking (event loop-based) server. The
blocking client can't really sleep() for 5 seconds when closing the FTP
session. At least I think users wouldn't like it :-)
Also, the client doesn't try to shutdown the SSL layer when closing its
connection. According to the client's author, this is contrary to the
RFC. In his own words:
ftplib.FTP_TLS class already calls unwrap() but only when
closing a "secured" *data* connection.
This is never done for the *control* connection as the examples
shown in RFC-4217 do that only when dealing with the CCC command
which is intended to switch the control connection back to clear
text.
Since ftplib.py does not implement the CCC command I would avoid
to override its close() method.
(if you have an opinion on this specific point -- no implicit SSL
shutdown when closing the FTP session --, I'd like to hear it. Although
it isn't really part of the issue at hand).
> So you have to stand back for a moment and examine Python's use of the
> OpenSSL API and decide if you are trying to be 1:1 as much as possible
> to support and pass on all the cryptographic guarantees that OpenSSL
> makes or if you are trying to provide a simplified view of the world
> that Noddy and Big-Ears could use.
Heh :)
We definitely want to provide a thin layer over OpenSSL, and not hide
any legitimate error conditions, as far as these conditions provide
useful feedback to the user (errno==0 being a bit disturbing :-)). Our
SSL support is definitely not newbie-proof.
(If you are curious, some precisions. There are actually two layers in
our abstraction:
- the "_ssl" extension module is a thin C wrapper
- the "ssl" library module provides a higher-level (but not that much)
socket-alike abstraction over it. Even "ssl", though, doesn't really try
to hide any strange conditions.
The one thing I think is worth abstracting (and part of my patches) is
when SSL_shutdown returns ERROR_WANT_{READ,WRITE} *and* the socket is in
*blocking* mode. In that case, shipping the select-and-retry loop as
part of the ssl abstraction, instead of having each user replicate the
boring logic, looks reasonable to me. What do you think?
ending the parenthesis here --> )
Thanks for the explanations.
Regards,
Antoine.
I have read through the discussion first I'd like to confirm the
scenario for the errno==0 situation through particular sequence of events.
I have an SSL protocol test-case creator that can manipulate both ends
OpenSSL API usage in a co-ordinated fashion, it should be straight
forward to cause an abrupt socket closure around/during SSL_shutdown()
usage.
> Ok, thanks for the clarification. We were a bit baffled by errno==0
> (EPIPE, ECONNABORTED, EBADF... would have been much more helpful).
I agree with this, it should return a more useful value.
> So, in any case, I can interpret an SSL_ERROR_SYSCALL return from
> SSL_shutdown() as "the socket was closed more or less abruptly"
> response? There are no other possible reasons for this error return?
This is the intention of the error indication. The presumption by me at
this time is to believe it, as no proof has been submitted otherwise.
Further investigation may alter this statement.
>> But to
>> be a good well meaning TLS/SSL citizen both ends should continue their
>> non-blocking event loops for a reasonable amount of time (in the order
>> of 5 to TCP timeout seconds) even after the last SSL_write() has been
>> made.
>
> He, well. The interesting thing here is that we are testing a blocking
> FTP TLS client with a non-blocking (event loop-based) server. The
> blocking client can't really sleep() for 5 seconds when closing the FTP
> session. At least I think users wouldn't like it :-)
>
> Also, the client doesn't try to shutdown the SSL layer when closing its
> connection. According to the client's author, this is contrary to the
> RFC. In his own words:
This is in sympathy with my claim. To reiterate, it is upto an
individual protocol/application to decide if it requires a secure
cryptographic shutdown or not. It is also upto the individual
protocol/application to decide the course of action to take when it
doesn't happen.
So if the protocol spec for "FTP TLS" makes a claim one way or the
other, that is a matter for that specification. Since the FTP protocol
has a clear "QUIT" command to mark the moment when the client has no
further use of the control connection, then there is actually no need to
perform a full SSL_shutdown() to make the system safe from attack. This
doesn't mean you shouldn't attempt to do SSL_shutdown().
>
> ftplib.FTP_TLS class already calls unwrap() but only when
> closing a "secured" *data* connection.
> This is never done for the *control* connection as the examples
> shown in RFC-4217 do that only when dealing with the CCC command
> which is intended to switch the control connection back to clear
> text.
> Since ftplib.py does not implement the CCC command I would avoid
> to override its close() method.
You need to be clear in your own mind what statements from the "FTP TLS"
specification are:
* mandating and
* what it is suggesting / recommending and
* also matters it doesn't indicate any opinion on
The fact that something ISN'T shown in an example should not be taken as
any kind of statement, it is just that; that specific example didn't
express that particular matter. Interpret only the rules that are
written as rules, anything else is open to interpretation.
You also need to go an read the original RFC first-hand and come to your
own interpretation. Then compare your interpretation to that of the
ftplib author's.
> (if you have an opinion on this specific point -- no implicit SSL
> shutdown when closing the FTP session --, I'd like to hear it. Although
> it isn't really part of the issue at hand).
You'd need to educate me in the specific of "FTP TLS" protocol. I am
very experienced with all the details of the classic "FTP" protocol.
Does "FTP TLS" :
* does it make use of 2 sockets like FTP ?
* are both sockets encrypted with TLS (at all times before any
transaction starts) ?
* is the ftp-data socket opened/closed once for each file like FTP ?
* is the payload data inside the ftp-data socket just the exact number
of bytes in the single file being transfered ?
So in interests of trying to convey better understanding of the TLS
shutdown issue please read the following claims and attempt to
understand the goals behind each claim rather than the specific detail
(in respect of FTP TLS, since I do not fully understand every detail of
FTP TLS at this time).
Things to consider:
* Any unencrypted channel falls outside the scope of TLS (and thus any
points made right below).
* If the encrypted command channel has a "QUIT" command and the
specification (or defacto default implementation) requires that the
channel after receving such a command write's back a single response and
then stops processing any further commands. It can be said that you
already have an in-band shutdown process and SSL_shutdown() provides no
additional benefit to your application.
* "FTP TLS" is transactional in the sense that an individual file
operation is a single unit-of-work (1 transaction). Therefore if
tampering with the TLS stream is detected at most your rollback would
then attempt to rollback the transaction you are currently on. No
previously completed transactions would be affected.
* Does the specification talk about what to do in the case of a
protocol error? I use the parallel of "transactions" to describe this
predicament. It mainly affects stuff being written (transactions with
persistent side-effects). i.e. The rollback strategy is: If the new
file didn't exist before, delete it, if the new file is being appended
too then truncate it back to the old length, etc... Single operation
commands like "Make A Directory" are begun and committed before its
response is returned. A command response is not part of the
transaction, just an advice about transaction status.
* If the ftp-data stream works just like Classic (non-TLS) FTP
protocol, then one connection per-file with the entire data contents of
the connection being exactly the data in the file (there are no in-band
start and end markers). Then in this situation you MUST make use and
check the SSL_shutdown() returns 1 at both ends before you consider the
file data contents to be valid and commit the transaction. In this
situation there is no in-band end-of-file market, its implied from the
end of the network socket stream. This is just the situation
SSL_shutdown() provides cryptographic guarantees over.
Now to talk in respect of SSL_shutdown() more specifically:
* Since SSL_shutdown() is part of the SSL protocol and since
implementing it doesn't contradict any other part of the FTP_TLS
protocol, and where it isn't a required part of the FTP_TLS protocol,
then a BEST EFFORT attempt should be made to use/implement it.
* A BEST EFFORT attempt does not mean you are required to enforce any
kind of extra delay purely for the purpose of implementing a complete
SSL_shutdown() sequence. BEST EFFORT might mean you call SSL_shutdown()
which will attempt to write out to the socket the end-of-stream notify
packet at least once. If it fails; it fails, you tried!
* A client SHOULD attempt to receive the "QUIT command response" (or
wait for server instigated socket disconnection) before indicating to
the user that it has finished being a client.
* A server MUST ensure it sends the "QUIT command response" with the
same amount of effort as it would any other kind of response. That is
while the socket remains open it will be persistent with flushing the
data out the socket.
* A server SHOULD (after making its last successful SSL_write() to
send the "QUIT command response") immediately call SSL_shutdown(). Note
- which MAY return 1 immediately.
* Both client and server if using non-standard OpenSSL BIO layers
should ensure that during a QUIT command/response those layers are
actively flushed downwards into the kernel, BEFORE the socket descriptor
goes under consideration to be close() at kernel level. Notes - This
reinforces the point that you must ensure a data flush down the stack
from application -> OpenSSL -> BIO -> Kernel BEFORE you close the
socket. Only once all data has been written to the socket do you
consider when to close().
* Both client and server after they call SSL_shutdown() and it returns
the specific value of 0 (or 1) then that side MAY call shutdown(fd,
SHUT_WR) on the socket. Notes - You are not guaranteed SSL_shutdown()
will always return 0 on the first call, even if you observe that to be
the case.
* A server SHOULD implement the SSL_shutdown() wait loop even after it
has written its last byte to the socket. Consider this to be a STRONG
BEST EFFORT (i.e. actually code it ! ha ha). Notes - A server more so
than a client should implement a wait loop. A server is designed so to
be hanging around for work to do, a server is usually capable of
handling multiple client simultaneously, a server is usually a
non-interactive application. This is the logic on "more so".
[Errors and Omissions Exempt.]
Now in respect of implementing a "FTP Client Access Library", then you
should consider your "ftpcli.quit()" method to have 4 return states to
provide back to the caller:
* ERROR before "QUIT" committed (terminal state 1)
* "QUIT" committed, ERROR before response (terminal state 2)
* "QUIT" sent, "QUIT" response received. (terminal state 3)
* "QUIT" sent, "QUIT" response received, ERROR before TLS shutdown
complete. (terminal state 4)
* "QUIT" sent, "QUIT" response received, TLS shutdown completed.
(terminal state 5)
The term committed means you got the data flushed into the kernel. So
therefore the data was committed into the kernel layer and part the
point of no return.
If Python has an "exception" system, then I would suggest you consider
only the first case to raise an exception. The other three are
indicated in soft-error returns. The logic in this is that you should
raise exceptions for instruction that you failed to be execute on behalf
of the caller.
Most users might just choose to IGNORE the return status of
"ftpcli.quit()" because they are also acting in a best-efforts kind of
way by sending a quit command in the first place. Since the course of
action the client will take after the ftpcli.quit() method return is the
same, regardless of its error state.
You might also like to provide an argument to the quit() command to
indicate a maximum waiting time. This would be applied to the "waiting
for QUIT response" aspect, as well as the "waiting for TLS shutdown
complete" aspect. You might like to consider a value where zero
milliseconds of wait can be indicated for impatient client users. You
might also like to consider a value to mean an INFINITE wait. This
would also mean you need additional states to indicate:
* "QUIT" committed, no-error, waiting for response (interim state 1.5)
* "QUIT" committed, "QUIT" response received, no-error, waiting for
TLS shutdown complete (interim state 3.5)
In order to implement an assured max-wait time then you might need to
change a socket that was blocked into non-blocking mode, and then put it
back to blocking before returning from the ftpcli method.
You might also like to make your "ftpcli.quit()" method restartable.
That is make it valid for a client to call it multiple times, the ftpcli
library will track the state and not resend the QUIT command, or not
expect to see a quit response, etc... You might also like to convert a
previous error state (state 2, state 4) into an exception raising event
if the ftpcli user calls quit() method again, after already having been
told an error occurred via soft-error return value on a previous invocation.
You might want to enforce that the ftpcli.quit() will never wait for
"TLS shutdown" on the first invocation. This means a ftpcli users who
wants to do that must call it again (potentially with a new timeout
value) in the hope the return status changes from state 3.5 to state 5
(in my list) in that time. As an after thought to this, if the socket
is already non-blocking the first invocation of ftpcli.quit() might like
to attempt a one-shot non-blocking test of SSL_shutdown() to see if it
would/can complete (right after it received the "QUIT response message")
but before returning for the first time. What I'm trying to emphasis is
give the ftpcli API user the control over tho two waits.
Putting all these things together allows the ftpcli API users to decide
what they want, allows fast users to get what they want, allows fully
compliant users to get what they want, allows the shutdown blocking
timeouts to be finely controlled.
I am somewhat practical about matters, your FTP Client Access Library
should seek to provide:
* the ability for someone to use the FTP protocol "by the book" and do
everything possible.
* the ability for users to gain performance by cutting corner on stuff
that is unimportant to them (anything after sending the QUIT command
maybe unimportant).
A healthy balance of the two makes for a good API that everyone can
like. With an API you always have to consider how an API is used, the
ethos of the language/paradigm and always try to make the API easier to
use by providing the complex/difficult stuff.
Here is my attempt at using my own API and being a fully compliant
citizen, allowing up ~30 seconds to stuff to happen:
#define 5000_MILLISECONDS 5000
rc = ftpcli.quit(5000_MILLISECONDS); // Send command, maybe we get
response too
for(int i = 0; i < 5; i++) {
if(ftpcli.quit_is_terminal_status(rc) == TRUE)
break; // No more progress can be made
rc = ftpcli.quit(5000_MILLISECONDS); // Wait for response and SSL
shutdown
}
// Examine rc status now. we tried to push the progress as much as possible
Here is my attempt at using my own API and cutting corners:
rc = ftpcli.quit(); // No argument implies a system default or historic
compatibly timeout value is used, it will commence the SSL_shutdown()
but will-never/may-never be able confirm completion of that.
You stick both examples in the documentation, this help reassure simple
users that their use is valid too.
Everybody wins at the expense of the ftpcli maintainer(s).
> The one thing I think is worth abstracting (and part of my patches) is
> when SSL_shutdown returns ERROR_WANT_{READ,WRITE} *and* the socket is in
> *blocking* mode. In that case, shipping the select-and-retry loop as
> part of the ssl abstraction, instead of having each user replicate the
> boring logic, looks reasonable to me. What do you think?
This should not be an issue since if the socket is in blocking mode they
will never return EAGAIN (in the case of reads) and
EAGAIN/partial-writes (in the case of writes).
So -1/WANT_READ and -1/WANT_WRITE soft-error returns are facets of
non-blocking socket usage.
Certainly classic BSD socket interpretation of blocking and non-blocking
mode makes my above comments true.
Darryl
I have investigated this issue of -1/SSL_ERROR_SYSCALL with errno==0.
From the SSL_get_error(3) man page:
SSL_ERROR_SYSCALL
Some I/O error occurred. The OpenSSL error queue may contain more
information on the error. If the error queue is empty (i.e.
ERR_get_error() returns 0), ret can be used to find out more about the
error: If ret == 0, an EOF was observed that violates the protocol. If
ret == -1, the underlying BIO reported an I/O error (for socket I/O on
Unix systems, consult errno for details).
Note the use of "may contain more information" there is no guarantee.
Note the confirmation that ret==0 for the specific condition of EOF (on
the BIO, i.e. on the socket, it violates the protocol because the
protocol expects to receive a shutdown notify packet, which would have
been caused by the far end calling SSL_shutdown() at least once). You
have used the term "errno" where really OpenSSL talks in terms of the
error codes off the error stack.
I also note the man page doesn't include SSL_shutdown() in the very
specific list of calls that SSL_get_error() is used in sympathy with.
However it was my intention to bring SSL_shutdown() into line so that
man page should also be updated to include SSL_shutdown().
My claim is that the other end did a close() on the socket, while you
were trying/sending/waiting-for the two-way SSL shutdown process to
complete.
This would be observed as an end-of-file condition, i.e. read() returns 0.
This is considered a "SSL3/TLS1 protocol violation" because the protocol
expects all users to always make use of the cryptographically secure
two-stream shutdown all the time.
I have then taken a look at Python from CVS and see that:
./Modules/_ssl.c function PySSL_SetError() does attempt to handle
SSL_ERROR_SYSCALL as per the documentation. Whoever wrote that did read
the man page.
While I agree with the sentiment that having the exact errno saved and
available for inspection/recall by the application using OpenSSL would
be very useful. I don't agree that SSL_shutdown() is acting against the
existing documentation.
Unfortunately I am not sure myself how errno values from read/write or
recv/send calls get onto the OpenSSL error stack. Auditing the source
reveals very few places where get_last_socket_error() is called in
relation to normal recv/send IO operations. So I'm almost able to say
it is not possible to retrieve errno values for anything other than the
connect setup phase (where a variety of kinds of error can occur,
ECONNREFUSED, ETIMEDOUT, ENETUNREACH, ... check out the connect(2) man
page). This is also the stage most people have problems and therefore
historically have required the most detail about the problem to resolve it.
There is however one mystery of how EPIPE from a write() is getting
propagated back in Python, I can only think that Python's custom BIO is
providing this information. As I can't see how OpenSSL's own socket BIO
implementation does that. The strangeness of printing errno==0 out as a
reason for an error is actually leaning towards a facet of Python's BIO
layer and BIO error handling.
> I have investigated this issue of -1/SSL_ERROR_SYSCALL with errno==0.
>
>
> From the SSL_get_error(3) man page:
>
> SSL_ERROR_SYSCALL
> Some I/O error occurred. The OpenSSL error queue may contain more
> information on the error. If the error queue is empty (i.e.
> ERR_get_error() returns 0), ret can be used to find out more about the
> error: If ret == 0, an EOF was observed that violates the protocol. If
> ret == -1, the underlying BIO reported an I/O error (for socket I/O on
> Unix systems, consult errno for details).
Well, in our case, and unless I'm mistaken,
ret == -1, ERR_get_error() == 0 and then errno (the Unix errno) == 0.
Perhaps errno gets cleared by another operation... I may try to
investigate if I get some time.
Regards
Antoine.
SSL_shutdown() by virtue of its unique mechanic you will not see "ret ==
0" (in the way the SSL_get_error man page describes) since that has a
different and special meaning. It means the first point that
((SSL_get_shutdown() & SSL_SENT_SHUTDOWN) == SSL_SENT_SHUTDOWN) would be
true.
Unlike for example SSL_read() which can return 0, which does mean EOF.
For which you can then do ((SSL_get_shutdown() & SSL_RECEIVED_SHUTDOWN)
== SSL_RECEIVED_SHUTDOWN) to find out if it was a "secure EOF".
=== RANT MODE ====
If the OpenSSL SSL_shutdown() API could have been made better this is
certainly one area that could be better. i.e. make SSL_shutdown()
return the current state like SSL_get_shutdown() does (which means
non-zero states). Then reuse the return of 0 state to mean EOF on
transport and keep -1/WANT_READ/WANT_WRITE/ERROR_SYSCALL as-is.
This would mean (simplified understanding) :
* old version returned 0, new version returns 1 (SSL_SENT_SHUTDOWN).
* old version returned 1, new version returns 3
(SSL_SENT_SHUTDOWN|SSL_RECEIVED_SHUTDOWN).
Unfortunately this would have broken historical compatibility; it took
quite a while to get the minimum breakage patch in to achieve my goals
by the end of that time thinking about improving OpenSSL (rather than
bug fixing it) was long out of my mind.
I'm all for breaking APIs to make things better, providing its done in a
responsible way. A poorly thought out API call can't hog a popular API
symbol forever, otherwise the whole product starts to weaken.
=== RANT MODE ====
> Perhaps errno gets cleared by another operation... I may try to
> investigate if I get some time.
Well now I've looked at the Python Module/_ssl.c to understand the
context of your usage, you are using standard stuff for BIO.
I know that errno==0 is getting set by OpenSSL before it makes the
read() system call (openssl-1.0.0/crypto/bio/bss_fd.c:150 function
fd_read() calls clear_sys_error() which does "errno=0;" from
openssl-1.0.0/e_os.h).
Then (I presume) it gets a read()==0 from kernel (bss_fd.c:151). Of
course a read()==0 does not modify errno in libc.
So in openssl-1.0.0/ssl/s3_lib.c:3191 inside the SSL_shutdown()
implementation you can see the error return is ignored. Since returning
0 from here has a different documented meaning.
I think this is the sequence of events you observe.
Unfortunately I can't confirm it to be so since I can't get the test
cases to run from Python's SVN.
Darryl