18 For with much wisdom comes much sorrow;
the more knowledge, the more grief.
I checked OpenSSL 1.0.0d release and the same comment is there "does not happen". If it does not happen why is it being checked?
132 int i,off,newb;133
134 if (!extend)[...]
166 /* avoid buffer overflow */167 int max_max = s->s3->rbuf.len - s->packet_length;168 if (max > max_max)169 max = max_max;170 }171 if (n > max) /* does not happen */172 {
173 SSLerr(SSL_F_SSL3_READ_N,ERR_R_INTERNAL_ERROR);
174 return -1;175 }
From: owner-op...@openssl.org [mailto:owner-op...@openssl.org] On Behalf Of Ger Hobbelt
Sent: Tuesday, 10 May, 2011 07:06
On Mon, May 9, 2011 at 9:15 PM, Michael Gale <gale.m...@gmail.com> wrote:
I checked OpenSSL 1.0.0d release and the same comment is there "does not happen". If it does not happen why is it being checked?
Might be more correctly stated as '/* _should_ not happen */' - see the code: it's a basic sanity check to ensure the BIO_read() doesn't overrun the rx buffer, which would cause arbitrary memory corruption if the BIO_read() were allowed to do that. The error is keeping you from worse (and even harder to debug) things happening to you.
<snip>
I haven't seen this error before, but what might help you in improving diagnosis, assuming you can build your own openSSL (+ python glue code ?), is adding a bit of info to the error report plus activating the openssl assertions in the build. Then you can add these lines (typed off the top of my head, so reckon with the compiler yakking due to typos by me) to see where the issue 'starts' within the function:
<snip>
171 if (n > max) /* does not happen */172 {
char errbuf[120]; /* and here, it might be useful to have a look at the rbuf, as by now we 'know' this will only fire when max was reduced above */
173 SSLerr(SSL_F_SSL3_READ_N,ERR_R_INTERNAL_ERROR);sprintf(errbuf, ' @ line %d in ssl3_read_n: n = %d, max = %d, rbuf.len = %d, packet_length = %d", n, max, (int)s->s3->rbuf.len, (int)s->packet_length);ERR_add_error_data(1, errbuf);
On Mon, May 9, 2011 at 9:15 PM, Michael Gale <gale.m...@gmail.com> wrote:I checked OpenSSL 1.0.0d release and the same comment is there "does not happen". If it does not happen why is it being checked?
Might be more correctly stated as '/* _should_ not happen */' - see the code: it's a basic sanity check to ensure the BIO_read() doesn't overrun the rx buffer, which would cause arbitrary memory corruption if the BIO_read() were allowed to do that. The error is keeping you from worse (and even harder to debug) things happening to you.
It is unclear (assuming the stack trace you provided only lists the python side of things) who invoked ssl3_read_n() and where the error originates: it may either be an input argument error to ssl3_read_n(): ..., int n, int max, ... where n>max from the start of the function, or due to a 'max' fixup inside ssl3_read_n() combined with yet unknown context conditions.
Incidentally, did you grep whether ssl3_read_n() is invoked directly from the Python glue (just to be sure only the 'usual suspects' inside openSSL itself may invoke ssl3_read_n(); it is marked as an internal-use-only function (prototype sits in ssl_locl.h)
I haven't seen this error before, but what might help you in improving diagnosis, assuming you can build your own openSSL (+ python glue code ?), is adding a bit of info to the error report plus activating the openssl assertions in the build. Then you can add these lines (typed off the top of my head, so reckon with the compiler yakking due to typos by me) to see where the issue 'starts' within the function:
--snip-- Whole function below from s3_pkt.c
132 int i,off,newb;133
OPENSSL_assert(n <= max); /* turn on assertions in build to have this one fire on incorrect input */
alternative code, which might be more useful and doesn't require assertions enabled to fire anyway when this 'should not happen' occasion happens anyway:
if (n > max) /* should not happen */{
char errbuf[80];SSLerr(SSL_F_SSL3_READ_N,ERR_R_INTERNAL_ERROR);sprintf(errbuf, ' @ start of ssl3_read_n: n = %d, max = %d", n, max);
ERR_add_error_data(1, errbuf);return -1;}
The ERR_add_error_data() call accepts a series of strings which are appended as extra data to the error report; very handy to transport debug info out of openSSL. Note that it must be invoked AFTER the SSLerr() pushed the error on the error stack.
134 if (!extend)[...]
165 {
166 /* avoid buffer overflow */167 int max_max = s->s3->rbuf.len - s->packet_length;168 if (max > max_max)169 max = max_max;170 }
171 if (n > max) /* does not happen */
172 {
char errbuf[120]; /* and here, it might be useful to have a look at the rbuf, as by now we 'know' this will only fire when max was reduced above */
173 SSLerr(SSL_F_SSL3_READ_N,ERR_R_INTERNAL_ERROR);sprintf(errbuf, ' @ line %d in ssl3_read_n: n = %d, max = %d, rbuf.len = %d, packet_length = %d", n, max, (int)s->s3->rbuf.len, (int)s->packet_length);
ERR_add_error_data(1, errbuf);
174 return -1;175 }
--
Met vriendelijke groeten / Best regards,
Ger Hobbelt
--------------------------------------------------
web: http://www.hobbelt.com/
http://www.hebbut.net/
mail: g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------
Hey,First, thanks for the reply and feedback.The section of the python code that invokes the function call is below, I highlighted line 138 which is where the exception is occurring according to the stack trace.--snip-- Python-2.7/lib/python2.7/ssl.py132 def read(self, len=1024):133134 """Read up to LEN bytes and return them.135 Return zero-length string on EOF."""136137 try:138 return self._sslobj.read(len)139 except SSLError, x:140 if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs:141 return ''142 else:143 raise--snip--I traced this to Python-2.7/Modules/_ssl.c, then to openssl-0.9.8b/ssl/ssl_lib.c where it calls SSL_read. So it does not look like it is calling the internal method.
Nothing glaringly obvious to me in any of the code snippets. :-( (And, Dave, thanks for catching f.u. where I was missing the __LINE__ in my code!)
It's past 0200 hours here so I'd better get some shut-eye, but here's a few thoughts to ponder:
- you mentioned 'satellite link'. Given the wickedness of the issue, a few baseline checks to get our assumptions straight: does your QA rig simulate the 'long fat pipe' that's usual for satellite comms? (or does it not?) (long fat pipe ~ long round trip delay, high RX/TX rate, but I've also seen extremely high RX/TX ratios (way back when: both sides had different uplink bandwidth then; kinda like ADSL but much worse). All this /should/'ve been handled at the TCP level and not affect any upper layers (like SSL/sockets), but better ask now then ride on an incorrect assumption.
- the above implies the assumption that you're using SSL over TCP, not DTLS (which, IIRC, is UDP-based. Correct me if I'm wrong on that one.)
- given a grep through the code, all I can find as 'probable culprits' are internal rbuf (read buffer) related and nothing that can be more or less directly be tickled by SSL_write/read calls. Mix that with my own experience of no trouble whatsoever in the RX/TX buffer dept. for years plus an error which hints at an impending buffer overrun happening at yours (and then, only in production. Nasty bugger to tackle) my next bet it's something related to the 'satellite comms behaviour' as visible at the socket I/F level. (I was also thinking about packet size and thus ever-so-slightly altered recv/send behaviour, but I've been using OpenSSL over PPP, raw serial, Ethernets and a few network oddities and never encountered this. :-S )
- You said you had to 'rollback': is that to a state where you use an older OpenSSL stack, a different SSL stack or no SSL at all? (Just another baseline check here.)
- the only thing that I can see which /might/ have impact (and this is mere conjecture! But we're apparently looking at some sort of obscure edge case, so options are open in my mind) is when both ends differ in their OpenSSL builds, where one has OpenSSL compiled _with_ compression enabled, and the other has not (OPENSSL_NO_COMP).
Line of thought goes like this: the thing that possibly maybe might screw you up is when the other side sends packets which don't somehow fit in the RX buffer, the SSL stack keeps on asking ssl3_read_n() to fetch ever more, and the SSL packet is not caught as overlarge in other parts of the code (which would be odd by itself, but let's continue). So the question is then: how is the RX buffer dimensioned? It's set up in either ssl3_setup_buffers() or ssl3_setup_read_buffer() and space depends on the SSL3_RT_MAX_PACKET_SIZE define. Which, when followed through, is only really dependent on either the OPENSSL_NO_COMP setting or someone seriously screwing around in the OpenSSL code internals. So I opt for the 'no_comp' setting being a 'possibly maybe', however improbable it may be. (because I've used that mix before and no trouble for me; besides, one can expect both flavors to exist in the wild so why is this a first then?)
Another thing that affects packet size in OpenSSL is the SSL_OP_MICROSOFT_BIG_SSLV3_BUFFER option, which you can set, for example, using the SSL_ctrl(s, SSL_CTRL_OPTIONS, SSL_OP_MICROSOFT_BIG_SSLV3_BUFFER, NULL) call (or the eqv. for the SSL_CTX: SSL->options are copied from the SSL_CTX). When one side has the option and the other has not, it just might..... (again, with low probability, but these are the ways I can see that things /could/ go wrong as you described. (Without a system where this can be reproduced, we're down to 'intelligent' guesswork and a couple of straws.)
Cheers and good night!
--Met vriendelijke groeten / Best regards,
Ger Hobbelt
--------------------------------------------------
web: http://www.hobbelt.com/
http://www.hebbut.net/
mail: g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------