On putting the test suite in order

saaj

unread,

Jan 30, 2015, 7:16:31 AM1/30/15

to cherryp...@googlegroups.com

Hi Sylvain, hi folks,

I forked the repo and started to work on making it pass the suite on the discussed environments (py26, py27, py33, py34, pypy). Here are issues I've discovered:

Interactive switch is broken -- remember back in Soviet Russia story? ;-) Google suggests there was ``--dumb`` switch at pytest times. Now it's always on, and doesn't hang only because ``old_settings = termios.tcgetattr(fd)`` throws some terminal exception in ``webtest.getchar``. I'll turn ``interactive = False`` because I think much less important than stable test suite and ability to run it. If it goes upstream I'll open a bug for making a command line switch to explicitly turn it on, if someone will need the feature. It may be useful in some specific debugging case, but I'm not interested in implementing it.
tox.ini didn't have memcache dependency. Because of it it seems to me completely untested on py3. There're byte/string exceptions.
Session tests used ``repr`` for for dict representation of session data, which is key-order indeterministic.
Tests, in ``test_session``, depended on each other to run in order. Bad idea. Though it wasn't hard to fix.
Concurrency tests fail on counters off by one or two (I'm on 4 virtual core laptop). They fail more on py3, probably because of new GIL.
I can't seem to figure out when ``test_http.test_http_over_https`` should run. It's always skipped.
I've strong intention to turning it all back to stdlib unittest. We only need unittest2 for py26. I've issues with py3 like this, where nose runner raises strings which is not fine on py3. Also the lesser dependency the better.
PyPy suite doesn't even starts. It freezes on first test. I'm yet to look at the issue.

I am mostly concerned about #7 now.

Regards,

saaj.

saaj

unread,

Jan 30, 2015, 7:27:22 AM1/30/15

to cherryp...@googlegroups.com

Do we really need ``PostgresqlSession``? It lacks tests. And take a look at the thing:

def setup(cls, **kwargs):

"""Set up the storage system for Postgres-based sessions.

This should only be called once per process; this will be done

automatically when using sessions.init (as the built-in Tool does).

"""

for k, v in kwargs.items():

setattr(cls, k, v)

self.db = self.get_db()

setup = classmethod(setup)

I suspect no one is actually using it because of this.

Sylvain Hellegouarch

unread,

Jan 30, 2015, 7:52:50 AM1/30/15

to cherryp...@googlegroups.com

Hi saaj,

2015-01-30 13:16 GMT+01:00 saaj <i...@saaj.me>:

Hi Sylvain, hi folks,

I forked the repo and started to work on making it pass the suite on the discussed environments (py26, py27, py33, py34, pypy). Here are issues I've discovered:
Interactive switch is broken -- remember back in Soviet Russia story? ;-) Google suggests there was ``--dumb`` switch at pytest times. Now it's always on, and doesn't hang only because ``old_settings = termios.tcgetattr(fd)`` throws some terminal exception in ``webtest.getchar``. I'll turn ``interactive = False`` because I think much less important than stable test suite and ability to run it. If it goes upstream I'll open a bug for making a command line switch to explicitly turn it on, if someone will need the feature. It may be useful in some specific debugging case, but I'm not interested in implementing it.

I quite agree. I think it made a lot of sense when CP was under heavy development because it made life easier. These days, not so much.

tox.ini didn't have memcache dependency. Because of it it seems to me completely untested on py3. There're byte/string exceptions.
Session tests used ``repr`` for for dict representation of session data, which is key-order indeterministic.

I haven't looked at those tests but it sounds weird that we would expect ordering from a dict.

Tests, in ``test_session``, depended on each other to run in order. Bad idea. Though it wasn't hard to fix.
Concurrency tests fail on counters off by one or two (I'm on 4 virtual core laptop). They fail more on py3, probably because of new GIL.
I can't seem to figure out when ``test_http.test_http_over_https`` should run. It's always skipped.

Probbaly a case of dependency issue, whether or not you have pyopenssl installed perhaps?

I've strong intention to turning it all back to stdlib unittest. We only need unittest2 for py26. I've issues with py3 like this, where nose runner raises strings which is not fine on py3. Also the lesser dependency the better.

I was never a big fan of nose-only either. Even when I use py.test to execute my unit tests, I still write them using the good ol' unittest interface. Depends on how much work this would impose?

PyPy suite doesn't even starts. It freezes on first test. I'm yet to look at the issue.

Not to much of an issue straight away. We do a lot of introspection and private module/class handling so I'm not surprised.

Thanks for the effort!

--

- Sylvain
http://www.defuze.org
http://twitter.com/lawouach

Sylvain Hellegouarch

unread,

Jan 30, 2015, 7:53:33 AM1/30/15

to cherryp...@googlegroups.com

I'd personally remove all database backend as I think it's entirely broken anyway.

saaj

unread,

Jan 30, 2015, 11:55:42 AM1/30/15

to cherryp...@googlegroups.com

I can't seem to figure out when ``test_http.test_http_over_https`` should run. It's always skipped.

Probbaly a case of dependency issue, whether or not you have pyopenssl installed perhaps?

Aha, another missing test dependency. Though, as I thought, installing pyopenssl doesn't change a thing. The test is still skipped. ``test.helper.get_tst_config`` has fixed http schema. What have I overlooked?

I've strong intention to turning it all back to stdlib unittest. We only need unittest2 for py26. I've issues with py3 like this, where nose runner raises strings which is not fine on py3. Also the lesser dependency the better.

I was never a big fan of nose-only either. Even when I use py.test to execute my unit tests, I still write them using the good ol' unittest interface. Depends on how much work this would impose?

I guess it made sense for older Python version. ``unittest`` from Python 2.7+ is good. I don't know the amount of work that is required, but as it's possible I'll look at it. Another point here is backward compatibility for user tests. I mean these ``setup_class`` and ``teardown_class``. We can do something like this, i.e. calling them from normal ``setUp`` and ``tearDown`` if they exist. Also it'll be a good idea to have end-user base class for web tests that doesn't get auto-enhancements like ``test_gc`` in ``CPWebCase``, or at least to have a switch for it.

saaj

unread,

Feb 24, 2015, 1:56:11 PM2/24/15

to cherryp...@googlegroups.com, s...@defuze.org, jar...@jaraco.com

Hi Sylvain, Jason, folks,

Today I had time to work on the subject again. Here is a status update, questions, suggestions and plans. The following changes are in my fork.

As there was no one who could explain the nature of the race condition with removing lock files in the sibling topic, I assume the underlying library is deliberately avoids removal API, which is deemed unsafe. I changed sessions.FileSession behaviour is it only removes expired or obsolete locks in clean up monitor thread. I also added the test case from the topic in test.test_lockfile. Now session concurrency, at least in the confines of implemented tests, seems okay.
As there also was no insight on HTTPS test runs and test.helper.get_tst_config I assume it was intended for manual change. So some machinery is required to run it twice, for HTTP and second time for HTTPS, automatically. But before, I hope to do something with nosetests, see #4.
Latest stable PyPy (2.4.0) doesn't even start with the simplest CherryPy demo (some CPython socket non-spec internals which CherryPy relies upon) and as I already commented, the test suite freezes on start. So I removed it for now from setup.py for now. I think such unfounded support claims are dangerous, and once there's someone who want tackle PyPy support I hope he can benefit from established QA infrastructure which is a priority now.
nosetests made some more issues, with multiprocessing, so I'm strongly looking forward replacing it with standard unittest.

So #2 and #4 are in my plans. Here's CI build #12 and some observations.

There are sporadic test_gc fails.
Python 3 test.test_conn.PipelineTests.test_HTTP11_pipelining sporadic fails.
test.test_states.SignalHandlingTests.test_SIGHUP_tty and test.test_states.ServerStateTests.test_2_KeyboardInterrupt froze the build. Both are currently commented. I guess it may have something to do with non-interactive mode or Docker which is used by drone.io. Sylvain, I saw you recently wrote about Docker and CherryPy. Can you verify you can run the test suite inside a Docker container. Also it may be something related to nose output buffering. Has someone an idea why they can block?
There's is a lot of Sphinx warnings and errors. As it is relatively straightforward, so is there someone in CherryPy community willing to take care of them?
Test code coverage isn't fantastic, 1/3 isn't covered. wsgiserver package may need some refactoring (now sure how meaningful are these maintenance metrics).

This is it at the moment, and I'm looking forward to your replies.

Regards,

saaj.

Joseph S. Tate

unread,

Feb 24, 2015, 2:18:31 PM2/24/15

to cherryp...@googlegroups.com, s...@defuze.org, jar...@jaraco.com

The sporadic test_gc fails are well known. Don't know about the rest.

> --
> You received this message because you are subscribed to the Google Groups
> "cherrypy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cherrypy-user...@googlegroups.com.
> To post to this group, send email to cherryp...@googlegroups.com.
> Visit this group at http://groups.google.com/group/cherrypy-users.
> For more options, visit https://groups.google.com/d/optout.

--
Joseph Tate

saaj

unread,

Feb 25, 2015, 4:39:50 AM2/25/15

to cherryp...@googlegroups.com

Joseph, then could you please expand it at a little if it's something well-know? Is it explained or discussed somewhere?

Google only points to #1144 which I wrote about and agree with.

Joseph S. Tate

unread,

Feb 28, 2015, 12:12:07 AM2/28/15

to cherryp...@googlegroups.com

Eons ago, when I ported the test suite from a home-built test runner
to nose, I got test_gc failures periodically; it's a non-deterministic
test, or at least the determinism is sufficiently complicated that no
matter how many times we call gc.collect(), there's often something
that isn't collected. Improvements would be welcome, but we're
sufficiently used to the flappiness that we haven't wanted to spend
any more time or energy on it.

If you get a failure, run it again. If it passes, you're good. If you
still have failures, you're leaking objects, so fix your code.

saaj

unread,

Mar 5, 2015, 8:08:02 AM3/5/15

to cherryp...@googlegroups.com

That was a methodological mistake. Regression tests exist to, not surprisingly, detect regressions.

Before a change is incorporated in a codebase, tests should pass. If they fail after it, then there's a

regression which should be fixed. If they still pass, then it's all right. If regression tests don't

deterministically detect a regression, require additional investigation to decide whether it really was a

pass or fail, then they don't really serve its purpose. It's not even a regression testing then but some

memory leak sampling aid. This is even more aggravated in case of many contributors and many

supported Python versions and implementations.

Also now I know how to blame for the lost flag to run the test suite on SSL ;-) However, in retrospective

in line with unreliable test suite and lost automated SSL testing we have #1263, #1298 (and

probably some others) and, as the likely outcome, it becomes a common knowledge that CherryPy SSL is broken

and user should tradeoff old and new bugs by choosing versions. This is deadly to the framework, as such

knowledge leads to the only answer for a CherryPy newcomer on adoption question. So I do hope that we,

as a community, will take the lesson learned. Maybe eons ago it wasn't that important, but now it's crucial

to have reliable, well-covering and automated tests.

A note about a consequent calling gc.collect. Reference counting GC (CPython), as far as I know, is pretty

deterministic. It doesn't make sense calling gc.collect several times in row. I think, the source of

non-determinism is threaded execution. I'll take a look at gctools to see if I can do something with it.

Anyhow it should be fixed in one way or another, or be replaced by separate deterministic memory leak

test case.

saaj

unread,

Mar 5, 2015, 9:12:38 AM3/5/15

to cherryp...@googlegroups.com

Hi Sylvain, folks,

Another status update

Here's codecov test coverage report. The client library maintainer was quite responsive and fixed running on drone.io and mercurial

quickly. It's a good place for a contributor to start, in the way of see a red line -- make it green.

Here's build #43. Major changes so far:

Migrated to unittest. unitest2 fallback for py26.
Now we see a whole lot of ResourceWarnings on py3 which were filtered out by careful nosetests.
It is another place for a contributor to start.
Added configuration override for a CPWebCase.
Removed global and persistent configuration then prevented mix of HTTP and HTTPS cases in single run.
Added HTTPS case as a subclass for every subclass of CPWebCase. This made a single run to last over
200 seconds and overall tox run over 20 minutes, whereas drone.io's free tier is limited to 15 minutes.
Because every supported environment has ssl module built in, builtin adapter is now default for py2 as well.
Removed all relying on os.getcwd in tests.
Tests are run in install directory, not checkout directory (see tox.ini).
Tests are run on a free port provided by OS (one test case already had find_free_port).
Memcached cases are locked with an atomic operation.
Tox is replaced with detox on drone.io. It now runs for 4-5 minutes.
It is now possible to work on the suite in PyDev. Its runner use leads to another non-daemonic thread CherryPy
doesn't expect. It leads to deadlocks or fails for 3 or 4 cases, which are now just skipped under PyDev runner.

Sylvain, about those two tests that froze the runner. test_SIGHUP_tty had very funny comment by Robert,

This might hang if things aren't working right, but meh. Yes, what can I say, programmer do make assumptions ;-) In fact,

the name is descriptive. On receiving SIGHUP signal CherryPy decides to stop or reload by examining whether it has been

daemonized. It's examined with os.isatty(sys.stdin.fileno()) which tests whether stdin is a TTY-like device.

On drone.io (in Docker?) and in PyDev runner stdin is indeed a non-TTY device so it's the only thing to do with it is to skip

the test under the same condition. test_2_KeyboardInterrupt doesn't block on unittest. Though it still may block,

I guess, as a side-effect of some other failed tests. It happened several times. If it continues I'll run it in a thread. There's

the comment that says it expects the main thread, but I guess it is said in contrast to worker threads.

There're also several tests that seem to deterministically fail on HTTPS run. Some that fail non-deterministically besides

test_gc. And some probably because of parallel run. It's up to 10 failures at total ~ 490 tests. I'm yet to analyse them

and will write about them next time.

In general, though, I think that the biggest part of my endeavour, providing QA infrastructure, is completed. The rest is mostly

debugging and normal maintenance.

Regards,

saaj.

Sylvain Hellegouarch

unread,

Mar 6, 2015, 10:01:02 AM3/6/15

to cherryp...@googlegroups.com

Hi saaj,

Thank you so much for all your hard work. I will have to spend more time with everything you did over the weekend. But this looks great!

Do we have a complete list of the skipped/broken test somewhere?

Regarding drone.io, would you mind if I create a "cherrypy" account so that we can use it instead from now on? I will obviously share the credentials with those interested.

saaj

unread,

Mar 7, 2015, 2:24:34 PM3/7/15

to cherryp...@googlegroups.com

Thank you so much for all your hard work. I will have to spend more time with everything you did over the weekend. But this looks great!

Yes, it gets closer to become something valuable :-)

I we done another changes:

Finally run blocking code in a thread for keyboard interrupt tests. The suite is less likely to freeze now on drone.io.
Commented out test_gc until all other tests are fixed.
Added drone.io build artefacts for qa environment. They will appear on successful build.

Do we have a complete list of the skipped/broken test somewhere?

As I wrote I've only planned to look closer on failed tests and sporadic behaviour to provide details. For now I can say about three fails on HTTPS:

cherrypy.test.test_static.TestStaticHttps.test_file_stream always fails locally in every env, however it passed on drone.io last time. I have something like:

Traceback (most recent call last):
File ".tox/py27/local/lib/python2.7/site-packages/cherrypy/test/test_static.py", line 294, in test_file_stream
"chunk size (64k)" % read_so_far)
AssertionError: The file should have advanced to position 720896, but has already advanced to the end of the file. It may not be streamed as intended, or at the wrong chunk size (64k)
cherrypy.test.test_conn.TestLimitedRequestQueueHttps.test_queue_full locally py2 always fails, py3 is always fine. drone.io is close, though py33 passed last time.

Traceback (most recent call last):
File "/home/saaj/Projects/cherrypy/build/.tox/py27/local/lib/python2.7/site-packages/cherrypy/test/test_conn.py", line 831, in test_queue_full
raise AssertionError("Overflow conn did not get RST ")
AssertionError: Overflow conn did not get RST
cherrypy.test.test_caching.TestCacheHttps.test_antistampede locally it either fails to keep within 6 seconds, or just not all threads join in 16 with a lot of threads having connection refused error. The latter has the follow On drone.io they passed.

You can run them all or individually like (or detox instead):

tox -e py26,py27,py33,py34 -- -v cherrypy.test.test_static.TestStaticHttps.test_file_stream cherrypy.test.test_conn.TestLimitedRequestQueueHttps.test_queue_full cherrypy.test.test_caching.TestCacheHttps.test_antistampede

Nothing is unconditionally skipped (except test_gc which is commented). Here're conditional skips I've added for PyDev to:

cherrypy.test_states.TestServerState.testNormalStateFlow
cherrypy.test_states.TestServerState.testKeyboardInterruptMainThread
cherrypy.test_states.TestServerState.testKeyboardInterruptWorkerThread
cherrypy.test_bus.BusMethodTests.test_block

And one when stdin is non TTY-like device for cherrypy.test_states.TestSignalHandling.test_SIGHUP_tty.

What complicates test failure analysis is another methodological issue. We have server setup code in setUpClass but rather than in setUp. So when test interferes with server state, threads and so, and when it fails it makes all remaining test in the case to fail as well. Generally, tests (case's methods) should be independent and be able to run independently of side-effects of other tests. But it is obviously a performance trade-off and when things are normal and we have rare fails it may be considered a good trade-off. So it leads me thinking about several modes, or base classes for different setup.

Also I would like to know if you have any ideas about test_gc.

Regarding drone.io, would you mind if I create a "cherrypy" account so that we can use it instead from now on? I will obviously share the credentials with those interested.

Of course I don't mind. I'm more than propose it in fact. Likewise for codecov.io.

For drone.io we need to decide on build fail notification rule (will be actual after we make tests pass). On drone.io free tier I guess we can have at most 5 emails, which may be not enough and we may need to think of statues badges. Btw, Sylvain, we also shouldn't forget assigning a notification email for ReadTheDocs as well.

Here's latest drone.io build command:

echo 'debconf debconf/frontend select noninteractive' | sudo debconf-set-selections

sudo add-apt-repository ppa:fkrull/deadsnakes &> /dev/null

sudo apt-get update &> /dev/null

sudo apt-get -y install python2.6 python3.4 &> /dev/null

sudo pip install --quiet detox

detox

Put codecov token in environment variable named CODECOV_TOKEN.

Here's artefact list:

coverage_report.txt

coverage_report.html.tgz

maintenance_index.txt

code_complexity.txt

Codecov doesn't need setup as far as I remember.

Joseph S. Tate

unread,

Mar 9, 2015, 11:19:28 AM3/9/15

to cherryp...@googlegroups.com

On Thu, Mar 5, 2015 at 8:08 AM, saaj <i...@saaj.me> wrote:
> Also now I know how to blame for the lost flag to run the test suite on SSL
> ;-) However, in retrospective
> in line with unreliable test suite and lost automated SSL testing we have
> #1263, #1298 (and
> probably some others) and, as the likely outcome, it becomes a common
> knowledge that CherryPy SSL is broken
> and user should tradeoff old and new bugs by choosing versions. This is
> deadly to the framework, as such
> knowledge leads to the only answer for a CherryPy newcomer on adoption
> question. So I do hope that we,
> as a community, will take the lesson learned. Maybe eons ago it wasn't that
> important, but now it's crucial
> to have reliable, well-covering and automated tests.

Yes, I ripped out the flag to run the test suite on ssl. If we want
SSL tests, we should have a small set of tests that exercise JUST the
SSL portions. These are supposed to be unit tests after all.

I think we should deprecate SSL support. I liked the idea when it was
first done, but:

1. Can we guarantee that the SSL implementation is secure?
2. That it will be patched quickly if it is determined to be insecure?
3. That all ciphers and hash systems are up to date and can be
added/removed by the operator via config? If this is possible it's not
documented.

It's one thing to use SSL as a client, but to be a server requires a
lot more dedication than we are presently giving it. Unless someone,
with the proper security background, steps forward to do what is
necessary, I don't see it getting better.

saaj

unread,

Mar 9, 2015, 12:35:38 PM3/9/15

to cherryp...@googlegroups.com

Yes, I ripped out the flag to run the test suite on ssl. If we want
SSL tests, we should have a small set of tests that exercise JUST the
SSL portions.

Working on re-enabling SSL in the test suite proves that this isn't a good idea, because under SSL some,

presumably non-SSL related tests, fail. It's the same choice as for test_gc. More checks make the test

suite more secure which is paid with a little longer runs. But unlike test_gc SSL runs are deterministic and

I see it mostly advantageous.

These are supposed to be unit tests after all.

There's almost no unit tests in the suite. Subclasses of CPWebCase start full server with threads and all

dependencies. This is called integration/regression testing because it involves many integrated modules

(units) and end-user assertions.

I think we should deprecate SSL support.

And this is indeed a serious question. Aside from compatibility and security concerns it also

affects CherryPy self-sufficiency, as without SSL support it may no longer be considered a

web-server, but rather than a backend in a composite setup.

I liked the idea when it was
first done, but:

1. Can we guarantee that the SSL implementation is secure?

The adapter relies on ssl module, which in turn relies on OpenSSL. Naive assumption is:

the adapter is as secure is secure are ssl and OpenSSL. Of course, unless the adapter

itself is bug-free. Functional doesn't necessarily mean secure.

What are general means employed for ensuring SSL implementation (high-level one, relying

on OpenSSL) is secure except manual audit? How do Nginx and Apache ensure they are

secure?

2. That it will be patched quickly if it is determined to be insecure?

First, as long as the adapter is a wrapper on wrapper, and it doesn't itself implement crypto

algorithms, the patching shouldn't need a cryptographer involved. Second, timeliness was

discussed in the Future of Cherrypy as the need to put the bugtracker in order. To prioritize

bugs and to provide guidance. It isn't SSL-specific. I guess Sylvain was planning the effort.

3. That all ciphers and hash systems are up to date and can be
added/removed by the operator via config? If this is possible it's not
documented.

I guess it's about FREAK cipher downgrade attack (CVE-2015-0204). As far as I see

BuiltinSSLAdapter doesn't enforce ciphers. It uses ssl.PROTOCOL_SSLv23 which

is described in Python docs as:

Selects the highest protocol version that both the client and server support.

Despite the name, this option can select “TLS” protocols as well as “SSL”.

Compatibility table says it supports: SSLv3, SSLv23, TLSv1, TLSv1.1, TLSv1.2.

Is someone running public CherryPy with SSL turned on, so we can check it with FREAK test tool?

It's one thing to use SSL as a client, but to be a server requires a
lot more dedication than we are presently giving it. Unless someone,
with the proper security background, steps forward to do what is
necessary, I don't see it getting better.

From one side, as I said above the SSL adapter is very high-level wrapper and we don't need constant

cryptographer's involvement. From the other, I absolutely agree that giving a false sense security is a

very bad thing.

Michiel Overtoom

unread,

Mar 9, 2015, 3:06:00 PM3/9/15

to cherryp...@googlegroups.com

Hi Joseph,

> I think we should deprecate SSL support. [...]

What do you have in mind to replace it with? Putting secure sites behind third party secure proxies like Nginx?

--
"You can't actually make computers run faster, you can only make them do less." - RiderOfGiraffes

Joseph S. Tate

unread,

Mar 9, 2015, 3:07:33 PM3/9/15

to cherryp...@googlegroups.com

On Mon, Mar 9, 2015 at 12:35 PM, saaj <i...@saaj.me> wrote:
>>
>> Yes, I ripped out the flag to run the test suite on ssl. If we want
>> SSL tests, we should have a small set of tests that exercise JUST the
>> SSL portions.
>
>
> Working on re-enabling SSL in the test suite proves that this isn't a good
> idea, because under SSL some,
> presumably non-SSL related tests, fail. It's the same choice as for test_gc.
> More checks make the test
> suite more secure which is paid with a little longer runs. But unlike
> test_gc SSL runs are deterministic and
> I see it mostly advantageous.
>
>
>>
>> These are supposed to be unit tests after all.
>
>
> There's almost no unit tests in the suite. Subclasses of CPWebCase start
> full server with threads and all
> dependencies. This is called integration/regression testing because it
> involves many integrated modules
> (units) and end-user assertions.
>
>

This is a different problem.

>>
>> I think we should deprecate SSL support.
>
>
> And this is indeed a serious question. Aside from compatibility and security
> concerns it also
> affects CherryPy self-sufficiency, as without SSL support it may no longer
> be considered a
> web-server, but rather than a backend in a composite setup.
>
>
>>
>> I liked the idea when it was
>> first done, but:
>>
>> 1. Can we guarantee that the SSL implementation is secure?
>
>
> The adapter relies on ssl module, which in turn relies on OpenSSL. Naive
> assumption is:
> the adapter is as secure is secure are ssl and OpenSSL. Of course, unless
> the adapter
> itself is bug-free. Functional doesn't necessarily mean secure.
>
> What are general means employed for ensuring SSL implementation (high-level
> one, relying
> on OpenSSL) is secure except manual audit? How do Nginx and Apache ensure
> they are
> secure?

Almost everyone relies on openssl, apache and nginx are no exception.
However, openssl is often patched by the distro vendors and affected
consumers are rebuilt. Lots of people run penetration testing against
these browsers and bugs are discovered quickly. Using an external
library for SSL is the right thing to do.

>
>>
>> 2. That it will be patched quickly if it is determined to be insecure?
>
>
> First, as long as the adapter is a wrapper on wrapper, and it doesn't itself
> implement crypto
> algorithms, the patching shouldn't need a cryptographer involved. Second,
> timeliness was
> discussed in the Future of Cherrypy as the need to put the bugtracker in
> order. To prioritize
> bugs and to provide guidance. It isn't SSL-specific. I guess Sylvain was
> planning the effort.
>
>
>>
>> 3. That all ciphers and hash systems are up to date and can be
>> added/removed by the operator via config? If this is possible it's not
>> documented.
>
>
> I guess it's about FREAK cipher downgrade attack (CVE-2015-0204). As far as
> I see
> BuiltinSSLAdapter doesn't enforce ciphers. It uses ssl.PROTOCOL_SSLv23 which
> is described in Python docs as:
>

FREAK and HEARTBLEED. Both are fixed with a policy change deactivating
insecure protocols and ciphers. I don't run CherryPy SSL in any way,
so I haven't had to look into whether these can be configured like
they can for NGINX and Apache. Distros have patched openssl to remove
insecure stuff, so maybe this is not as big a concern as I'm making it
out to be, but the lack of an expert on this code is hindering its
viability.

> Selects the highest protocol version that both the client and server
> support.
>
> Despite the name, this option can select “TLS” protocols as well as “SSL”.
>
>
> Compatibility table says it supports: SSLv3, SSLv23, TLSv1, TLSv1.1,
> TLSv1.2.

SSLv3, and SSLv23 are both vulnerable to downgrade attacks
(heartbleed) and have been removed from the recommended list of
protocols. TLSv1 is marginally safe anymore and isn't recommended
unless you have to support old clients. That leaves TLS 1.1 and 1.2

>
> Is someone running public CherryPy with SSL turned on, so we can check it
> with FREAK test tool?
>
>>
>> It's one thing to use SSL as a client, but to be a server requires a
>> lot more dedication than we are presently giving it. Unless someone,
>> with the proper security background, steps forward to do what is
>> necessary, I don't see it getting better.
>
>
> From one side, as I said above the SSL adapter is very high-level wrapper
> and we don't need constant
> cryptographer's involvement. From the other, I absolutely agree that giving
> a false sense security is a
> very bad thing.

We are making a bunch of assumptions in our use of openssl, especially
regarding protocols and ciphers, with no warnings to the (lack of)
fitness to those assumptions. Intermediate certificate support is
either not available, needs a 3rd party patch, or isn't documented.

Joseph S. Tate

unread,

Mar 9, 2015, 3:10:18 PM3/9/15

to cherryp...@googlegroups.com

On Mon, Mar 9, 2015 at 3:05 PM, Michiel Overtoom <mot...@xs4all.nl> wrote:
>
> Hi Joseph,
>
>> I think we should deprecate SSL support. [...]
>
>
> What do you have in mind to replace it with? Putting secure sites behind third party secure proxies like Nginx?
>

Well, an SSL terminator, Load Balancer with SSL termination,
nginx/apache reverse proxy, or even a caching proxy with ssl
termination.

I'm almost always behind nginx anyway, because as fast as CP is at
serving static files, nginx is faster. Configuring NGINX to handle the
SSL is easy compared to getting it to serve the static files with CP
handling the dynamic stuff.

saaj

unread,

Mar 10, 2015, 5:42:37 AM3/10/15

to cherryp...@googlegroups.com

Joseph, thank you for rising important questions and providing details. I will summarise SSL-related points

said here and create a separate thread for discussing CherryPy SSL issues as it's off-topic here. I will CC

you, and hope you can keep on participating in the discussion.

> There's almost no unit tests in the suite. Subclasses of CPWebCase start
> full server with threads and all
> dependencies. This is called integration/regression testing because it
> involves many integrated modules
> (units) and end-user assertions.
>
>

This is a different problem.

Likely it is. But as long as we can increase test coverage from current 66% to something over 90%

we can mostly neglect it. The test goal is to simulate every state and workflow. If it's achieved it isn't very

important whether it was done directly (unit testing) or not (intergeneration testing). codecov.io will help with

targeting the blind spots either way.

saaj

unread,

Apr 23, 2015, 9:05:55 AM4/23/15

to cherryp...@googlegroups.com

Hi Sylvain, folks,

Rendered version is published here.

Testing

=======

As was discovered in `Future of CherryPy <http://recollection.saaj.me/article/future-of-cherrypy-bright-and-shiny.html>`_

quality assurance was a glaring problem. After I volunteered and started working on putting

tests back in the right way, I've discovered several testing methodological errors that were

made along in years of CherryPy existence. Also, back in the day CherryPy had its own test

runner and later survived migration to Nose, but lost SSL tests. Not that the tests were purged

away from the codebase, but werern't possible to be run automatically and required a tester

to change code to run tests with SSL. Obviously no one did it, and it led to several issues

that rendered SSL functionality broken. Actually it still is, as at the moment of writing

in latest CherryPy release, 3.6, SSL doesn't work.

Besides testing issues, there were issues with CherryPy functionality as such. For instance,

session file locking didn't seem to be tested on machines with more than one execution unit.

This way session locks didn't work all the time on machines with a multi-core CPU.

You may be confused to think that as long as there is GIL [1]_, you shouldn't care about number

of CPUs. Unfortunately, serial execution doesn't mean execution in one certain order. Thus on

different number of execution units and on different GIL implementations (Python 3.2 has new

GIL [2]_) a threaded Python program will likely behave differently.

I've fixed file locking by postponing lock file removal to a cleanup process. Briefly covering

issues with other session backends is to say Postgres session backend was decided to be removed

because it was broken, untested and mostly undocumented. Memcached session backend was mostly

fine, except some minor ``str``/``bytes`` issue in *py3* and the fact it isn't very useful in

its current condition. The problem is that it doesn't implement distributed locking, so once you

have two or more CherryPy instances that use a Memcached server as a session backend and your

users are not distributed consistently across the instances, all sorts of concurrent update

issues may raise.

Substantial methodological errors in the test suite were:

* test determinism

* test dependency

* test isolation

Test determinism

----------------

There were garbage collection tests that were automatically added to every subclass of

``test.helper.CPWebCase``. They are non-deterministic and fail sporadically (there are

other normal tests that have non-deterministic behaviour, see below). Because CPython's reference

counting garbage collector is deterministic, it's a concurrency issue. What was really surprising

for me hear was explanation like, okay, if it has failed run it once again until you're sure that

the failure persists. This ``test_gc`` supplementing is disabled now until everything else is

resolved, and it is clear who to handle it properly.

Another thing that prevented certain classification to *pass* and *fail* and led to frequent

freezes were request retries and absence of HTTP client's socket timeout. I can only guess about

the original reasons of such "carefulness" and not letting a test just fail. For instance, you can

take some of concurrency tests that throw out up to a hundred client threads and imagine stability

and predictability of the undertaking with 10 retires and no client timeout. These two were

also fixed -- ``test.webtest.openURL`` doesn't retry on socket errors and sets 10-second timeout

to its HTTP connections by default.

``test.helper.CPWebCase`` starts full in-process CherryPy server and then uses real

``httplib.HTTPConnection`` to connect to it. Most of CherryPy tests are in fact either

integration [3]_ or system tests [4]_.

Test dependency

---------------

How do you like tests named like ``test_0_check_something``, ``test_1_check_something_else``, and

so on? And what if they were named this way intentionally, so they run in order, and fail

otherwise because latter depend on former. Specifically, you can not run test individually. All

occurrence I've found were renamed and freed from sibling shackles.

Test isolation

--------------

``test.helper.CPWebCase``'s server startup routine was made in equivalent of ``unittest``'s

``setUpClass``, so tests in a test case were not isolated from each other, which made it

problematic to deal with the case of tests that interfere with internals as it was really

hard to say whether such test has failed on its own or as a consequence. I've added

``per_test_setup`` attribute to ``CPWebCase`` to conditionally do the routine in ``setUp``

and accommodated tests' code. I apprehended significant performance degradation, but it has just

become around 20% slower, which was the reason for me to make it true by default. Not sure,

maybe it should't exist at all now.

Unfortunately, this doesn't resolve all isolation issues because having to do a complete reset

of in-process threaded server is a juggling whatsoever. When a tricky test fails badly, it still

has a side-effect (see below), like this::

./cherrypy/process/wspbus.py:233: RuntimeWarning: The main thread is exiting, but the Bus is

in the states.STARTING state; shutting it down automatically now. You must either call

bus.block() after start(), or call bus.exit() before the main thread exits.

Testing progress

----------------

What has just been said about fixed and improved stuff is happening in my fork [5]_, it is not yet

pushed upstream. More detailed description and discussion about current and above sections is in

the thread in CherryPy user group [6]_. Because it's there until the moment Google decides

to shutdown Google Groups because of low traffic, abuse or any other reason they use for such

an announcement, I will summarise the achievements in improving CherryPy testing infrastructure.

#. Migrated tests to stdlib ``unittest``, ``unitest2`` as fallback for *py26*. Nose is unfriendly

to *py3* and probably was related to some locking issues.

#. Overall improvement in avoiding freezes in various test cases and testing environments.

TTY in Docker, disabled interactive mode, various internals' tests wait in a thread, et cetera.

#. Implemented "tagged" Tox 1.8+ configuration for matrix of various environments (see below).

#. Integrated with Drone.io [7]_ CI service and Codecov.io [8]_ code coverage tracking

service.

#. Made various changes to allow parallel environment run with Detox to fit in 15 minutes of

Drone.io free tier. It includes running tests in install directory [9]_, starting server each

time on free port provided by OS, locking Memcached cases with an atomic operation and other.

#. Removed global and persistent configuration that prevented mixing HTTP and HTTPS cases.

#. Made it possible to work on the test suite in PyDev. When running tests in PyDev test runner,

it adds another non-daemonic thread that tests don't expect which leads to deadlocks or fails

for 3 tests. They are now just skipped under PyDev runner.

Here's Drone.io commands required to run the test suite. It uses Deadsnakes [10]_ to install old

or not yet stable version of Python. Development versions are needed to build pyOpenSSL. The rest

comes with Drone.io container out of the box.

.. sourcecode:: bash

echo 'debconf debconf/frontend select noninteractive' | sudo debconf-set-selections

sudo add-apt-repository ppa:fkrull/deadsnakes &> /dev/null

sudo apt-get update &> /dev/null

sudo apt-get -y install python2.6 python3.4 &> /dev/null

sudo apt-get -y install python2.6-dev python3.4-dev &> /dev/null

sudo pip install --quiet detox

detox

tox -e post

As you can see ``post`` environment closes the build. Unless the build was successful it won't run

and no coverage will be submitted or build artifacts become available. The build artifacts, what

also needs to be listed in Drone.io, are several quality assurance reports that can be helpful for

further improving the codebase::

coverage_report.txt

coverage_report.html.tgz

maintenance_index.txt

code_complexity.txt

For Codecov.io integration to work, environment variable ``CODECOV_TOKEN`` should be assigned.

Tagged Tox configuration

------------------------

Final Tox configuration looks like the following. It emerged through series of rewrites that were

led by tradeoffs between test duration, combined coverage report and need to test dependencies.

I came to the last design after I realised importance of support of pyOpenSSL and thus the need to

test it (see about SSL issues below). Then test sampling facility was easy to integrate to it.

.. sourcecode:: ini

[tox]

minversion = 1.8

envlist = pre,docs,py{26-co,27-qa,33-nt,34-qa}{,-ssl,-ossl}

# run tests from install dir, not checkout dir

# http://tox.rtfd.org/en/latest/example/pytest.html#known-issues-and-limitations

[testenv]

changedir = {envsitepackagesdir}/cherrypy

setenv =

qa: COVERAGE_FILE = {toxinidir}/.coverage.{envname}

ssl: CHERRYPY_TEST_SSL_MODULE = builtin

ossl: CHERRYPY_TEST_SSL_MODULE = pyopenssl

sa: CHERRYPY_TEST_XML_REPORT_DIR = {toxinidir}/test-xml-data

deps =

routes

py26: unittest2

py{26,27}: python-memcached

py{33,34}: python3-memcached

qa: coverage

ossl: pyopenssl

sa: unittest-xml-reporting

commands =

python --version

nt: python -m unittest {posargs: discover -v cherrypy.test}

co: unit2 {posargs:discover -v cherrypy.test}

qa: coverage run --branch --source="." --omit="t*/*" \

qa: --module unittest {posargs: discover -v cherrypy.test}

sa: python test/xmltestreport.py

[testenv:docs]

basepython = python2.7

changedir = docs

commands = sphinx-build -q -E -n -b html . build

deps =

sphinx < 1.3

sphinx_rtd_theme

[testenv:pre]

changedir = {toxinidir}

deps = coverage

commands = coverage erase

# must be run separately after main envlist, because of detox

[testenv:post]

changedir = {toxinidir}

deps =

coverage

codecov

radon

commands =

bash -c 'echo -e "[paths]\nsource = \n cherrypy" > .coveragerc'

bash -c 'echo " .tox/*/lib/*/site-packages/cherrypy" >> .coveragerc'

coverage combine

bash -c 'coverage report > coverage_report.txt'

coverage html

tar -czf coverage_report.html.tgz htmlcov

- codecov

bash -c 'radon mi -s -e "cherrypy/t*/*" cherrypy > maintenance_index.txt'

bash -c 'radon cc -s -e "cherrypy/t*/*" cherrypy > code_complexity.txt'

whitelist_externals =

/bin/tar

/bin/bash

# parse and collect XML files of XML test runner

[testenv:report]

deps =

changedir = {toxinidir}

commands = python -m cherrypy.test.xmltestreport --parse

This design heavily relies on great multi-dimensional configuration that appeared in Tox 1.8 [11]_.

Given an environment, say ``py27-nt-ossl``, Tox treats it like a set of tags,

``{'py27', 'nt', 'ossl'}``, where each entry is matched individually. This way any environment,

that makes sense though, with desired qualities can be constructed even if it doesn't listed in

``envlist``. Here's a table that explains the purpose of the tags.

==== ================================================

tag meaning

==== ================================================

nt normal test run via ``unittest``

co compatibility run via ``unittest2``

qa quality assurance run via ``coverage``

sa test sampling run via ``xmlrunner``

ssl run using Python bulitin ``ssl`` module

ossl run using PyOpenSSL

==== ================================================

Note ``{posargs: discover -v cherrypy.test}`` part. It makes possible to run a subset of the

test suite in desired environments, e.g.:

.. sourcecode:: bash

tox -e "py{26-co-ssl,27-nt,33-qa,34-nt-ssl}" -- -v cherrypy.test.test_http \

cherrypy.test.test_conn.TestLimitedRequestQueue \

cherrypy.test.test_caching.TestCache.test_antistampede

Tackling entropy of threading and in-process server juggling

------------------------------------------------------------

Even though there was a progress on making test suite more predictable, and actually runnable,

it is still far from desired, and I was scratching my head trying to figure out how to gather

test stats from all the environments. Solution was in the lighted place, right where I was

looking for it. There's a package *unittest-xml-reporting*, whose name is pretty descriptive.

It's a ``unittest`` runner that writes XML files likes

``TEST-test.test_auth_basic.TestBasicAuth-20150410005927.xml`` with easy-to-guess content.

Then it was just a need to write some glue code for overnight runs and implement parsing and

aggregation.

I've implemented and incorporated module named ``xmltestreport`` [12]_ into the ``test``

package. ``sa`` Tox tag corresponds to it. Running the following in terminal at evening,

at morning can give you the following table (except N/A evironments, see below).

.. sourcecode:: bash

for i in {1..64}; do tox -e "py{26,27,33,34}-sa{,-ssl,-ossl}"; done

tox -e report

+-------------------------------------------------------------+----+----+----+

| **py26-sa** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| OK |

+-------------------------------------------------------------+----+----+----+

| **py26-sa-ossl** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| test_conn.TestLimitedRequestQueue.test_queue_full | 27 | 0 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_request_obj.TestRequestObject.testParamErrors | 1 | 0 | 26 |

+-------------------------------------------------------------+----+----+----+

| test_static.TestStatic.test_file_stream | 27 | 0 | 0 |

+-------------------------------------------------------------+----+----+----+

| **py26-sa-ssl** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| test_caching.TestCache.test_antistampede | 0 | 33 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_config_server.TestServerConfig.testMaxRequestSize | 0 | 1 | 32 |

+-------------------------------------------------------------+----+----+----+

| test_conn.TestLimitedRequestQueue.test_queue_full | 33 | 0 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_static.TestStatic.test_file_stream | 33 | 0 | 0 |

+-------------------------------------------------------------+----+----+----+

| **py27-sa** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| OK |

+-------------------------------------------------------------+----+----+----+

| **py27-sa-ossl** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| test_conn.TestLimitedRequestQueue.test_queue_full | 33 | 0 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_session.TestSession.testFileConcurrency | 1 | 0 | 32 |

+-------------------------------------------------------------+----+----+----+

| test_states.TestWait.test_wait_for_occupied_port_INADDR_ANY | 1 | 0 | 32 |

+-------------------------------------------------------------+----+----+----+

| test_static.TestStatic.test_file_stream | 33 | 0 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_tools.TestTool.testCombinedTools | 2 | 0 | 31 |

+-------------------------------------------------------------+----+----+----+

| **py27-sa-ssl** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| test_caching.TestCache.test_antistampede | 0 | 33 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_config_server.TestServerConfig.testMaxRequestSize | 0 | 4 | 29 |

+-------------------------------------------------------------+----+----+----+

| test_conn.TestLimitedRequestQueue.test_queue_full | 33 | 0 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_static.TestStatic.test_file_stream | 33 | 0 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_tools.TestTool.testCombinedTools | 2 | 0 | 31 |

+-------------------------------------------------------------+----+----+----+

| **py33-sa** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| test_conn.TestPipeline.test_HTTP11_pipelining | 0 | 16 | 17 |

+-------------------------------------------------------------+----+----+----+

| **py33-sa-ossl** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| N/A |

+-------------------------------------------------------------+----+----+----+

| **py33-sa-ssl** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| test_caching.TestCache.test_antistampede | 0 | 33 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_static.TestStatic.test_file_stream | 33 | 0 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_tools.TestTool.testCombinedTools | 3 | 0 | 30 |

+-------------------------------------------------------------+----+----+----+

| **py34-sa** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| test_conn.TestLimitedRequestQueue.test_queue_full | 0 | 1 | 32 |

+-------------------------------------------------------------+----+----+----+

| test_conn.TestPipeline.test_HTTP11_pipelining | 0 | 14 | 19 |

+-------------------------------------------------------------+----+----+----+

| **py34-sa-ossl** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| N/A |

+-------------------------------------------------------------+----+----+----+

| **py34-sa-ssl** | F | E | S |

+-------------------------------------------------------------+----+----+----+

| test_caching.TestCache.test_antistampede | 0 | 32 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_config_server.TestServerConfig.testMaxRequestSize | 0 | 1 | 31 |

+-------------------------------------------------------------+----+----+----+

| test_static.TestStatic.test_file_stream | 32 | 0 | 0 |

+-------------------------------------------------------------+----+----+----+

| test_tools.TestTool.testCombinedTools | 3 | 0 | 29 |

+-------------------------------------------------------------+----+----+----+

Actually, as you can see, it only had time for half of the iterations, and at the morning I

interrupted it. Sylvain, you see, I mentioned those three tests in the group correctly.

But there're other candidates and this new arisen fragility of SSL sockets, if I understand it

correctly. I mean all the tests with one fail or error. For example,

``test_session.TestSession.testFileConcurrency`` which is fine lock-wise but because it stresses

networking layer with several dozen client threads, there're two socket errors::

Exception in thread Thread-1098:

Traceback (most recent call last):

File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner

self.run()

File "/usr/lib64/python2.7/threading.py", line 763, in run

self.__target(*self.__args, **self.__kwargs)

File "py27-sa-ossl/lib/python2.7/site-packages/cherrypy/test/test_session.py", line 282, in request

c.endheaders()

File "/usr/lib64/python2.7/httplib.py", line 969, in endheaders

self._send_output(message_body)

File "/usr/lib64/python2.7/httplib.py", line 829, in _send_output

self.send(msg)

File "/usr/lib64/python2.7/httplib.py", line 805, in send

self.sock.sendall(data)

File "/usr/lib64/python2.7/ssl.py", line 293, in sendall

v = self.send(data[count:])

File "/usr/lib64/python2.7/ssl.py", line 262, in send

v = self._sslobj.write(data)

error: [Errno 32] Broken pipe

Exception in thread Thread-1099:

Traceback (most recent call last):

File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner

self.run()

File "/usr/lib64/python2.7/threading.py", line 763, in run

self.__target(*self.__args, **self.__kwargs)

File "py27-sa-ossl/lib/python2.7/site-packages/cherrypy/test/test_session.py", line 282, in request

c.endheaders()

File "/usr/lib64/python2.7/httplib.py", line 969, in endheaders

self._send_output(message_body)

File "/usr/lib64/python2.7/httplib.py", line 829, in _send_output

self.send(msg)

File "/usr/lib64/python2.7/httplib.py", line 805, in send

self.sock.sendall(data)

File "/usr/lib64/python2.7/ssl.py", line 293, in sendall

v = self.send(data[count:])

File "/usr/lib64/python2.7/ssl.py", line 262, in send

v = self._sslobj.write(data)

error: [Errno 104] Connection reset by peer

Other error details are available in XML reports which retain such details. You may think that

I have found the reason for "carefulness" I questioned above, but I woulnd't agree for several

reasons. First, it looks like an SSL-only and also needs investigation -- someone knows practical

reasons for the SSL server and client to be more fragile? Second, if a test stresses server with a

few dozens of client threads, regardless of SSL, it should be written in a way that tolerates

failures for some of them rather than relying on magic recovery behind the scenes. And after all,

all this integration testing is quite useless if we assume we're fine having 9/10 of the client to

fail in any test.

Status indication

-----------------

Long story told short, I have changed my mind on status badges. After thinking a lot about

organising project's status indication and notification, where most important is regression

notification when we have contributors who used to commit without running tests, I came to

conclusion that status badges can do the job.

This way every contributor will know that an introduced regression will become a public knowledge

in a matter of minutes. And that it won't make other members of the community happy about it.

Hopefully, this as a form of enforcement of standards and new testing tools will help us go

smoother.

Drawing the line

----------------

As far as I see, testing infrastructure is ready. I also made several related changes here and

there. Now I think it is a good time to put it into upstream. First, of course for other

contributors to make use of the new tools. Second, for the sake of granularity because there's

already a bunch of changes since January in my fork. And third, in coming months it is likely that

I won't have much time for contribution, thus it's good time to discuss the changes if we need to.

Question

--------

For these I can expect the answers only from Sylvain, but others' ideas are also welcomed.

#. Will you review my changes in the fork [5]_? Or it's easier if I've created a pull

request right away?

#. Do we have any progress on bugtracker?

#. I still want to hear about your ideas about ``test_gc``.

#. We still have this outdated and poitless wiki home as landing page for the repo.

#. Recent coverage report [8]_ uncovered that we have a dozen with zero-coverage modules which

are probably obsolete.

____

.. [1] https://wiki.python.org/moin/GlobalInterpreterLock

.. [2] http://www.dabeaz.com/python/NewGIL.pdf

.. [3] http://en.wikipedia.org/wiki/Integration_testing

.. [4] http://en.wikipedia.org/wiki/System_testing

.. [5] https://bitbucket.org/saaj/cherrypy

.. [6] https://groups.google.com/forum/#!topic/cherrypy-users/Qm-uLscS4z4

.. [7] https://drone.io/bitbucket.org/saaj/cherrypy

.. [8] https://codecov.io/bitbucket/saaj/cherrypy?ref=default

.. [9] http://tox.rtfd.org/en/latest/example/pytest.html#known-issues-and-limitations

.. [10] https://launchpad.net/~fkrull/+archive/ubuntu/deadsnakes

.. [11] http://tox.readthedocs.org/en/latest/config-v2.html

.. [12] https://bitbucket.org/saaj/cherrypy/src/default/cherrypy/test/xmltestreport.py

saaj

unread,

Apr 23, 2015, 9:21:16 AM4/23/15

to cherryp...@googlegroups.com, Robert Brewer

Hello Robert,

I'm sorry to disturb you, but there's an issue with your code. In both, ``wsgiserver2`` and ``wsgiserver3`` we have ``ChunkedRFile.__iter__`` which looks like this:

def __iter__(self):

# Shamelessly stolen from StringIO

total = 0

line = self.readline(sizehint)

while line:

yield line

total += len(line)

if 0 < sizehint <= total:

break

line = self.readline(sizehint)

This probably indicates that you're not as good in stealing as in programming ;-) Note undefined ``sizehint``. Also coverage report indicates the method is not reached in

the test suite. Can you verify that it's safe to remove the method or can you fix it? So we don't have broken code in the core.

P.S. It'll be really nice to have participating the discussion.

Sylvain Hellegouarch

unread,

Apr 23, 2015, 10:04:52 AM4/23/15

to cherryp...@googlegroups.com

Hi saaj,

Thanks for being so thorough and patient.

I think it's a good call. I'd rather have no test than a false-positive broken test.

Another thing that prevented certain classification to *pass* and *fail* and led to frequent
freezes were request retries and absence of HTTP client's socket timeout. I can only guess about
the original reasons of such "carefulness" and not letting a test just fail. For instance, you can
take some of concurrency tests that throw out up to a hundred client threads and imagine stability
and predictability of the undertaking with 10 retires and no client timeout. These two were
also fixed -- ``test.webtest.openURL`` doesn't retry on socket errors and sets 10-second timeout
to its HTTP connections by default.

``test.helper.CPWebCase`` starts full in-process CherryPy server and then uses real
``httplib.HTTPConnection`` to connect to it. Most of CherryPy tests are in fact either
integration [3]_ or system tests [4]_.

Seems fair. Do we need such a long timeout though? I would have set it to something smaller so that the test suite doesn't drag on.

Test dependency
---------------
How do you like tests named like ``test_0_check_something``, ``test_1_check_something_else``, and
so on? And what if they were named this way intentionally, so they run in order, and fail
otherwise because latter depend on former. Specifically, you can not run test individually. All
occurrence I've found were renamed and freed from sibling shackles.

That's great. I don't think tests should depend on each other either at all. So ordering them isn't a feature I would expect.

Test isolation
--------------
``test.helper.CPWebCase``'s server startup routine was made in equivalent of ``unittest``'s
``setUpClass``, so tests in a test case were not isolated from each other, which made it
problematic to deal with the case of tests that interfere with internals as it was really
hard to say whether such test has failed on its own or as a consequence. I've added
``per_test_setup`` attribute to ``CPWebCase`` to conditionally do the routine in ``setUp``
and accommodated tests' code. I apprehended significant performance degradation, but it has just
become around 20% slower, which was the reason for me to make it true by default. Not sure,
maybe it should't exist at all now.

Unfortunately, this doesn't resolve all isolation issues because having to do a complete reset
of in-process threaded server is a juggling whatsoever. When a tricky test fails badly, it still
has a side-effect (see below), like this::

./cherrypy/process/wspbus.py:233: RuntimeWarning: The main thread is exiting, but the Bus is
in the states.STARTING state; shutting it down automatically now. You must either call
bus.block() after start(), or call bus.exit() before the main thread exits.

This is in response to the fact we are doing integrationg tests not real unit tests of code itself. I wonder if we shouldn't remove that functional layer and try to re-purpose our tests to make them easier to control.

I often rely on this little recipe I wrote for my own tests:

https://bitbucket.org/Lawouach/cherrypy-recipes/src/tip/testing/unit/serverless/

It makes the request go through the same stack except for the socket layer.

Could this help out? The tests aren't much more unit tests per-se but we avoid some issues.

Testing progress
----------------
What has just been said about fixed and improved stuff is happening in my fork [5]_, it is not yet
pushed upstream. More detailed description and discussion about current and above sections is in
the thread in CherryPy user group [6]_. Because it's there until the moment Google decides
to shutdown Google Groups because of low traffic, abuse or any other reason they use for such
an announcement, I will summarise the achievements in improving CherryPy testing infrastructure.

#. Migrated tests to stdlib ``unittest``, ``unitest2`` as fallback for *py26*. Nose is unfriendly
to *py3* and probably was related to some locking issues.

That's okay with me.

#. Overall improvement in avoiding freezes in various test cases and testing environments.
TTY in Docker, disabled interactive mode, various internals' tests wait in a thread, et cetera.
#. Implemented "tagged" Tox 1.8+ configuration for matrix of various environments (see below).
#. Integrated with Drone.io [7]_ CI service and Codecov.io [8]_ code coverage tracking
service.

That's great, though it won't make much difference if, as maintainers, we don't pay attention to the results anyhow :(

#. Made various changes to allow parallel environment run with Detox to fit in 15 minutes of
Drone.io free tier. It includes running tests in install directory [9]_, starting server each
time on free port provided by OS, locking Memcached cases with an atomic operation and other.
#. Removed global and persistent configuration that prevented mixing HTTP and HTTPS cases.
#. Made it possible to work on the test suite in PyDev. When running tests in PyDev test runner,
it adds another non-daemonic thread that tests don't expect which leads to deadlocks or fails
for 3 tests. They are now just skipped under PyDev runner.

Cool stuff. I'm using pycharm myself but that ought to follow a similar trend. Thanks.

Mmh, the fact we're running integrations tests make our test suite more fragile and complicated than it ought to be.

What are we trying to do with our tests anyway? That's probably the question we want to answer first :)

Status indication
-----------------
Long story told short, I have changed my mind on status badges. After thinking a lot about
organising project's status indication and notification, where most important is regression
notification when we have contributors who used to commit without running tests, I came to
conclusion that status badges can do the job.

This way every contributor will know that an introduced regression will become a public knowledge
in a matter of minutes. And that it won't make other members of the community happy about it.
Hopefully, this as a form of enforcement of standards and new testing tools will help us go
smoother.

Drawing the line
----------------
As far as I see, testing infrastructure is ready. I also made several related changes here and
there. Now I think it is a good time to put it into upstream. First, of course for other
contributors to make use of the new tools. Second, for the sake of granularity because there's
already a bunch of changes since January in my fork. And third, in coming months it is likely that
I won't have much time for contribution, thus it's good time to discuss the changes if we need to.

I'd be happy to integrate your change as you've clearly put a lot of effort into it and they seem to have paid up.

Question
--------
For these I can expect the answers only from Sylvain, but others' ideas are also welcomed.

#. Will you review my changes in the fork [5]_? Or it's easier if I've created a pull
request right away?

I prefer a PR if you can allow it in your schedule.

#. Do we have any progress on bugtracker?

You could open a ticket in BitBucket.

#. I still want to hear about your ideas about ``test_gc``.

I have no recollection of why we had this one so hard for me to comment.

#. We still have this outdated and poitless wiki home as landing page for the repo.

Indeed.

#. Recent coverage report [8]_ uncovered that we have a dozen with zero-coverage modules which
are probably obsolete.

That's a status at least.

Thanks for all your work saaj,

saaj

unread,

Apr 24, 2015, 8:49:28 AM4/24/15

to cherryp...@googlegroups.com

Thanks for being so thorough and patient.

:-)

Test determinism
----------------
There were garbage collection tests that were automatically added to every subclass of
``test.helper.CPWebCase``. They are non-deterministic and fail sporadically (there are
other normal tests that have non-deterministic behaviour, see below). Because CPython's reference
counting garbage collector is deterministic, it's a concurrency issue. What was really surprising
for me hear was explanation like, okay, if it has failed run it once again until you're sure that
the failure persists. This ``test_gc`` supplementing is disabled now until everything else is
resolved, and it is clear who to handle it properly.

I think it's a good call. I'd rather have no test than a false-positive broken test.

#. I still want to hear about your ideas about ``test_gc``.

I have no recollection of why we had this one so hard for me to comment.

These ``test_gc``s of course weren't without a reason. Maybe because of them CherryPy has good memory profile and doesn't leak. After this

iteration we should get back to them because they are important. Test sampling facility may help to detect if the problem is in certain tests.

Another thing that prevented certain classification to *pass* and *fail* and led to frequent
freezes were request retries and absence of HTTP client's socket timeout. I can only guess about
the original reasons of such "carefulness" and not letting a test just fail. For instance, you can
take some of concurrency tests that throw out up to a hundred client threads and imagine stability
and predictability of the undertaking with 10 retires and no client timeout. These two were
also fixed -- ``test.webtest.openURL`` doesn't retry on socket errors and sets 10-second timeout
to its HTTP connections by default.

``test.helper.CPWebCase`` starts full in-process CherryPy server and then uses real
``httplib.HTTPConnection`` to connect to it. Most of CherryPy tests are in fact either
integration [3]_ or system tests [4]_.

Seems fair. Do we need such a long timeout though? I would have set it to something smaller so that the test suite doesn't drag on.

I agree it's long, and actually I started with 2 or 4 seconds. But there were additional failures, because tests were written in expectation of infinite

client timeout and retries, but I can't recall, though my point was "infrastructure". I have just run ``py27-nt`` and ``py33-nt``. With 2-second timeout

both added a failure of notorious ``test.test_caching.TestCache.test_antistampede`` which needs to be amended anyway. So yes, ``WebCase.timeout``

can be decreased for a general case, and set as param to ``CPWebCase.getPage`` on ocassion.

Test isolation
--------------
``test.helper.CPWebCase``'s server startup routine was made in equivalent of ``unittest``'s
``setUpClass``, so tests in a test case were not isolated from each other, which made it
problematic to deal with the case of tests that interfere with internals as it was really
hard to say whether such test has failed on its own or as a consequence. I've added
``per_test_setup`` attribute to ``CPWebCase`` to conditionally do the routine in ``setUp``
and accommodated tests' code. I apprehended significant performance degradation, but it has just
become around 20% slower, which was the reason for me to make it true by default. Not sure,
maybe it should't exist at all now.

Unfortunately, this doesn't resolve all isolation issues because having to do a complete reset
of in-process threaded server is a juggling whatsoever. When a tricky test fails badly, it still
has a side-effect (see below), like this::

./cherrypy/process/wspbus.py:233: RuntimeWarning: The main thread is exiting, but the Bus is
in the states.STARTING state; shutting it down automatically now. You must either call
bus.block() after start(), or call bus.exit() before the main thread exits.

This is in response to the fact we are doing integrationg tests not real unit tests of code itself. I wonder if we shouldn't remove that functional layer and try to re-purpose our tests to make them easier to control.

To me the thing looks like apples vs oranges. What's more important, high-level assertions that indicate that the web-server acts like a

web-server or that a bunch of low-level assertions about file-like objects and HTTP string parsing, if can't have both? To me it's the former.

I see there's (inherent?) coupling -- tools, bus, etc. You either end up writing bunch of mocks or do integration testing anyway. But I don't feel

steady on the point. It may need more discussion.

I often rely on this little recipe I wrote for my own tests:

https://bitbucket.org/Lawouach/cherrypy-recipes/src/tip/testing/unit/serverless/

It makes the request go through the same stack except for the socket layer.

Could this help out? The tests aren't much more unit tests per-se but we avoid some issues.

I looked at the recipe several times in past, but didn't dare to incorporate it to my workflow. Maybe because of need to implement assertions

methods that ``webtest.WebCase`` has. From one side I think it is good to have this base class out of the box and have a choice. From other

I don't understand how tell a user when to use one and when another.

It's going to be a big rewrite anyway. With this full-blown integration/system we have several environments that are fine (plain py2). When we

complicate things, like, throwing a hundred threads into it, add SSL, or run it on py3, it has issues. But in fact there's a handful of them. I believe

it's easier to fix them that to rewrite the whole suite. It seems a future question.

#. Overall improvement in avoiding freezes in various test cases and testing environments.
TTY in Docker, disabled interactive mode, various internals' tests wait in a thread, et cetera.
#. Implemented "tagged" Tox 1.8+ configuration for matrix of various environments (see below).
#. Integrated with Drone.io [7]_ CI service and Codecov.io [8]_ code coverage tracking
service.

That's great, though it won't make much difference if, as maintainers, we don't pay attention to the results anyhow :(

Absolutely. As I already told, at very least one how publishes to the Cheese Shop should look for regressions, check up the status badge's

color, beforehand. But it's not only for maintainers. For instance, it is a good guidance for people who want to contribute to the project.

Other error details are available in XML reports which retain such details. You may think that
I have found the reason for "carefulness" I questioned above, but I woulnd't agree for several
reasons. First, it looks like an SSL-only and also needs investigation -- someone knows practical
reasons for the SSL server and client to be more fragile? Second, if a test stresses server with a
few dozens of client threads, regardless of SSL, it should be written in a way that tolerates
failures for some of them rather than relying on magic recovery behind the scenes. And after all,
all this integration testing is quite useless if we assume we're fine having 9/10 of the client to
fail in any test.

Mmh, the fact we're running integrations tests make our test suite more fragile and complicated than it ought to be.
What are we trying to do with our tests anyway? That's probably the question we want to answer first :)

We need tests to detect regressions, obviously. We have issues with determinism. To tackle it we needed a tool. I think test sampling can

be quite helpful with finding patterns and debugging. Normally, of course, you don't need sampling as it just should clearly fall in one of two

categories: pass or fail.

As I said it gets fragile and complicated not without reason. Either to fix individual cases or to undertake whole suite rewriting. The question

is what is more achievable.

#. Will you review my changes in the fork [5]_? Or it's easier if I've created a pull
request right away?

I prefer a PR if you can allow it in your schedule.

There you go: https://bitbucket.org/cherrypy/cherrypy/pull-request/98

#. Do we have any progress on bugtracker?

You could open a ticket in BitBucket.

Hm, for bookkeeping or just to answer it myself? ;-) I'm about "Bugs bugs bugs" section you had in original post of Future of CherryPy.

Reply all

Reply to author

Forward