On putting the test suite in order

174 views
Skip to first unread message

saaj

unread,
Jan 30, 2015, 7:16:31 AM1/30/15
to cherryp...@googlegroups.com
Hi Sylvain, hi folks,


I forked the repo and started to work on making it pass the suite on the discussed environments (py26, py27, py33, py34, pypy). Here are issues I've discovered:
  1. Interactive switch is broken -- remember back in Soviet Russia story? ;-) Google suggests there was ``--dumb`` switch at pytest times. Now it's always on, and doesn't hang only because ``old_settings = termios.tcgetattr(fd)`` throws some terminal exception in ``webtest.getchar``. I'll turn ``interactive = False`` because I think much less important than stable test suite and ability to run it. If it goes upstream I'll open a bug for making a command line switch to explicitly turn it on, if someone will need the feature. It may be useful in some specific debugging case, but I'm not interested in implementing it.
  2. tox.ini didn't have memcache dependency. Because of it it seems to me completely untested on py3. There're byte/string exceptions.
  3. Session tests used ``repr`` for for dict representation of session data, which is key-order indeterministic.
  4. Tests, in ``test_session``, depended on each other to run in order. Bad idea. Though it wasn't hard to fix.
  5. Concurrency tests fail on counters off by one or two (I'm on 4 virtual core laptop). They fail more on py3, probably because of new GIL.  
  6. I can't seem to figure out when ``test_http.test_http_over_https`` should run. It's always skipped.
  7. I've strong intention to turning it all back to stdlib unittest. We only need unittest2 for py26. I've issues with py3 like this, where nose runner raises strings which is not fine on py3. Also the lesser dependency the better.
  8. PyPy suite doesn't even starts. It freezes on first test. I'm yet to look at the issue.
I am mostly concerned about #7 now.


Regards,
saaj.


saaj

unread,
Jan 30, 2015, 7:27:22 AM1/30/15
to cherryp...@googlegroups.com
Do we really need ``PostgresqlSession``? It lacks tests. And take a look at the thing:

    def setup(cls, **kwargs):
        """Set up the storage system for Postgres-based sessions.

        This should only be called once per process; this will be done
        automatically when using sessions.init (as the built-in Tool does).
        """
        for k, v in kwargs.items():
            setattr(cls, k, v)

        self.db = self.get_db()
    setup = classmethod(setup)

I suspect no one is actually using it because of this.

Sylvain Hellegouarch

unread,
Jan 30, 2015, 7:52:50 AM1/30/15
to cherryp...@googlegroups.com
Hi saaj,

2015-01-30 13:16 GMT+01:00 saaj <i...@saaj.me>:
Hi Sylvain, hi folks,


I forked the repo and started to work on making it pass the suite on the discussed environments (py26, py27, py33, py34, pypy). Here are issues I've discovered:
  1. Interactive switch is broken -- remember back in Soviet Russia story? ;-) Google suggests there was ``--dumb`` switch at pytest times. Now it's always on, and doesn't hang only because ``old_settings = termios.tcgetattr(fd)`` throws some terminal exception in ``webtest.getchar``. I'll turn ``interactive = False`` because I think much less important than stable test suite and ability to run it. If it goes upstream I'll open a bug for making a command line switch to explicitly turn it on, if someone will need the feature. It may be useful in some specific debugging case, but I'm not interested in implementing it.

I quite agree. I think it made a lot of sense when CP was under heavy development because it made life easier. These days, not so much.
 
  1. tox.ini didn't have memcache dependency. Because of it it seems to me completely untested on py3. There're byte/string exceptions.
  2. Session tests used ``repr`` for for dict representation of session data, which is key-order indeterministic.
I haven't looked at those tests but it sounds weird that we would expect ordering from a dict.
 
  1. Tests, in ``test_session``, depended on each other to run in order. Bad idea. Though it wasn't hard to fix.
  2. Concurrency tests fail on counters off by one or two (I'm on 4 virtual core laptop). They fail more on py3, probably because of new GIL.  
  3. I can't seem to figure out when ``test_http.test_http_over_https`` should run. It's always skipped.

Probbaly a case of dependency issue, whether or not you have pyopenssl installed perhaps?
 
  1. I've strong intention to turning it all back to stdlib unittest. We only need unittest2 for py26. I've issues with py3 like this, where nose runner raises strings which is not fine on py3. Also the lesser dependency the better.

I was never a big fan of nose-only either. Even when I use py.test to execute my unit tests, I still write them using the good ol' unittest interface. Depends on how much work this would impose?
 
  1. PyPy suite doesn't even starts. It freezes on first test. I'm yet to look at the issue.

Not to much of an issue straight away. We do a lot of introspection and private module/class handling so I'm not surprised.


Thanks for the effort!
--

Sylvain Hellegouarch

unread,
Jan 30, 2015, 7:53:33 AM1/30/15
to cherryp...@googlegroups.com
I'd personally remove all database backend as I think it's entirely broken anyway.

saaj

unread,
Jan 30, 2015, 11:55:42 AM1/30/15
to cherryp...@googlegroups.com
I can't seem to figure out when ``test_http.test_http_over_https`` should run. It's always skipped.

Probbaly a case of dependency issue, whether or not you have pyopenssl installed perhaps?

Aha, another missing test dependency. Though, as I thought, installing pyopenssl doesn't change a thing. The test is still skipped. ``test.helper.get_tst_config`` has fixed http schema. What have I overlooked?

 
I've strong intention to turning it all back to stdlib unittest. We only need unittest2 for py26. I've issues with py3 like this, where nose runner raises strings which is not fine on py3. Also the lesser dependency the better.

I was never a big fan of nose-only either. Even when I use py.test to execute my unit tests, I still write them using the good ol' unittest interface. Depends on how much work this would impose?

I guess it made sense for older Python version. ``unittest`` from Python 2.7+ is good. I don't know the amount of work that is required, but as it's possible I'll look at it. Another point here is backward compatibility for user tests. I mean these ``setup_class`` and ``teardown_class``. We can do something like this, i.e. calling them from normal ``setUp`` and ``tearDown`` if they exist. Also it'll be a good idea to have end-user base class for web tests that doesn't get auto-enhancements like ``test_gc`` in ``CPWebCase``, or at least to have a switch for it.

 

saaj

unread,
Feb 24, 2015, 1:56:11 PM2/24/15
to cherryp...@googlegroups.com, s...@defuze.org, jar...@jaraco.com
Hi Sylvain, Jason, folks,


Today I had time to work on the subject again. Here is a status update, questions, suggestions and plans. The following changes are in my fork.

  1. As there was no one who could explain the nature of the race condition with removing lock files in the sibling topic, I assume the underlying library is deliberately avoids removal API, which is deemed unsafe. I changed sessions.FileSession behaviour is it only removes expired or obsolete locks in clean up monitor thread. I also added the test case from the topic in test.test_lockfile. Now session concurrency, at least in the confines of implemented tests, seems okay.
  2. As there also was no insight on HTTPS test runs and test.helper.get_tst_config I assume it was intended for manual change. So some machinery is required to run it twice, for HTTP and second time for HTTPS, automatically. But before, I hope to do something with nosetests, see #4.
  3. Latest stable PyPy (2.4.0) doesn't even start with the simplest CherryPy demo (some CPython socket non-spec internals which CherryPy relies upon) and as I already commented, the test suite freezes on start. So I removed it for now from setup.py for now. I think such unfounded support claims are dangerous, and once there's someone who want tackle PyPy support I hope he can benefit from established QA infrastructure which is a priority now.
  4. nosetests made some more issues, with multiprocessing, so I'm strongly looking forward replacing it with standard unittest.
So #2 and #4 are in my plans. Here's CI build #12 and some observations.
  1. There are sporadic test_gc fails.
  2. Python 3 test.test_conn.PipelineTests.test_HTTP11_pipelining sporadic fails.
  3. test.test_states.SignalHandlingTests.test_SIGHUP_tty and test.test_states.ServerStateTests.test_2_KeyboardInterrupt froze the build. Both are currently commented. I guess it may have something to do with non-interactive mode or Docker which is used by drone.ioSylvain, I saw you recently wrote about Docker and CherryPy. Can you verify you can run the test suite inside a Docker container. Also it may be something related to nose output buffering. Has someone an idea why they can block?
  4. There's is a lot of Sphinx warnings and errors. As it is relatively straightforward, so is there someone in CherryPy community willing to take care of them?
  5. Test code coverage isn't fantastic, 1/3 isn't covered.  wsgiserver package may need some refactoring (now sure how meaningful are these maintenance metrics).
This is it at the moment, and I'm looking forward to your replies.


Regards,
saaj.

Joseph S. Tate

unread,
Feb 24, 2015, 2:18:31 PM2/24/15
to cherryp...@googlegroups.com, s...@defuze.org, jar...@jaraco.com
The sporadic test_gc fails are well known. Don't know about the rest.
> --
> You received this message because you are subscribed to the Google Groups
> "cherrypy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cherrypy-user...@googlegroups.com.
> To post to this group, send email to cherryp...@googlegroups.com.
> Visit this group at http://groups.google.com/group/cherrypy-users.
> For more options, visit https://groups.google.com/d/optout.



--
Joseph Tate

saaj

unread,
Feb 25, 2015, 4:39:50 AM2/25/15
to cherryp...@googlegroups.com
Joseph, then could you please expand it at a little if it's something well-know? Is it explained or discussed somewhere?

Google only points to #1144 which I wrote about and agree with.

Joseph S. Tate

unread,
Feb 28, 2015, 12:12:07 AM2/28/15
to cherryp...@googlegroups.com
Eons ago, when I ported the test suite from a home-built test runner
to nose, I got test_gc failures periodically; it's a non-deterministic
test, or at least the determinism is sufficiently complicated that no
matter how many times we call gc.collect(), there's often something
that isn't collected. Improvements would be welcome, but we're
sufficiently used to the flappiness that we haven't wanted to spend
any more time or energy on it.

If you get a failure, run it again. If it passes, you're good. If you
still have failures, you're leaking objects, so fix your code.

saaj

unread,
Mar 5, 2015, 8:08:02 AM3/5/15
to cherryp...@googlegroups.com
That was a methodological mistake. Regression tests exist to, not surprisingly, detect regressions. 
Before a change is incorporated in a codebase, tests should pass. If they fail after it, then there's a 
regression which should be fixed. If they still pass, then it's all right. If regression tests don't 
deterministically detect a regression, require additional investigation to decide whether it really was a 
pass or fail, then they don't really serve its purpose. It's not even a regression testing then but some 
memory leak sampling aid. This is even more aggravated in case of many contributors and many 
supported Python versions and implementations.

Also now I know how to blame for the lost flag to run the test suite on SSL ;-) However, in retrospective 
in line with unreliable test suite and lost automated SSL testing we have #1263, #1298 (and
probably some others) and, as the likely outcome, it becomes a common knowledge that CherryPy SSL is broken
and user should tradeoff old and new bugs by choosing versions. This is deadly to the framework, as such 
knowledge leads to the only answer for a CherryPy newcomer on adoption question. So I do hope that we,
as a community, will take the lesson learned. Maybe eons ago it wasn't that important, but now it's crucial
to have reliable, well-covering and automated tests.

A note about a consequent calling gc.collect. Reference counting GC (CPython), as far as I know, is pretty 
deterministic. It doesn't make sense calling gc.collect several times in row. I think, the source of 
non-determinism is threaded execution. I'll take a look at gctools to see if I can do something with it. 
Anyhow it should be fixed in one way or another, or be replaced by separate deterministic memory leak 
test case.

saaj

unread,
Mar 5, 2015, 9:12:38 AM3/5/15
to cherryp...@googlegroups.com
Hi Sylvain, folks,


Another status update
Here's codecov test coverage report. The client library maintainer was quite responsive and fixed running on drone.io and mercurial
quickly. It's a good place for a contributor to start, in the way of see a red line -- make it green.

Here's build #43. Major changes so far:
  • Migrated to unittest. unitest2 fallback for py26.
  • Now we see a whole lot of ResourceWarnings on py3 which were filtered out by careful nosetests. 
    It is another place for a contributor to start.
  • Added configuration override for a CPWebCase
  • Removed global and persistent configuration then prevented mix of HTTP and HTTPS cases in single run.
  • Added HTTPS case as a subclass for every subclass of CPWebCase. This made a single run to last over
    200 seconds and overall tox run over 20 minutes, whereas drone.io's free tier is limited to 15 minutes. 
  • Because every supported environment has ssl module built in, builtin adapter is now default for py2 as well.
  • Removed all relying on os.getcwd in tests.
  • Tests are run in install directory, not checkout directory (see tox.ini).
  • Tests are run on a free port provided by OS (one test case already had find_free_port).
  • Memcached cases are locked with an atomic operation.
  • Tox is replaced with detox on drone.io. It now runs for 4-5 minutes.
  • It is now possible to work on the suite in PyDev. Its runner use leads to another non-daemonic thread CherryPy
    doesn't expect. It leads to deadlocks or fails for 3 or 4 cases, which are now just skipped under PyDev runner.
Sylvain, about those two tests that froze the runner. test_SIGHUP_tty had very funny comment by Robert, 
This might hang if things aren't working right, but meh. Yes, what can I say, programmer do make assumptions ;-) In fact,
the name is descriptive. On receiving SIGHUP signal CherryPy decides to stop or reload by examining whether it has been
daemonized. It's examined with os.isatty(sys.stdin.fileno()) which tests whether stdin is a TTY-like device.
On drone.io (in Docker?) and in PyDev runner stdin is indeed a non-TTY device so it's the only thing to do with it is to skip
the test under the same condition. test_2_KeyboardInterrupt doesn't block on unittest. Though it still may block,
I guess, as a side-effect of some other failed tests. It happened several times. If it continues I'll run it in a thread. There's
the comment that says it expects the main thread, but I guess it is said in contrast to worker threads.

There're also several tests that seem to deterministically fail on HTTPS run. Some that fail non-deterministically besides 
test_gc. And some probably because of parallel run. It's up to 10 failures at total ~ 490 tests. I'm yet to analyse them
and will write about them next time.

In general, though, I think that the biggest part of my endeavour, providing QA infrastructure, is completed. The rest is mostly
debugging and normal maintenance.


Regards,
saaj.

Sylvain Hellegouarch

unread,
Mar 6, 2015, 10:01:02 AM3/6/15
to cherryp...@googlegroups.com
Hi saaj,

Thank you so much for all your hard work. I will have to spend more time with everything you did over the weekend. But this looks great! 

Do we have a complete list of the skipped/broken test somewhere?

Regarding drone.io, would you mind if I create a "cherrypy" account so that we can use it instead from now on? I will obviously share the credentials with those interested.

saaj

unread,
Mar 7, 2015, 2:24:34 PM3/7/15
to cherryp...@googlegroups.com
Thank you so much for all your hard work. I will have to spend more time with everything you did over the weekend. But this looks great! 
Yes, it gets closer to become something valuable :-)

I we done another changes:
  • Finally run blocking code in a thread for keyboard interrupt tests. The suite is less likely to freeze now on drone.io.
  • Commented out test_gc until all other tests are fixed.
  • Added drone.io build artefacts for qa environment. They will appear on successful build.

Do we have a complete list of the skipped/broken test somewhere?
As I wrote I've only planned to look closer on failed tests and sporadic behaviour to provide details. For now I can say about three fails on HTTPS:
  • cherrypy.test.test_static.TestStaticHttps.test_file_stream always fails locally in every env, however it passed on drone.io last time. I have something like:

    Traceback (most recent call last):
      File ".tox/py27/local/lib/python2.7/site-packages/cherrypy/test/test_static.py", line 294, in test_file_stream
        "chunk size (64k)" % read_so_far)
    AssertionError: The file should have advanced to position 720896, but has already advanced to the end of the file. It may not be streamed as intended, or at the wrong chunk size (64k)

  • cherrypy.test.test_conn.TestLimitedRequestQueueHttps.test_queue_full locally py2 always fails, py3 is always fine. drone.io is close, though py33 passed last time.

    Traceback (most recent call last):
      File "/home/saaj/Projects/cherrypy/build/.tox/py27/local/lib/python2.7/site-packages/cherrypy/test/test_conn.py", line 831, in test_queue_full
        raise AssertionError("Overflow conn did not get RST ")
    AssertionError: Overflow conn did not get RST

  • cherrypy.test.test_caching.TestCacheHttps.test_antistampede locally it either fails to keep within 6 seconds, or just not all threads join in 16 with a lot of threads having connection refused error. The latter has the follow On drone.io they passed.
You can run them all or individually like (or detox instead):
tox -e py26,py27,py33,py34 -- -v cherrypy.test.test_static.TestStaticHttps.test_file_stream cherrypy.test.test_conn.TestLimitedRequestQueueHttps.test_queue_full cherrypy.test.test_caching.TestCacheHttps.test_antistampede

Nothing is unconditionally skipped (except test_gc which is commented). Here're conditional skips I've added for PyDev to:
  • cherrypy.test_states.TestServerState.testNormalStateFlow
  • cherrypy.test_states.TestServerState.testKeyboardInterruptMainThread
  • cherrypy.test_states.TestServerState.testKeyboardInterruptWorkerThread
  • cherrypy.test_bus.BusMethodTests.test_block
And one when stdin is non TTY-like device for cherrypy.test_states.TestSignalHandling.test_SIGHUP_tty.

What complicates test failure analysis is another methodological issue. We have server setup code in setUpClass but rather than in setUp. So when test interferes with server state, threads and so, and when it fails it makes all remaining test in the case to fail as well. Generally, tests (case's methods) should be independent and be able to run independently of side-effects of other tests. But it is obviously a performance trade-off and when things are normal and we have rare fails it may be considered a good trade-off. So it leads me thinking about several modes, or base classes for different setup.

Also I would like to know if you have any ideas about test_gc.

 
Regarding drone.io, would you mind if I create a "cherrypy" account so that we can use it instead from now on? I will obviously share the credentials with those interested.

Of course I don't mind. I'm more than propose it in fact. Likewise for codecov.io

For drone.io we need to decide on build fail notification rule (will be actual after we make tests pass). On drone.io free tier I guess we can have at most 5 emails, which may be not enough and we may need to think of statues badges. Btw, Sylvain, we also shouldn't forget assigning a notification email for ReadTheDocs as well.

Here's latest drone.io build command:

echo 'debconf debconf/frontend select noninteractive' | sudo debconf-set-selections
sudo add-apt-repository ppa:fkrull/deadsnakes &> /dev/null
sudo apt-get update &> /dev/null
sudo apt-get -y install python2.6 python3.4 &> /dev/null

sudo pip install --quiet detox
detox

Put codecov token in environment variable named CODECOV_TOKEN.

Here's artefact list:

coverage_report.txt
coverage_report.html.tgz
maintenance_index.txt
code_complexity.txt

Codecov doesn't need setup as far as I remember.
 

Joseph S. Tate

unread,
Mar 9, 2015, 11:19:28 AM3/9/15
to cherryp...@googlegroups.com
On Thu, Mar 5, 2015 at 8:08 AM, saaj <i...@saaj.me> wrote:
> Also now I know how to blame for the lost flag to run the test suite on SSL
> ;-) However, in retrospective
> in line with unreliable test suite and lost automated SSL testing we have
> #1263, #1298 (and
> probably some others) and, as the likely outcome, it becomes a common
> knowledge that CherryPy SSL is broken
> and user should tradeoff old and new bugs by choosing versions. This is
> deadly to the framework, as such
> knowledge leads to the only answer for a CherryPy newcomer on adoption
> question. So I do hope that we,
> as a community, will take the lesson learned. Maybe eons ago it wasn't that
> important, but now it's crucial
> to have reliable, well-covering and automated tests.

Yes, I ripped out the flag to run the test suite on ssl. If we want
SSL tests, we should have a small set of tests that exercise JUST the
SSL portions. These are supposed to be unit tests after all.

I think we should deprecate SSL support. I liked the idea when it was
first done, but:

1. Can we guarantee that the SSL implementation is secure?
2. That it will be patched quickly if it is determined to be insecure?
3. That all ciphers and hash systems are up to date and can be
added/removed by the operator via config? If this is possible it's not
documented.

It's one thing to use SSL as a client, but to be a server requires a
lot more dedication than we are presently giving it. Unless someone,
with the proper security background, steps forward to do what is
necessary, I don't see it getting better.

saaj

unread,
Mar 9, 2015, 12:35:38 PM3/9/15
to cherryp...@googlegroups.com

Yes, I ripped out the flag to run the test suite on ssl. If we want
SSL tests, we should have a small set of tests that exercise JUST the
SSL portions.

Working on re-enabling SSL in the test suite proves that this isn't a good idea, because under SSL some, 
presumably non-SSL related tests, fail. It's the same choice as for test_gc. More checks make the test
suite more secure which is paid with a little longer runs. But unlike test_gc SSL runs are deterministic and
I see it mostly advantageous.

 
These are supposed to be unit tests after all.

There's almost no unit tests in the suite. Subclasses of CPWebCase start full server with threads and all 
dependencies. This is called integration/regression testing because it involves many integrated modules 
(units) and end-user assertions.

 
I think we should deprecate SSL support.

And this is indeed a serious question. Aside from compatibility and security concerns it also 
affects CherryPy self-sufficiency, as without SSL support it may no longer be considered a
web-server, but rather than a backend in a composite setup.

 
I liked the idea when it was
first done, but:

1. Can we guarantee that the SSL implementation is secure?

The adapter relies on ssl module, which in turn relies on OpenSSL. Naive assumption is: 
the adapter is as secure is secure are ssl and OpenSSL. Of course, unless the adapter
itself is bug-free. Functional doesn't necessarily mean secure. 

What are general means employed for ensuring SSL implementation (high-level one, relying
on OpenSSL) is secure except manual audit? How do Nginx and Apache ensure they are
secure?
 
2. That it will be patched quickly if it is determined to be insecure?

First, as long as the adapter is a wrapper on wrapper, and it doesn't itself implement crypto
algorithms, the patching shouldn't need a cryptographer involved. Second, timeliness was
discussed in the Future of Cherrypy as the need to put the bugtracker in order. To prioritize 
bugs and to provide guidance. It isn't SSL-specific. I guess Sylvain was planning the effort.

 
3. That all ciphers and hash systems are up to date and can be
added/removed by the operator via config? If this is possible it's not
documented.

I guess it's about FREAK cipher downgrade attack (CVE-2015-0204). As far as I see 
BuiltinSSLAdapter doesn't enforce ciphers. It uses ssl.PROTOCOL_SSLv23 which 
is described in Python docs as:
 
Selects the highest protocol version that both the client and server support. 
Despite the name, this option can select “TLS” protocols as well as “SSL”.

Compatibility table says it supports: SSLv3, SSLv23, TLSv1, TLSv1.1, TLSv1.2. 

Is someone running public CherryPy with SSL turned on, so we can check it with FREAK test tool?
 
It's one thing to use SSL as a client, but to be a server requires a
lot more dedication than we are presently giving it. Unless someone,
with the proper security background, steps forward to do what is
necessary, I don't see it getting better.

From one side, as I said above the SSL adapter is very high-level wrapper and we don't need constant
cryptographer's involvement. From the other, I absolutely agree that giving a false sense security is a 
very bad thing.

 

Michiel Overtoom

unread,
Mar 9, 2015, 3:06:00 PM3/9/15
to cherryp...@googlegroups.com

Hi Joseph,

> I think we should deprecate SSL support. [...]


What do you have in mind to replace it with? Putting secure sites behind third party secure proxies like Nginx?

--
"You can't actually make computers run faster, you can only make them do less." - RiderOfGiraffes

Joseph S. Tate

unread,
Mar 9, 2015, 3:07:33 PM3/9/15
to cherryp...@googlegroups.com
On Mon, Mar 9, 2015 at 12:35 PM, saaj <i...@saaj.me> wrote:
>>
>> Yes, I ripped out the flag to run the test suite on ssl. If we want
>> SSL tests, we should have a small set of tests that exercise JUST the
>> SSL portions.
>
>
> Working on re-enabling SSL in the test suite proves that this isn't a good
> idea, because under SSL some,
> presumably non-SSL related tests, fail. It's the same choice as for test_gc.
> More checks make the test
> suite more secure which is paid with a little longer runs. But unlike
> test_gc SSL runs are deterministic and
> I see it mostly advantageous.
>
>
>>
>> These are supposed to be unit tests after all.
>
>
> There's almost no unit tests in the suite. Subclasses of CPWebCase start
> full server with threads and all
> dependencies. This is called integration/regression testing because it
> involves many integrated modules
> (units) and end-user assertions.
>
>

This is a different problem.

>>
>> I think we should deprecate SSL support.
>
>
> And this is indeed a serious question. Aside from compatibility and security
> concerns it also
> affects CherryPy self-sufficiency, as without SSL support it may no longer
> be considered a
> web-server, but rather than a backend in a composite setup.
>
>
>>
>> I liked the idea when it was
>> first done, but:
>>
>> 1. Can we guarantee that the SSL implementation is secure?
>
>
> The adapter relies on ssl module, which in turn relies on OpenSSL. Naive
> assumption is:
> the adapter is as secure is secure are ssl and OpenSSL. Of course, unless
> the adapter
> itself is bug-free. Functional doesn't necessarily mean secure.
>
> What are general means employed for ensuring SSL implementation (high-level
> one, relying
> on OpenSSL) is secure except manual audit? How do Nginx and Apache ensure
> they are
> secure?

Almost everyone relies on openssl, apache and nginx are no exception.
However, openssl is often patched by the distro vendors and affected
consumers are rebuilt. Lots of people run penetration testing against
these browsers and bugs are discovered quickly. Using an external
library for SSL is the right thing to do.

>
>>
>> 2. That it will be patched quickly if it is determined to be insecure?
>
>
> First, as long as the adapter is a wrapper on wrapper, and it doesn't itself
> implement crypto
> algorithms, the patching shouldn't need a cryptographer involved. Second,
> timeliness was
> discussed in the Future of Cherrypy as the need to put the bugtracker in
> order. To prioritize
> bugs and to provide guidance. It isn't SSL-specific. I guess Sylvain was
> planning the effort.
>
>
>>
>> 3. That all ciphers and hash systems are up to date and can be
>> added/removed by the operator via config? If this is possible it's not
>> documented.
>
>
> I guess it's about FREAK cipher downgrade attack (CVE-2015-0204). As far as
> I see
> BuiltinSSLAdapter doesn't enforce ciphers. It uses ssl.PROTOCOL_SSLv23 which
> is described in Python docs as:
>

FREAK and HEARTBLEED. Both are fixed with a policy change deactivating
insecure protocols and ciphers. I don't run CherryPy SSL in any way,
so I haven't had to look into whether these can be configured like
they can for NGINX and Apache. Distros have patched openssl to remove
insecure stuff, so maybe this is not as big a concern as I'm making it
out to be, but the lack of an expert on this code is hindering its
viability.

> Selects the highest protocol version that both the client and server
> support.
>
> Despite the name, this option can select “TLS” protocols as well as “SSL”.
>
>
> Compatibility table says it supports: SSLv3, SSLv23, TLSv1, TLSv1.1,
> TLSv1.2.

SSLv3, and SSLv23 are both vulnerable to downgrade attacks
(heartbleed) and have been removed from the recommended list of
protocols. TLSv1 is marginally safe anymore and isn't recommended
unless you have to support old clients. That leaves TLS 1.1 and 1.2

>
> Is someone running public CherryPy with SSL turned on, so we can check it
> with FREAK test tool?
>
>>
>> It's one thing to use SSL as a client, but to be a server requires a
>> lot more dedication than we are presently giving it. Unless someone,
>> with the proper security background, steps forward to do what is
>> necessary, I don't see it getting better.
>
>
> From one side, as I said above the SSL adapter is very high-level wrapper
> and we don't need constant
> cryptographer's involvement. From the other, I absolutely agree that giving
> a false sense security is a
> very bad thing.

We are making a bunch of assumptions in our use of openssl, especially
regarding protocols and ciphers, with no warnings to the (lack of)
fitness to those assumptions. Intermediate certificate support is
either not available, needs a 3rd party patch, or isn't documented.

Joseph S. Tate

unread,
Mar 9, 2015, 3:10:18 PM3/9/15
to cherryp...@googlegroups.com
On Mon, Mar 9, 2015 at 3:05 PM, Michiel Overtoom <mot...@xs4all.nl> wrote:
>
> Hi Joseph,
>
>> I think we should deprecate SSL support. [...]
>
>
> What do you have in mind to replace it with? Putting secure sites behind third party secure proxies like Nginx?
>

Well, an SSL terminator, Load Balancer with SSL termination,
nginx/apache reverse proxy, or even a caching proxy with ssl
termination.

I'm almost always behind nginx anyway, because as fast as CP is at
serving static files, nginx is faster. Configuring NGINX to handle the
SSL is easy compared to getting it to serve the static files with CP
handling the dynamic stuff.

saaj

unread,
Mar 10, 2015, 5:42:37 AM3/10/15
to cherryp...@googlegroups.com
Joseph, thank you for rising important questions and providing details. I will summarise SSL-related points
said here and create a separate thread for discussing CherryPy SSL issues as it's off-topic here. I will CC 
you, and hope you can keep on participating in the discussion. 

> There's almost no unit tests in the suite. Subclasses of CPWebCase start
> full server with threads and all
> dependencies. This is called integration/regression testing because it
> involves many integrated modules
> (units) and end-user assertions.
>
>

This is a different problem.

Likely it is. But as long as we can increase test coverage from current 66% to something over 90%
we can mostly neglect it. The test goal is to simulate every state and workflow. If it's achieved it isn't very
important whether it was done directly (unit testing) or not (intergeneration testing). codecov.io will help with
targeting the blind spots either way.

saaj

unread,
Apr 23, 2015, 9:05:55 AM4/23/15
to cherryp...@googlegroups.com
Hi Sylvain, folks,

Rendered version is published here.

Testing
=======
quality assurance was a glaring problem. After I volunteered and started working on putting
tests back in the right way, I've discovered several testing methodological errors that were
made along in years of CherryPy existence. Also, back in the day CherryPy had its own test
runner and later survived migration to Nose, but lost SSL tests. Not that the tests were purged
away from the codebase, but werern't possible to be run automatically and required a tester
to change code to run tests with SSL. Obviously no one did it, and it led to several issues
that rendered SSL functionality broken. Actually it still is, as at the moment of writing
in latest CherryPy release, 3.6, SSL doesn't work.

Besides testing issues, there were issues with CherryPy functionality as such. For instance,
session file locking didn't seem to be tested on machines with more than one execution unit. 
This way session locks didn't work all the time on machines with a multi-core CPU.

  You may be confused to think that as long as there is GIL [1]_, you shouldn't care about number
  of CPUs. Unfortunately, serial execution doesn't mean execution in one certain order. Thus on
  different number of execution units and on different GIL implementations (Python 3.2 has new
  GIL [2]_) a threaded Python program will likely behave differently.

I've fixed file locking by postponing lock file removal to a cleanup process. Briefly covering
issues with other session backends is to say Postgres session backend was decided to be removed 
because it was broken, untested and mostly undocumented. Memcached session backend was mostly 
fine, except some minor ``str``/``bytes`` issue in *py3* and the fact it isn't very useful in 
its current condition. The problem is that it doesn't implement distributed locking, so once you 
have two or more CherryPy instances that use a Memcached server as a session backend and your 
users are not distributed consistently across the instances, all sorts of concurrent update 
issues may raise.

Substantial methodological errors in the test suite were:

* test determinism
* test dependency
* test isolation

Test determinism
----------------
There were garbage collection tests that were automatically added to every subclass of
``test.helper.CPWebCase``. They are non-deterministic and fail sporadically (there are
other normal tests that have non-deterministic behaviour, see below). Because CPython's reference
counting garbage collector is deterministic, it's a concurrency issue. What was really surprising
for me hear was explanation like, okay, if it has failed run it once again until you're sure that
the failure persists. This ``test_gc`` supplementing is disabled now until everything else is 
resolved, and it is clear who to handle it properly. 

Another thing that prevented certain classification to *pass* and *fail* and led to frequent 
freezes were request retries and absence of HTTP client's socket timeout. I can only guess about
the original reasons of such "carefulness" and not letting a test just fail. For instance, you can
take some of concurrency tests that throw out up to a hundred client threads and imagine stability 
and predictability of the undertaking with 10 retires and no client timeout. These two were
also fixed -- ``test.webtest.openURL`` doesn't retry on socket errors and sets 10-second timeout
to its HTTP connections by default.

  ``test.helper.CPWebCase`` starts full in-process CherryPy server and then uses real 
  ``httplib.HTTPConnection`` to connect to it. Most of CherryPy tests are in fact either 
  integration [3]_ or system tests [4]_.

Test dependency
---------------
How do you like tests named like ``test_0_check_something``, ``test_1_check_something_else``, and
so on? And what if they were named this way intentionally, so they run in order, and fail 
otherwise because latter depend on former. Specifically, you can not run test individually. All 
occurrence I've found were renamed and freed from sibling shackles.

Test isolation
--------------
``test.helper.CPWebCase``'s server startup routine was made in equivalent of ``unittest``'s 
``setUpClass``, so tests in a test case were not isolated from each other, which made it 
problematic to deal with the case of tests that interfere with internals as it was really
hard to say whether such test has failed on its own or as a consequence. I've added
``per_test_setup`` attribute to ``CPWebCase`` to conditionally do the routine in ``setUp``
and accommodated tests' code. I apprehended significant performance degradation, but it has just 
become around 20% slower, which was the reason for me to make it true by default. Not sure, 
maybe it should't exist at all now.

Unfortunately, this doesn't resolve all isolation issues because having to do a complete reset
of in-process threaded server is a juggling whatsoever. When a tricky test fails badly, it still
has a side-effect (see below), like this::

    ./cherrypy/process/wspbus.py:233: RuntimeWarning: The main thread is exiting, but the Bus is
    in the states.STARTING state; shutting it down automatically now. You must either call 
    bus.block() after start(), or call bus.exit() before the main thread exits.

Testing progress
----------------
What has just been said about fixed and improved stuff is happening in my fork [5]_, it is not yet 
pushed upstream. More detailed description and discussion about current and above sections is in 
the thread in CherryPy user group [6]_. Because it's there until the moment Google decides
to shutdown Google Groups because of low traffic, abuse or any other reason they use for such 
an announcement, I will summarise the achievements in improving CherryPy testing infrastructure. 

#. Migrated tests to stdlib ``unittest``, ``unitest2`` as fallback for *py26*. Nose is unfriendly
   to *py3* and probably was related to some locking issues.
#. Overall improvement in avoiding freezes in various test cases and testing environments.
   TTY in Docker, disabled interactive mode, various internals' tests wait in a thread, et cetera.
#. Implemented "tagged" Tox 1.8+ configuration for matrix of various environments (see below). 
#. Integrated with Drone.io [7]_ CI service and Codecov.io [8]_ code coverage tracking 
   service.  
#. Made various changes to allow parallel environment run with Detox to fit in 15 minutes of 
   Drone.io free tier. It includes running tests in install directory [9]_, starting server each
   time on free port provided by OS, locking Memcached cases with an atomic operation and other. 
#. Removed global and persistent configuration that prevented mixing HTTP and HTTPS cases.
#. Made it possible to work on the test suite in PyDev. When running tests in PyDev test runner, 
   it adds another non-daemonic thread that tests don't expect which leads to deadlocks or fails
   for 3 tests. They are now just skipped under PyDev runner. 

Here's Drone.io commands required to run the test suite. It uses Deadsnakes [10]_ to install old
or not yet stable version of Python. Development versions are needed to build pyOpenSSL. The rest
comes with Drone.io container out of the box.

.. sourcecode:: bash

    echo 'debconf debconf/frontend select noninteractive' | sudo debconf-set-selections
    sudo add-apt-repository ppa:fkrull/deadsnakes &> /dev/null
    sudo apt-get update &> /dev/null
    sudo apt-get -y install python2.6 python3.4 &> /dev/null
    sudo apt-get -y install python2.6-dev python3.4-dev &> /dev/null
    
    sudo pip install --quiet detox
    detox
    tox -e post

As you can see ``post`` environment closes the build. Unless the build was successful it won't run
and no coverage will be submitted or build artifacts become available. The build artifacts, what 
also needs to be listed in Drone.io, are several quality assurance reports that can be helpful for 
further improving the codebase::  

    coverage_report.txt
    coverage_report.html.tgz
    maintenance_index.txt
    code_complexity.txt
    
For Codecov.io integration to work, environment variable ``CODECOV_TOKEN`` should be assigned.

Tagged Tox configuration
------------------------
Final Tox configuration looks like the following. It emerged through series of rewrites that were
led by tradeoffs between test duration, combined coverage report and need to test dependencies.
I came to the last design after I realised importance of support of pyOpenSSL and thus the need to 
test it (see about SSL issues below). Then test sampling facility was easy to integrate to it. 

.. sourcecode:: ini

    [tox]
    minversion = 1.8
    envlist    = pre,docs,py{26-co,27-qa,33-nt,34-qa}{,-ssl,-ossl}
    
    # run tests from install dir, not checkout dir
    [testenv]
    changedir = {envsitepackagesdir}/cherrypy
    setenv    = 
        qa:   COVERAGE_FILE = {toxinidir}/.coverage.{envname}
        ssl:  CHERRYPY_TEST_SSL_MODULE = builtin
        ossl: CHERRYPY_TEST_SSL_MODULE = pyopenssl
        sa:   CHERRYPY_TEST_XML_REPORT_DIR = {toxinidir}/test-xml-data
    deps = 
        routes
        py26:      unittest2
        py{26,27}: python-memcached
        py{33,34}: python3-memcached
        qa:        coverage
        ossl:      pyopenssl
        sa:        unittest-xml-reporting
    commands =
        python --version
        nt: python -m unittest {posargs: discover -v cherrypy.test}
        co: unit2 {posargs:discover -v cherrypy.test} 
        qa: coverage run --branch --source="." --omit="t*/*" \
        qa: --module unittest {posargs: discover -v cherrypy.test}
        sa: python test/xmltestreport.py
          
    [testenv:docs]
    basepython = python2.7 
    changedir  = docs
    commands   = sphinx-build -q -E -n -b html . build 
    deps       =
        sphinx < 1.3
        sphinx_rtd_theme
    
    [testenv:pre]
    changedir = {toxinidir}
    deps      = coverage
    commands  = coverage erase
    
    # must be run separately after main envlist, because of detox  
    [testenv:post]
    changedir = {toxinidir}
    deps      = 
        coverage
        codecov
        radon
    commands  =
        bash -c 'echo -e "[paths]\nsource = \n  cherrypy" > .coveragerc'
        bash -c 'echo "  .tox/*/lib/*/site-packages/cherrypy" >> .coveragerc'
        coverage combine
        bash -c 'coverage report > coverage_report.txt'
        coverage html
        tar -czf coverage_report.html.tgz htmlcov
        - codecov
        bash -c 'radon mi -s -e "cherrypy/t*/*" cherrypy > maintenance_index.txt'
        bash -c 'radon cc -s -e "cherrypy/t*/*" cherrypy > code_complexity.txt'
    whitelist_externals = 
        /bin/tar
        /bin/bash
    
    # parse and collect XML files of XML test runner    
    [testenv:report]
    deps      = 
    changedir = {toxinidir}
    commands  = python -m cherrypy.test.xmltestreport --parse

This design heavily relies on great multi-dimensional configuration that appeared in Tox 1.8 [11]_.
Given an environment, say ``py27-nt-ossl``, Tox treats it like a set of tags, 
``{'py27', 'nt', 'ossl'}``, where each entry is matched individually. This way any environment, 
that makes sense though, with desired qualities can be constructed even if it doesn't listed in 
``envlist``. Here's a table that explains the purpose of the tags.

====  ================================================
tag   meaning 
====  ================================================
nt    normal test run via ``unittest``
co    compatibility run via ``unittest2``
qa    quality assurance run via ``coverage``
sa    test sampling run via ``xmlrunner``
ssl   run using Python bulitin ``ssl`` module
ossl  run using PyOpenSSL
====  ================================================ 

Note ``{posargs: discover -v cherrypy.test}`` part. It makes possible to run a subset of the
test suite in desired environments, e.g.:

.. sourcecode:: bash

    tox -e "py{26-co-ssl,27-nt,33-qa,34-nt-ssl}" -- -v cherrypy.test.test_http \
      cherrypy.test.test_conn.TestLimitedRequestQueue \
      cherrypy.test.test_caching.TestCache.test_antistampede

Tackling entropy of threading and in-process server juggling
------------------------------------------------------------
Even though there was a progress on making test suite more predictable, and actually runnable, 
it is still far from desired, and I was scratching my head trying to figure out how to gather 
test stats from all the environments. Solution was in the lighted place, right where I was 
looking for it. There's a package *unittest-xml-reporting*, whose name is pretty descriptive. 
It's a ``unittest`` runner that writes XML files likes 
``TEST-test.test_auth_basic.TestBasicAuth-20150410005927.xml`` with easy-to-guess content.
Then it was just a need to write some glue code for overnight runs and implement parsing and
aggregation. 

I've implemented and incorporated module named ``xmltestreport`` [12]_ into the ``test`` 
package. ``sa`` Tox tag corresponds to it. Running the following in terminal at evening, 
at morning can give you the following table (except N/A evironments, see below).  

.. sourcecode:: bash

    for i in {1..64}; do tox -e "py{26,27,33,34}-sa{,-ssl,-ossl}"; done
    tox -e report

+-------------------------------------------------------------+----+----+----+
| **py26-sa**                                                 | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| OK                                                                         |
+-------------------------------------------------------------+----+----+----+
| **py26-sa-ossl**                                            | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| test_conn.TestLimitedRequestQueue.test_queue_full           | 27 | 0  | 0  |
+-------------------------------------------------------------+----+----+----+
| test_request_obj.TestRequestObject.testParamErrors          | 1  | 0  | 26 |
+-------------------------------------------------------------+----+----+----+
| test_static.TestStatic.test_file_stream                     | 27 | 0  | 0  |
+-------------------------------------------------------------+----+----+----+
| **py26-sa-ssl**                                             | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| test_caching.TestCache.test_antistampede                    | 0  | 33 | 0  |
+-------------------------------------------------------------+----+----+----+
| test_config_server.TestServerConfig.testMaxRequestSize      | 0  | 1  | 32 |
+-------------------------------------------------------------+----+----+----+
| test_conn.TestLimitedRequestQueue.test_queue_full           | 33 | 0  | 0  |
+-------------------------------------------------------------+----+----+----+
| test_static.TestStatic.test_file_stream                     | 33 | 0  | 0  |
+-------------------------------------------------------------+----+----+----+
| **py27-sa**                                                 | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| OK                                                                         |
+-------------------------------------------------------------+----+----+----+
| **py27-sa-ossl**                                            | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| test_conn.TestLimitedRequestQueue.test_queue_full           | 33 | 0  | 0  |
+-------------------------------------------------------------+----+----+----+
| test_session.TestSession.testFileConcurrency                | 1  | 0  | 32 |
+-------------------------------------------------------------+----+----+----+
| test_states.TestWait.test_wait_for_occupied_port_INADDR_ANY | 1  | 0  | 32 |
+-------------------------------------------------------------+----+----+----+
| test_static.TestStatic.test_file_stream                     | 33 | 0  | 0  |
+-------------------------------------------------------------+----+----+----+
| test_tools.TestTool.testCombinedTools                       | 2  | 0  | 31 |
+-------------------------------------------------------------+----+----+----+
| **py27-sa-ssl**                                             | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| test_caching.TestCache.test_antistampede                    | 0  | 33 | 0  |
+-------------------------------------------------------------+----+----+----+
| test_config_server.TestServerConfig.testMaxRequestSize      | 0  | 4  | 29 |
+-------------------------------------------------------------+----+----+----+
| test_conn.TestLimitedRequestQueue.test_queue_full           | 33 | 0  | 0  |
+-------------------------------------------------------------+----+----+----+
| test_static.TestStatic.test_file_stream                     | 33 | 0  | 0  |
+-------------------------------------------------------------+----+----+----+
| test_tools.TestTool.testCombinedTools                       | 2  | 0  | 31 |
+-------------------------------------------------------------+----+----+----+
| **py33-sa**                                                 | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| test_conn.TestPipeline.test_HTTP11_pipelining               | 0  | 16 | 17 |
+-------------------------------------------------------------+----+----+----+
| **py33-sa-ossl**                                            | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| N/A                                                                        |
+-------------------------------------------------------------+----+----+----+
| **py33-sa-ssl**                                             | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| test_caching.TestCache.test_antistampede                    | 0  | 33 | 0  |
+-------------------------------------------------------------+----+----+----+
| test_static.TestStatic.test_file_stream                     | 33 | 0  | 0  |
+-------------------------------------------------------------+----+----+----+
| test_tools.TestTool.testCombinedTools                       | 3  | 0  | 30 |
+-------------------------------------------------------------+----+----+----+
| **py34-sa**                                                 | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| test_conn.TestLimitedRequestQueue.test_queue_full           | 0  | 1  | 32 |
+-------------------------------------------------------------+----+----+----+
| test_conn.TestPipeline.test_HTTP11_pipelining               | 0  | 14 | 19 |
+-------------------------------------------------------------+----+----+----+
| **py34-sa-ossl**                                            | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| N/A                                                                        |
+-------------------------------------------------------------+----+----+----+
| **py34-sa-ssl**                                             | F  | E  | S  |
+-------------------------------------------------------------+----+----+----+
| test_caching.TestCache.test_antistampede                    | 0  | 32 | 0  |
+-------------------------------------------------------------+----+----+----+
| test_config_server.TestServerConfig.testMaxRequestSize      | 0  | 1  | 31 |
+-------------------------------------------------------------+----+----+----+
| test_static.TestStatic.test_file_stream                     | 32 | 0  | 0  |
+-------------------------------------------------------------+----+----+----+
| test_tools.TestTool.testCombinedTools                       | 3  | 0  | 29 |
+-------------------------------------------------------------+----+----+----+

Actually, as you can see, it only had time for half of the iterations, and at the morning I 
interrupted it. Sylvain, you see, I mentioned those three tests in the group correctly.
But there're other candidates and this new arisen fragility of SSL sockets, if I understand it
correctly. I mean all the tests with one fail or error. For example,  
``test_session.TestSession.testFileConcurrency`` which is fine lock-wise but because it stresses
networking layer with several dozen client threads, there're two socket errors::

    Exception in thread Thread-1098:
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner
        self.run()
      File "/usr/lib64/python2.7/threading.py", line 763, in run
        self.__target(*self.__args, **self.__kwargs)
      File "py27-sa-ossl/lib/python2.7/site-packages/cherrypy/test/test_session.py", line 282, in request
        c.endheaders()
      File "/usr/lib64/python2.7/httplib.py", line 969, in endheaders
        self._send_output(message_body)
      File "/usr/lib64/python2.7/httplib.py", line 829, in _send_output
        self.send(msg)
      File "/usr/lib64/python2.7/httplib.py", line 805, in send
        self.sock.sendall(data)
      File "/usr/lib64/python2.7/ssl.py", line 293, in sendall
        v = self.send(data[count:])
      File "/usr/lib64/python2.7/ssl.py", line 262, in send
        v = self._sslobj.write(data)
    error: [Errno 32] Broken pipe

    Exception in thread Thread-1099:
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner
        self.run()
      File "/usr/lib64/python2.7/threading.py", line 763, in run
        self.__target(*self.__args, **self.__kwargs)
      File "py27-sa-ossl/lib/python2.7/site-packages/cherrypy/test/test_session.py", line 282, in request
        c.endheaders()
      File "/usr/lib64/python2.7/httplib.py", line 969, in endheaders
        self._send_output(message_body)
      File "/usr/lib64/python2.7/httplib.py", line 829, in _send_output
        self.send(msg)
      File "/usr/lib64/python2.7/httplib.py", line 805, in send
        self.sock.sendall(data)
      File "/usr/lib64/python2.7/ssl.py", line 293, in sendall
        v = self.send(data[count:])
      File "/usr/lib64/python2.7/ssl.py", line 262, in send
        v = self._sslobj.write(data)
    error: [Errno 104] Connection reset by peer
    
Other error details are available in XML reports which retain such details. You may think that
I have found the reason for "carefulness" I questioned above, but I woulnd't agree for several
reasons. First, it looks like an SSL-only and also needs investigation -- someone knows practical
reasons for the SSL server and client to be more fragile? Second, if a test stresses server with a
few dozens of client threads, regardless of SSL, it should be written in a way that tolerates 
failures for some of them rather than relying on magic recovery behind the scenes. And after all, 
all this integration testing is quite useless if we assume we're fine having 9/10 of the client to 
fail in any test.

Status indication
-----------------
Long story told short, I have changed my mind on status badges. After thinking a lot about 
organising project's status indication and notification, where most important is regression 
notification when we have contributors who used to commit without running tests, I came to 
conclusion that status badges can do the job. 

This way every contributor will know that an introduced regression will become a public knowledge
in a matter of minutes. And that it won't make other members of the community happy about it. 
Hopefully, this as a form of enforcement of standards and new testing tools will help us go 
smoother. 
 
Drawing the line
----------------
As far as I see, testing infrastructure is ready. I also made several related changes here and 
there. Now I think it is a good time to put it into upstream. First, of course for other 
contributors to make use of the new tools. Second, for the sake of granularity because there's
already a bunch of changes since January in my fork. And third, in coming months it is likely that 
I won't have much time for contribution, thus it's good time to discuss the changes if we need to.    

Question
--------
For these I can expect the answers only from Sylvain, but others' ideas are also welcomed.

#. Will you review my changes in the fork [5]_? Or it's easier if I've created a pull 
   request right away?
#. Do we have any progress on bugtracker?
#. I still want to hear about your ideas about ``test_gc``.
#. We still have this outdated and poitless wiki home as landing page for the repo.
#. Recent coverage report [8]_ uncovered that we have a dozen with zero-coverage modules which
   are probably obsolete.


saaj

unread,
Apr 23, 2015, 9:21:16 AM4/23/15
to cherryp...@googlegroups.com, Robert Brewer
Hello Robert,


I'm sorry to disturb you, but there's an issue with your code. In both, ``wsgiserver2`` and ``wsgiserver3`` we have ``ChunkedRFile.__iter__`` which looks like this:

    def __iter__(self):
        # Shamelessly stolen from StringIO
        total = 0
        line = self.readline(sizehint)
        while line:
            yield line
            total += len(line)
            if 0 < sizehint <= total:
                break
            line = self.readline(sizehint)

This probably indicates that you're not as good in stealing as in programming ;-) Note undefined ``sizehint``. Also coverage report indicates the method is not reached in 
the test suite. Can you verify that it's safe to remove the method or can you fix it? So we don't have broken code in the core.


P.S. It'll be really nice to have participating the discussion.

Sylvain Hellegouarch

unread,
Apr 23, 2015, 10:04:52 AM4/23/15
to cherryp...@googlegroups.com
Hi saaj,

Thanks for being so thorough and patient.

I think it's a good call. I'd rather have no test than a false-positive broken test.

 

Another thing that prevented certain classification to *pass* and *fail* and led to frequent 
freezes were request retries and absence of HTTP client's socket timeout. I can only guess about
the original reasons of such "carefulness" and not letting a test just fail. For instance, you can
take some of concurrency tests that throw out up to a hundred client threads and imagine stability 
and predictability of the undertaking with 10 retires and no client timeout. These two were
also fixed -- ``test.webtest.openURL`` doesn't retry on socket errors and sets 10-second timeout
to its HTTP connections by default.

  ``test.helper.CPWebCase`` starts full in-process CherryPy server and then uses real 
  ``httplib.HTTPConnection`` to connect to it. Most of CherryPy tests are in fact either 
  integration [3]_ or system tests [4]_.


Seems fair. Do we need such a long timeout though? I would have set it to something smaller so that the test suite doesn't drag on.
 



Test dependency
---------------
How do you like tests named like ``test_0_check_something``, ``test_1_check_something_else``, and
so on? And what if they were named this way intentionally, so they run in order, and fail 
otherwise because latter depend on former. Specifically, you can not run test individually. All 
occurrence I've found were renamed and freed from sibling shackles.


That's great. I don't think tests should depend on each other either at all. So ordering them isn't a feature I would expect.


 

Test isolation
--------------
``test.helper.CPWebCase``'s server startup routine was made in equivalent of ``unittest``'s 
``setUpClass``, so tests in a test case were not isolated from each other, which made it 
problematic to deal with the case of tests that interfere with internals as it was really
hard to say whether such test has failed on its own or as a consequence. I've added
``per_test_setup`` attribute to ``CPWebCase`` to conditionally do the routine in ``setUp``
and accommodated tests' code. I apprehended significant performance degradation, but it has just 
become around 20% slower, which was the reason for me to make it true by default. Not sure, 
maybe it should't exist at all now.

Unfortunately, this doesn't resolve all isolation issues because having to do a complete reset
of in-process threaded server is a juggling whatsoever. When a tricky test fails badly, it still
has a side-effect (see below), like this::

    ./cherrypy/process/wspbus.py:233: RuntimeWarning: The main thread is exiting, but the Bus is
    in the states.STARTING state; shutting it down automatically now. You must either call 
    bus.block() after start(), or call bus.exit() before the main thread exits.


This is in response to the fact we are doing integrationg tests not real unit tests of code itself. I wonder if we shouldn't remove that functional layer and try to re-purpose our tests to make them easier to control.
I often rely on this little recipe I wrote for my own tests:


It makes the request go through the same stack except for the socket layer.

Could this help out? The tests aren't much more unit tests per-se but we avoid some issues.


 

Testing progress
----------------
What has just been said about fixed and improved stuff is happening in my fork [5]_, it is not yet 
pushed upstream. More detailed description and discussion about current and above sections is in 
the thread in CherryPy user group [6]_. Because it's there until the moment Google decides
to shutdown Google Groups because of low traffic, abuse or any other reason they use for such 
an announcement, I will summarise the achievements in improving CherryPy testing infrastructure. 

#. Migrated tests to stdlib ``unittest``, ``unitest2`` as fallback for *py26*. Nose is unfriendly
   to *py3* and probably was related to some locking issues.


That's okay with me. 

 
#. Overall improvement in avoiding freezes in various test cases and testing environments.
   TTY in Docker, disabled interactive mode, various internals' tests wait in a thread, et cetera.
#. Implemented "tagged" Tox 1.8+ configuration for matrix of various environments (see below). 
#. Integrated with Drone.io [7]_ CI service and Codecov.io [8]_ code coverage tracking 
   service.  


That's great, though it won't make much difference if, as maintainers, we don't pay attention to the results anyhow :(

 
#. Made various changes to allow parallel environment run with Detox to fit in 15 minutes of 
   Drone.io free tier. It includes running tests in install directory [9]_, starting server each
   time on free port provided by OS, locking Memcached cases with an atomic operation and other. 
#. Removed global and persistent configuration that prevented mixing HTTP and HTTPS cases.
#. Made it possible to work on the test suite in PyDev. When running tests in PyDev test runner, 
   it adds another non-daemonic thread that tests don't expect which leads to deadlocks or fails
   for 3 tests. They are now just skipped under PyDev runner. 

Cool stuff. I'm using pycharm myself but that ought to follow a similar trend. Thanks.
Mmh, the fact we're running integrations tests make our test suite more fragile and complicated than it ought to be.

What are we trying to do with our tests anyway? That's probably the question we want to answer first :)


 

Status indication
-----------------
Long story told short, I have changed my mind on status badges. After thinking a lot about 
organising project's status indication and notification, where most important is regression 
notification when we have contributors who used to commit without running tests, I came to 
conclusion that status badges can do the job. 

This way every contributor will know that an introduced regression will become a public knowledge
in a matter of minutes. And that it won't make other members of the community happy about it. 
Hopefully, this as a form of enforcement of standards and new testing tools will help us go 
smoother. 
 
Drawing the line
----------------
As far as I see, testing infrastructure is ready. I also made several related changes here and 
there. Now I think it is a good time to put it into upstream. First, of course for other 
contributors to make use of the new tools. Second, for the sake of granularity because there's
already a bunch of changes since January in my fork. And third, in coming months it is likely that 
I won't have much time for contribution, thus it's good time to discuss the changes if we need to.    


I'd be happy to integrate your change as you've clearly put a lot of effort into it and they seem to have paid up.
 

Question
--------
For these I can expect the answers only from Sylvain, but others' ideas are also welcomed.

#. Will you review my changes in the fork [5]_? Or it's easier if I've created a pull 
   request right away?

I prefer a PR if you can allow it in your schedule.

 
#. Do we have any progress on bugtracker?

You could open a ticket in BitBucket.

 
#. I still want to hear about your ideas about ``test_gc``.

I have no recollection of why we had this one so hard for me to comment.

 
#. We still have this outdated and poitless wiki home as landing page for the repo.

Indeed.
 
#. Recent coverage report [8]_ uncovered that we have a dozen with zero-coverage modules which
   are probably obsolete.

That's a status at least.

Thanks for all your work saaj,

saaj

unread,
Apr 24, 2015, 8:49:28 AM4/24/15
to cherryp...@googlegroups.com
Thanks for being so thorough and patient.
:-)
 
Test determinism
----------------
There were garbage collection tests that were automatically added to every subclass of
``test.helper.CPWebCase``. They are non-deterministic and fail sporadically (there are
other normal tests that have non-deterministic behaviour, see below). Because CPython's reference
counting garbage collector is deterministic, it's a concurrency issue. What was really surprising
for me hear was explanation like, okay, if it has failed run it once again until you're sure that
the failure persists. This ``test_gc`` supplementing is disabled now until everything else is 
resolved, and it is clear who to handle it properly. 


I think it's a good call. I'd rather have no test than a false-positive broken test.

#. I still want to hear about your ideas about ``test_gc``.

I have no recollection of why we had this one so hard for me to comment.

These ``test_gc``s of course weren't without a reason. Maybe because of them CherryPy has good memory profile and doesn't leak. After this 
iteration we should get back to them because they are important. Test sampling facility may help to detect if the problem is in certain tests.
 
Another thing that prevented certain classification to *pass* and *fail* and led to frequent 
freezes were request retries and absence of HTTP client's socket timeout. I can only guess about
the original reasons of such "carefulness" and not letting a test just fail. For instance, you can
take some of concurrency tests that throw out up to a hundred client threads and imagine stability 
and predictability of the undertaking with 10 retires and no client timeout. These two were
also fixed -- ``test.webtest.openURL`` doesn't retry on socket errors and sets 10-second timeout
to its HTTP connections by default.

  ``test.helper.CPWebCase`` starts full in-process CherryPy server and then uses real 
  ``httplib.HTTPConnection`` to connect to it. Most of CherryPy tests are in fact either 
  integration [3]_ or system tests [4]_.


Seems fair. Do we need such a long timeout though? I would have set it to something smaller so that the test suite doesn't drag on.

I agree it's long, and actually I started with 2 or 4 seconds. But there were additional failures, because tests were written in expectation of infinite 
client timeout and retries, but I can't recall, though my point was "infrastructure".  I have just run ``py27-nt`` and ``py33-nt``. With 2-second timeout 
both added a failure of notorious ``test.test_caching.TestCache.test_antistampede`` which needs to be amended anyway. So yes, ``WebCase.timeout`` 
can be decreased for a general case, and set as param to ``CPWebCase.getPage`` on ocassion.

 
Test isolation
--------------
``test.helper.CPWebCase``'s server startup routine was made in equivalent of ``unittest``'s 
``setUpClass``, so tests in a test case were not isolated from each other, which made it 
problematic to deal with the case of tests that interfere with internals as it was really
hard to say whether such test has failed on its own or as a consequence. I've added
``per_test_setup`` attribute to ``CPWebCase`` to conditionally do the routine in ``setUp``
and accommodated tests' code. I apprehended significant performance degradation, but it has just 
become around 20% slower, which was the reason for me to make it true by default. Not sure, 
maybe it should't exist at all now.

Unfortunately, this doesn't resolve all isolation issues because having to do a complete reset
of in-process threaded server is a juggling whatsoever. When a tricky test fails badly, it still
has a side-effect (see below), like this::

    ./cherrypy/process/wspbus.py:233: RuntimeWarning: The main thread is exiting, but the Bus is
    in the states.STARTING state; shutting it down automatically now. You must either call 
    bus.block() after start(), or call bus.exit() before the main thread exits.


This is in response to the fact we are doing integrationg tests not real unit tests of code itself. I wonder if we shouldn't remove that functional layer and try to re-purpose our tests to make them easier to control.

To me the thing looks like apples vs oranges. What's more important, high-level assertions that indicate that the web-server acts like a 
web-server or that a bunch of low-level assertions about file-like objects and HTTP string parsing, if can't have both? To me it's the former.
I see there's (inherent?) coupling -- tools, bus, etc. You either end up writing bunch of mocks or do integration testing anyway. But I don't feel
steady on the point. It may need more discussion.
 
I often rely on this little recipe I wrote for my own tests:


It makes the request go through the same stack except for the socket layer.

Could this help out? The tests aren't much more unit tests per-se but we avoid some issues.

I looked at the recipe several times in past, but didn't dare to incorporate it to my workflow. Maybe because of need to implement assertions
methods that ``webtest.WebCase`` has. From one side I think it is good to have this base class out of the box and have a choice. From other
I don't understand how tell a user when to use one and when another. 

It's going to be a big rewrite anyway. With this full-blown integration/system we have several environments that are fine (plain py2). When we 
complicate things, like, throwing a hundred threads into it, add SSL, or run it on py3, it has issues. But in fact there's a handful of them. I believe
it's easier to fix them that to rewrite the whole suite. It seems a future question.
 

 
 
#. Overall improvement in avoiding freezes in various test cases and testing environments.
   TTY in Docker, disabled interactive mode, various internals' tests wait in a thread, et cetera.
#. Implemented "tagged" Tox 1.8+ configuration for matrix of various environments (see below). 
#. Integrated with Drone.io [7]_ CI service and Codecov.io [8]_ code coverage tracking 
   service.  


That's great, though it won't make much difference if, as maintainers, we don't pay attention to the results anyhow :(

Absolutely. As I already told, at very least one how publishes to the Cheese Shop should look for regressions, check up the status badge's 
color, beforehand. But it's not only for maintainers. For instance, it is a good guidance for people who want to contribute to the project.


Other error details are available in XML reports which retain such details. You may think that
I have found the reason for "carefulness" I questioned above, but I woulnd't agree for several
reasons. First, it looks like an SSL-only and also needs investigation -- someone knows practical
reasons for the SSL server and client to be more fragile? Second, if a test stresses server with a
few dozens of client threads, regardless of SSL, it should be written in a way that tolerates 
failures for some of them rather than relying on magic recovery behind the scenes. And after all, 
all this integration testing is quite useless if we assume we're fine having 9/10 of the client to 
fail in any test.


Mmh, the fact we're running integrations tests make our test suite more fragile and complicated than it ought to be.
What are we trying to do with our tests anyway? That's probably the question we want to answer first :)

We need tests to detect regressions, obviously. We have issues with determinism. To tackle it we needed a tool. I think test sampling can
be quite helpful with finding patterns and debugging. Normally, of course, you don't need sampling as it just should clearly fall in one of two
categories: pass or fail.

As I said it gets fragile and complicated not without reason. Either to fix individual cases or to undertake whole suite rewriting. The question 
is what is more achievable.
 


#. Will you review my changes in the fork [5]_? Or it's easier if I've created a pull 
   request right away?

I prefer a PR if you can allow it in your schedule.
 
#. Do we have any progress on bugtracker?

You could open a ticket in BitBucket.

Hm, for bookkeeping or just to answer it myself? ;-) I'm about "Bugs bugs bugs" section you had in original post of Future of CherryPy.

 
Reply all
Reply to author
Forward
0 new messages