platform-specific unit test failures in cc.test-complex-numbers

22 views
Skip to first unread message

B Smith-Mannschott

unread,
Apr 16, 2010, 1:49:01 PM4/16/10
to clo...@googlegroups.com
I'm building af2a730 "some tests for c.c.io byte-level support" of
clojure-contrib.

I'm seeing clojure.contrib.test-complex-numbers error out the maven
build with 4 failures and 253 errors on two of the five platforms at
my disposal.

Only my linux-based netbooks fail the build. JDK version does not seem
to explain the failures. (Apologies to those not reading this in a
mono-spaced font.)

|---+--------+---------------+------------------------+-------|
| | Mvn | JDK | OS/Kernel | Build |
|---+--------+---------------+------------------------+-------|
| W | 2.2.1 | 1.6.0_16 | Ubuntu 9.10, 2.6.31-20 | FAIL |
| P | 2.2.1 | 1.6.0_15 | Ubuntu 9.10, 2.6.31-20 | FAIL |
| P | 2.2.1 | 1.6.0_19 | Ubuntu 9.10, 2.6.31-20 | FAIL |
| P | 2.2.1 | OpenJDK 1.6.0 | Ubuntu 9.10, 2.6.31-20 | FAIL |
|---+--------+---------------+------------------------+-------|
| O | 2.2.1 | 1.6.0_10 | Ubuntu 9.04, 2.6.28-18 | PASS |
| M | 2.2.1 | 1.6.0_17 | Mac OS X 10.6.3 x86_64 | PASS |
| G | 2.0.10 | 1.5.0_13 | Mac OS X 10.6.3 x86_64 | PASS |
|---+--------+---------------+------------------------+-------|

|---+---------------+------------------------+-------+-------|
| | Model | Processor | RAM | Build |
|---+---------------+------------------------+-------+-------|
| W | Dell Mini 9 | Atom, 1.66 GHz | 2 GiB | FAIL |
| P | Eee PC 1005HA | Atom, 1.6 GHz | 2 GiB | FAIL |
| O | Desktop | Core2Duo Quad, 2.5 GHz | 4 GiB | PASS |
| M | MacBook | Core2Duo, 2.2 GHz | 4 GiB | PASS |
| G | PowerBook | PowerPC G4, 1 GHz | 1 GiB | PASS |
|---+---------------+------------------------+-------+-------|

Commonalities of the Failing platforms:

- Ubuntu 9.10
- Linux Kernel 2.6.31-20-generic
- Intel Atom processor (though different models: "Wesley" runs at 1.66GHz
while "Pepper" uses a slightly newer model at 1.6 GHz)

These failures don't appear related to the underlying clojure
version. M (which passes) and P (which fails) were both
using clojure-1.2.0-master-20100415.170113-27.jar
(md5:43db78bcc5461156c80fee9434f2ff28)

A few representative examples of the failures I'm seeing
========================================================

ERROR in (complex-sqrt) (run-test1347341999632195512.clj:44)
Uncaught exception, not in assertion.
expected: nil
actual: java.lang.IllegalArgumentException: No method in multimethod
'sqrt' for dispatch value: :clojure.contrib.complex-numbers/complex
...

FAIL in (complex-conjugate) (run-test1347341999632195512.clj:44)
expected: (= (conjugate (complex 1 2)) (complex 1 -2))
actual: false

ERROR in (complex-subtraction) (run-test1347341999632195512.clj:44)
expected: (= (- -1 (complex -3 -7)) (complex 2 7))
actual: java.lang.IllegalArgumentException: No method in multimethod
'-' for dispatch value: :clojure.contrib.complex-numbers/complex
...

ERROR in (complex-division) (run-test1347341999632195512.clj:44)
expected: (= (/ (imaginary 5) (imaginary 5)) 1)
actual: java.lang.IllegalArgumentException: No method in multimethod
'/' for dispatch value:
:clojure.contrib.complex-numbers/pure-imaginary
...

ERROR in (complex-abs) (run-test5245007338919107779.clj:44)
expected: (approx= (* c (conjugate c)) (sqr (abs c)) 1.0E-14)
actual: java.lang.IllegalArgumentException: No method in multimethod
'*' for dispatch value: [:clojure.contrib.complex-numbers/complex
:clojure.contrib.complex-numbers/complex]
at clojure.lang.MultiFn.getFn (MultiFn.java:115)
clojure.lang.MultiFn.invoke (MultiFn.java:161)
clojure.contrib.test_complex_numbers$fn__10168$fn__10173.invoke
(test_complex_numbers.clj:284)
clojure.contrib.test_complex_numbers/fn (test_complex_numbers.clj:284)
clojure.test$test_var__6644$fn__6645.invoke (test.clj:644)
clojure.test/test_var (test.clj:644)
clojure.test$test_all_vars__6649$fn__6650$fn__6657.invoke (test.clj:659)
clojure.test/default_fixture (test.clj:617)
clojure.test$test_all_vars__6649$fn__6650.invoke (test.clj:659)
clojure.test/default_fixture (test.clj:617)
clojure.test/test_all_vars (test.clj:655)
clojure.test/test_ns (test.clj:677)
clojure.core$map__3989$fn__3990.invoke (core.clj:1885)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:56)
clojure.lang.Cons.next (Cons.java:37)
clojure.lang.RT.next (RT.java:566)
clojure.core/next (core.clj:54)
clojure.core/reduce (core.clj:715)
clojure.core/reduce (core.clj:706)
clojure.core$merge_with__4079.doInvoke (core.clj:2061)
clojure.lang.RestFn.applyTo (RestFn.java:140)
clojure.core/apply (core.clj:480)
clojure.test$run_tests__6666.doInvoke (test.clj:691)
clojure.lang.RestFn.invoke (RestFn.java:1261)
user$eval__12082$fn__12085.invoke (run-test5245007338919107779.clj:46)
user/eval (run-test5245007338919107779.clj:44)
clojure.lang.Compiler.eval (Compiler.java:5363)
clojure.lang.Compiler.load (Compiler.java:5773)
clojure.lang.Compiler.loadFile (Compiler.java:5736)
clojure.main/load_script (main.clj:213)
clojure.main/script_opt (main.clj:265)
clojure.main$main__6119.doInvoke (main.clj:346)
clojure.lang.RestFn.invoke (RestFn.java:409)
clojure.lang.Var.invoke (Var.java:365)
clojure.lang.AFn.applyToHelper (AFn.java:165)
clojure.lang.Var.applyTo (Var.java:482)
clojure.main.main (main.java:37)


What's going on here? Any ideas?

// Ben

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Konrad Hinsen

unread,
Apr 16, 2010, 2:03:31 PM4/16/10
to clo...@googlegroups.com
On 16.04.2010, at 19:49, B Smith-Mannschott wrote:

> A few representative examples of the failures I'm seeing
> ========================================================
...

> FAIL in (complex-conjugate) (run-test1347341999632195512.clj:44)
> expected: (= (conjugate (complex 1 2)) (complex 1 -2))
> actual: false

What strikes me as odd is this one, as all the others are about failing multimethod resolution. If we assume a common single cause for everything, perhaps it's in equality testing?

Konrad.

B Smith-Mannschott

unread,
Apr 17, 2010, 9:45:42 AM4/17/10
to clo...@googlegroups.com
More oddness:

If I remove all unit tests *except* test-compex-numbers, all is well:

[INFO] [clojure:test {execution: test-clojure}]

Testing clojure.contrib.test-complex-numbers

Ran 8 tests containing 268 assertions.
0 failures, 0 errors.

If I remove only test-complex-numbers, leaving all other tests in
place, all is also well:

...
Ran 351 tests containing 991 assertions.
0 failures, 0 errors.

B Smith-Mannschott

unread,
Apr 17, 2010, 12:02:47 PM4/17/10
to clo...@googlegroups.com
On Sat, Apr 17, 2010 at 15:45, B Smith-Mannschott <bsmit...@gmail.com> wrote:
> More oddness:
>
> If I remove all unit tests *except* test-compex-numbers, all is well:
>
>  [INFO] [clojure:test {execution: test-clojure}]
>
>  Testing clojure.contrib.test-complex-numbers
>
>  Ran 8 tests containing 268 assertions.
>  0 failures, 0 errors.

Actually, that's not true. The unit tests *usually* run
without error, but sometimes crash with a NPE in LazySeq.sval.

Here's a script I'm using:

[[file: .git/this-build-fails]]
#!/bin/bash
log=".git/$(date +%F%H%M%S)-$(git log --oneline|head -n 1|cut -d' ' -f1).log"
(
mvn clean
mvn test
) > $log 2>&1
if grep -q ERROR < $log
then
echo "build had ERROR: $log"
exit 0
else
echo "build was OK"
rm "$log"
exit 1
fi

Here's what I told my shell to do, and the resulting output

smithma@pepper:~/w/clojure-contrib$ while true ; do .git/this-build-fails ; done
build was OK
build had ERROR: .git/2010-04-17172643-2bc0dcc.log
build had ERROR: .git/2010-04-17172745-2bc0dcc.log
build was OK
build was OK
build was OK
build was OK
build had ERROR: .git/2010-04-17173302-2bc0dcc.log
build was OK
build had ERROR: .git/2010-04-17173510-2bc0dcc.log
build was OK
build was OK
build was OK
build was OK
build had ERROR: .git/2010-04-17174026-2bc0dcc.log
build was OK
build was OK
build was OK
build was OK
build was OK
build was OK
build was OK
build was OK
build was OK

The referenced commit 2bc0dcc is one where I've removed all unit
tests, save for test_complex_numbers.clj

smithma@pepper:~/w/clojure-contrib$ tree src/test/
src/test/
`-- clojure
`-- clojure
`-- contrib
`-- test_complex_numbers.clj

The interesting part of the five log files written above always looks
this same: a NullPointerException in LazySeq.sval

[INFO] [clojure:test {execution: test-clojure}]

Testing clojure.contrib.test-complex-numbers
Exception in thread "main" java.lang.RuntimeException:
java.lang.NullPointerException (run-test1269169458415140791.clj:0)
at clojure.lang.Compiler.eval(Compiler.java:5389)
at clojure.lang.Compiler.load(Compiler.java:5784)
at clojure.lang.Compiler.loadFile(Compiler.java:5747)
at clojure.main$load_script__6226.invoke(main.clj:213)
at clojure.main$script_opt__6255.invoke(main.clj:265)
at clojure.main$main__6273.doInvoke(main.clj:346)
at clojure.lang.RestFn.invoke(RestFn.java:409)
at clojure.lang.Var.invoke(Var.java:365)
at clojure.lang.AFn.applyToHelper(AFn.java:165)
at clojure.lang.Var.applyTo(Var.java:482)
at clojure.main.main(main.java:37)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at clojure.lang.LazySeq.sval(LazySeq.java:47)
at clojure.lang.LazySeq.seq(LazySeq.java:56)
at clojure.lang.Cons.next(Cons.java:37)

Color me confused.

Also, this isn't giving me warm fuzzy feelings of confidence for my next
Clojure project as I do a lot of my weekend hacking on my netbooks.

B Smith-Mannschott

unread,
Apr 17, 2010, 12:33:13 PM4/17/10
to clo...@googlegroups.com
I'm now seeing unreliable builds of this configuration even on my
MacBook (configuration "M"eheadable in the first mail on this thread)
which leads me to believe that whatever problem I'm seeing here, it's
more widespread than just my two netbooks.

10 of 28 builds failed on the MacBook. All failures produced the same
stack trace (though this one is somewhat different than what I saw on
my netbook, described in the previous mail.)

Testing clojure.contrib.test-complex-numbers
Exception in thread "main" java.lang.RuntimeException:
java.lang.NullPointerException (run-test1151698185522980091.clj:0)
at clojure.lang.Compiler.eval(Compiler.java:5389)
at clojure.lang.Compiler.load(Compiler.java:5784)
at clojure.lang.Compiler.loadFile(Compiler.java:5747)
at clojure.main$load_script__6226.invoke(main.clj:213)
at clojure.main$script_opt__6255.invoke(main.clj:265)
at clojure.main$main__6273.doInvoke(main.clj:346)
at clojure.lang.RestFn.invoke(RestFn.java:409)
at clojure.lang.Var.invoke(Var.java:365)
at clojure.lang.AFn.applyToHelper(AFn.java:165)
at clojure.lang.Var.applyTo(Var.java:482)
at clojure.main.main(main.java:37)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at clojure.lang.LazySeq.sval(LazySeq.java:47)
at clojure.lang.LazySeq.seq(LazySeq.java:56)
at clojure.lang.Cons.next(Cons.java:37)
at clojure.lang.RT.boundedLength(RT.java:1162)
at clojure.lang.RestFn.applyTo(RestFn.java:131)
at clojure.core$apply__3643.invoke(core.clj:480)
at clojure.test$run_tests__6820.doInvoke(test.clj:691)
at clojure.lang.RestFn.invoke(RestFn.java:409)
at user$eval__1507$fn__1510.invoke(run-test1151698185522980091.clj:9)
at user$eval__1507.invoke(run-test1151698185522980091.clj:7)
at clojure.lang.Compiler.eval(Compiler.java:5373)
... 10 more
Caused by: java.lang.NullPointerException
at clojure.core.protocols$fn__6000$G__5996__6004.invoke(protocols.clj:11)
at clojure.core$reduce__6111.invoke(core.clj:4719)
at clojure.test$join_fixtures__6796.invoke(test.clj:629)
at clojure.test$test_all_vars__6803.invoke(test.clj:653)
at clojure.test$test_ns__6817.invoke(test.clj:677)
at clojure.core$map__4086$fn__4087.invoke(core.clj:1870)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
... 20 more

I notice that clojure.core.protocols makes a showing here, which is
consistent with the first message in this thread. I also know that
protocols are still under development. Judging from the randomness of
the observed brokenness I'd wager that there's a race condition of
some sort buried in protocols or in the abstractions it builds upon.

any better guesses?

B Smith-Mannschott

unread,
Apr 17, 2010, 2:57:54 PM4/17/10
to clo...@googlegroups.com
I let Meheadable and George (my two macs) run clojure-contrib builds
while I was watching TV to get an idea of the probability of this
crash occurring was.

George, a 1 GHz PowerPC G4 (1 core) fails 37 of 59 builds (circa 63%)
Meheadable, a 2.2 GHz Core2Duo (2 cores) fails 48 of 132 builds (circa 36%)

It seems clear now that the problem is Clojure not clojure-contrib, so
I'm going to try to do a bisection of Clojure and see if I can't
figure out when this problem first cropped up.

Stuart Halloway

unread,
Apr 17, 2010, 3:32:23 PM4/17/10
to clo...@googlegroups.com
It's almost certainly the commit that added the InternalReduce
protocol: 5b281880571573c5917781de932ce4789f18daec.

I am slowly pounding my skull against this and would welcome any help.
It appears that the internal-reduce function flakes out and stops
working, but only intermittently.

If there is some way that protocols.clj gets *re*loaded while Clojure
is running, that would definitely be a problem, as reduce depends on it.

B Smith-Mannschott

unread,
Apr 17, 2010, 3:50:02 PM4/17/10
to clo...@googlegroups.com, stuart....@gmail.com
On Sat, Apr 17, 2010 at 21:32, Stuart Halloway
<stuart....@gmail.com> wrote:
> It's almost certainly the commit that added the InternalReduce protocol:
> 5b281880571573c5917781de932ce4789f18daec.
>
> I am slowly pounding my skull against this and would welcome any help. It
> appears that the internal-reduce function flakes out and stops working, but
> only intermittently.
>
> If there is some way that protocols.clj gets *re*loaded while Clojure is
> running, that would definitely be a problem, as reduce depends on it.
>

Yes, I just saw a8e92018ce0ce32fc59fae2072369a8671fdea62 "disable new
reduce" go in on clojure/master and have been running repeated builds
of clojure-contrib builds using that. It's a marked improvement, I'm
now only seeing 1 build of every 18 fail. It's no longer failing in
complex-numbers but rather in json:

ERROR in (can-print-json-null) (run-test1309997463507545582.clj:44)
expected: (= "null" (json-str nil))
actual: java.lang.NullPointerException: null
at clojure.contrib.json$eval__7586$fn__7587$G__7578__7590.invoke (json.clj:204)
clojure.contrib.json/json_str (json.clj:306)
clojure.contrib.test_json/fn (test_json.clj:148)
clojure.test$test_var__6804$fn__6805.invoke (test.clj:644)
clojure.test/test_var (test.clj:644)
clojure.test$test_all_vars__6809$fn__6810$fn__6817.invoke (test.clj:659)
clojure.test/default_fixture (test.clj:617)
clojure.test$test_all_vars__6809$fn__6810.invoke (test.clj:659)
clojure.test/default_fixture (test.clj:617)
clojure.test/test_all_vars (test.clj:655)
clojure.test/test_ns (test.clj:677)
clojure.core$map__4089$fn__4090.invoke (core.clj:1870)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:56)
clojure.lang.Cons.next (Cons.java:37)
clojure.lang.RT.next (RT.java:540)
clojure.core/next (core.clj:54)
clojure.core/reduce (core.clj:707)
clojure.core/reduce (core.clj:698)
clojure.core$merge_with__4179.doInvoke (core.clj:2046)
clojure.lang.RestFn.applyTo (RestFn.java:140)
clojure.core/apply (core.clj:480)
clojure.test$run_tests__6826.doInvoke (test.clj:691)
clojure.lang.RestFn.invoke (RestFn.java:1261)
user$eval__12082$fn__12085.invoke (run-test1309997463507545582.clj:46)
user/eval (run-test1309997463507545582.clj:44)
clojure.lang.Compiler.eval (Compiler.java:5373)
clojure.lang.Compiler.load (Compiler.java:5784)
clojure.lang.Compiler.loadFile (Compiler.java:5747)
clojure.main/load_script (main.clj:213)
clojure.main/script_opt (main.clj:265)
clojure.main$main__6279.doInvoke (main.clj:346)
clojure.lang.RestFn.invoke (RestFn.java:409)
clojure.lang.Var.invoke (Var.java:365)
clojure.lang.AFn.applyToHelper (AFn.java:165)
clojure.lang.Var.applyTo (Var.java:482)
clojure.main.main (main.java:37)

Line 204 of json.cl is:

202 ;;; JSON PRINTER
203
*204 (defprotocol Write-JSON
205 (write-json [object out]
206 "Print object to PrintWriter out as JSON"))

So, though a8e92018 is a great improvement, I'm still seeing some
defprotocol related misbehavior.

Rich Hickey

unread,
Apr 18, 2010, 10:10:07 AM4/18/10
to clo...@googlegroups.com
This should all be fixed now, as of
#19dd3c593e7a29cbca514c6ab7424ff22e353cc6.

Thanks for the report,

Rich
Reply all
Reply to author
Forward
0 new messages