To save me digging into git commits, do you (or anyone else) happen to
know for sure when "realized?" was added? It's not in the release
notes below... I *think* it's new in Alpha 7...?
Sean
To save me digging into git commits, do you (or anyone else) happen toknow for sure when "realized?" was added? It's not in the release
notes below... I *think* it's new in Alpha 7...?
Thanx Christopher.
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/
Railo Technologies, Inc. -- http://www.getrailo.com/
"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)
> = 0 Changes from 1.3 Alpha 6 to 1.3 Alpha 7 (05/13/2011)
>
> [...]
> * case now handles hash collisions (CLJ-426)
I've just updated my git master checkout, built it, and tried to use it
in my project. But still I get a hash collision error:
java.lang.IllegalArgumentException: No distinct mapping found
at clojure.core$min_hash.invoke (core.clj:5805)
So it seems to me that CLJ-426 is not really fixed...
Bye,
Tassilo
java.lang.IllegalArgumentException: No distinct mapping found
at clojure.core$min_hash.invoke (core.clj:5805)
I'm not on alpha7 yet, but what about (case 49 "1" 'string 49 'int)?
"1" hashes to 49, so there's a hash collision.
Hi Christopher,
> Can you supply a small example where this is happening?
No, I don't have a small, standalone example. :-(
The problem appeared when I converted my project from multimethods to
protocols. You can clone my mercurial project and run the tests, which
will trigger the error for the `coupling-by-objects' test.
$ hg clone https://anonymous:sec...@hg.uni-koblenz.de/horn/funql
The project needs some java library which you can download from
http://www.uni-koblenz.de/~horn/jgralab.jar
and put into a local maven repository using:
$ mvn install:install-file \
-Dfile=/path/to/jgralab.jar \
-DgroupId=ist \
-DartifactId=jgralab \
-Dversion=1.0.0 \
-Dpackaging=jar
Then fetch the deps and run the tests with either
$ cake deps
$ cake test
or
$ lein deps
$ lein test
> For reference, this code that failed in alpha6:
> [...]
Well, that works fine now:
user> *clojure-version*
{:interim true, :major 1, :minor 3, :incremental 0, :qualifier "master"}
user> (defn buggy-case [n]
(case (int n)
0 :null
1 :load
0x70000000 :loproc))
#'user/buggy-case
Bye,
Tassilo
Hi,
> The min-hash function throwing that exception is no longer used by
> case, though it is still used by the protocols internals, so that's
> what's running into the collision.
Ah, ok. I've thought protocols dispatch using `case', so I expected my
problem being fixed.
Is there already a bug report for the hash collisions on protocol
dispatch?
Bye,
Tassilo
It's actually worse than that! There are exactly twenty slots there,
so even though it's saying "can't specify more than 20" it's actually
choking on any amount more than NINETEEN.
--
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.
Is there already a bug report for the hash collisions on protocol
dispatch?
Hi,
What's the incremental runtime cost of increasing the max number of fields to 40? :-)
Las
sent from my mobile device
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To p...
Oh, there's no need for that. Just add a little smarts to the
constructor generating macro to detect 19+ fields and then define it
as
(defn ->Foo [& fields]
(if (not= (count fields) whatever)
(throw-arity-exception)
(Foo. (nth fields 0) (nth fields 1) (nth fields 2) ...)))
or maybe for efficiency use a let cascade:
(let [p1 (first fields)
fields (rest fields)
p2 (first fields)
fields (rest fields)
...]
...)
Of course there's probably some JVM limit on the number of constructor
args, too.
I'd suggest if you have records that large to consider restructuring
them to have a smaller number of fields that are more composite --
seqs, vectors, sets, maps, even nested records. Surely there's *some*
structure beyond a heterogeneous assortment of twenty-plus objects?
<rant>
I think this issue reinforces my belief that arbitrary limits are bad.
Re. defrecord - Should we put the onus on everyone using defrecord to manipulate wide datasets to remember that there is an arbitrary limit of 19 fields, or put some smarts into the defrecord macro ?
I think we are in agreement that we should put some smarts in the defrecord macro.
But, hold on, why do we need smarts in the defrecord macro ? Why ? because of an arbitrary limit in the number of params that a function can handle.
So, why don't we put the smarts around function compilation/application to handle any number of params (within the JVMs ability). This way, we can avoid having to have smarts everywhere that a macro might expand to a function call with num params > 19. I have direct experience of being bitten by this.
If handling this many params means a performance hit, then, by all means lets log a warning about it, but we should not trip up perfectly valid code.
</rant>
Ah! that feels better :-)
Jules
+1. I think this can be done. It just needs to be that 20+ params
routes through .applyTo/applyToHelper instead of the .invoke methods.
There will no doubt be a performance hit though, and perhaps a
substantial one.
One of the useful features of lisp programming is that limits are mostly
constrained by the amount of available memory. This isn't always true
but the exceptional cases are very rare.
This becomes important because people generally write automatic code
generator macros to support their domain specific languages. While you
might never code a 20 parameter defrecord by hand it is almost certain
that some macro will generate code well past that limit.
I am working with code that auto-generates variables and collects them
into the "let" form so there can be hundreds of them, mostly temporary
variables that were gensym'ed. If the code were moved to keep the
variables in a defrecord I would expect it to "just work".
I'm surprised that defrecord isn't a hashed trie data structure of
some sort, allowing the usual 2^32 entries.
Tim Daly
Literate Software
da...@literatesoftware.com
For speed, it's a Java class with instance variables for the
predefined keys (and a hash trie for any non-standard-for-that-record
keys that are added to a particular instance).
IMO, records should support anything maps support -- in particular,
they should support near-arbitrary numbers of entries. If that means
that bigger ones (in terms of numbers of predefined keys) switch to a
less performant hash trie for their predefined keys, so be it. It's
not unlike promoting integers to BigIntegers in arithmetic that way.
(Similarly, fns should be overloadable for specific arities over 20,
with any call with over 20 parameters going through .applyTo instead
of a .invoke, and then being checked for arity. For instance, if the
fn can take 1, 15, 25, or 30 & more arguments, and you pass it 27,
applyTo gets called and then blows up since it wants either 25 or over
29 arguments, but not 27, in that case. There'll be a performance
penalty in calling it with over 20 arguments without varargs, the same
as calling it with varargs or with apply, caused by having to box the
args into a seq and unbox them again, but that beats it not working at
all.)
IMO, records should support anything maps support -- in particular,
they should support near-arbitrary numbers of entries.
The suggestions would make records (and functions) no slower for the
situations where they currently work at all, and would make them work
in situations where they currently fail.
To make them stay fast, without hitting ivar and other limits, you'd
have to do something fundamentally new (and a bit icky): use an array
of Object under the hood to hold the contents/params. There'd still be
a slight penalty for "huge" arglists and records, due to the second
indirection to get from record pointer to array pointer to element in
the case of records, and the indirection to get the arg out of the
array in the case of functions. This also couldn't be done with
deftype (as it would break volatile mutable volatility).
Getting rid of the indirection would require either fundamental JVM
changes (unlikely, at least in the near term) or making records
directly be object arrays "under the hood" rather than deftypes. Then
records aren't their own classes and a lot of how type and .foo
dispatch works for records would have to be changed. Also, records
become thread-unsafe if there's any way to break "hygiene" and
directly get at the array cells to mutate them. I do not recommend
going that far.
Frankly, I'd prefer "large" records just use hash tries. They're
tried, tested, and true (so to speak) and support structure sharing,
unlike anything directly backed by Object arrays. Furthermore, they
won't be any slower to access in the case of <33 fields than Object
arrays and get slower only logarithmically, and it's a very
slow-growing logarithm at that -- in particular, there's only one more
indirection for records with 33-1024 fields. Locality also stays good
(for the field value pointers, rather than the value objects
themselves) for up to 32 fields.
So, one indirection for defined fields in records of up to 20 defined
fields, and for the first 20 defined fields of any record; two for the
next 32 fields of any record; three for the next 992 fields of any
record; past that, seq becomes more efficient for traversal than a
series of direct field accesses that eventually hits them all (and
especially has superior memory locality), and just as efficient as
traversing, say, a vector or a hash-map.