Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib aware versions of Set and Map

4 views
Skip to first unread message

Berke Durak

unread,
Apr 21, 2008, 5:39:56 AM4/21/08
to Caml-list List
Hello,

I decided to share my modification of the standard Set and Map modules.

Basically, I took Set and Map, unfunctorized them, passing the comparison
function as an optional argument with Pervasives.compare as the default,
added or exposed some missing functions (reverse_fold, reverse_iter, min_key...)
and added converters for Sexplib.

It is under LGPL and feel free to include it in Extlib, in Ocaml "batteries"
or anywhere else.

http://abaababa.ouvaton.org/caml/xsetxmap-20080421.tar.gz
--
Berke DURAK

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jean-Christophe Filliâtre

unread,
Apr 21, 2008, 6:56:52 AM4/21/08
to Berke Durak, Caml-list List
Berke Durak wrote:
> I decided to share my modification of the standard Set and Map modules.
>
> Basically, I took Set and Map, unfunctorized them, passing the comparison
> function as an optional argument with Pervasives.compare as the default,

Since your library is allowing the user to pass the comparison function
to any function which requires it

type 'elt cp = 'elt -> 'elt -> int
val add: ?cp:'elt cp -> 'elt -> 'elt t -> 'elt t
val mem: ?cp:'elt cp -> 'elt -> 'elt t -> bool
...

you should really add a *huge* warning in the documentation saying that
it is unsound to insert an element with a comparison function and then
to search for it using another one.

This is precisely where the functor approach is increasing your program
correctness, with a static binding of a single comparison function into
the code of all operations.

Even without functors, you can provide a safer approach where the
comparison function is given when creating the data structure (a la Caml
light), and stored inside. This can be an optional argument of create
for instance, so that you keep your idea of a default comparison function:

val create: ?cp:('elt -> 'elt -> int) -> unit -> 'elt t

(Note that I agree with you that it is sometimes annoying that Set and
Map do not provide polymorphic versions using Pervasives.compare, as
Hashtbl does with the polymorphic hash funtion. I'm only concerned with
the potential risk with your solution...)

--
Jean-Christophe

Berke Durak

unread,
Apr 21, 2008, 7:19:36 AM4/21/08
to Jean-Christophe Filliâtre, Caml List
Yes you are right, I should have put a warning. These modules are for
peculiar cases.
So here it is:

WARNING: The default comparison function is not sound for non-canonical
datastructures such
as Sets and Maps. Xset is not canonical. Do not use Xsets or Xmap without
providing the correct
comparison function. You must always pass the same comparison function.

(That's why I was asking if there was a Set/Map structure with canonical
representation the other day.)

Extlib provides a version where the comparison function for the keys is
stored in the map.
But I chose to not do this for some peculiar reasons.
- I'm serializing Xmap and Xsets using Marshal, and I don't want to have
functions in the structure
- I don't like the overhead of a wrapper structure and I don't want to
complicate the API
--
Berke Durak

Jon Harrop

unread,
Apr 23, 2008, 8:42:27 AM4/23/08
to caml...@yquem.inria.fr
On Monday 21 April 2008 12:19:09 Berke Durak wrote:
> Yes you are right, I should have put a warning. These modules are for
> peculiar cases.

Actually I would say that your style is more useful than the built-in Set and
Map modules because you don't have to jump through hoops defining your
own "Int" module with its own "int" type and its own comparison function over
ints every time you want a set of integers. I would put the comparison
function in the set itself though.

Your modules cover 99% of use cases with the lowest syntactic overhead
possible in OCaml. However, you have inherited a couple of design flaws from
OCaml's standard library:

The "height" function has been inlined which has no significant effect on
performance but makes it harder to refactor the code.

If you special case Node(Empty, v, Empty) to a new Node1(v) type constructor
then you can improve performance by 30% by alleviating the GC.

As Jean-Christophe says, it is dangerous. However, the whole language is
already dangerous in this sense because it is so easy to erroneously apply
polymorphic comparison or equality to the existing sets and maps and get
silent data corruption. The only solution is to fix the language itself by
allowing types to override comparison, equality and hashing.

Another inherent problem is the inefficiency of performing comparisons in this
way in OCaml. Type specialization (e.g. using a JIT) greatly accelerates this
kind of code by allowing the comparison function (which is almost always
trivial) to be inlined and optimized.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

Brian Hurt

unread,
Apr 23, 2008, 9:26:00 AM4/23/08
to Jon Harrop, caml...@yquem.inria.fr
Jon Harrop wrote:

>Actually I would say that your style is more useful than the built-in Set and
>Map modules because you don't have to jump through hoops defining your
>own "Int" module with its own "int" type and its own comparison function over
>ints every time you want a set of integers. I would put the comparison
>function in the set itself though.
>
>
>

IMHO, the Int module should be in the standard library, and the Set and
Map modules should have already instantiated sets and maps for the
standard base types (int, float, string, char).

Also, I'm not as down on functors as a lot of programmers seem to be.
While not perfect, they solve a number of problems very well. For
example, there are a number of operations on sets and maps which are
O(N) if and only if you know the two trees are in the same order, but
O(N log N) if you don't know they're in the same order. Functors lift
this check into the type system.

Brian

Berke Durak

unread,
Apr 23, 2008, 9:41:38 AM4/23/08
to Jon Harrop, caml...@yquem.inria.fr
Jon Harrop wrote:
> On Monday 21 April 2008 12:19:09 Berke Durak wrote:
>> Yes you are right, I should have put a warning. These modules are for
>> peculiar cases.
>
> Actually I would say that your style is more useful than the built-in Set and
> Map modules because you don't have to jump through hoops defining your
> own "Int" module with its own "int" type and its own comparison function over
> ints every time you want a set of integers. I would put the comparison
> function in the set itself though.

Well, thanks.

> Your modules cover 99% of use cases with the lowest syntactic overhead
> possible in OCaml. However, you have inherited a couple of design flaws from
> OCaml's standard library:
>

> . The "height" function has been inlined which has no significant effect on

> performance but makes it harder to refactor the code.
>

> . If you special case Node(Empty, v, Empty) to a new Node1(v) type constructor

> then you can improve performance by 30% by alleviating the GC.

That's a good idea.

> As Jean-Christophe says, it is dangerous.

Have you checked out the second version where I use phantom types to prevent
the more severe cases of data corruption? Yes, you can still end up with
multiple copies of the same "set", but at least the tree won't get
inconsistent because you mistakenly called add with different comparison functions.
So it's as safe/unsafe as using List.assoc or the generic hash table to
store sets or hash tables.

That's still a problem, but it's not that much worse than with
functors. After all, nothing prevents you from incorrectly writing

module StringSet = Set.Make(String)
module SetSet = Set.Make(struct type t = StringSet.t let compare = compare)

> However, the whole language is
> already dangerous in this sense because it is so easy to erroneously apply
> polymorphic comparison or equality to the existing sets and maps and get
> silent data corruption. The only solution is to fix the language itself by
> allowing types to override comparison, equality and hashing.

Overriding equality etc. would require run-time type-based dispatch
or implicitly carrying the comparison function (as Haskell apparently does
with type classes) as an extra argument. Could be expensive.

> Another inherent problem is the inefficiency of performing comparisons in this
> way in OCaml. Type specialization (e.g. using a JIT) greatly accelerates this
> kind of code by allowing the comparison function (which is almost always
> trivial) to be inlined and optimized.

Did anyone try the GNU lightning-based ocamljit in the Ocaml CVS?
--
Berke DURAK

David Allsopp

unread,
Apr 23, 2008, 10:17:04 AM4/23/08
to Brian Hurt, Jon Harrop, caml...@yquem.inria.fr
Brian Hurt wrote:

> Jon Harrop wrote:
>
> > Actually I would say that your style is more useful than the built-in
> > Set and Map modules because you don't have to jump through hoops
> > defining your own "Int" module with its own "int" type and its own
> > comparison function over ints every time you want a set of integers. I
> > would put the comparison function in the set itself though.
> >
> >
> >
> IMHO, the Int module should be in the standard library, and the Set and
> Map modules should have already instantiated sets and maps for the
> standard base types (int, float, string, char).

Agreed - then we could also have more sensibly located functions such as
Int.of_string (note that it's the same length as int_of_string!!) and remove
lots of random functions from Pervasives!

All that said, and especially as StdLib changes are reasonably rare, I find
having files IntSet.ml and IntSet.mli containing:

include Set.Make(struct type t = int let compare = Pervasives.compare end)

and

include Set.S with type elt = int

isn't too bad (except that you have to include IntSet.cmo/.cmx when
compiling, obviously)


David

Berke Durak

unread,
Apr 23, 2008, 10:23:21 AM4/23/08
to David Allsopp, caml...@yquem.inria.fr
David Allsopp wrote:
> Brian Hurt wrote:
>
>> Jon Harrop wrote:
>>
>
> Agreed - then we could also have more sensibly located functions such as
> Int.of_string (note that it's the same length as int_of_string!!) and remove
> lots of random functions from Pervasives!

Of course we'll just add an extra Int module and leave Pervasives alone, or at
least make it trigger a warning.

> All that said, and especially as StdLib changes are reasonably rare, I find
> having files IntSet.ml and IntSet.mli containing:
>
> include Set.Make(struct type t = int let compare = Pervasives.compare end)
>
> and
>
> include Set.S with type elt = int
>
> isn't too bad (except that you have to include IntSet.cmo/.cmx when
> compiling, obviously)
>
>

I agree, and someone might write an optimized version for ints or floats.

But Xset & Xmap also provide opportunities to add some missing functions.
In particular, I often use Sets or Maps as a good priority queue/heap substitute, and
reverse_fold / reverse_iter come in handy.

The only problem is that I have an allergy to CamlCaseIdentifiers but I'll
just swallow some antiHistaminics.

--
Berke DURAK

Alexandre Pilkiewicz

unread,
Apr 26, 2008, 10:44:47 AM4/26/08
to caml...@yquem.inria.fr
Le Wednesday 23 April 2008 15:56:47 David Allsopp, vous avez écrit :
> All that said, and especially as StdLib changes are reasonably rare, I find
> having files IntSet.ml and IntSet.mli containing:
>
> include Set.Make(struct type t = int let compare = Pervasives.compare end)
>
> and
>
> include Set.S with type elt = int

One should probably use an other function than Pervasives.compare in that
case. Some "benchmarks" show that 100 000 000 uses of comp with different
definitions lead to some important differences for performance

let comp = compare
-> 1,54s


let comp (i:int) (j:int) = compare i j
-> 0.7s


let comp (i:int) (j:int) =
if i = j then 0 else if i < j then -1 else 1
-> 0.18s

--
Alexandre Pilkiewicz

0 new messages