[erlang-questions] Ordered set and DETS

20 views
Skip to first unread message

Evans, Matthew

unread,
Sep 3, 2010, 4:28:55 PM9/3/10
to Erlang Questions Mailinglist
Hi,

Does anyone know if there are any roadmap plans to implement ordered_set table types in DETS?

Thanks

Tony Rogvall

unread,
Sep 4, 2010, 7:10:26 AM9/4/10
to Evans, Matthew, Erlang Questions Mailinglist
+1 Please

/Tony


________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questio...@erlang.org

Ulf Wiger

unread,
Sep 4, 2010, 1:16:03 PM9/4/10
to erlang-q...@erlang.org

It may be possible to build on the combination of Tokyo Tyrant
and my sortable serialization library, which I mentioned in
this post a while ago:

http://www.erlang.org/cgi-bin/ezmlm-cgi/4/47285

That was part of the justification for adding prefix
matching and matchspec-style functionality in the sext
library in the first place.

BR,
Ulf W

Anthony Molinaro

unread,
Sep 4, 2010, 6:50:32 PM9/4/10
to Evans, Matthew, Erlang Questions Mailinglist
On Fri, Sep 03, 2010 at 04:28:55PM -0400, Evans, Matthew wrote:
> Does anyone know if there are any roadmap plans to implement ordered_set table types in DETS?

It might be a tad out of date now, but a couple years ago a company I worked
for had a consultant write mnesiax

http://code.google.com/p/mnesiaex/

Which made the mnesia storage backend a behaviour IIRC.

Then we used tcerl

http://code.google.com/p/tcerl/

to keep ordered_set disk backed tables in mnesia.

It's probably bit-rotted a bit but maybe worth dusting off?

-Anthony

--
------------------------------------------------------------------------
Anthony Molinaro <anth...@alumni.caltech.edu>

Ulf Wiger

unread,
Sep 6, 2010, 7:56:24 AM9/6/10
to erlang-q...@erlang.org

So when exercising my QuickCheck suite for the sext
(sortable serialization) library, I stumbled upon an intermittent
test case failure, which was particularly vexing.

The failing case was the sorting property, which says that,
for any terms, comparing the encoded terms should yield
the same result as comparing the original terms.

The failing term pairs are {-1.0, 1}, {1.0, 1}, {2.0, 2} etc.
The terms are not strictly equal, so they have to be encoded
differently (otherwise, the encoding property breaks).
That is, decode(encode(2.0)) =:= 2.0 must hold true*.
In this sense, the values in each pair are distinctly different.

* For some definition of true. In the case of sext, I specify that
the IEEE 754 binary64 representation of the float, as used in
the bit syntax notation, must be the same.

Yet, they have no defined ordering!

Eshell V5.7.5 (abort with ^G)
1> 1 < 1.0.
false
2> 1.0 < 1.
false
3> 1.0 =:= 1.
false

This means that the result of lists:sort/1 is not defined in this
particular case:

Eshell V5.7.5 (abort with ^G)
1> lists:sort([1,1.0]).
[1,1.0]
2> lists:sort([1.0,1]).
[1.0,1]

Naturally, lists:usort/1 has a corresponding behaviour:

3> lists:usort([2,2.0]).
[2]
4> lists:usort([2.0,2]).
[2.0]


I think this is a good argument in favour of adding operators
"strictly <" and "strictly >", which I suggest could be :< and >:.

I made a simple hack where the parser recognized these tokens and
expanded it into something similar to:

':<'(A,B) ->
A == B
andalso (is_integer(A)
andalso is_float(B))
orelse A < B.

'>:'(A,B) ->
A == B
andalso (is_float(A)
andalso is_integer(B))
orelse A < B.

but rather as a form rewriting excercise in erl_parse.yrl:

wrap_op({op,Pos,':<',L,R}) ->
{op,Pos,'orelse',
{op,Pos,'andalso',
{op,Pos,'==',L,R},
{op,Pos,'andalso',
{call,13,{atom,Pos,is_integer},[L]},
{call,14,{atom,Pos,is_float},[R]}}},
{op,Pos,'<',L,R}};
wrap_op({op,Pos,'>:',L,R}) ->
{op,Pos,'orelse',
{op,Pos,'andalso',
{op,Pos,'==',L,R},
{op,Pos,'andalso',
{call,13,{atom,Pos,is_float},[L]},
{call,14,{atom,Pos,is_integer},[R]}}},
{op,Pos,'>',L,R}};
wrap_op(Op) ->
Op.


This way, it works in guards as well, and will be BW
compatible with existing tools, but with the disadvantage
that the original source cannot be reconstituted.

I could try to formulate this as an EEP, if you all think
it's a reasonable (although quite marginal) suggestion.

An alternative would be to re-formulate the property of sext
such that the encoded terms sort the same way as the original
terms in all cases where it is reasonable to rely on the sort
order in the first place. This is to some extent already true,
at least in regard to encode/decode symmetry, in other cases
involving floats. This alternative likely has less impact. :)

BR,
Ulf W

Nicholas Frechette

unread,
Sep 6, 2010, 11:03:39 AM9/6/10
to Ulf Wiger, erlang-q...@erlang.org
Is that not the expected behavior per the erlang doc?
http://www.erlang.org/doc/reference_manual/expressions.html
Section 7.11 Term Comparisons (copied below the simplicity)

Lists are compared element by element. Tuples are ordered by size, two
tuples with the same size are compared element by element.

If one of the compared terms is an integer and the other a float, the
integer is first converted into a float, unless the operator is one of =:=
and =/=. If the integer is too big to fit in a float no conversion is done,
but the order is determined by inspecting the sign of the numbers.

Returns the Boolean value of the expression, true or false.
afaik, 1==1.0 -> true and 1=:=1.0 -> false makes sense to me. This means
that of course, you can't trust =:= for ordering elements.
Conceptually speaking, comparing floating points and integers, one can only
infer an ordering if one converts one into the other since ordering is only
defined within each's realm.

Incidentaly, what would you expect of the following:
1 < {foo, 2}
1 > {foo, 2}
1 == {foo, 2}
1 =:= {foo, 2}

IMO a better way to define an ordering in a case like these, when sorting,
would be to pass a function that defines the ordering based on input values.

I'm not sure extending the language makes sense here.
2cents
Nicholas

Ulf Wiger

unread,
Sep 6, 2010, 11:30:02 AM9/6/10
to Nicholas Frechette, erlang-q...@erlang.org
On 09/06/2010 05:03 PM, Nicholas Frechette wrote:
> Is that not the expected behavior per the erlang doc?
> http://www.erlang.org/doc/reference_manual/expressions.html

Well, yes, but that in itself is no guarantee that it is a
good solution. ;-)


> Incidentaly, what would you expect of the following:
> 1 < {foo, 2}
> 1 > {foo, 2}
> 1 == {foo, 2}
> 1 =:= {foo, 2}

There is nothing strange or unambiguous there.
Erlang has a global term comparison order so e.g. 1 < {foo,2} is
perfectly well defined.

The int->float coercion is fine... until it's not. That's why
we have =:= and =/=, because in some cases, 1 and 1.0 really
are distinctly different.

I have managed to get by programming erlang for the better part of
17 years before I was finaly bitten by this, so I'm perfectly willing
to go on happily without ever seeing this corrected.

However, I really don't see the disadvantage of ensuring that the
global comparison order can be guaranteed across the board, without
any surprising corner cases, and I think lists:sort/1 et al should
really be able to provide such guarantees.

BTW, there was another interesting edge case some years ago, that
proved very troublesome for mnesia. Pids did not sort the same
way on all connected nodes in a distributed erlang network. This broke
some fundamental assumptions in mnesia's deadlock detection algorithm.
It has long-since been fixed, but having a truly universal sort order
really is a very good thing.

Nicholas Frechette

unread,
Sep 6, 2010, 7:04:14 PM9/6/10
to Ulf Wiger, erlang-q...@erlang.org
I see what you mean by 'number' < 'tuple' defining an order. Essentially,
there is a category missing to (on that same page/section):

The arguments may be of different data types. The following order is
defined:

number < atom < reference < fun < port < pid < tuple < list < bit string

'number' could be split into two: integer and floating point. Something like:
integer < floating point < atom < reference < etc.
Such that:
1 < 1.0 -> true instead of false?
1 > 1.0 -> false as currently

This would allow lists:sort to sort but then, the behavior would be one that
isn't intuitive. (ie: 1 < 0.5 -> true)
The other way around this is as you suggested to add matching strict
operators (:>, :<, :=>, :<=) but then that still won't fix lists:sort
(without a code change to it obviously). And when one is doing sorting, one
will have to stick to one set of operators or the other, else the order
isn't potentially defined. Though this seems to apply to numbers only as the
order is explicitly defined for any other pair of terms. Your operators
would thus only be required for comparing numbers.
Perhaps it would make more sense to have =:= sometime coerce so that == and
=:= are equivalent for numbers only? Since there isn't any defined ordering
for them without that operator coercing (if you mix <, > and =:=).
Incidentaly, matching would then work with numbers:
case 1.0 of
1 -> currently_doesnt_match;
_ -> currently_always_matches
end.
(Side note: Does the compiler optimize the above and strip it since it's a
constant branch?)

Looking elsewhere, I believe in lisp the following applies:
(< 1 1.0) -> false
(< 1 2.0) -> true
(= 1 1.0) -> true

(eq 3 3) might be true or false, depending on the implementation.
(eq 3 3.0) is false.
(eq 3.0 3.0) might be true or false, depending on the implementation.

(eql 3 3) is true.
(eql 3 3.0) is false.
(eql 3.0 3.0) is true.

(equal 3 3) is true.
(equal 3 3.0) is false.
(equal 3.0 3.0) is true.

(equalp 3 3) is true.
(equalp 3 3.0) is true.
(equalp 3.0 3.0) is true.

This is the case because in lisp +-=<>*/ etc are functions that coerce.
Given a list of terms, a single function would be evaluated on them to sort
them, a function you'd explicitly define (or defined in the library). This
could lead to weird results if you mix types:
(< 1 "foo") -> true (no order defined for these, unlike in erlang)
(< "foo" 1) -> true
(= 1 "foo") -> false
Sadly, my lisp is a bit rusty and I'm not sure I can adequately explain the
above behavior. Nonetheless, when sorting or otherwise extracting an
ordering between terms, in lisp one will most likely end up using the
comparison operators and not the equality predicates (eq, eql, equal,
equalp) thus avoiding the issue you ran into (at least when not mixing
types, I presume when one is mixing types that can't coerce, one has to
provide a function in order to get strict ordering). Maybe someone with more
lisp experience than I have can elaborate further here but that is my
understanding.

Thus imo, when ordering things in erlang, one should use the comparison
operators that coerce and use the exact match operators when one is
attempting to match. Otherwise, you are setting yourself up for a world of
pain.

lists:sort is probably a "stable sort" in that if elements are equal, they
appear in the sorted list in the order they appeared in the unsorted one
which explains the behavior you see:
lists:sort([2, 1, 1.0]) -> [1, 1.0, 2]
lists:sort([2, 1.0, 1]) -> [1.0, 1, 2]
lists:sort([1, 2, 3, 2.0]) -> [1, 2, 2.0, 3]
lists:sort([1, 2.0, 3, 2]) -> [1, 2.0, 2, 3]

Sorry for the lenghty email.
Nicholas

On Mon, Sep 6, 2010 at 11:30 AM, Ulf Wiger
<ulf....@erlang-solutions.com>wrote:

> On 09/06/2010 05:03 PM, Nicholas Frechette wrote:

John Hughes

unread,
Sep 6, 2010, 7:37:16 PM9/6/10
to Ulf Wiger, erlang-q...@erlang.org
I think this is a good idea... it's handy to have a real total ordering on
values, which this would give. Then sort should be modified to use <:
instead of <, which is very unlikely to cause any problems. But note there
would need to be TWO versions of usort, one with (approximately) today's
behaviour, and one giving uniqueness up to matching. musort, maybe?

John

Robert Virding

unread,
Sep 6, 2010, 8:20:10 PM9/6/10
to Ulf Wiger, erlang-q...@erlang.org
The original source of this problem is that we got the comparison
operators wrong. With 20/20 hindsight I think we should have had >,
>=, ... as purely arithmetic comparisons, with or without conversion,
and had a separate set of operators @>, @>=, ... as pure term
comparison with no conversion.

While we can't change the existing operators we can add the pure term
comparison operators. I would prefer to have a complete set with
consistent names even if this means duplicating =:= and =/=.

There is one problem with your definitions of :< and >:, which is that
they don't work if the arguments are structures with embedded numbers
as you only check at the top-level.

Defining a comparison order should be done.

Robert

Ulf Wiger

unread,
Sep 7, 2010, 1:24:10 AM9/7/10
to Robert Virding, erlang-q...@erlang.org
On 09/07/2010 02:20 AM, Robert Virding wrote:
>
> There is one problem with your definitions of :< and >:, which is that
> they don't work if the arguments are structures with embedded numbers
> as you only check at the top-level.

Yes, of course you're right! Let's say it's just a quick-and-dirty
to illustrate the concept (which it certainly was, too).

A proper implementation would have to be done inside the emulator.

Richard O'Keefe

unread,
Sep 7, 2010, 2:27:35 AM9/7/10
to Nicholas Frechette, Ulf Wiger, erlang-q...@erlang.org

On Sep 7, 2010, at 11:04 AM, Nicholas Frechette wrote:
> Incidentaly, matching would then work with numbers:
> case 1.0 of
> 1 -> currently_doesnt_match;
> _ -> currently_always_matches
> end.

Trust me, you *really* don't want to go there.

To start with, while -0.0 and +0.0 compare equal under
IEEE rules, they also BEHAVE DIFFERENTLY. But both of
them are numerically equal to the integer 0.

I've seen what happened in VM/PROLOG, where they had a
"feature" of allowing fuzzy matching in clause heads.
It made trivial cases look kewl, but it made
reasoning about programs incredibly hard, because
X = Y and Y = Z did not imply X = Z. Heck, even the
order in which you wrote the arguments could matter.

Floats have the property that there is an number X
such that X = X+1. Integers do not. Let's *never*
confuse them.

(One of the things I just *love* about Ada and Haskell
is that they do not allow 'mixed mode arithmetic'.)


>
> Thus imo, when ordering things in erlang, one should use the comparison
> operators that coerce

Oh no, no, no. Please NO! That way lies failure of transitivity,
which is a DISASTER for sorting. More precisely, there is a way to
do comparison-by-coercion that works, and there's one that doesn't,
and the one language designers tend to pick is the one that doesn't.

Suppose there are two kinds of numbers: Q and F.
Q numbers are exact (integers, rationals).
F numbers are inexact (floats).
If you compare Q1 <= Q2, no problem.
If you compare F1 <= F2, no problem.
What happens if you compare Q1 <= F2
or F1 <= Q2?

The popular technique is what Fortran, Pascal, and C do:
convert the Q number to F format, and do F comparison:

Q1 <= F2 iff uglify(Q1) <= F2
F1 <= Q2 iff F1 <= uglify(Q2).

The problem with that is that it is possible to find
numbers X Y Z such that
X <= Y
Y <= Z
*and* X > Z

Like I said, disastrous for sorting.

The right way to do it is

Q1 <= F2 iff Q1 <= make_exact(F2)
F1 <= Q2 iff make_exact(F1) <= Q2

This is not particularly easy, and it requires
some care. Arguably we want
(1 bsl 8000) < 1.0/0.0
so that
-infinity < all finite numbers < +infinity

Robert Virding

unread,
Sep 7, 2010, 4:09:01 AM9/7/10
to Richard O'Keefe, Nicholas Frechette, Ulf Wiger, erlang-q...@erlang.org
On 7 September 2010 08:27, Richard O'Keefe <o...@cs.otago.ac.nz> wrote:
>
> ...

>
> (One of the things I just *love* about Ada and Haskell
> is that they do not allow 'mixed mode arithmetic'.)

This was also something we got wrong. We tried to be helpful and
wanted to make it easy to use. To be fair though we never saw
arithmetic as an important part of the language and in the beginning
we didn't even have floats.

Robert

David Mercer

unread,
Sep 8, 2010, 9:44:10 AM9/8/10
to Robert Virding, Richard O'Keefe, Nicholas Frechette, Ulf Wiger, erlang-q...@erlang.org
On September 07, Robert Virding wrote:

> On 7 September 2010 08:27, Richard O'Keefe <o...@cs.otago.ac.nz> wrote:
> >
> > ...
> >
> > (One of the things I just *love* about Ada and Haskell
> > is that they do not allow 'mixed mode arithmetic'.)
>
> This was also something we got wrong. We tried to be helpful and
> wanted to make it easy to use. To be fair though we never saw
> arithmetic as an important part of the language and in the beginning
> we didn't even have floats.

I wonder if the equality operator should even be applicable to floats.
After all, float F1 really represents a number between F1 - epsilon and F2 +
epsilon.

Richard O'Keefe

unread,
Sep 8, 2010, 11:58:32 PM9/8/10
to David Mercer, Robert Virding, Nicholas Frechette, Ulf Wiger, erlang-q...@erlang.org

On Sep 9, 2010, at 1:44 AM, David Mercer wrote:
> I wonder if the equality operator should even be applicable to floats.
> After all, float F1 really represents a number between F1 - epsilon and F2 +
> epsilon.

This is a popular misconception. There is no fuzziness in the number that
a floating-point number represents. The fuzziness is in the *operations*.
IEEE defines the floating point operations in terms of applying a
dynamically selected rounding algorithm to the infinitely precise result
of a mathematical operation on unambiguous inputs.

In particular, note that floating point add, subtract, multiply, and
compare are *exact* for integers expressed as floating point numbers,
provided the results are not too big. Before 64-bit integers were
readily available, I've used IEEE doubles to get 53-bit integers.

In any case, we _can't_ drop term equality from Erlang, because it's an
essential part of pattern matching. Arithmetic equality, maybe.

Jesper Louis Andersen

unread,
Sep 9, 2010, 7:34:11 AM9/9/10
to Richard O'Keefe, David Mercer, Robert Virding, Nicholas Frechette, Ulf Wiger, erlang-q...@erlang.org
On Thu, Sep 9, 2010 at 5:58 AM, Richard O'Keefe <o...@cs.otago.ac.nz> wrote:
>
> In any case, we _can't_ drop term equality from Erlang, because it's an
> essential part of pattern matching.  Arithmetic equality, maybe.
>

Fun aside relating to this point: Standard ML defines specific types
as eqtypes or types which can be compared for equality. The 'real'
type of ML is not an eqtype because of the above mentioned trouble:
comparing two reals (IEEE FPs) for equality is almost often wrong. But
as Richard writes, this has ramifications for term equality. A term is
an eqtype iff all its primitive types are eqtypes and all composite
types are eqtypes. That is, you can have terms which you cannot
compare for equality with the '=' operator. For composite values, this
does not rule out partial pattern matches on only some components
however.

The MosML implementation of SML get this wrong and allows reals to be
compared for equality. This also means that programs which compile in
MosML does not compile in other SML implementations. It has bitten me
more than once.


--
J.

Richard O'Keefe

unread,
Sep 9, 2010, 10:27:48 PM9/9/10
to Jesper Louis Andersen, David Mercer, Robert Virding, Nicholas Frechette, Ulf Wiger, erlang-q...@erlang.org

On Sep 9, 2010, at 11:34 PM, Jesper Louis Andersen wrote:
> Fun aside relating to this point: Standard ML defines specific types
> as eqtypes or types which can be compared for equality. The 'real'
> type of ML is not an eqtype because of the above mentioned trouble:
> comparing two reals (IEEE FPs) for equality is almost often wrong.

I see this claim a lot, but I wonder how true it is.
In *beginners'* code, yes, sure, but then, almost everything in
beginners' code is wrong at some point.

Attempts to deal with the problem by banning = don't really help,
because you can always define
fun eql x (y:float) = not (x < y orelse x > y);
after which eql 0.0 ~0.0 is not only defined but true.

If it comes to that, the MLton compiler complains every time you
use polymorphic equality at all.

> The MosML implementation of SML get this wrong and allows reals to be
> compared for equality. This also means that programs which compile in
> MosML does not compile in other SML implementations. It has bitten me
> more than once.

ocaml also does this.

Pascal Chapier

unread,
Sep 10, 2010, 2:05:23 PM9/10/10
to erlang-q...@erlang.org

I didn't know that float appears lately in Erlang.

It should be the reason why we have this weird type of number.

Integer is a mathematical group with a pretty good representation in Erlang where there is no size limit.
+, - and * are defined in that group, and the Erlang algorithms are consistent with the mathematical operations.

Float is nothing but a subset of the rational numbers. In this subset, there is no guarantee that the result of a
mathematical addition between 2 float is a float. IEEE gives the rules to round the math result to the closest float.

IEEE +, -, *, / are perfectly defined functions, but they have not the properties of their math counterpart:

1> Y = 1.7e308.
1.7e308
2> W = -Y.
-1.7e308
3> D = 9.975e291 .
9.975e291
4> Y =:= Y + D.
true %% Hummm !
5> (Y + D*1.1) - Y .
1.99584030953472e292
6> ((Y + D*1.1) - Y) > 2*D .
true %% i.e. D*1.1 > 2*D with D > 0.0 !
7> (D+Y)+W.
0.0 %% really ?
8> D+(Y+W).
9.975e291 %% very good, but different!
9>

In these examples, all variable are floats, this shows that one must be aware of the float representation when using them.
Mixing floats with integer, complex calculation, FPU internal representation and code implementation can even create a bigger mess.
In real life, this kind of problem is not likely to occur, because we generally use float in a small range, far from the maximum value,
and we are generally not interested in very very small values - except 0.0 (on my side I change unit if the values I am using are smaller
than 1/1000 or bigger than 10000).
Nevertheless, the smallest relative difference between two float is around 10e-16 (in 64bits), and this is sometime a problem.

I wrote a small 3D simulation program as a hobby. The float were very useful, but I had to write my own operators for <, > and ==,
based on a global value NEAR, which define a relative limit where float representation is not relevant, knowing my problem and the algorithms I use.

My conclusion is that it is a bad idea to compare floats, a worse idea to check their equality, and an error to mix integers and floats is such operations,
using only implicit conversion and standard IEEE operations. The same conclusion applies to pattern matching.

I think that it is a good idea to say that the target is to have separate float and integer types, and separate them in Erlang terms comparison.

The problem I see, is that it is not possible to detect at compilation a mix in comparison or operation, and this seems be necessary to have
an intermediate phase where numbers and numbers operations or comparisons have a deprecate status (i.e. operation without explicit conversion).

Is it acceptable to have warnings at run time, and is it feasible to have these warnings controlled by a flag, for backward compatibility?

I am not sure that the problem is enough critical to be worth the creation of new syntax operation or the dynamic check .

Pascal.

Reply all
Reply to author
Forward
0 new messages