In the end I went for using most of the dict module API unchanged, with
some new synonyms, and a number of additional useful functions. I also
made the dict module define a new 'dict' behaviour (even though it's
just an interface rather than a complete behaviour).
One particular detail is that gb_trees (with its user-unfriendly name
and rather different calling conventions) can now quite simply be used
through the dict module as an ordered variant of dict, and you can
pretend you never heard of the gb_trees module unless you want to use
one of its specially implementation-dependent functions. An ordered dict
can be created through dict:new([ordered_set]). This also resolved some
major problems with clashing function definitions.
The code (based on the OTP maint branch) can be found here:
https://github.com/richcarl/otp/tree/dict-api
The quickest way of having a look at the suggested API is to follow this
link and scroll down to see the behaviour callback:
https://github.com/richcarl/otp/blob/dict-api/lib/stdlib/src/dict.erl
Reactions, suggestions, and test pilots are most welcome. I'm not ready
to write an EEP until I feel the API is good enough. Things still on the
TODO-list are: updating the docs for ets and dets, figuring out a good
set of iterator functions, and maybe include QLC table/1/2 functions and
match/select functions in the API. And then the same kind of unified API
should be made for sets and probably also for sequences.
A Good Yule to y'all,
/Richard
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
You can see the dictionary signature here
https://github.com/erlware/erlware_commons/blob/master/src/ec_dictionary.erl
which contains what we came up with along those lines.
signatures are our attempt to provide, standard interfaces that are
friendly to tools like dialyzer and xref. The best part about them is
they don't require language changes.
T
I agree dict API is way to go, because it is so used. However I would be
against introdusing any synonymous, it may start confusing.
What I do not link in dict API is how update/4 function behaves.
It is called like update(Key, Fun, Initial, Dict) -> Dict.
Problem is that if Initial is some sort of complex and costly
to compute initial value, then it will be most of the time
wasted, because it will be not needed.
THis is why I often use function like this:
update_full(Key, Fun, FunInit, Dict) when is_function(Fun, 1), is_function(FunInit, 0) ->
case dict:is_key(Key, Dict) of
true ->
dict:update(Key, Fun, Dict);
false ->
Val = FunInit(),
dict:store(Key, Val, Dict)
end;
Of course integrating it directly into dict will make it even faster, by
not needing to traverse datastructure twice.
I'm not really sure if integrating abstract collections into existing
dict module is good idea. It should not break compatibility, but it will
for sure bring some performance hit (due pattern matching in the dict
module and delegation to other modules).
I think dict module should be leaves as is, and new module should be
introduced, like gen_dict. Sure in some sense, it is easier to just
find all dict:new() using simple grep, and change it to dict:new([...]),
where appropriate without worring about other call sites, but if for some
reasons one changes dict:new([dict]), to something else, some functions
may subtelly change how they work, so I think it should not messed too much.
And running sed s/\bdict:\b/gen_dict:/ or similar things, shouldn't be hard,
but will make sure developer understand it can be:
1) slightly slower, due additional level of indirection
2) have slightly different API
In fact I prepared much simpler (but ugly) wrapper years ago, and often
I use it when I know I needs to store and access randomly potentially
lots of elements, but want to test if it is actually faster than using
simple orddict, or proplist or gb_trees or own datastructures:
https://github.com/baryluk/common_collection % I have somewhere on disk even newer version, will need to search
No tests, no real documentation, no nothing, no simple way of adding
collections without editing file (howver it should be simple to fix). I
will for sure write it now in different way (I was using for some time
a parametrized modules as wrappers to dict and proplists also, but because
of unknown future of them Erlang - I personally like them very much! -
i stoped using them, despite being even simpler to use). But it works.
It looks that https://github.com/erlware/erlware_commons/tree/master/src
is more modular and easier to extend by other persons without changing
source code. It probably is faster. But it lacks many very useful
functions, as well is not separate project on its own (as it should be).
--
Witold Baryluk
JID: witold.baryluk // jabster.pl
The synonyms I was referring to are only the old names for certain
functions in the dict module. For backwards compatibility reasons, they
cannot simply be removed from dict.erl, but new names are needed for the
unified API to avoid collisions with functions in other modules (or, in
a few cases, to allow better names and more consistent argument order).
The suggested unified dict API does not in itself contain more than one
name for any function.
> What I do not link in dict API is how update/4 function behaves.
> It is called like update(Key, Fun, Initial, Dict) -> Dict.
> Problem is that if Initial is some sort of complex and costly
> to compute initial value, then it will be most of the time
> wasted, because it will be not needed.
This could be a good point. The function update/4 is part of the
existing dict module and cannot be changed, but my suggested new
function map/4 could be made to take an initialization function (Key) ->
term() instead of just a term to be stored.
> I'm not really sure if integrating abstract collections into existing
> dict module is good idea. It should not break compatibility, but it will
> for sure bring some performance hit (due pattern matching in the dict
> module and delegation to other modules).
And this is definitely not the intention. The only delegation that is
done is in order to save the user from knowing about the gb_trees
module. The dict module becomes more like ets, in that you can choose
between a hashed and an ordered dictionary, but you shouldn't have to
know anything about the implementation. Other kinds of dictionaries such
as ets and orddict only implement the same interface; you are not
supposed to be able to use the dict module to manipulate an orddict or
an ets table. This is not object oriented programming. If you want to do
that kind of wrapper where the data structure knows its implementation
and does its own dispatch, you can build that yourself on top of this API.
> I think dict module should be leaves as is, and new module should be
> introduced, like gen_dict. Sure in some sense, it is easier to just
> find all dict:new() using simple grep, and change it to dict:new([...]),
> where appropriate without worring about other call sites, but if for some
> reasons one changes dict:new([dict]), to something else, some functions
> may subtelly change how they work, so I think it should not messed too much.
You should not have to change anything in your code. Existing calls to
dict:new() work as they are, since the default is still to create a
plain old hashed dictionary. Only if you want to create an ordered
dictionary do you need to call dict:new([ordered_set]). You do not have
to specify a callback module as in your example dict:new([dict]).
I agree that if you want to make a more object oriented implementation
with method dispatch, another module name should be used for that. But
what I have suggested is just a unified API for the existing dictionary
data types.
/Richard
My 2c.
/s
On Dec 29, 11:35 am, Richard Carlsson <carlsson.rich...@gmail.com>
wrote:
> ...but you shouldn't have to know anything about the implementation.
And I basically agree, which is why I'm not aiming for an object
oriented approach. But I think that letting the dict module have an
ordered version, just like ets tables, is a good idea. You still get to
choose between O(n) time for an unordered table and O(log n) time for an
ordered table when you initialize the dict, and from a usability
perspective it means that for most cases where you want a functional
dictionary data type, the dict module is all you need to know.
I couldn't get a good unified dict API until I decided to remove the
gb_trees module from the equation by making it accessible via dict
instead, because its current API is too different from the existing
dict/orddict API and the name clashes got too difficult to resolve. I
could of course also invent a new name for gb_trees, but that seems
unnecessary.
/Richard
I think it is good to have at least one tree based version as well. Though I would much prefer to have it in a separate module (gbdict?) instead of baking it into the dict version. I prefer to be explicit rather then "hiding" behind one module. In most case you do really want to know what is going on. I have two tree versions that I have long been meaning to include in the distribution, rbdict based on Red-Black trees and aadict based on the same algorithms as gb_trees but implemented in a different way. The more the merrier.
But as I said I strongly feel it is much better to have separate modules. It also makes it easier to add versions, which as you can probably understand I think there should be.
I have not yet looked through the suggestion for extending the API, but I will shortly.
Robert
There seems to be a lot of confusion going on here between *interface*
(API) and *dispatch* (redirecting the API calls to the actual
implementation depending on the data structure). I have only suggested a
unified API; not a generic dispatch mechanism.
"Tuple modules" (please call them Parameterized modules or Abstract
modules - the implementation-as-tuples is a temporary hack, and you
should treat them as opaque objects just like funs) are not needed,
because there is nothing here to parameterize. If all dict-like modules
implement the same API, it's enough to use the module's name. For example:
MyDict = orddict,
...
D = MyDict:new(),
...
What you *could* do is to parameterize the module that uses the dictionary:
-module(my_module, [SomeDict]).
...
foo() ->
D = SomeDict:new(),
...
and then you could easily change implementation as you like:
M1 = my_module:new(orddict),
M1:foo(),
M2 = my_module:new(ets),
M2:foo(),
...
That said, I think Clojure's protocols (which are a way of dynamically
getting the same effect that you get with Haskell's type classes) are a
neat idea that would probably work well with Erlang. But that's a
different story.
There seems to be a lot of confusion going on here between *interface* (API) and *dispatch* (redirecting the API calls to the actual implementation depending on the data structure). I have only suggested a unified API; not a generic dispatch mechanism.
"Tuple modules" (please call them Parameterized modules or Abstract modules - the implementation-as-tuples is a temporary hack, and you should treat them as opaque objects just like funs).
That said, I think Clojure's protocols (which are a way of dynamically getting the same effect that you get with Haskell's type classes) are a neat idea that would probably work well with Erlang. But that's a different story.
It is of course possible to *also* give gb_trees a new interface (which
requires a new module name) although I didn't want to do that as part of
this exercise. Still, I think that it's a good thing to just make the
dict module itself offer both a hashed form and an ordered (tree based)
form, because it makes Erlang more approachable. These things are not
mutually exclusive.
The actual implementation used for the hash and tree forms could change
between Erlang releases depending on what is deemed to be the most
efficient; after all, the current hash table implementation is
explicitly said to be opaque and might be replaced with some more
efficient variant one day.
But for the basic Erlang user, I think it ought to be enough to know
that the dict module covers most of his needs, both for ordered and
unordered dictionaries. If he later wants to optimize his code, he can
then familiarize himself with the other N dict-like modules in the
distribution and pick one that might be better for his use case. If they
all speak the same API, swapping them should then be simple.
Yes, the most important thing is definitely that the various dict modules all have the same API. My thought with having separate modules for each implementation would be that it would be easier to add new and different implementations. You could have versions tuned to different types of usage profiles. Even if you add the ordered tree option to 'dict' I think there should still be a version of it in a separate module, like gbdict. For backwards compatibility.
Robert
There seems to be a lot of confusion going on here between *interface* (API) and *dispatch* (redirecting the API calls to the actual implementation depending on the data structure). I have only suggested a unified API; not a generic dispatch mechanism.
"Tuple modules" (please call them Parameterized modules or Abstract modules - the implementation-as-tuples is a temporary hack, and you should treat them as opaque objects just like funs) are not needed, because there is nothing here to parameterize. If all dict-like modules implement the same API, it's enough to use the module's name. For example:
MyDict = orddict,
...
D = MyDict:new(),
...
What you *could* do is to parameterize the module that uses the dictionary:
-module(my_module, [SomeDict]).
...
foo() ->
D = SomeDict:new(),
...
and then you could easily change implementation as you like:
M1 = my_module:new(orddict),
M1:foo(),
M2 = my_module:new(ets),
M2:foo(),
...
That said, I think Clojure's protocols (which are a way of dynamically getting the same effect that you get with Haskell's type classes) are a neat idea that would probably work well with Erlang. But that's a different story.