Custom vectors/maps and sequence functions

156 views
Skip to first unread message

plamen...@gmail.com

unread,
Jan 15, 2019, 8:58:07 AM1/15/19
to Clojure
Hello all

while working on a daily basis where I use Clojure's native vectors/maps I almost never experience the problem and even if - it is easy fixable with something like (into [] ...), I have the following problem with custom data structures and ask here for clarification if my understanding is correct.

Imagine I try on the one side to represent something like a database table in memory, while on the other to make it pluggable into all meaningful sequence and vector/map functions in Clojure. In the most naive implementation a table is a vector of maps. If i would like to add more functionality, while trying to remain transparent to Clojure functions I could implement a custom Record type (IPersistentMap etc.) for the rows (for a custom storage, for column ordering, type checking etc.) and this works. I could also implement a custom Table (IPersistentVector etc.) for the actual collection of records while maintaining internally some indexes, the schema of the table etc. The point is that if possible to apply let say (filter.. or (take 2 ..., (sort .. incl. define/apply transducers. or whatever other Clojure function compatible with these interfaces on the Table/IPersistentVector/IPersistentCollection and get back a Table.

What I know:
1. Clojure's doc actually says that sequence functions return sequences and say nothing about preserving the original type, the implementations of the sequence functions do actually as advertised and return usually Cons-es.
2. monads, yes, but the point is not to force the user to use custom functions instead of the built in ones.
3. IPersistentCollection has first/next/more methods which could return a Table instead of a Record or a collection of them.
4. IPersistentVector has a cons method, but it is used by Clojure's conj for example, but not inside the implementations of filter/take etc. which create actual Cons objects
5. I could attach table like descriptions to each Record object (be it in its metadata or else), but then enforcing that all Records share the same Table data could get penalizing at runtime.
6. I know how Incanter and core.matrix try to solve some similar problems.
7. there are some vector functions which return actual vectors back (like filterv, mapv), but also not the original collection type.


So - do I miss something either in my knowledge or in the Clojure's documentation/implementation and is there a meaningful way to apply Clojure's and not mine filter/map/take/drop/sort etc. functions on a Table and to get a Table back, without going the monads/own functions for everything route or should I take it for granted that I can implement some custom types, but no way to get them be fully transparent to Clojure?

With best regards
Plamen

plamen...@gmail.com

unread,
Jan 15, 2019, 9:01:28 AM1/15/19
to Clojure
P.S. This is a question about clarification/advice, not a critique to Clojure, as it actually works as advertised :)

Alex Miller

unread,
Jan 15, 2019, 9:19:05 AM1/15/19
to Clojure
Well first, there is a conscious split between collection and sequence functions. Collection functions take a collection and return the same collection (and take the collection first) - things like conj, assoc, dissoc, disj, etc. Those functions are all trait-based and "update" operations are performed on the collection (via those trait interfaces) so the collection has the opportunity to return a new variant of its own type.

For sequences, these take and return sequences (really seqables), which is a logical abstraction. Sequence functions inherently make new sequences and you don't have an opportunity to influence how that happens (they mostly build new lazy seqs from scratch). However, this is exactly the thing that transducer operations do - they let you separate the operation from what happens with the results. So using something `into`, with a transducer chain, from your source collection to your target collection is again, only using collection ops (conj) so you're back in the collection world.

Hopefully that confirms/answers what you're thinking through.

James Reeves

unread,
Jan 15, 2019, 9:47:13 AM1/15/19
to clo...@googlegroups.com
On Tue, 15 Jan 2019 at 13:58, <plamen...@gmail.com> wrote:
So - do I miss something either in my knowledge or in the Clojure's documentation/implementation and is there a meaningful way to apply Clojure's and not mine filter/map/take/drop/sort etc. functions on a Table and to get a Table back, without going the monads/own functions for everything route or should I take it for granted that I can implement some custom types, but no way to get them be fully transparent to Clojure?

You could write:

    (into (empty coll) xform coll)

Assuming coll is some implementation of IPersistentCollection and where xform is your transducer.
 
So for example:

    (into (empty table) (comp (filter process?) (map process)) table)

--
James Reeves

plamen...@gmail.com

unread,
Jan 15, 2019, 10:22:33 AM1/15/19
to Clojure
Hello Alex


On Tuesday, January 15, 2019 at 3:19:05 PM UTC+1, Alex Miller wrote:
Well first, there is a conscious split between collection and sequence functions. Collection functions take a collection and return the same collection (and take the collection first) - things like conj, assoc, dissoc, disj, etc. Those functions are all trait-based and "update" operations are performed on the collection (via those trait interfaces) so the collection has the opportunity to return a new variant of its own type.

Yes


For sequences, these take and return sequences (really seqables), which is a logical abstraction. Sequence functions inherently make new sequences and you don't have an opportunity to influence how that happens (they mostly build new lazy seqs from scratch). However, this is exactly the thing that transducer operations do - they let you separate the operation from what happens with the results. So using something `into`, with a transducer chain, from your source collection to your target collection is again, only using collection ops (conj) so you're back in the collection world.

Yes, this confirms my understanding and that there is not only no shortcut, but that it would be wrong to actually look for a shortcut given the way the abstractions are defined.
 

Hopefully that confirms/answers what you're thinking through.

Lot of thanks for taking time to reply!

With best regards
Plamen

plamen...@gmail.com

unread,
Jan 15, 2019, 10:28:12 AM1/15/19
to Clojure

Hello James

Yes. The (into (empty table) (comp (filter process?) (map process)) table) was exactly what I wanted for an end user to avoid to have to write, but just a (filter ... table), otherwise I could provide a wrapper like (defn myfilter [pred table] (into (empty table) (comp (filter process?) (map process)) table)), which I also wanted to avoid, but you and Alex are both right.

Thank you for taking time.

With best regards
Plamen

greybird

unread,
Jan 15, 2019, 12:43:09 PM1/15/19
to Clojure
Hi Plamen, I don't have any advice to offer but I'm curious why you want to bind the table and column type info directly onto the result set. If you associate them in some other way, then you can just use plain maps and vectors. Are you trying to have less total objects in your API?

plamen...@gmail.com

unread,
Jan 15, 2019, 1:22:22 PM1/15/19
to Clojure
Hello Mark

the reason was that I want them to act as if they are maps and vectors for an end user developer and let him work with all the usual Clojure functions for these data structures, while internally they would have some different implementation and additional functionality (for unrelated to Clojure itself reasons). So, for me it is less about having less API objects, it was much more about transparency (e.g. HashMap and SortedMap have essentially the same interface and one just uses assoc/dissoc independent of the implementation except for the actual constructor). In my case this works well for the "Records" as they are the elements of the "Table" collection. But for the sequence functions operating on a "Table" - as Alex mentioned, they return own sequences of elements ("Records") actually and this with reason in Clojure's line of thought. And as both mentioned, this will need restitching at the end back into a "Table". I just hoped that I miss something and there would be a transparent way (e.g. through the Clojure collection/sequence interfaces) to do this restitching "(into (table) ..." without rerouting the user to custom wrappers or let him do it on his own.

This will actually work painlessly, except that the user needs to use then the wrapper functions, where the mapping between a records and tables map more or less directly to maps and vectors. But if one would like to implement something different and more involved (e.g. column/sparse/off-heap store for the table or some indexing which is meaningful not for a single "Record", but for the "Table" as a whole) the a "Record" would be either only a view over specific indexes in the column arrays (in the naive case) or the "Records" would be some derived objects for use only by the end user, but physically detached from the actual storage - and in these more convoluted examples (or need) the transparency will get harder. So - it seems to me that to support both - element, collection and sequence operations, I'll have to go so or so through some own wrappers, which are probably less flexible than the built in functions, but keep the whole thing consistent together (and at the end will fuse the collection/sequence operations to maintain the "Table" abstraction).

Regards
Plamen

greybird

unread,
Jan 15, 2019, 1:41:52 PM1/15/19
to Clojure
Thanks for explaining Plamen. Yes, it seems very difficult to treat a database abstraction as a regular Clojure map/vector. FWIW, in writing an app I think it works well to use SQL queries to return ordinary maps/vectors which can then be manipulated as usual. But I think you are doing something more ambitious. The interface to a database is something that is very tempting to try to improve, and I have tried several times without accomplishing very much in the end. :-)

plamen...@gmail.com

unread,
Jan 15, 2019, 1:49:44 PM1/15/19
to Clojure
P.S. Actually it just a coincidence that in my use case I am not concerned with let say trees, but with "Records" in a "Table", which in contrast map somehow nicely to IPersistentMap (where I can put a lot of additional functionality in a transparent way) and a "Table" (where the sequence functions built in sequence functions actually desintegrate the "Table", which is undesired for me) which _could_ map to a "IPersistentVector", which makes me desire to have them as transparent to the rest of Clojure as possible and let them act as if the user would actually use vectors and maps and invoke additional functionality only when needed, while all the internal accounting/maintenance of the table of records is hidden from the user. I wanted that it is more like a vector of maps and not a Clojure sequence of maps, because the vector abstraction has some useful (not for databases in general, but but my specific use case) properties and because it is already extensible through the Clojure's collection protocols. On the other side (outside of Clojure) I could aim to maintain the other abstraction of lazy/chunked sequences which are produced by Clojure's sequence functions, but then again have to restitch them to "Table"-s and there is no protocol for it. On the third side one could think of a Table which always has at least some primary key - then one would rather use IPersistentMap for the Table itself, but this again doesn't provide a transparent solution for the sequence functions. And so on and so on. I just miss the glue which in all these cases, which would allow me to give to the end user a fully maintained and consistent "Table" without forcing him into wrapper functions or do-it-yourselfs instead of using the built ins - I wanted that for the end user only an implementation is different, but the whole interface remains pure Clojure :)

Herwig Hochleitner

unread,
Jan 16, 2019, 6:52:35 PM1/16/19
to clo...@googlegroups.com
Am Di., 15. Jan. 2019 um 14:58 Uhr schrieb <plamen...@gmail.com>:

Imagine I try on the one side to represent something like a database table in memory, while on the other to make it pluggable into all meaningful sequence and vector/map functions in Clojure. In the most naive implementation a table is a vector of maps. If i would like to add more functionality, while trying to remain transparent to Clojure functions I could implement a custom Record type (IPersistentMap etc.) for the rows (for a custom storage, for column ordering, type checking etc.) and this works. I could also implement a custom Table (IPersistentVector etc.) for the actual collection of records while maintaining internally some indexes, the schema of the table etc.

IPersistentCollection and its derivatives are great to have on a custom data type, as applicable.

To be a good citicen in modern clojure, you should also implement IReduceInit / IReduce. This allows your collection to drive a transducer stack, in order to evaluate whole-collection transformations at top-speed. 
All transducer machinery is based on IReduceInit, so as soon as you implement that, you enable efficient usage of reduce, transduce, into, eduction, ... on your collection.

Also consider parameterizing your collection with a transducer stack, as a basis for composable, postponed-eager transformations:

(table-transform #db{:trafo xf-db :db-data ...} xf-a)
=> #db{:trafo (comp xf-db xf-a) :db-data ...}

The point is that if possible to apply let say (filter.. or (take 2 ..., (sort .. incl. define/apply transducers. or whatever other Clojure function compatible with these interfaces on the Table/IPersistentVector/IPersistentCollection and get back a Table.

Implementing IEmptiableCollection allows the standard idiom:

(into (empty db) xform db)

If you have a more efficient way to transform your collection, e.g. by composing transducers, there is no direct Clojure interface for that, sorry.
Though, you can implement something congruent to #(into (empty %1) %2 %1) and call it table-transform (traverse?).

What I know:
1. Clojure's doc actually says that sequence functions return sequences and say nothing about preserving the original type, the implementations of the sequence functions do actually as advertised and return usually Cons-es.

This is on purpose. As soon as you've called seq on something, you're not supposed to recover its original type and can only use more sequence functions on it.
This allows for any number of intermediate sequence representations, most notably lazy-seq, but also chunked-seq, ...
 
2. monads, yes, but the point is not to force the user to use custom functions instead of the built in ones.

In short, Clojure being a dynamic language and hence not having type-guided expression generation (the return/pure problem), I think the best approach for implementing monads is to hardcode the continuation monad and embed other monads into that. See http://blog.sigfpe.com/2008/12/mother-of-all-monads.html#c6884748521935561127
That's what transducers essentially do, IMO.

5. I could attach table like descriptions to each Record object (be it in its metadata or else), but then enforcing that all Records share the same Table data could get penalizing at runtime.

That's the same trade-off as whether to implement substring by copying or pointing to the original string. I don't think there is a universal answer to that. Depends on your use case.

So - do I miss something either in my knowledge or in the Clojure's documentation/implementation and is there a meaningful way to apply Clojure's and not mine filter/map/take/drop/sort etc. functions on a Table and to get a Table back, without going the monads/own functions for everything route or should I take it for granted that I can implement some custom types, but no way to get them be fully transparent to Clojure?

As detailed above, clojure's seq abstraction is intentionally opaque, so own functions + implementing seq, empty, conj, being reducible and accepting transducers is as transparent as it gets.

For optimizing single-thread performance, implement IEditableCollection

For utilizing multiple cores, implement clojure.core.reducers/CollFold

best regards

Herwig Hochleitner

unread,
Jan 16, 2019, 6:57:54 PM1/16/19
to clo...@googlegroups.com
Am Do., 17. Jan. 2019 um 00:52 Uhr schrieb Herwig Hochleitner <hhochl...@gmail.com>:
5. I could attach table like descriptions to each Record object (be it in its metadata or else), but then enforcing that all Records share the same Table data could get penalizing at runtime.

That's the same trade-off as whether to implement substring by copying or pointing to the original string. I don't think there is a universal answer to that. Depends on your use case.

I misread your point, sorry. Please disregard this comment.
Reply all
Reply to author
Forward
0 new messages