Anyone using the sdb library?

60 views
Skip to first unread message

Chas Emerick

unread,
Feb 24, 2011, 3:36:33 PM2/24/11
to Clojure
Is anyone using the sdb (AWS SimpleDB) client library, originally written by Rich Hickey in 2009, and then tweaked in various ways by a couple of others since?

Github repo network here: https://github.com/richhickey/sdb/network

I ask because I have some ideas for some changes and enhancements to the library, some of which would be breaking (potentially from both an API and data format standpoint). It seems like having a dialogue with anyone that is actively using it would be productive, if only to ensure that I'm not headed towards the weeds. Beyond that, a collective attempt to coordinate the direction of the project would be good (rather than simply letting off-by-one forks of it proliferate on github).

Cheers,

- Chas

.Bill Smith

unread,
Feb 24, 2011, 5:58:00 PM2/24/11
to clo...@googlegroups.com
I don't know, but if you introduce breaking changes, you'd better name it version 2.0.

Mark Rathwell

unread,
Feb 24, 2011, 6:05:22 PM2/24/11
to clo...@googlegroups.com

I used it as a starting point for an sdb lib a while back, moved that project to GAE though.  One note, it uses an outdated version of the AWS java libraries, you should probably update that if you're in there.


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Chas Emerick

unread,
Feb 24, 2011, 6:23:02 PM2/24/11
to clo...@googlegroups.com

On Feb 24, 2011, at 5:58 PM, .Bill Smith wrote:

> I don't know, but if you introduce breaking changes, you'd better name it version 2.0.

;-)

It'd actually be nice to establish a stable groupId and artifactId for the project (rather than the constantly-shifting `org.clojars.github-username-here`), which in conjunction with breaking changes, would warrant dropping back to v0.x.y until those changes have proven themselves.

- Chas

Chas Emerick

unread,
Feb 26, 2011, 3:38:13 PM2/26/11
to Clojure
FYI and FWIW, I've jotted down some notes detailing how I'd like to
see the library change:

https://docs.google.com/document/d/1K5p2RRVtvYxBNLEJuWGNf1iZak2ri8cI73joWu9K1W0/edit?hl=en&authkey=CMDR_6AF

If you have feedback, questions, thoughts, whatever, do let me know.

Thanks,

- Chas

Gijs S.

unread,
Feb 27, 2011, 5:59:35 AM2/27/11
to Clojure
Hi,



I have dabbled a bit with both AWS SimpleDB and the sdb library as
well as with the similar Google App Engine Datastore and related
Clojure libraries.



These two parts are indeed no-brainers:

- Support for consistent reads

- Support for literal string queries



Re: numeric formatting

I am not too familiar with the details of numeric types in SimpleDB
and Clojure 1.3. Congruence between the two through usage of longs and
integers sounds good.



Re: type indicators in attribute values

Indeed this gives some troubles when using literal string queries or
using different tools and languages. The suggested move to storing
type indicators in the attribute names instead of values would improve
the usability of
literal string queries and the "LIKE" construct. Below is another
suggestion.



Re: suggestion to be able to customize how the library should
serialize and deserialize values

I have used two clojure libraries with the Google App Engine Datastore
that provide such functionality. Both libraries need an explicit
definition of an entity. They also allow the specification of a
serialization and deserialization function per attribute, for instance
with appengine-clj:



(ds/defentity Link ()

((url :key identity)

(title)

(points)

(date :serialize joda->javadate

:deserialize java->jodadate)))



The two libraries are:

appengine-clj [https://github.com/r0man/appengine-clj/]

clj-gae-datastore [https://github.com/smartrevolution/clj-gae-
datastore]

(Note that the often mentioned appengine-magic has no support for
defining how values should be serialized and deserialized in the
datastore)



To talk in terms of entities is perhaps too much structure for the
simpledb. I would suggest a way to define serialization and
deserialization per attribute per domain in code. This would mean that
there is no encoding of the type of an attribute in its name or value
in the storage. get-attr and put-attr take the domain as an argument,
which can be used to lookup the proper serialization and
deserialization function per attribute.



This approach would do away with storing type information into
simpledb. It is also possible to provide this functionality on top of
the existing simpledb library or the library with the suggested
improvements.



Finally, a (get-attr "domain" "some-id") call returns a {:sdb/id "some-
id"} map when there is no item with that id in the domain. This means
that a get-attr result can always be used a base case to merge updates
in, but I have also found cases where I would have preferred a nil
result when the item doesn't exist. I would prefer the nil returning
case as this works nicely with if-let constructs.



-Gijs

Chas Emerick

unread,
Feb 27, 2011, 8:04:58 AM2/27/11
to clo...@googlegroups.com
Thanks very much for the feedback!


On Feb 27, 2011, at 5:59 AM, Gijs S. wrote:

To talk in terms of entities is perhaps too much structure for the
simpledb.

I would agree -- although if the data one is storing in SDB is regular enough, you should be able to build ORM-esque functionality on top of the approach I'm proposing.


I would suggest a way to define serialization and
deserialization per attribute per domain in code. This would mean that
there is no encoding of the type of an attribute in its name or value
in the storage. get-attr and put-attr take the domain as an argument,
which can be used to lookup the proper serialization and
deserialization function per attribute.

This approach would do away with storing type information into
simpledb. It is also possible to provide this functionality on top of
the existing simpledb library or the library with the suggested
improvements.

If I'm understanding you properly, you could build this on top of the :encode/:decode functions in the configuration object as well.  I've added this example to the design notes:

Or, if you wanted to avoid storing any type indicators in SDB, you could explicitly map formats to attribute names (which would then be used on a per-domain basis):

(def config {:sdb-client (AmazonSimpleDBClient. …)
            :encode (fn [[k v]]
                      [(str k)
                       (case k
                         (:mls :asking-price) (encode-integer v)
                         (:address :agent-name) (str v)
                         :listing-date (encode-date v)
                         (str v))])
            :decode …})

If the above is common usage, a helper function that takes a simple map of attribute names -> value types and returns a corresponding configuration object would be nice.

Is that what you were describing?

- Chas

Gijs S.

unread,
Feb 27, 2011, 9:15:30 AM2/27/11
to Clojure
Yes, my suggestion without type indicators in SDB would look something
like this:

(def config {:sdb-client (AmazonSimpleDBClient. ...)
:mapping {"Link" {"url" {:encode encode-string :decode
decode-string}
"points" sdb/integer-encoding
"date" {:encode encode-jodadate :decode
decode-jodadate}}
"AnotherDomain" {...}})

sdb/integer-encoding would be a library provided encoding/decoding
map.

In this approach each encoding/decoding pair is defined at the level
of an attribute in a domain.

A (get-attr config "Link" "id") will go through the mapping in the
config to create a clojure datastructure with each attr properly
decoded.

A mapping map would allow for a bit more composition then one big
encode fn in the config where a case construct needs to be extended
for each attribute. For instance the mapping could be constructed like
this: {:mapping (merge link-domain-map another-domain-map ...)}.
Perhaps a library could provide a macro that expands (defdomain Link
[url :string, point :int, date {:encode .. :decode ..}]) into such a
mapping map.

Because this approach doesn't have any type indicators in SDB, every
attribute needs to have an encoding/decoding defined or there needs to
be a default encode/decoding, which for Clojure could be print-string/
read-string.

-Gijs

Mark Rathwell

unread,
Feb 27, 2011, 1:06:27 PM2/27/11
to clo...@googlegroups.com

If you have time, I posted a gist containing a data access library I built on top of Rich's sdb library (data.clj), and the modifications I made to his sdb library (sdb.clj) for consistent reads, etc.  This is some of the first real clojure code I wrote, so not the prettiest, but maybe you can see some of the pain points I had using the current sdb library.

To summarize:

 - Whether or not to add asynchronous client support
 - Not a nice generic way of building up select dsl maps
 - I believe the select dsl 'where' clause only handles up to two predicates
 - How to account for nil / blank values
 - Automatically sync domains with a specified list

Certainly some of this belongs above the level of the sdb library, but some of it should be handled there.

 - Mark


Mark Rathwell

unread,
Feb 27, 2011, 1:08:42 PM2/27/11
to clo...@googlegroups.com

Chas Emerick

unread,
Feb 28, 2011, 3:58:14 PM2/28/11
to clo...@googlegroups.com
Mark,

Thanks for the input. Some comments inline:

On Feb 27, 2011, at 1:06 PM, Mark Rathwell wrote:

> If you have time, I posted a gist containing a data access library I built on top of Rich's sdb library (data.clj), and the modifications I made to his sdb library (sdb.clj) for consistent reads, etc. This is some of the first real clojure code I wrote, so not the prettiest, but maybe you can see some of the pain points I had using the current sdb library.
>
> To summarize:
>
> - Whether or not to add asynchronous client support

Using the async client within Clojure doesn't make sense IMO. We can make any call asynchronous (at least, as the AWS library defines 'asynchronous') by wrapping it in a future.

> - Not a nice generic way of building up select dsl maps
> - I believe the select dsl 'where' clause only handles up to two predicates

I'll attempt to verify and potentially resolve that when I dig into the implementation work.

> - How to account for nil / blank values

That's a tough one. As you might have seen, I'm strongly leaning towards eliminating the type tags in formatted values, which would make representing nil pretty difficult.

Is this really a desired feature to begin with? As it stands, distinguishing between a nil value and no entry for a key requires using `find` -- in my experience, that has made it very uncommon for nil values to be used at all.

> - Automatically sync domains with a specified list

Definitely next layer up. Really, it's just (dorun (map (partial sdb/create-domain client) […coll of domain names…])).

> Certainly some of this belongs above the level of the sdb library, but some of it should be handled there.

Thanks for the example. Talking about "entities" is a little disorienting for me, at least insofar as maps coming out of SDB *are* Clojure entities. It'd be interesting to see what would reasonably be required to make persisting records to SDB more satisfying than record in, map out. Presumably, there are people that are working on this problem for use with more traditional databases; I'd hope that those results could be adapted to use SDB as just another data source.

- Chas

Mark Rathwell

unread,
Mar 1, 2011, 9:58:25 AM3/1/11
to clo...@googlegroups.com


>  - How to account for nil / blank values

That's a tough one.  As you might have seen, I'm strongly leaning towards eliminating the type tags in formatted values, which would make representing nil pretty difficult.

Is this really a desired feature to begin with?  As it stands, distinguishing between a nil value and no entry for a key requires using `find` -- in my experience, that has made it very uncommon for nil values to be used at all.


Probably not a desired feature.  There are times when I need to distinguish between nil and no entry (though I'm having trouble coming up with examples), however there are certainly better ways to do that.

Thanks, looking forward to the results.

Chas Emerick

unread,
Mar 10, 2011, 1:26:27 PM3/10/11
to clo...@googlegroups.com
I've now pushed v0.0.1 of Rummage to github and maven central. This is my massive refactoring / rewrite of Rich's original SDB client implementation:

https://github.com/cemerick/rummage

It addresses all of the issues I note in the quoted google doc, and hopefully pushes things forward some w.r.t. data encoding issues and such.

Feedback and suggestions most welcome.

Thanks,

- Chas

Reply all
Reply to author
Forward
0 new messages