ANN: DataScript, in-memory database and datalog queries in ClojureScript

2,023 views
Skip to first unread message

Nikita Prokopov

unread,
Apr 25, 2014, 2:18:06 AM4/25/14
to clojur...@googlegroups.com
Hi!

I’m glad to announce my new library, DataScript.

It’s an open-source, from-the-scratch implementation of in-memory immutable database aimed at ClojureScript with API and data model designed after Datomic. Full-featured Datalog queries included.

Library is here: https://github.com/tonsky/datascript

Also check out this blog post about why you may need a database in a browser: http://tonsky.me/blog/decomposing-web-app-development/

Feedback welcome!

Mike Haney

unread,
Apr 25, 2014, 8:00:33 AM4/25/14
to clojur...@googlegroups.com
This is a really neat idea - thank you for taking the time to write this and share it.

For awhile I've been telling people to look at Datomic Free because it provides such a nice way to structure and query your data, even if you don't use the full Datomic solution.

It looks like this library will give us the same thing on the Clojurescript side, and that's pretty cool. I can't wait to try it out.

Daniel Neal

unread,
Apr 25, 2014, 2:31:52 PM4/25/14
to clojur...@googlegroups.com
Is this something that could potentially work *with* Om or is it going down a different road?

I'm really enjoying Om for client side work but totally love the idea of being able to do database-like queries over the application state as you describe.

David Nolen

unread,
Apr 25, 2014, 8:19:24 PM4/25/14
to clojur...@googlegroups.com
Nikita,

Wow this is really awesome work, starting to toy around with Om integration :)

David



--
Note that posts from new members are moderated - please be patient with your first post.
---
You received this message because you are subscribed to the Google Groups "ClojureScript" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojurescrip...@googlegroups.com.
To post to this group, send email to clojur...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojurescript.

Nikita Prokopov

unread,
Apr 26, 2014, 1:50:02 AM4/26/14
to clojur...@googlegroups.com
суббота, 26 апреля 2014 г., 1:31:52 UTC+7 пользователь Daniel Neal написал:
> Is this something that could potentially work *with* Om or is it going down a different road?
>
> I'm really enjoying Om for client side work but totally love the idea of being able to do database-like queries over the application state as you describe.

I thinks DataScript is something that can be used instead of Om cursors. Of course you can mix, it’s just a library. But for proper integration cursors will be just a dead weight, so it would be easier to integrate with pure React or some thin wrapper around React like Quiescent.

But that’s just my opinion, David already started to toy with Om integration :)

David Nolen

unread,
Apr 26, 2014, 2:07:57 AM4/26/14
to clojur...@googlegroups.com
This is not completely accurate. Cursors  abstract away the particular state strategy. Even with DataScript you will want some indirection for reusable components. I suspect widgets will take entities or composites and in order for rendering to be efficient you will want entity caching.

Dylan Butman

unread,
Apr 26, 2014, 4:06:24 PM4/26/14
to clojur...@googlegroups.com
Very interested to see how this pans out. Being able to write real queries for views is a big leap forward. In a current project I'm dynamically filtering on 3 criteria and then sorting 5000+ items, and while clojurescript offers a pretty rich toolset, a datalog query would be a pretty significant simplification of that process.

As far as Om integration, I could see separating concerns a little being a possible solution. If you keep the parameters for your queries in app state, and pass the database as a shared cursor, any change in app state that would change the query a component uses to retrieve it's view of the database would cause a rerender.

The issue would become keeping the app in sync with database changes. You could just trigger a root rerender on any db change, but the whole point of Om is that you don't want to rerender the entire app on change, just the components with affected cursors.

A little wrapper might do the trick. What if you mediate all component queries to the database, and cache a list of returned entities. Anytime any of those entities change, the component should be rendered (you could even implement more advanced update to the query result instead of triggering a new query). Anytime a new entity is added to the database, check if it would be included in the result of any component query, and mark those components as dirty.

Thoughts?

David Nolen

unread,
Apr 26, 2014, 9:09:53 PM4/26/14
to clojur...@googlegroups.com
On Sat, Apr 26, 2014 at 4:06 PM, Dylan Butman <dbu...@gmail.com> wrote:
A little wrapper might do the trick. What if you mediate all component queries to the database, and cache a list of returned entities. Anytime any of those entities change, the component should be rendered (you could even implement more advanced update to the query result instead of triggering a new query). Anytime a new entity is added to the database, check if it would be included in the result of any component query, and mark those components as dirty.

This is exactly right and the approach I'm exploring, DataScript makes this particularly easy to do since all transactions return a transaction report. Also with entity caching you can re-render efficiently from the root in the time travel case.

David

Dave Dixon

unread,
Apr 27, 2014, 11:52:39 AM4/27/14
to clojur...@googlegroups.com
Exploring an alternative approach here: https://gist.github.com/allgress/11348685

This example uses Reagent (just because I wanted to play with Reagent). The bind function binds a DataScript query to a Reagent atom. When a tx-report is received, the query is run against the tx-data, and the atom is only updated with the full query results (against the new version of the db) if the query against tx-data is non-empty. Similarly, undo reverses the db actions (add/retract) and applies a new transaction rather than simply reverting to the previous db value.

Probably heavy-handed, but I wanted to see what it looked like to deal directly with datoms.

Sean Grove

unread,
Apr 27, 2014, 11:53:38 AM4/27/14
to clojur...@googlegroups.com
Pretty excited to play with this, seems like a fantastic idea. Thank you Nikita!


Nikita Prokopov

unread,
Apr 28, 2014, 2:50:52 AM4/28/14
to clojur...@googlegroups.com
Thanks Dave!

This approach is almost exactly what I had in mind when thinking about how to glue DB and rendering together (bind/unbind especially).

Actually, idea of reverse transactions is something I couldn’t imagine before seeing you code. I now see how it optimizes re-rendering on undo operations.

You probably also need to do `reverse` on tx-data when reverting transaction.

Thanks for the inspiration!

Nikita Prokopov

unread,
Apr 28, 2014, 3:00:18 AM4/28/14
to clojur...@googlegroups.com
Dylan,

can you give a little sneak peak on what your app is doing with 5000+ objects on client side? I feel like it’s where DataScript should be aimed at, but failed to imagine a use case beyond 100-500 entites.

For Om integration, see what Dave Dixon is doing — filtering tx-report queue with the same datalog query and trigger query re-run and component re-render seems like a decent approach.

Mike Haney

unread,
Apr 28, 2014, 9:11:46 AM4/28/14
to clojur...@googlegroups.com
Dylan,

Your app sounds very similar to what I am working on right now and was thinking of using Datascript for. I have ~10k items (construction materials) I need to present in a list with dynamic filtering (category, size/gauge, etc.). Running datalog queries on the client instead of the server would be a big win.

I'm worried now about performance, since Nikita says he had much smaller data sets in mind. Have you progressed enough yet to determine if performance is a problem with these large data sets?

Dylan Butman

unread,
Apr 28, 2014, 9:39:26 AM4/28/14
to clojur...@googlegroups.com

Nikita, we're rewriting https://www.cosponsor.gov/ in a full clojure stack. Hoping to be able to open source the whole project once it's done. We've been able to get the folks in charge (in congress) excited about immutability and clojure as a new paradigm and they seem on board so far.

I'm dynamically sorting a filtering 5000+ bills. Currently supporting search on keyup (Om is crazy fast!), category and status filtering, as well as dynamic sorting.

Mike, I'm hoping to play with integration today or tomorrow, so I'll get back to you. I can't imagine performance will be worse than using my current (-> filter-by-topic filter-by-search filter-by-status sort) pipeline. If anything I'd expect it to be much better, and considerably more fluent.

Nikita Prokopov

unread,
Apr 28, 2014, 9:43:52 AM4/28/14
to clojur...@googlegroups.com
Guys,

please don’t be too optimistic about performace so far. My goal is definetly to support tens of thousands datoms, but right now I’ve spent about 0 time on performace, so potential is unknown. My first milestone is to reach near native filter/group-by/get-in speed and then look where we can improve from there.

Dave Dixon

unread,
Apr 28, 2014, 9:55:28 AM4/28/14
to clojur...@googlegroups.com
Thank you Nikita for DataScript! I've been wanting something like this for awhile, never had the time to dive into it myself, so was very excited to see your work.

I'm thinking it's possible to do better than simply querying tx-data if we hook into DataScript internals. Doing it through tx-report means every subscriber has to run the query. But I think we should be able to do something like "listen-query", which analyzes the query to see which parts of the index are relevant. After each transaction, a single analysis could be done of tx-data to see which parts of the indexes got modified, and notify accordingly.

Thoughts?

Mike Haney

unread,
Apr 28, 2014, 10:00:49 AM4/28/14
to clojur...@googlegroups.com
Nikita,

I'll be happy to work with you to improve performance as needed. Anything from providing metrics to giving you access to the code or my data sets to test against.

Nikita Prokopov

unread,
Apr 28, 2014, 10:01:09 AM4/28/14
to clojur...@googlegroups.com
Dave,

yes, that’s again exactly what I was thinking about :)

It’ll mean to reverse-analyze the query, but looks like it’s easier to do with Datalog than with SQL for example, and maybe even possible.

Dave Dixon

unread,
Apr 29, 2014, 5:57:14 PM4/29/14
to clojur...@googlegroups.com
Nikita,

I took a first stab at this, available at https://github.com/allgress/datascript.

Still raw, in particular the query analysis logic is just a slightly modified version of the query logic. Seems like this should be consolidated, so the analysis logic can live in one spot, and then be used for whatever purpose. Added some tests, but probably needs more coverage. Let me know if you have thoughts on where to take this.

Dave

Gijs S.

unread,
Apr 30, 2014, 9:38:19 AM4/30/14
to clojur...@googlegroups.com
Hi all,

Thank you Nikita for DataScript.

It's the missing ClojureScript part I've been looking for, for a web application I've been building. This todo application includes a front-end using DataScript and React.js/Quiescent.

More details here: http://thegeez.net/2014/04/30/datascript_clojure_web_app.html

The code is on github: https://github.com/thegeez/clj-crud

Demo on heroku: http://clj-crud.herokuapp.com/

-Gijs

Dmitry Suzdalev

unread,
May 6, 2014, 12:05:09 PM5/6/14
to clojur...@googlegroups.com
Nikita, thanks for this library, it will make me learn more about Datomic which I wanted for some time :)

As I'm currently playing with a desktop app project which is using ClojureScript on top of node-webkit, I want to try Datascript for its DB needs.
I'm interested in DB which would be in-memory, but also with the ability to be persisted on disk.
I know that you specifically said in the project description:

No facilities to persist, transfer over the wire or sync DB with the server

So I'm thinking that for my use case I can just hand-write this by introducing some kind of import-export functionality on top of Datascript and at the points I want to load/persist.
And i'm wondering what would be the best way to export Datascript DB? Is it just that I make some kind of "give me *everything*" query and then serialize as I please?
Or is there maybe something similar already baked in or planned?

Also I'm interested to know if I'm completely wrong in wanting Datascript for this usecase :)

And another question.
Is there somewhere an estimated memory footprint, or is it again hard to say at this point (pre-alpha and all)?

Thanks,
Dmitry.


Nikita Prokopov

unread,
May 6, 2014, 12:20:53 PM5/6/14
to clojur...@googlegroups.com
Hi Dmitry!

I cannot tell you memory footprint because I’m working on underlying structure rewrite at the moment.

Future-proof and simplest way is just to get all datoms and persist them using EDN or more efficient serializer. On app start, read all datoms and create new DB from them.

DataScript was intended for small datasets limited by browser memory, so it shouln’t be a problem.

There’s currently no API call for such thing as “get all datoms”. It’ll be something similar to Datomic’s “datoms” or “seek-datoms” calls, I’m planning to add this in next one or two versions.

If you need this right now, feel free to hack into implementation, it should be easy to change that later.

Dmitry Suzdalev

unread,
May 6, 2014, 1:03:15 PM5/6/14
to clojur...@googlegroups.com
Thanks for all the answers! No rush for now, so I can even experiment with it staying in-memory and add persisting later.
What datasets do you consider small? :) I've seen you said something about tens of thousands of records in previous mails, seems like not too small :)
Personally I'm aiming at 20k-30k of records which will store file info in up to 10 fields per-record, so it seems quite fitting for my needs if I understood everything right.


Nikita Prokopov

unread,
May 6, 2014, 1:11:51 PM5/6/14
to clojur...@googlegroups.com
Yes, 30k should be fine. But let’s see where optimisations can lead us.

Mike Haney

unread,
May 6, 2014, 5:43:09 PM5/6/14
to clojur...@googlegroups.com
One straightforward approach would be to use tx-listen to listen to all your db transactions and then just append the datoms to a file. This would be very easy with node-webkit, because you can take advantage of node's asynchronous io. Then when your app restarts just read the file and apply the transactions in order.

This would retain the entire history (like Datomic), which may not be what you want. But you could easily build on this to suit many use cases. You could create separate files for different entities, and/or choose which data to persist the entire history vs. just the current state. Append-only is easy, but once you start getting fancy there are a ton of edge cases to worry about. In that case, you could look at using a logging framework to manage the gory details.

You could even embed a database to back everything up, there are about a billion choices in npm. Even if you choose to use an embedded db on the node side, tx-listen is probably the easiest place to hook in. Another route would be to add some metadata to your entities in Datascript to make it easier to query what you want to persist.

Dmitry Suzdalev

unread,
May 7, 2014, 3:23:17 PM5/7/14
to clojur...@googlegroups.com
Mike, thanks for this! Will surely re-read later when it comes to thinking of storing. I think that by then I'll have more of the app structure and needs for this field detailed in my head, so will consider all the options you've mentioned :) At the moment I'm about to dive in Datomic docs and then will experiment with just runtime in-memory usage of Datascript first.


Scott Klarenbach

unread,
Oct 27, 2015, 6:05:04 AM10/27/15
to ClojureScript
On Thursday, April 24, 2014 at 11:18:06 PM UTC-7, Nikita Prokopov wrote:
> Hi!
>
> I’m glad to announce my new library, DataScript.
>
> It’s an open-source, from-the-scratch implementation of in-memory immutable database aimed at ClojureScript with API and data model designed after Datomic. Full-featured Datalog queries included.
>
> Library is here: https://github.com/tonsky/datascript
>
> Also check out this blog post about why you may need a database in a browser: http://tonsky.me/blog/decomposing-web-app-development/
>
> Feedback welcome!

Hello,

Very helpful library Nikita.

I wonder what would be required to enable datascript to act more like a datomic peer, which holds, say, part of the db in browser memory, and then goes back to the server to fetch new segments if it can't serve the query from browser cache. Older entries in the cache get gc'd and the system can operate on dbs of any size.

I realize this wasn't a design goal of datascript, and am just looking for insights into how it might be possible. I guess a more fundamental question is: how does a datomic peer know when the segments to complete its query are not present in cache? And can we emulate this in datascript?

Robin Heggelund Hansen

unread,
Oct 27, 2015, 7:40:39 AM10/27/15
to ClojureScript
One thing to keep in mind is that a datomic peer retrieves "the index" when it first connects to a transactor. The peer can use this to get a picture of what is available, and thus what it can retrieve when running queries. Another thing to keep in mind is that a datomic peer receives every change from the transactor, so that it is always up to date.

Browsers don't need access to the entire database, just the relevant data for a particular client. What you can do is to connect to the backend using websockets. When you first connect, you ask for all relevant data for this particular user. After that, the backend can push updates for this particular user, to keep the client up to date.

kovas boguta

unread,
Oct 27, 2015, 11:39:00 AM10/27/15
to clojur...@googlegroups.com
Checking in with Datascript and impressed with the progress. Very cool
that its cljc now.

I'd love to see Datascript being able to query large static datasets.
In general Datascript is interesting for doing the things Datomic
doesnt do.

Forget all the transactor/coordination/live-merge stuff. Lets assume I
just map-reduced some dataset into various datom indices, and now I
want to query&navigate it. Wikidata comes to mind for me.

Looking at the code, it looks like there is some pluggability wrt the
source of datoms, so it seems doable for someone who knows the
codebase ;)

I've definitely encountered a number of other people with this use
case. Love datomic datalog, love navigation, love direct data access,
don't love the 10 billion datom limit and dont need transactions.

kovas boguta

unread,
Oct 27, 2015, 11:45:54 AM10/27/15
to clojur...@googlegroups.com
On Mon, Oct 26, 2015 at 7:02 PM, Scott Klarenbach
<doyouun...@gmail.com> wrote:


> I realize this wasn't a design goal of datascript, and am just looking for insights into how it might be possible. I guess a more fundamental question is: how does a datomic peer know when the segments to complete its query are not present in cache? And can we emulate this in datascript?

Remember that datomic query always happens against a specific database
at a specific basis-t. There shallow hierarchy of indices that index
the particular segments associated with that t. So you know exactly
what segments you need, and can compare that with what's in the cache.

Emulating this is Datascript would be a big undertaking and you're
better off using datomic, or figuring out how to integrate datascript
into datomic. You'd have to reinvent the transactor (you want multiple
peers committing transactions right?) and all the associated
coordination and indexing.

Alan Moore

unread,
Oct 27, 2015, 2:20:42 PM10/27/15
to ClojureScript
FYI: Om Next has new built-in affordances for using pull syntax to resolve local/remote datom fetching that plays nicely with local datascript instances. Nothing in Om Next is Datomic or Datascript specific but there are working examples of using both including the proverbial TODO app.

If you can't use Om Next you could do something similar in your own code. David Nolan has put in a lot of thinking around this and seems to have hit a sweet spot in this area. Replicating all the features/aspects he has designed in might be a challenge.

Alan

Dylan Butman

unread,
Oct 27, 2015, 2:52:09 PM10/27/15
to ClojureScript
Does anyone know David's thinking about doing server push to update local data? 

In the case of datomic/datascript, it'd be pretty straightforward to push datoms to a client that are within a user's scope. In the case of user scoped data, it should be possible (although not always desirable) to structure your data so that all data relevant to that user is contained within a single graph (the user node is a edge that transitively connects to all data). This means that is you take the initial set of entity ids relevant to the user, any new datoms with entity ids in the initial set are relevant changes, and can be pushed directly to that particular client. Datoms not in the set can be ignored. 

I've been thinking about how this can be efficiently translated into triggering the correct rereads in the client. I haven't spent enough time yet with the Om Next internals to fully understand what's going on, but my current understanding of is that queries are registered, and that mutations must return queries as :value which then trigger reads. 

If you just transact the datoms, you won't get rereads because you're not returning specific query :value to be reread. If you're able register the entities used to produce the result of a query (not sure how to do this unless your queries always return entity ids directly or in some parseable format), then you could do the same entity id set contains? to see if the query needs to be reread. However this will only work if the query is a single graph like the above mentioned user data graph. If you have a query that gets a set of edges, for example, all users in the system, as opposed to all todos of a particular user, then the set of entity ids returned by the query is dynamic, and can't be used as described above. Perhaps you could just override the default behavior for edge queries?

--
Note that posts from new members are moderated - please be patient with your first post.
---
You received this message because you are subscribed to a topic in the Google Groups "ClojureScript" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojurescript/o0W57ptvPc8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojurescrip...@googlegroups.com.

David Nolen

unread,
Oct 27, 2015, 3:40:37 PM10/27/15
to clojur...@googlegroups.com
Server push just falls directly out of the Om Next design. It's doesn't require anything special.

David

You received this message because you are subscribed to the Google Groups "ClojureScript" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojurescrip...@googlegroups.com.

Nikita Prokopov

unread,
Nov 22, 2015, 4:01:57 AM11/22/15
to ClojureScript
Requesting segments on-demand is, in theory, possible. DataScript is written to support queries over data source protocol, which can be a DB (we can then utilize indexes), collection (full scans) or something user-specific. I don’t have working example of that at the moment, but from design perspective, plugging in custom data source with, e.g., on-demand segment loader, is possible.

Nikita Prokopov

unread,
Nov 22, 2015, 4:03:36 AM11/22/15
to ClojureScript

In theory, yes, queries should work on any dataset provided it implements some basic protocol. In practice, I have to build an example of that to see what pieces are missing at the moment.

Reply all
Reply to author
Forward
0 new messages