Using a dynamic var for my database connection for implicit connections+transactions

872 views
Skip to first unread message

J. Pablo Fernández

unread,
Jul 30, 2015, 8:44:31 PM7/30/15
to Clojure
Hello Clojurians,

I found passing around the database connection to each function that uses it very error prone when you are using transactions as passing the wrong one could mean a query runs outside the transaction when in the source code it is inside the with-db-transaction function. So I ended up defining the db namespace like this:

(ns db)

(defonce ^:dynamic conn (atom nil))

(defn connect!
  (reset conn (generate-new-connection)))

(defn run-query 
  [query] (run-query query @conn)
  [query conn] (run-the-query-in-connection query conn))


This is pseudo-code of course, simplified to highlight the part that I'm most unfamiliar with:

(defonce ^:dynamic conn (atom nil))

The reason why it's an atom is so that connect! can *set* it and the reason why it's a dynamic var is so I can do this:

(jdbc/with-db-transaction
        [db-connection-with-transaction @db/conn]
        (binding [db/conn (atom db-connection-with-transaction)]
          (db/run-query "SELECT *"))))))

and the query will be implicitly run inside the transaction. Does it make sense? Is this wrong? will it fail in unexpected ways? Is there a better way?

Thanks.


James Reeves

unread,
Jul 30, 2015, 10:17:40 PM7/30/15
to clo...@googlegroups.com
On 31 July 2015 at 01:44, J. Pablo Fernández <pup...@pupeno.com> wrote:
I found passing around the database connection to each function that uses it very error prone when you are using transactions as passing the wrong one could mean a query runs outside the transaction when in the source code it is inside the with-db-transaction function. So I ended up defining the db namespace like this:

(ns db)

(defonce ^:dynamic conn (atom nil))

(defn connect!
  (reset conn (generate-new-connection)))

(defn run-query 
  [query] (run-query query @conn)
  [query conn] (run-the-query-in-connection query conn))

This style of code is generally considered to be unidiomatic in Clojure. The reason for this is that it significantly increases complexity, and Clojure is about reducing complexity where possible.

Consider a function like:

  (defn find-by-id [conn id]
    (sql/query conn ["SELECT * FROM foo WHERE id = ?" id]))

The output of this function is affected by its arguments (and by the state of the database the connection is associated with), which is passed by its caller.

Now consider a function like:

  (defn find-by-id [id]
    (sql/query @conn ["SELECT * FROM foo WHERE id = ?" id]))

The output of this function is affected by its arguments... and by anything that touches the global conn var, which could literally be anything in your program, in any namespace, in any function, in any thread.

The more ways in which a function has, the more "complex" it is. This is why Clojure prefers immutable data over mutable data, and why function arguments are generally preferred over dynamic vars.

The problem of accidentally calling a database connection directly inside a transaction is a difficult one, but I don't think the solution is to add more complexity. An alternative solution would be to take the original database connection out of scope, by moving your transaction code to a separate function:

   (defn do-things* [tx]
     (do-foo tx)
     (do bar tx)
     (do baz tx))

   (defn do-things [db-spec]
     (sql/with-db-transaction [tx db-spec]
       (do-things* tx)))

If this is still too prone to error, you could also automate this pattern with a function:

  (defn wrap-transaction [f]
    (fn [db-spec & args]
      (sql/with-db-transaction [tx db-spec]
        (apply f tx args))))

  (def do-things
    (wrap-transaction do-things*))

- James

J. Pablo Fernández

unread,
Jul 31, 2015, 4:55:01 AM7/31/15
to clo...@googlegroups.com
Hello James,

Thanks for your answer. I do understand your point. Pure functions are easier to reason about and my use of dynamic here breaks that purity. I'm not doing it lightly. It already happened to me, that one of those functions that was running inside the transaction, was not passed the transaction connection and instead got the global one and the failure was silent and very hard to debug, and this was with a project that has less than 200 lines of code. I'm trying to find patterns that will work when this project has 200k lines of code.

For me, the thing is, I have a traditional relational database here, this is already far from pure. For example, calling (db/create-user "pup...@pupeno.com") twice will not only not return the same thing the second time, it'll actually raise an exception the second time. Also, the database connection is *global state* unless each function creates its own connection, which would be terrible. So, this global state also breaks functional purity.

The problem with the second aspect of breaking purity as far as I can see is this: at some point, this global state has to be picked up and used, so at some point a function will *not* get a database connection passed to it but *will* access the database by using this global connection. I haven't advanced this project enough to say this with 100% certainty, but, I think there's going to be more than one function like that and at some point I'll need to have one inside the other so I need them to be composable. Let me show you a naive example:

db/create-user is the low level database function that creates a record in the user table
user/create is the function used to create a user, it takes care of, for example, encrypting the password.
account/register is the function to register a new user, it takes care of creating a user but also validation, sending a welcome email and so on.

So each function calls the predecessor there and would pass the database connection, account/register, being the entry point, would grab it from the global state so it doesn't get a connection passed to it. So far, a lot of it looks like pure functions (let's ignore the fact that a database breaks that purity). The problem arises when I get another function, account/invite, that is used to register a bunch of people one after the other, so that account/invite would call account/register many times. The problem is that account/invite *can't* start a transaction and have account/register and all its inner functions use that transaction when that makes a lot of sense.

To make account/register composable it needs to accept an optional database connection and use that one if it's present, or the global one if it's not. Every time a function does that there's a high risk of picking the wrong database and account/invite and account/register shouldn't be dealing with database connection management. That feels to me like lower level details leaked into higher level abstractions.

Now, I know this is a naive example and you could push the grabbing of the global connection higher and higher, as long as the example is naive and simple like this, but it does represent what in my experience is the reality of web application development at least in another languages and I haven't seen anything to make me think Clojure will be radically different here (at least when using a patterns such as compojure).

So yes, it's not purely function but with a database that's already impossible and if I wanted purely functional I would probably be using Haskell instead of Clojure. What I like about Clojure is this: 

Clojure is a practical language that recognizes the occasional need to maintain a persistent reference to a changing value and provides 4 distinct mechanisms for doing so in a controlled manner - Vars, Refs, Agents and Atoms.

I'm just trying to be practical here. But I'm new and I'm not sure if an atom that is a dynamic var has some hidden issues that I'm not seeing (other than the fact of it being state that changes and that I have to manage explicitly because the language is not protecting me from shooting myself in the foot with it).

Does it make sense?



--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/fRi554wbPSk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
J. Pablo Fernández <pup...@pupeno.com> (http://pupeno.com)

Colin Yates

unread,
Jul 31, 2015, 6:48:54 AM7/31/15
to clo...@googlegroups.com
This is always tricky - the other dimension is that the transaction can span multiple components which should all be ignorant of each other, so a collecting fn (like James’ ‘do-things*’) isn’t feasible.

For me, I have the service create a tx and then either I pass in the tx instead of the db param (e.g. (defn my-comp [tx …])) or I bind the tx to a dynamic var but only allow access to it via a (defn in-tx [fn] (binding…. (fn *tx*))). Both of these styles are restrictive, but good enough. 

Another style I run into is when I have a Protocol which doesn’t care at all about persistence. For example, my event store has an in-memory implementation and a JDBC backed one. What can the signature look like:

 - everything takes in a db which some implementations ignore - yuck
 - the signature includes a magical ‘environment’ param - yuck
 - the signature is focused and the JDBC implementation uses (in-tx #(…))

This is definitely an area where pragmatism meets principle.

I am away from the code at the moment, but is there any reason why the dynamic connection can’t be private? This goes some way to providing safety.

You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

J. Pablo Fernández

unread,
Jul 31, 2015, 9:24:09 AM7/31/15
to Clojure, colin...@gmail.com


On Friday, 31 July 2015 11:48:54 UTC+1, Colin Yates wrote:
I am away from the code at the moment, but is there any reason why the dynamic connection can’t be private? This goes some way to providing safety.

No, I don't think there's a problem with it being private, I just didn't think of it and my code to handle transactions is in a different namespace, but it should go in the db namespace anyway and then it can easily made private. 

James Reeves

unread,
Jul 31, 2015, 9:25:28 AM7/31/15
to clo...@googlegroups.com
On 31 July 2015 at 09:54, J. Pablo Fernández <pup...@pupeno.com> wrote:
For me, the thing is, I have a traditional relational database here, this is already far from pure. For example, calling (db/create-user "pup...@pupeno.com") twice will not only not return the same thing the second time, it'll actually raise an exception the second time. Also, the database connection is *global state* unless each function creates its own connection, which would be terrible. So, this global state also breaks functional purity.

The problem with the second aspect of breaking purity as far as I can see is this: at some point, this global state has to be picked up and used, so at some point a function will *not* get a database connection passed to it but *will* access the database by using this global connection.

Why does the database connection have to be global state? Why not pass it through your functions as an argument?

So in your example scenario, you might write something like:

  (defn create-user [db user]
    (sql/insert! db "users" user))

  (defn create [db user]
    (create-user db (update user :password encrypt-password)))

  (defn register [db user]
    (create db user)
    (send-welcome-email user))
  
  (defn make-handler [{:keys [db]}]
    (routes
     (POST "/users" [& user]
       (register db (select-keys user [:email :password])))))

This style ensures that the functions only have access to the database you want them to have access to, and also allows you to deal with multiple databases simultaneously. For example, often I'll have a dev database and a test database, and ideally I want to be able to run my tests in my REPL (or via Cider) without clearing the data in my dev database.

Another advantage of this approach is in test performance. Frequently I'll place my lowest database functions in a protocol:

  (defprotocol UserDatabase
    (create-user [db user]))

This allows me to implement a mock database for performing fast integration tests.

While most databases are stateful, so we obviously sacrifice purity, that doesn't mean we shouldn't continue limiting the use of other dangerous tools like dynamic or global vars. Clojure is a pragmatic language, but this means that it just trusts us to be disciplined.

- James

jongwon.choi

unread,
Jul 31, 2015, 8:38:33 PM7/31/15
to Clojure
Cut & paste from my own code base:

(def ^:dynamic *db-conn*)

(defmacro with-db-connection [& body]
  `(jdbc/with-db-connection [con# (get-db-conn)]
     (binding [*db-conn* con#]
       ~@body)))

A few points. Do not give any value to the dynamic var, make it unbound and make it fail early for unexpected usage. Set a value via binding and make a macro then use the dynamic var only through the macro (i.e., hide its existence)

That's how I normally use dynamic vars. Some people afraid of using it, but it is like a knife - If you know how to use it, it is useful to remove unnecessary complexity.

Jo Geraerts

unread,
Aug 1, 2015, 2:32:49 AM8/1/15
to Clojure
Hello,

I think you are coming to a point where you might want to consider Stuart Sierra's component library. 

A dynamic var is kind of thread-bounded global state, which is to be avoided. 

There is a nice talk to go with it

Kr,

Jo



Op vrijdag 31 juli 2015 02:44:31 UTC+2 schreef J. Pablo Fernández:

Colin Yates

unread,
Aug 1, 2015, 4:29:12 AM8/1/15
to clo...@googlegroups.com

That is indeed a useful library for wiring things up at init-time, but it doesn't help at all for run-time wiring, like transactions.

James Reeves

unread,
Aug 1, 2015, 8:28:24 AM8/1/15
to clo...@googlegroups.com
On 1 August 2015 at 01:38, jongwon.choi <oz.jong...@gmail.com> wrote:
That's how I normally use dynamic vars. Some people afraid of using it, but it is like a knife - If you know how to use it, it is useful to remove unnecessary complexity.

Well, it's more that dynamic vars trade convenience for greater complexity. You make one argument implicit, at the cost of having it available to all code evaluated within the binding.

- James

Rob Lally

unread,
Aug 1, 2015, 1:30:04 PM8/1/15
to clo...@googlegroups.com
Hey Pablo,

I could be wrong, but it seems that the key problem here is the existence of the global transaction. If the global transaction didn’t exist then any time you failed to pass in a transaction the code would fail: immediately and loudly.

I appreciate what you’re trying to do but it seems like you’re on a path to solve the problems caused by one shared, implicit, global variable by creating more shared, implicit, slightly less global, variables and that doesn’t seem like it is going to end well. Implicit connections in a binding will fail in, perhaps mysterious ways if you ever include any sort of concurrency: even things as simple as asynchronous logging can start logging wrong values or missing values.

Can you somehow render the global connection inoperable in some way? Perhaps redefine it, point it at a data source that doesn’t exist or… by some other hook-or-crook have it fail in a loud, grotesque manner if it is touched?


R.




You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

James Gatannah

unread,
Aug 3, 2015, 1:00:05 AM8/3/15
to Clojure


On Thursday, July 30, 2015 at 7:44:31 PM UTC-5, J. Pablo Fernández wrote:
Hello Clojurians,

I found passing around the database connection to each function that uses it very error prone when you are using transactions as passing the wrong one could mean a query runs outside the transaction when in the source code it is inside the with-db-transaction function.

I'll go ahead and make the point that it's error-prone for different reasons.

Pretty much by definition, that database connection is a system boundary. It's all about something that's *way* more complex than random global state changes inside your program. This is a thing that interacts with the outside world, with all the nastiness that implies.

Everything that everyone else has already written about this approach is true, but I don't think they've gone far enough.

Even if you pass that database connection around as a parameter everywhere, you're talking about throwing away a huge part of the benefit of using a functional language.

Isolate your side-effects.

Think of a castle. You have a moat surrounding it, and a few gates that you use to allow your peasants to enter/exit. This particular gate opens up to a swamp full of alligators.

Your approach amounts to letting the gators wander around loose.

Passing the connection around to each function in the call chain is like tying a ribbon around the gator's neck and hoping you can use that as a leash.

You can use either approach to great effect. If you're really, really good. And so is everyone else on your team (you did mention a 200 KLOC project).

One of the main benefits to functional programming is that admitting you aren't really, really good is incredibly liberating. I don't have the time/energy to dedicate to trying to maintain this sort of code. (Yes, I spent lots of time recently thinking about how java was designed for very average programmers, but it really takes much better programmers than a functional language to actually write correct programs). Even if I were that good...I'd rather be focused on the problems that make my customers happy.

I'm going to appeal to authority here for the right answer: http://prog21.dadgum.com/23.html (in my defense, it's a great blog). Have your web response handler (which is another system boundary...this one is next to an active volcano populated by fire-breathing dragons) build up a list of all the nasty side-effects that will eventually have to happen.

Don't just isolate your side-effects. Quarantine those suckers as if each and every one means you're dealing with the most diabolical hacker you can imagine.

J. Pablo Fernández

unread,
Aug 3, 2015, 6:15:11 AM8/3/15
to clo...@googlegroups.com
James,

I'm not new to functional programming and I understand the principles and why they are good. I worked in Haskell, Erlang and other Lisps before. Even if only a tiny portion of my codebase deals with the database, I still need a pattern for that part of the codebase.

It is very easy to say "this is dirty, you should not do this" without offering an alternative solution and by alternative solution I mean one that applies to the pattern of making web applications that are database heavy. Think of your traditional CRUD application, you can say "quarentine the code that deals with the database" and yes, you can do that, but it still is 90% of the code.





--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/fRi554wbPSk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Colin Yates

unread,
Aug 3, 2015, 6:19:34 AM8/3/15
to clo...@googlegroups.com
I have heard this approach before, but I have never seen how it works in
real life. For example, what about 'selects' - where do they happen?
What about if my updates are not independent (e.g. if the first
update works then do the second update otherwise do this completely
different thing?).

For simple workflows I can see the elegance. For non-trivial workflows
the problem is that _accessing_ the DB and _switching_ on the DB logic
tends to be all through the workflow.

I am genuinely interested in the answers to this as yes, the described
approach has some great benefits.
--
Sent with my mu4e

J. Pablo Fernández

unread,
Aug 3, 2015, 6:20:07 AM8/3/15
to clo...@googlegroups.com
Rob,

The transactions are not global, the transactions are local. Connections are global and there's no way around it with the constraints of a traditional RDBMs. Your idea of making the global connection unavailable for code that's in the context of a transaction would prevent the errors I'm trying to prevent but you would still required to pass the connection around, which in a webapp, means that the whole chain of function calls, pretty much everything, would have a database connection. That is ugly in some cases, impossible in others.

A piece of code that would be impossible is to enclose each test in a transaction using clojure.test: http://stackoverflow.com/questions/31735423/how-to-pass-a-value-from-a-fixture-to-a-test-with-clojure-test

Furthermore, I don't think this solution gains you any purity, you still have a global estate and you are hiding it away when starting a transaction. My proposal was to make it work instead of hiding it. They are rather equivalent from the complexity point of view.

J. Pablo Fernández

unread,
Aug 3, 2015, 6:23:00 AM8/3/15
to clo...@googlegroups.com
Other than the fact that this approach doesn't reach the level of functional purity that some people want, after playing with it for a while, we found it very productive and leads to clean/readable code and tests, so we decided to turn it into a library:



You are still in control of the connection so you can do other things with it and have more than one if necessary.

On 31 July 2015 at 01:44, J. Pablo Fernández <pup...@pupeno.com> wrote:
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/fRi554wbPSk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rob Lally

unread,
Aug 3, 2015, 6:48:05 PM8/3/15
to clo...@googlegroups.com
Hey Pablo,

Sorry, you are completely correct: I accidentally typed transaction when I meant connection. My bad.

I don’t understand what you mean by connections having to be global when dealing with an RDBMS. It is true that you normally want to pool them to conserve resources, and that pooling may be global in nature - but it also may not be. Quality of Service demands often necessitate multiple pools to the same data-source to be created and accessed independently. You also, sometimes, want to access different databases from the same application, that gets much harder - and your code becomes less general if you need to reference specific global connection-pools.

I’m also a little confused by your suggestion that it would be impossible to enclose each test in a transaction. The article you point to shows one way. Another way I’ve used often is to declare a var holding a ref and in a fixture initialise/close a connection around the test. The test function then derefences the var holding the connection passing it into the function under test. This does make it impossible to run tests in parallel, but that’s not something I’ve ever tried. Creating tests that access shared resources is a little bit uglier than in a OO language where you could more easily reference an instance variable but it isn’t much worse, and a little macro-pixie dust can make it all go away.

I will say that it has been my experience that you don’t find yourself passing database connection pools around everywhere. As your code grows you naturally extract components/protocols that wrap all that up. Your ring-handler doesn’t need a connection it needs some sort of data-access component. That component needs a connection/connection pool but that can be passed in at construction time. I have feared the creation of functions that need an ever increasing number of parameters that need to be passed on, but it isn’t something I find happens in practice. Of course, YMMV.

I would respectfully suggest that the solutions are not the same. My way has no global state and functional purity-lite (those connections are rarely idempotent): which are obviously only theoretical benefits. But it does work when I do it; I like my code, I don’t have problems working with it, and I genuinely don’t face any of the challenges you’re struggling with.



R.

Dmitri

unread,
Aug 3, 2015, 9:19:19 PM8/3/15
to Clojure
While I generally agree that purity is something to strive for, I think it's also important to consider the problem that's being solved in each particular scenario. While creating stateless database query functions and passing the connection round explicitly achieves purity, it's not entirely clear what practical problem that's addressing. However, the practical disadvantages are that you end up with is additional variable being passed around and more opportunities for user error as is the case with transactions. While there is mental overhead in having to understand that the functions use an implicit connection, there's conversely additional mental overhead in having to remember what connection to use when with the explicit approach.

In some applications it might be valuable to pass the database around explicitly. Yet, many of the applications in the wild tend to deal with a single database instance where a connection is created when the application is started and retained for the lifecycle of the application. Creating the additional hoops for the user to jump through for purely theoretical benefit seems unwarranted in this scenario. 

Rob Lally

unread,
Aug 3, 2015, 10:16:28 PM8/3/15
to clo...@googlegroups.com
Everything you say is true. And programming is always about picking the right trade-offs at the right time, making this sort of conversation one that can never come to a *right* answer.

But… in this case Pablo was experiencing concrete problems: those problems stemmed from how easy it was to make mistakes having both a global connection pool and the option to pass around specific connections which had transactions open. Having two conflicting ways of doing the same thing was causing problems. Eliminating at least one of them is a solution. Pablo felt that he might want to eliminate both and add a new mechanism. A number of people (including me) felt that Pablo’s proposed solution would, potentially, lead to new problems and that eliminating only one of the original mechanisms would be sufficient to solve his problems.

In a very real sense, the pragmatic solution and the most “pure” solution seem - to me at least - to be the same.


R.


Dmitri

unread,
Aug 3, 2015, 11:21:09 PM8/3/15
to Clojure
My understanding is that the problem is actually caused by the stateless nature of the functions. Since the function accepts the connection as a parameter it's up to the user of the function to ensure that it's passed the correct connection. Every functional solution presented in this thread suffers from this same fundamental problem that the function is not aware of the context it's being run in.

J. Pablo Fernández

unread,
Aug 4, 2015, 4:47:49 AM8/4/15
to clo...@googlegroups.com

On 3 August 2015 at 23:47, Rob Lally <rob....@gmail.com> wrote:
I’m also a little confused by your suggestion that it would be impossible to enclose each test in a transaction. The article you point to shows one way.

It shows how to pass data to the test by using a dynamic var, which is what we are using to track the connection, which is what I'm interested in passing in to the test. If my solution to use a dynamic var to track the connection is dirty and unacceptable, so is that one. The only difference I can think of is, using a dynamic var only for tests dirties only tests but using a dynamic var to track connections is consistent across the code-base.

James Gatannah

unread,
Aug 5, 2015, 6:27:25 AM8/5/15
to Clojure


On Monday, August 3, 2015 at 5:15:11 AM UTC-5, J. Pablo Fernández wrote:
James,

I'm not new to functional programming and I understand the principles and why they are good.

This isn't even really about functional programming. This gets into fundamental architectural decisions that are at the core of pretty much every code base I've ever seen.

I can't tell you how to architect your system. Yours may be different than every other massive system on the planet.

 
I worked in Haskell, Erlang and other Lisps before. Even if only a tiny portion of my codebase deals with the database, I still need a pattern for that part of the codebase.

It is very easy to say "this is dirty, you should not do this" without offering an alternative solution and by alternative solution I mean one that applies to the pattern of making web applications that are database heavy.

I'm sorry if my link/real answer at the bottom wasn't clear enough:

In general, one part of your code should come up with data that describes the various side-effects that need to happen.

As needed (and this shouldn't happen often), that data structure gets funneled through some kind of gate keeper to the part that actually has access to "the" database.


 
Think of your traditional CRUD application, you can say "quarentine the code that deals with the database" and yes, you can do that, but it still is 90% of the code.

Your original question was about scaling up to 200 KLOC. Which is massive for a legacy code base that started in (and is still based around) COBOL back in the 60s.
I'm really curious what you're planning that involves that much clojure code.

A lot of patterns that work and seem to make sense in a 200 line toy project don't scale. This kind of issue is precisely how we wound up with things like SOAP and
WSDL. The web server presents the pretty UI and tells the back-end servers enough about what's going on so they can do the real work.

If you're planning on writing 180 KLOC that touch the database directly, you're probably doing it wrong.

I have taken the sort of approach that you're describing. It's easy. It's fast. It's dirty and dangerous. It's almost always wrong, but there are
times when you have to make that trade-off. When I have, we've always planned for it as a stop-gap to try to deal with ridiculous deadlines imposed by
clueless managers. We've always regretted it, and the company didn't last very long after we hit that point.

Respectfully,
James

James Gatannah

unread,
Aug 5, 2015, 7:52:16 AM8/5/15
to Clojure


On Monday, August 3, 2015 at 5:19:34 AM UTC-5, Colin Yates wrote:
I have heard this approach before, but I have never seen how it works in
real life.

Interesting. I don't think I've ever worked anywhere that didn't take this
kind of approach. Even if we weren't doing it formally. 

 
For example, what about 'selects' - where do they happen?

They're just part of the overall transaction description.

It's [thankfully] been a long time since I worked on a project that worked in raw strings
to build SQL. Even working at that level, the SELECT pieces were just
a subquery in the main query we were building.

The web server(s) would pre-process the incoming request, hand that
to the middle-tier layers that did this sort of business logic, those would pass
the data description (in this case raw SQL, in others something like XML-RPC)
off to the servers that actually have pools of database connections.

Even in companies that only needed one virtual host for their tiny little web
server and couldn't be bothered to break it into multiple processes, we split
this out into multiple logical layers.
 
What about if my updates are not independent (e.g. if the first
update works then do the second update otherwise do this completely
different thing?).

We're working in lisp. This sort of thing is far easier than it is for the
poor schmucks who are doing exactly this sort of thing in, say, C# with
LINQ or building/parsing/translating XML using python.


For simple workflows I can see the elegance. For non-trivial workflows
the problem is that _accessing_ the DB and _switching_ on the DB logic
tends to be all through the workflow.

I'll argue that this sort of isolation gets more important when your system
gets more complex. You don't build a modern automatic transmission with
sloppier tolerances that were used for a manual 50 years ago. You spend
more time engineering the Eiffel Tower than you would a back-yard shed.
You don't skimp on O-rings when you're building a space shuttle.

 
I am genuinely interested in the answers to this as yes, the described
approach has some great benefits.

Let's see. Specific example...

This comes from a startup that was bought for IP around 2001. I think it's
safe to describe what I remember of their general architecture, since it seems
to be extinct.

Big picture was warehouses where workers are loading up pallets of goods to
be shipped to retail stores.

End users wore computers that had headsets and mics. Input was by
speech recognition, output through text-to-speech. The wearables didn't have
enough horse-power to actually do the speech recognition, so compressed .wav
files were transmitted to the server over wifi.

We probably should have been using a web interface, but most web apps were
written in perl. This was a serious technology shop using <s>C++</s>MFC.

We had a ton of different modules/classes that covered different possibilities for
any given scenario at various customer sites. They got initialized and wired
together by rules defined in an .ini file. (That should have come from a database,
but that's a different topic).

Most of our database interaction happened when a worker logged in: we cached
the various phrases that would be used throughout her shift.

And we'd select a list of work. As each work item was completed, the front-end
code would notify our database layer. I don't have any idea how that notification
worked. It went through some library routine that had a database connection
buried far away from any front-end code.

"Our" database was really just a cache for interaction phrases in various languages
and the work queues.

Periodically, some job would upload our work status reports and download new
pending work queues to/from the "real" database. IIRC, this interaction happened
over FTP. We absolutely did not have (or want!) handles to work directly with
whatever ancient server that was their authoritative source of all knowledge.

Our portion of the system was ~70-80 KLOC. Since it was C++ (and most of
that was boilerplate), it was *far* simpler (in terms of business logic)
than the 200 KLOC of clojure that we're talking about. Although the big iron
side may have pushed it into that realm.

You obviously aren't going to be able to predict every possible interaction in
the next few minutes. Much less the next 4 hours. But you should be able to predict
most of what might happen for any given web request.

Life changes when you're working on a highly interactive app like, say, an IDE
or a photo editor. At my job today we have issues with multiple users adding
and removing the same artifact to a document at something resembling the same
time.

That problem has a lot to do with the fact that it's a distributed system with at
least (I haven't dug into the tier beneath the web server) 4 layers between the
end-users and the actual database connections. I'm not sure it would be possible
to get rid of more than one of those layers to get us closer to the database.

I'm absolutely positive that it would not be desirable.

My "database connection" consists of data structures that I put onto and read
from a message queue.

This system is about as far from functional purity as you can possibly get.

It's quite possible that the database guys at the very bottom of the stack are
using exactly the kind of global connections that were originally suggested.

The difference is that, even if they are (and I certainly hope not!), I can't touch
them. When (and it's never "if") a hacker gets control over the front-end web
server, that's just the first layer of defense.

Regards,
James

James Gatannah

unread,
Aug 5, 2015, 8:19:38 AM8/5/15
to Clojure


On Monday, August 3, 2015 at 10:21:09 PM UTC-5, Dmitri wrote:
My understanding is that the problem is actually caused by the stateless nature of the functions.

You're talking about database interactions. By definition, those are not stateless. The trick is to isolate this statefulness as much as possible. Expanding it just compounds the problem. 

 
Since the function accepts the connection as a parameter it's up to the user of the function to ensure that it's passed the correct connection.

That depends on what you mean by "user of the function." If I'm working on the web server front-end, I shouldn't have any idea about the database connection. I should describe what I
want to happen. If I have any concept that multiple databases are involved (I shouldn't, but abstractions leak), then, yes, I have to specify which one I mean. This isn't
complicated.

It's up to the person writing the database back-end code (also possibly me, if we're talking about a start-up, but then we aren't talking about the 200 KLOC scenario) to turn that
description into the side-effects.

 
Every functional solution presented in this thread suffers from this same fundamental problem that the function is not aware of the context it's being run in.

That isn't a problem: it's a feature.

It doesn't matter what language or programming paradigm you're using in this context. This cuts across problem domains and architectures.

The actual database interaction code should be as brain-dead simple (not "easy") and bullet-proof as you can possibly make it. And it should be as
isolated from the front-end code as you can get away with making it.

That isn't a "right" answer. But it's a good rule of thumb. And you should have very hefty reservations (and very good reasons) about violating it.

Respectfully,
James

Colin Yates

unread,
Aug 5, 2015, 8:43:29 AM8/5/15
to clo...@googlegroups.com
I think we are talking at cross purposes - I thought you were talking
about delaying the mutation until the 'main' logic has finished, whereby
the main logic would push the mutation into a queue which were then
executed later. Something like queueing up a bunch of commands to be
executed later.

If you are simply talking about isolating/abstracting database logic
(but not the _timeliness_ of when that logic is executed) from the
mainline code. The Repository pattern, DataAccess pattern then yes, of
course, I think we are all on the same page.

The original point was do you/don't you pass in the DB as an explicit
arg. I still haven't seen a recommendation for handling protocols whose
implementation is persistence agnostic.

I also think you might be making a few unwarranted assumptions ;-) - who
mentioned building up SQL in raw strings?! for example.

Dmitri

unread,
Aug 5, 2015, 9:03:56 AM8/5/15
to Clojure
What I'm talking about is whether it's a better pattern to leave a repetitive and error prone task to the user or encapsulate it in a single place. The whole discussion boils down to following options.

The first option is that we keep functions pure and the connection is passed as a parameter by the person writing the code (the user). With this approach the burden is squarely on the user to remember to pass the correct connection to the function. This becomes error prone for things like transactions where the developer has to pass the transaction connection as opposed to the normal database connection. The worst part in this scenario is that the code will run except it won't run transactionally. This is a bug that is difficult to catch as it only surfaces in cases where the transaction should be rolled back.

The alternative is to encapsulate the database connection management in the initialization logic in the namespace managing the connection. This way the query functions can be context aware and ensure that the correct connection is used automatically.

I simply disagree that leaving repetitive and error prone tasks to the developer as opposed to encapsulating them centrally is a better pattern. I certainly don't see that as a feature.

James Reeves

unread,
Aug 5, 2015, 11:11:19 AM8/5/15
to clo...@googlegroups.com
On 5 August 2015 at 14:03, Dmitri <dmitri....@gmail.com> wrote:
What I'm talking about is whether it's a better pattern to leave a repetitive and error prone task to the user or encapsulate it in a single place. The whole discussion boils down to following options.

The first option is that we keep functions pure and the connection is passed as a parameter by the person writing the code (the user). With this approach the burden is squarely on the user to remember to pass the correct connection to the function. This becomes error prone for things like transactions where the developer has to pass the transaction connection as opposed to the normal database connection. The worst part in this scenario is that the code will run except it won't run transactionally. This is a bug that is difficult to catch as it only surfaces in cases where the transaction should be rolled back.

It's worth pointing out that you don't need to use dynamic or global vars to avoid this scenario. You could just remove the original database connection from scope:

  (defn foobar* [tx]
    (foo tx)
    (bar tx))

  (defn foobar [db]
    (sql/with-transaction [tx db] (foobar* tx))

Or shadow the original binding:

  (defn foobar [db]
    (sql/with-transaction [db db]
      (foo db)
      (bar db)))

Ideally you also want a way of testing this behaviour, even with dynamic or global scope. If it's critical to your application that database operations run in a transaction, you should have a way of verifying that.

For instance, you might use a protocol to factor out the operations on the database:

  (defprotocol Database
    (wrap-transaction [db f])
    (foo db)
    (bar db))

  (defn foobar [db]
    (wrap-transaction db
      (fn [tx]
        (foo tx)
        (bar tx))))

This allows tests to be written to verify that foo and bar are called within wrap-transaction, and to verify our production implementation of the protocol correctly wraps the function f in a SQL transaction.

If you're writing something that depends upon behaviour that can't be verified, you're going to run into problems no matter how you structure your application.
 
The alternative is to encapsulate the database connection management in the initialization logic in the namespace managing the connection. This way the query functions can be context aware and ensure that the correct connection is used automatically.

Do you mean storing the database connection in a global var, not just a dynamically scoped one?

In such a case, how do you run tests without wiping the development database? Do you run the tests in a separate process, or shut down the dev server before running tests, or do you not mind if your development database is cleared by your tests?

- James

Dmitri

unread,
Aug 5, 2015, 1:04:05 PM8/5/15
to Clojure, ja...@booleanknot.com
I agree that wrapping the functions is a sensible approach. Using wrap-transaction is precisely how I ended up doing it with the conman yesql wrapper https://github.com/luminus-framework/conman

The approach I took there is to have the generated functions use the connection atom, and have with-transaction rebind it to the transactional connection within its scope. However, the functions also accept an explicit connection, and with-transaction also provides explicit access to the transactional connection:

(with-transaction [t-conn conn]
 
(jdbc/db-set-rollback-only! t-conn)
 
(create-user!
   
{:id         "foo"
     
:first_name "Sam"
     
:last_name  "Smith"
     
:email      "sam....@example.com"})
 
(get-user {:id "foo"}))



This approach works for testing, since the connection can be passed explicitly, but I would argue that in practice it's better to use a separate test database instead of the dev instance.

Colin Yates

unread,
Aug 5, 2015, 1:25:38 PM8/5/15
to clo...@googlegroups.com, ja...@booleanknot.com
This.

I would repeat with as strong an emphasis as possible the sanity
of using a dedicated test database - how else can you test schema
creation for example...

Dmitri writes:

> I agree that wrapping the functions is a sensible approach. Using
> wrap-transaction is precisely how I ended up doing it with the conman yesql
> wrapper https://github.com/luminus-framework/conman
>
> The approach I took there is to have the generated functions use the
> connection atom, and have with-transaction rebind it to the transactional
> connection within its scope. However, the functions also accept an explicit
> connection, and with-transaction also provides explicit access to the
> transactional connection:
>
> (with-transaction [t-conn conn]
> (jdbc/db-set-rollback-only! t-conn)
> (create-user!
> {:id "foo"
> :first_name "Sam"
> :last_name "Smith"
> :email "sam....@example.com"})
> (get-user {:id "foo"}))
>
>
>
>
> This approach works for testing, since the connection can be passed
> explicitly, but I would argue that in practice it's better to use a
> separate test database instead of the dev instance.
>
> On Wednesday, August 5, 2015 at 11:11:19 AM UTC-4, James Reeves wrote:
>>
>> On 5 August 2015 at 14:03, Dmitri <dmitri....@gmail.com <javascript:>>

James Reeves

unread,
Aug 5, 2015, 2:34:31 PM8/5/15
to Dmitri, Clojure
On 5 August 2015 at 18:04, Dmitri <dmitri....@gmail.com> wrote:
I agree that wrapping the functions is a sensible approach. Using wrap-transaction is precisely how I ended up doing it with the conman yesql wrapper https://github.com/luminus-framework/conman

The approach I took there is to have the generated functions use the connection atom, and have with-transaction rebind it to the transactional connection within its scope. However, the functions also accept an explicit connection, and with-transaction also provides explicit access to the transactional connection

So when you're testing, presumably you use a dynamic binding to override the global connection to the test database?

- James

Dmitri

unread,
Aug 5, 2015, 3:45:08 PM8/5/15
to Clojure, dmitri....@gmail.com, ja...@booleanknot.com
There are a number of options here depending on your workflow. You could override the dynamic var, create a separate connection for tests and pass it around explicitly in test code, or get the connection from the environment. The last option is what I tend to do, the profiles.clj will contain separate URLs for testing and dev databases:

{:profiles/dev  {:env {:database-url "jdbc:postgresql://localhost/myapp_dev?user=db_user_name_here&password=db_user_password_here"}}
 :profiles/test {:env {:database-url "jdbc:postgresql://localhost/myapp_test?user=db_user_name_here&password=db_user_password_here"}}}

The tests run in the test profile and so get the test database connection from the environment.

J. Pablo Fernández

unread,
Aug 6, 2015, 12:56:49 PM8/6/15
to clo...@googlegroups.com, Dmitri

On 5 August 2015 at 19:33, James Reeves <ja...@booleanknot.com> wrote:
So when you're testing, presumably you use a dynamic binding to override the global connection to the test database?

The wrap transaction always overrides the dynamic binding, whether it's in tests or not.

Eric Normand

unread,
Aug 6, 2015, 11:58:50 PM8/6/15
to Clojure, dmitri....@gmail.com
Hi Pablo!!!!

I think the reason you've got so many responses is because most people share the same problem. I do think this is a fruitful area of discussion. There are multiple ways to go about it.

Based on your original post, I'll share my two cents:

In my experience, mixing an atom and dynamic vars is unnecessary and brings complexity that is not worth it. It would be better to separate the two. That is, use the atom, which would allow you to change the state (to reconnect, etc). But also use the dynamic var so that the dynamic scope is changed. That means that you may want to define a macro (but may also do without);

(def current-connection (atom nil))
(def ^:dynamic db nil)

;; somewhere later
(binding [db @current-connection]
  (do-some-db-operations))

Transactions could re-bind `db` in their dynamic scope.

Rebinding a dynamic var to a new atom is a little weird. It mixes mutation with dynamic scope. I'm sure there are weird edge cases that are hard to reason about. This way unifies the operations (even the first db operation needs a binding) and separates it from mutation, which appears necessary in your code.

The other thing is that you can make a nice Ring middleware like this:

(defn wrap-db [handler]
  (fn [req]
    (binding [db @current-connection]
      (handler req))))

The connection it gets at the beginning will be with it throughout, so you won't ever have a weird case where the connection atom is changed in the middle.

Thanks
Eric
Reply all
Reply to author
Forward
0 new messages