Thoughts and oppinions about TDD and database related parts of an application

Boas Enkler

unread,

Jul 1, 2013, 5:56:53 AM7/1/13

to clean-code...@googlegroups.com

Hi

I love tdd and I used when ever it is possible for me.

Now I'm thinking about a design for a application. I want to seperate the logic as far as possible from the dataacesss.

For this I have different components some handling logic and some handling database access.

Now in which way do you design the dataaccess to maximize testability of the code ? (ignoring Integration tests for the moment)

Very often I see the repository patten which seems to be quite ok.

But there are also often methods like GetRemindersForDate" or "InvoicesByCustomer" which are mostly directly bound to the DataSource. So this is not so nice for unittesting, and is also some kind of logic, isn't it ?

Or is it just somekind of mapping db Structure to a state (All invoices for customer X have the given ID in Field Y) ?

The other way could be that the repositories could only have generic dataaccess methods like Add,Remove, or something like in .net IQueryable implementation.

But then would repositories really be needed anymore ? Or would this mean that it would be better to just have a generic DataAccess for each database technology?

So how do you design dataaccess for testability?

I'd love to get to hear about your real life experiences and best practices about TDD and DataAccess.

Sebastian Gozin

unread,

Jul 3, 2013, 5:02:41 AM7/3/13

to clean-code...@googlegroups.com

I think you want your data access abstraction to provide a way to express arbitrary queries for data which you can translate into the appropriate database access calls when integrating.
So to use your example of GetRemindersForDate you could have something like this:

repository.list([
date:[
eq:now()
]
])

Which you could then transform into sql SELECT * FROM reminders WHERE date = ?
Or if you are using an in memory store reminders.findAll {it.date == expectedDate}

Same for the InvoicesByCustomer query

repository.list([
customer:[
eq:customerId
]
])

I find this is not unlike attempts made in libraries such as hibernate, JPA, GORM with the exception that all of these are tightly coupled to their respective frameworks and therefor nearly impossible to use without an actual SQL database. (GORM does make a reasonable attempt to support unit testing but not to remove the dependency on Grails)

So I tend to come up with my own DSL for expressing queries such as the one above. I don't claim it is feature complete but I can build the feature's I need as I go.

Robert Snyder

unread,

Jul 3, 2013, 9:21:52 AM7/3/13

to clean-code...@googlegroups.com

I believe Uncle Bob had a good write up on this topic which might get you going in the right direction in a book he co-authored called Agile Principles, Patterns, and Practices in C#. From what I remember It is almost exactly as Sebastian Gozin says. You make a abstract class that has all those funny methods in it: "GetRemindersForDate" or "InvoicesByCustomer" Then you'd make classes that inherit this class and implement it accordingly. This way you can make a SQL one, or a Memory one, or one that uses HashTables...etc etc... This will allow you to unit test. Just remember to start with all the basics first with your tests, then work your way down to all the nitty gritty stuff.

Uncle Bob

unread,

Jul 4, 2013, 9:32:46 AM7/4/13

to clean-code...@googlegroups.com

I prefer methods like GetRemindersForDates. Indeed, I want all queries to be represented by methods in the repository interfaces. This makes unit testing the business rules very easy, since the mocks that implement the repository interfaces can return exactly what the business rules expect. It's also pretty easy to unit test the repository, because you mock out the lower level data access layer and just make sure that calls to the query functions generate the appropriate SQL. (If you are using a SQL database). In the best of all cases, the unit tests never quite get to the database itself.

Boas Enkler

unread,

Jul 6, 2013, 7:55:28 AM7/6/13

to clean-code...@googlegroups.com

Thanks for your reply.
Uncle Bob your point of view is quite like mine.

But there are some points i don't feel sure with and want to mention it . Perhaps you can give me some orientation if I'm right?

The First thing is that these methods tend to hide some kind of logic. But I assume that is only "mapping logic" for the database.

For example if you want to show popup windows for :
- new unread entries,
- entities with a past reminder
- and due entities

Now they could be entries in a table "tasks" with differeknt flag fields (e.g. IsNew, ReminderDate,DueDate)

In worst case these could also be placed in different tables.

Than would you make a "GetPopUpRelatedItems" Method or 3 Method and call each one by one ?
The second possibility seems for me to be cleaner, but more expensive looking at performance.
Also what to do if they are places in different tables.

I think repository should not be tied to the current database structure. More to sematnic groups of items, right?

But perhaps (be carefull of optimisations) that is not the point.

An other problem i see are technology related problems.

For Example in .net one main technology for accessing data is the "EntityFramework". But there a lot of state based Functions. For examples entities read from the database are bound to a context, and cannot be inserted into a other database context (even if on same database). And Entities for that should a update statement should be generated, have to be bound to a context ( best from that one that they were read from)

So there has to be logic to handle bind and undbing entities and so on.

I tend to not use this technology because i brings too much complexity (I'm also not sure that i need the benefits, like complex transaction handling , complex caching strategy and so on)
But I fear the moment i could need them. (Yeah I know YAGNI ;-) and defere decissions )

But this kind of changes would end up in changes, hat will hit the repositories, their structure and perhaps also components that are using the repositories. Sure I have my tests. but I don't know how big the changes would be when changing the kind of database. I've seen some projects where Some components should be aware of the state of an entity (is it a new one, was it read and is tied to the database and so on...)
The code was new for me and I couldn't see if this is the symptom of brownfield or if this things were really needed to get that database stuff correctly working.

Document based database look much cleaner and easier too use.

Especially in this case and looking and Episode 10 of architectures, it is hard to derefer the database decission.
My Projekt isn't very big so it is a risk I can take. But I'm not sure on how to handle this issue in big projects.

Could you give me some advice / expereince of you?

Uncle Bob

unread,

Jul 11, 2013, 5:29:36 PM7/11/13

to clean-code...@googlegroups.com

Part of the problem of separating the repositories from database implementation is that there are efficiencies that you can lose. Your example of the three pulldown lists is a good one. On the repository side, you'd like to have three different methods. On the implementation side you'd like to use one query.

One way to resolve this is to allow the implementation to make assumptions about what the repository side is doing. The three repository methods could all use the same query; and could memoize the result so that the query is only invoked once. Those kinds of optimizations can be inserted later if they are needed.

Andreas Schaefer

unread,

Nov 13, 2013, 3:49:26 PM11/13/13

to clean-code...@googlegroups.com

As the discussion so far is only related to read and querying operations, in paticular concerning mostly non business rule repository/gateway implementations (not interfaces, because repository interfaces are part of the "core" entity and interactor "business rule" components [implementation wise], they only dictate the methods to be implemented by the repository plugIns), I'm wondering if the mentioned kind of method design (a larger number of fine grained interface methods .. containing business rules in some way, in this read data scenarios in the form of containing specialized "hardcoded" filter criteria [by the semantics of the methods names]) should be used for update operations as well.

to put it more clearly, should the business rules define interface repository methods like "setDueDate(someDate)" on a TaskRepository??

I'm asking this because I'm having a debate with a fellow who would prefer it that way. Whereas as far as I did get it, e.g. regarding Ivar, Bob, DDD community and lots of others, I'd almost always go the way of dealing with entities when it comes to updates. In the above example I'd sure could have an TaskInteractor "setDueDate", but it would ...
1) request the task entity from the repository
2) validate the new DueDate
3) set the DueDate in the task entity object
4) give the task entity over to a general "save(taskEntity)" method of the taskRepository (which does no validations)

this approach also provides a single point of mutability, the entity .. or aggregate root, if you rather have a more DDD oriented point of view.
and if a validation rule would need other properties of the entity to validate against, you already have the entity by hand.

otherwise the repository itself would potentially need to validate, what would be a violation of the clean architecture, wouldn't it?

please correct me, but shouldn't repositories always deal with entities anyway, even regarding the read operations? .. even the above mentioned examples of read methods would all return entities, or (potentially read only) "ViewEntities" if you so will. but these return type classes would still all be part of the business rules and get defined as entities, even if they are simple(r) value types or read only views (where you perhaps would do JOINS in the database, or map reduces etc.).

I'd love to hear your thoughts on that.

regards
Andreas

Uncle Bob

unread,

Nov 14, 2013, 8:38:56 AM11/14/13

to clean-code...@googlegroups.com

Andreas,

Overall, I think it's better if the repository interfaces deal with whole entities. Setting individual attributes through the repositories couples those interfaces to the internals of the entities. Such coupling should be avoided where possible.

Someone might point out that the query methods of the repository interface already couple to the entity internals. For example, getTasksWithDueDatesAfter(Date);. However, this kind of coupling is a lot softer than offering setters on every attribute.

Boas Enkler

unread,

Nov 14, 2013, 9:37:00 AM11/14/13

to clean-code...@googlegroups.com

Thanks for your reply.

When you say

"Overall, I think it's better if the repository interfaces deal with whole entities."

Do you mean getting the businessrules entites or some kind of special entities / dtos for this concern ?

For example :

SetAppointmentDueDate(DueDateUpdateModel)

The main thing i'm worried about using the businessrules entities is that i will get more complexity in the interactors, tests for the interactors and Gateway implementations.

I've descriped my thoughts more detailed in a other post because i have seen your reply too late , sorry .But It would be really great if you could point out what you think about that.

s. https://groups.google.com/forum/#!topic/clean-code-discussion/2SYLpxmg3I0

I think having the whole entity as a parameter would result in having more dann datalayer abstraction in the gateway. The Entity always has a lot of properties which should perhaps not be updated.

Ingoring this would result in datacollisions and unexpected behavior.

For Example a "Remind me in 5 minutes" could result in loosing subject changes other persons have done.

A more simple interface of the gateway would eliminate that , be more expressive and easier to test.

I'm with you that having simple types there for example

- SetDueDate(newDueDate)

smells.

but

- SetDueDate(UpdateAppointmentDueModel)

looks for me much easier and more readable.

looking forward for reply and good discussions with Andreas :-)

Andreas Schaefer

unread,

Nov 15, 2013, 3:15:42 PM11/15/13

to clean-code...@googlegroups.com

"Do you mean getting the businessrules entites or some kind of special entities / dtos for this concern ?"

Aren't all entities business rules!? .. if "special" or not.

(remote facade/ [web] transport layer) DTOs aren't part of the "inner core" EBI, unless you mean the interactor requests and responses. but in this case the gateways must not have a dependancy to them, do they?
So you'd need seperate classes to shove the interactor request data into. what would one call them if not entities!?

"I think having the whole entity as a parameter would result in having more dann datalayer abstraction in the gateway. The Entity always has a lot of properties which should perhaps not be updated.
Ingoring this would result in datacollisions and unexpected behavior.

For Example a "Remind me in 5 minutes" could result in loosing subject changes other persons have done."

- do just a field update of the changes values
- don't do "last write wins", have some kind of concurrency control, maybe "Optimistic concurrency control" (e.g. by leveraging revisions, HTTP ETags, see: http://en.wikipedia.org/wiki/Optimistic_concurrency_control)

Jop van Raaij

unread,

Nov 24, 2013, 10:18:35 AM11/24/13

to clean-code...@googlegroups.com

I'm at the point I have a memory based version of the repository interface: fast unit tests to test the use cases! But how do I know things will work when using the real database? Mocking out lower level data access is not an option, because I don not own that part of the system. The implementation of the repository interface is bound to the external database api.

I currently solve this by creating a Test class which runs the interface methods for each implementation. Using a Java parameterized test, I am sure all implementations pass the exact same tests. Thoughts on the following three points are appreciated:

In one of your last episodes on advanced TDD you (Uncle Bob) mention you should not test your interfaces, but you don't say why (or do I misunderstand)?
When creating tests in a use-case involving a database, I need to think about putting a test in the testclass for the repository interface implementation. It is hard to move tests written for use-cases to the class testing the interface implementations. Changing use-case (tests) should now take changes of the interface test into account. Otherwise only my memory based implementation, tested by the use-cases indirectly, is tested. I might forget to update tests for the interface implementation
The tests in the repository interface testclass are not really behavioral tests, but more testmethods for the methods in the repository interface. Although the behavior from the use-case test is driving me to write these tests. Maybe more the other way around:

the use-case test drive me to add a method to the repository interface,
which drives me to write a test in the repository interface test,
eventually driving me to implement the real database repository code.

Concluding: it feels tricky when moving from tests specifying behavior for use-cases to tests guarding interface 'behavior'.

Another way I can think of to solve this, would be to run a separate integration test, injecting the real database implementation instead of the memory based version. But as I don't write any code without a test forcing me to, I won't have done any implementation for the real database implementation yet.

Which method (would) you use, and why?

Reply all

Reply to author

Forward