No, basically you only denormalize when it makes sense, and that is usually in one of two scenarios - sharding under some circumstances and to persist a point in-time view of data. In those 2 scenarios it actually makes sense to denormalize, hence its not a sacrifice (for example, you WANT the product price to be denormalized into the order object so future price changes won't affect past orders).
We have multi-maps and includes to handle 99% of all other cases
I'm not sure that technically is denormalization?
You may be storing the same piece of data, but you're actually storing two different pieces of information. For example, "the customer's current address" is not the same information as "the customer's address at the time he placed the order" - even if the data you're persisting is identical in those two cases, the information it conveys is different when you're preserving historical information.My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.
Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?
I saw a custom key examples somewhere, where the user's e-mail address was being used as the primary key, and references to that user would be stored as "users/j...@doe.com", which seems clever at a glance - but what happens if the user changes their e-mail address? seems like an extremely bad idea, I don't know why such an example would even be cited...
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.
Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?
I saw a custom key examples somewhere, where the user's e-mail address was being used as the primary key, and references to that user would be stored as "users/j...@doe.com", which seems clever at a glance - but what happens if the user changes their e-mail address? seems like an extremely bad idea, I don't know why such an example would even be cited...
inlineOn Thu, Apr 12, 2012 at 1:23 AM, mindplay <ras...@mindplay.dk> wrote:My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.You are breaking apart the key in your head. That isn't how it works in RavenDB.This is a _single value_. It is structured this way for one major reason. Readability. Because having this in this fashion make is easier to work with your code.
I saw a custom key examples somewhere, where the user's e-mail address was being used as the primary key, and references to that user would be stored as "users/j...@doe.com", which seems clever at a glance - but what happens if the user changes their e-mail address? seems like an extremely bad idea, I don't know why such an example would even be cited...
So what do you call a "denormalization"?
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?
We don't have the notion of foreign keys - RavenDB is NOT relational. "123" is not a document in a table of "categories"; "categories/123" is a document, and internally we group similar documents by their Entity-Name under a logical unit called a Collection.
Coming from a relational background, it's important to remember 2 things: the complete object graph is persisted, and we don't care about repeating ourselves where it makes sense to do so ("denormalization"). The question we ask when modeling is "what is an object" and "where does it make sense to repeat ourselves". Both are answered by Domain-Driven-Design concepts like an aggregate root (using the transactional boundaries to define discrete objects) and by expected usage patterns.
Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.
As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.
On Wednesday, April 11, 2012 6:46:02 PM UTC-4, Itamar Syn-Hershko wrote:So what do you call a "denormalization"?the introduction of any "accidental" state - when you're storing copies of data to increase performance, for example. Ideally, you should never have to do that.the only time you should have to make copies of data, is when copying the data is part of an operation that isn't "accidental" - that is, it satisfies a real requirement, not just something you have to do because the underlying systems suffer from technical limitations that make them unable to handle normalized models with sufficient performance.
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?the word "categories" is redundant - storing the ID as a string, for that matter, is redundant, if you know every key is going to be a number.
We don't have the notion of foreign keys - RavenDB is NOT relational. "123" is not a document in a table of "categories"; "categories/123" is a document, and internally we group similar documents by their Entity-Name under a logical unit called a Collection.
But the model you're storing contains relations - so the data you're storing is relational in nature, and "categories/123" is a foreign key to a specific document.
If you're going to store a list of category-document ids in a property, storing strings like "categories/123" is denormalization in some sense - you could just as well store the integer "123", since you know this list is going to contain strictly "category" document ids.
Coming from a relational background, it's important to remember 2 things: the complete object graph is persisted, and we don't care about repeating ourselves where it makes sense to do so ("denormalization"). The question we ask when modeling is "what is an object" and "where does it make sense to repeat ourselves". Both are answered by Domain-Driven-Design concepts like an aggregate root (using the transactional boundaries to define discrete objects) and by expected usage patterns.
Well, yes, a complete object-graph is persisted in one shot, but the complete model-graph is not automatically persisted - and you have to design around this when designing your model.Bear with me:Where I would normally have IList<Category> you have IList<string> instead - so now the model itself is not directly traversable.
In a sense, this is an indirect means of defining the boundaries of self-contained documents within your model, as a means of explaining to the data-mapper (Raven) which relations cross boundaries between documents, to prevent it from traversing outside the scope of your document-object-graph.
This adds complexity, because you have to go back to the database and fetch another piece of the model-graph when needed - and this complexity is accidental, because this is not a true expression of your model-graph.
This is what I was getting at early on when we connected on TwitterAlthough I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.I understand that you can't perform a query without providing the full document ID - but if you know you're storing only categories in a specific collection, that ID is easily reconstructed by adding "categories/" in front of the number itself, so there isn't technically any reason why you would need to store the whole string, other than convention, is that correct?As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.
It still seems to that a lot of your considerations are not business-related, but technical.
And it may be that this actually works better in practice - that it's more realistic and honest to deal with storage-mechanisms as what they really are.For me personally, as mentioned, working with the AR implementation in Yii was definitely a lot more transparent and productive than NHibernate, which attempts to fully abstract and hide almost every aspect of persistence.
On Wednesday, April 11, 2012 6:46:02 PM UTC-4, Itamar Syn-Hershko wrote:My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?the word "categories" is redundant - storing the ID as a string, for that matter, is redundant, if you know every key is going to be a number.
If you're going to store a list of category-document ids in a property, storing strings like "categories/123" is denormalization in some sense - you could just as well store the integer "123", since you know this list is going to contain strictly "category" document ids.
Well, yes, a complete object-graph is persisted in one shot, but the complete model-graph is not automatically persisted - and you have to design around this when designing your model.Bear with me:Where I would normally have IList<Category> you have IList<string> instead - so now the model itself is not directly traversable.In a sense, this is an indirect means of defining the boundaries of self-contained documents within your model, as a means of explaining to the data-mapper (Raven) which relations cross boundaries between documents, to prevent it from traversing outside the scope of your document-object-graph.
This adds complexity, because you have to go back to the database and fetch another piece of the model-graph when needed - and this complexity is accidental, because this is not a true expression of your model-graph.
This is what I was getting at early on when we connected on TwitterAlthough I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.I understand that you can't perform a query without providing the full document ID - but if you know you're storing only categories in a specific collection, that ID is easily reconstructed by adding "categories/" in front of the number itself, so there isn't technically any reason why you would need to store the whole string, other than convention, is that correct?
As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.It still seems to that a lot of your considerations are not business-related, but technical
On Wednesday, April 11, 2012 6:46:02 PM UTC-4, Itamar Syn-Hershko wrote:So what do you call a "denormalization"?the introduction of any "accidental" state - when you're storing copies of data to increase performance, for example. Ideally, you should never have to do that.the only time you should have to make copies of data, is when copying the data is part of an operation that isn't "accidental" - that is, it satisfies a real requirement, not just something you have to do because the underlying systems suffer from technical limitations that make them unable to handle normalized models with sufficient performance.
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?the word "categories" is redundant - storing the ID as a string, for that matter, is redundant, if you know every key is going to be a number.
We don't have the notion of foreign keys - RavenDB is NOT relational. "123" is not a document in a table of "categories"; "categories/123" is a document, and internally we group similar documents by their Entity-Name under a logical unit called a Collection.
But the model you're storing contains relations - so the data you're storing is relational in nature, and "categories/123" is a foreign key to a specific document.I understand that RavenDB itself is not "relational" in the traditional sense, but clearly a lot of work went into providing means of dealing with relational data. I can think of very, very few applications that would not need extensive relation management. Certainly every example I've seen so far makes use of relations and foreign keys, or "document ids", if that's what you prefer to call them.
If you're going to store a list of category-document ids in a property, storing strings like "categories/123" is denormalization in some sense - you could just as well store the integer "123", since you know this list is going to contain strictly "category" document ids.
Coming from a relational background, it's important to remember 2 things: the complete object graph is persisted, and we don't care about repeating ourselves where it makes sense to do so ("denormalization"). The question we ask when modeling is "what is an object" and "where does it make sense to repeat ourselves". Both are answered by Domain-Driven-Design concepts like an aggregate root (using the transactional boundaries to define discrete objects) and by expected usage patterns.
Well, yes, a complete object-graph is persisted in one shot, but the complete model-graph is not automatically persisted - and you have to design around this when designing your model.Bear with me:Where I would normally have IList<Category> you have IList<string> instead - so now the model itself is not directly traversable.In a sense, this is an indirect means of defining the boundaries of self-contained documents within your model, as a means of explaining to the data-mapper (Raven) which relations cross boundaries between documents, to prevent it from traversing outside the scope of your document-object-graph.This adds complexity, because you have to go back to the database and fetch another piece of the model-graph when needed - and this complexity is accidental, because this is not a true expression of your model-graph.This is what I was getting at early on when we connected on Twitter
Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.I understand that you can't perform a query without providing the full document ID - but if you know you're storing only categories in a specific collection, that ID is easily reconstructed by adding "categories/" in front of the number itself, so there isn't technically any reason why you would need to store the whole string, other than convention, is that correct?
As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.It still seems to that a lot of your considerations are not business-related, but technical.And it may be that this actually works better in practice - that it's more realistic and honest to deal with storage-mechanisms as what they really are.For me personally, as mentioned, working with the AR implementation in Yii was definitely a lot more transparent and productive than NHibernate, which attempts to fully abstract and hide almost every aspect of persistence.
It may be that attempting to run and hide from any aspects of persistence is somewhat delusional.I am definitely still very interested in RavenDB on account of it's simplicity. I'm just trying to gauge whether that simplicity will scale well to a model and requirements as complex as the ones in the application I'm building.
No, actually, that isn't how you would model this.This is how you are _used_ to modeling this, because you are thinking about how NHibernate does this.You have to understand that this has been an explicit design choice with RavenDB. I have seen the problems that you get into when you try to go the magic route.Since you noted the issue with `is` and `GetType()`, and you probably know about the SELECT N+1 issues, you are probably familiar with those issues.Instead of trying to imagine a world where everything is in memory (the abstraction that NHibernate is trying to create), RavenDB follows the Aggregate model, where there are clear boundaries between different entities. That matches well to the way things actually work, because you can rely on being able to cheaply access anything inside the aggregate, and there is an explicit step that you have to take to access anything that isn't in the aggregate.Note that RavenDB contains a lot of features, like `Include()` and `Live Projections` that allows you to easily get the related data, but again, we do that as an explicit step because you _have_ to respect the boundary.> So now there's an aspect of persistence to this entity, which suggests to me that the intention is to write dedicated DTO's rather than "business"-entities, and persist those?Nope, it is just that you model you entities in a different way than you would using a relational database.> but my concern here is not really performance, but transparency.So was mine when designing this. But instead of pretending that "oh, it doesn't matter, let us deal with this in the OR/M layer", I decided that we need to be transparent about the actual implications of what you are doing. The end result is a much better application, because you don't have hidden snares waiting for you.> In an ideal world, I would just write completely persistence-ignorant models, optimizing for the problem-domain of the software itself, without regard for persistence, perhaps other than specifying which properties are persistent or transient.No, you won't. Because even if we assume that you entire model is in memory, that is _still_ a bad way to design things. You need to think about things like concurrency, you need to think about transaction boundaries, you need to think about how to actually _deal_ with things. What you are saying is valid if you had only one user, only one time. But it falls apart once you start to consider what is actually going on.
Thank you for taking the time to elaborate on this.I guess I don't see the practical reason why aspects like persistence and transaction-boundaries should affect the OOP design-patterns you choose?
For example, the following models Stores that are closed on particular days:public class Closing{public DateTime ClosedFrom { get; set; }public DateTime ClosedTo { get; set; }}public class Store{public IList<Closing> Closings { get; set; }}This would be harder to model with a relational database, where this would actually persist in two tables, say, "stores" and "closings". That's a lot of complexity just to store something that is really truly composite - and in a sense, it's "wrong", because a closing-date is unique to a Store, it's not an independent thing that has any meaning outside the context of the Store it belongs to. Yet, with an RDBMS, we're forced to store them as independent units.Much simpler with RavenDB, where this is stored as one unit, a document. "It just works." :-)Now let's say that stores are listed in a number of cities.public class Store{public IList<Closing> Closings { get; set; }public IList<City> Cities { get; set; }}Now a City does not belong to a Store, and a Store does not belong to a City - they're related of course, but neither has ownership of the other. They are independent units.Let's be clear about the fact that I didn't choose this design-pattern because I'm thinking about persistence - this is basic, traditional, persistence-ignorant OO.And now you want me to define the transaction-boundary by changing the model:public class Store{public IList<Closing> Closings { get; set; }public IList<string> Cities { get; set; }}
My problem with this approach is, you did a lot more than just defining a transaction-boundary for persistence, and it has far-reaching consequences.
Why can't you just declare the document-boundary instead? For example:public class Store{public IList<Closing> Closings { get; set; }[Documents]public IList<City> Cities { get; set; }}And then let the persistence layer do the work?
There are at least a few common patterns that cover probably 90-95% of common relations in any given model.Why can't we model those cases using declarations instead of code?
Why can't we model those cases using declarations instead of code?The way I see it, you want to start writing a lot of complex stuff that will most probably be prone to a lot of bugs just to solve what you consider to be a problem, which is gone in 3 lines of code. I think before we move on with this conversation, you need to explain WHY this actually bugs you, and why you REALLY need that?...
On Thursday, April 12, 2012 11:12:48 AM UTC-4, Itamar Syn-Hershko wrote:Why can't we model those cases using declarations instead of code?The way I see it, you want to start writing a lot of complex stuff that will most probably be prone to a lot of bugs just to solve what you consider to be a problem, which is gone in 3 lines of code. I think before we move on with this conversation, you need to explain WHY this actually bugs you, and why you REALLY need that?...3 lines of code here and 3 lines of code there. And every time, code that most likely has nothing to do with the task it's performing.That's mainly what bugs me - when I'm writing code that works with the model, I don't want to have to think about persistence, I want to focus on the task I'm trying to perform.
There's also a maintenance issue - suppose your model changes, and something that was previously a component is now an independent document, so you change the collection from IList<Category> to IList<string> and this breaks all of your existing business-procedures. Suppose you have 100s of business-procedures that require a list of categories.
In light of this last consideration alone, I'm tempted to consistently add methods for every collection to every model object, essentially duplicating all of my collections:class Store{public IList<string> Categories { get; set; }
public IEnumerable<Category> GetCategories(){// fetch and return Categories...}}
That's mainly what bugs me - when I'm writing code that works with the model, I don't want to have to think about persistence, I want to focus on the task I'm trying to perform.Tough luck, persistence IS part of the problem that you are trying to solve, and you have to take it into account
May I suggest, go and write an idiomatic RavenDB application, then come back and tell us what the experience was like.Right now, you don't have valid reasons, you have gut feeling based on experience in completely different technology and methodology.
On Thursday, April 12, 2012 12:44:20 PM UTC-4, Oren Eini wrote:That's mainly what bugs me - when I'm writing code that works with the model, I don't want to have to think about persistence, I want to focus on the task I'm trying to perform.Tough luck, persistence IS part of the problem that you are trying to solve, and you have to take it into accountof course, but I still feel like it should be handled in isolation and not mixed into business-procedures.
May I suggest, go and write an idiomatic RavenDB application, then come back and tell us what the experience was like.Right now, you don't have valid reasons, you have gut feeling based on experience in completely different technology and methodology.that's going to be hard sell, since I don't understand your methodology - probably the only way I would get to do that, is if I had the time to do it on my own dime. I can't really go to management or to my client and try to sell them on an idea I don't understand.if it works as well as you claim, hopefully it will become popular enough to warrant a book - not just on the software, but on the methodology.The examples and tutorials available at the moment are all very trivial and show individual features working well in isolation, but I don't feel like there's enough depth to provide a big picture.
My main concern is that this won't scale in terms of complexity. If you could show me a complex app that leverages the features and applies the methodology in practice, perhaps this would be more accessible.
RavenDB runs msnbc.com and pluralsight, among others.We have several big sample apps, RacconBlog is a god example.
We have several big sample apps, RacconBlog is a god example.
As both Oren and I suggested - spend a few days trying to build a PoC for your problem with RavenDB, and ping us here for help whenever you need it. From past experience, it would be MUCH faster to set it up and resolve prospect problems there than continuing the RDBMS route, especially if your app is as complex as you say it is.
Seeing the number of trips to the server disturbs you that much, and the hypothetical problem of converting a List<Category> to List<string>, it is clear the relational modeling is still guiding you. This is completely fine, but if you want to REALLY try RavenDB you need to try and ignore it for a couple of days.
On Sat, Apr 14, 2012 at 6:21 PM, Stijn Volders <stijn....@gmail.com> wrote:
> 1) Maybe you find Vaughn Vernon's essay on Effective Aggregate Design
> helpful (google it, there are 3 pdf's and 2 video's). I was an eye opener
> for me and will help you understand how to model your domain and the AR
> pattern.
thanks, I will definitely take a look at that. I wasn't aware I had
anything left to learn in that area, but I'm starting to think maybe
there's some important pieces I'm still missing...
> 2) the way your current application is designed will cause you trouble, no
> matter what abstraction layer you use. As you mentioned, NHibernate is
> giving you also a lot of pain.
It is - but, for the most part, once you get it working, it actually
just works and you don't have to think much more about it. I'm just
talking about loading/saving/traversing the model here - queries are
and remain a real pain. That's probably the worst part about NH, I
think - HQL isn't as verbose as SQL, and it respects your defined
relations and writes joins for you, but everything above and beyond
that seems to be just as much work and causes just as many problems as
old-school SQL...
(for the most part I actually find SQL to be easier and more
transparent, and it seems that the most common questions on forums etc
revolve around "how to do this SQL query in HQL" or the criteria
API...)
> 3) while not related to your questions, you might want to split up your
> application. Working in such a large monolithic application isn't going to
> bring anything other than friction and pain IMHO
I would if I could - I am not the first or only person to suggest that
to the client, but they insist on very large iterations. Nor I or
management have been able to convince them of the fact that it's
safer, wiser and more cost-effective, and less disruptive to their
business process to build in smaller iterations and introduce the
software progressively...
Thanks for the discussion, I think I learned a lot.I'm working on a really simple example to get the hang of it. What I want to do is create website/mobile app on top of MVC Web API.Say I want to model FAQ list:public class Faq{string Id { get; set; }string Question { get; set; }string Answer { get; set; }List<Category> Categories { get; set; }}public class Category{string Id {get; set; }string Name { get; set; }int Order { get; set; } // need order when presenting for example list of categories}I started first with List<string> CategoryIds, but from the above discussion I learned not to do that.
Ok, still have the question what if we rename the category? Should we go through all Faq's containing the category? What happens in cloud hosted scenario when you have to update a lot of Faq's? If each request costs money, we would have an O(n) approach?
I see also advantages, I assume full text search can index all info on your document, so you get more meaningful results instead of searching over documents having references!Now I want to put an MVC web api on top.i will get requests like:GET REQUEST: http://myserver/API/Faqs/123Will I return the same FAQ class used for persistance, or create a Web API "View model" class, and thanslate?
I don't want the user of my api get a category Id categories/123, she should only see 123, so an new GET REQUEST http://myserver/API/Categories/123 can be done.
Another question: Should I maintain a separate set of Category documents? I need a picker for available categories, so need to trach them somewhere... Should I have each category as a seperate document, or create a single document witth all categories?
Any ideas on this?
> I'm still not sure about list of ids ... although one category could be used
> in one or more Faqs, we are getting relational. What is the impact on full
> text search?.
Even in nosql-land we have relations between documents ;)
When defining an index you can get related documents into the index as if
all information was located in that one single document. So that
should no be an issue.
> I know RavenDB can do custom id's, don't know about web api. User only sees
> web api, so should use it in web api, but to use it in RavenDB as well means
> transforming between categories/mobile and mobile just like categories/123
> and 123.
It really depends how you construct your API. You could setup routes
that take the id as 'categories/123' instead of just '123'.
Personally I would not like to construct the id for RavenDB, I'd
prefer to have the full id from the API.
You could encounter situations where you do not know whether an id
points to a category or to, say, a faq document. In those cases a
'full' id has a lot more advantages.
// Ryan
3 things came in mind1) Maybe you find Vaughn Vernon's essay on Effective Aggregate Design helpful (google it, there are 3 pdf's and 2 video's). I was an eye opener for me and will help you understand how to model your domain and the AR pattern.2) the way your current application is designed will cause you trouble, no matter what abstraction layer you use. As you mentioned, NHibernate is giving you also a lot of pain.
3) while not related to your questions, you might want to split up your application. Working in such a large monolithic application isn't going to bring anything other than friction and pain IMHO
On 14-apr.-2012, at 22:59, mindplay wrote:
Furthermore, I'm starting to see why NHibernate doesn't really work well for me. So here's the main thing that's starting to dawn on me, and please confirm or correct me on this:It seems that the idea behind NH is to configure the expected data-access strategies for the model itself. You write configuration-files that define the expected data-access strategies, but potentially, you're doing this based on assumptions about how you might access the data in this or that scenario.The problem I'm starting to see, is that you're defining these assumptions statically - and while it is possible to deviate from these defined patterns, it's easy to think that once you've defined your access strategies, you're "done", and the model "just works" and you can focus on writing business logic, which too frequently turns out to be untrue in practice.This contrasts with RavenDB, where you formally define the access strategies for specific scenarios - rather than for the model itself. And of course the same access strategy may work in different scenarios, but you're not tempted to assume that a single access strategy is going to work for all scenarios.You're encouraged to think and make choices about what you're accessing and updating in each scenario, rather than just defining one overriding strategy and charging ahead blindly on the assumption that it'll always just work, or always perform well, or always make updates that are sufficiently small to not cause concurrency problems.Am I catching on?
Thank you for pointing me to those 3 PDF's - I read the first one last night, and I'm starting to see what's going on here. Let me explain what I'm thinking, so you can verify (or not) if I'm starting to "see the light" ;-)Back when I took my education in systems development, basically, I was taught to build aggregates as large, as complete and as connected as possible. But that was 14 years ago, and I'm starting to think, what they taught me back then was based on the kind of thinking that works for single-user, typically desktop applications, where the entire model was assumed to be in-memory, and therefore had to be traversible, since there was no "engine" you could go back to and ask for another piece of the model.I can see now why that doesn't make sense for concurrent applications with large models persisted in the background. It just never occurred to me, and looked extremely wrong to me, because that's not how I was taught to think.
Furthermore, I'm starting to see why NHibernate doesn't really work well for me. So here's the main thing that's starting to dawn on me, and please confirm or correct me on this:It seems that the idea behind NH is to configure the expected data-access strategies for the model itself. You write configuration-files that define the expected data-access strategies, but potentially, you're doing this based on assumptions about how you might access the data in this or that scenario.The problem I'm starting to see, is that you're defining these assumptions statically - and while it is possible to deviate from these defined patterns, it's easy to think that once you've defined your access strategies, you're "done", and the model "just works" and you can focus on writing business logic, which too frequently turns out to be untrue in practice.This contrasts with RavenDB, where you formally define the access strategies for specific scenarios - rather than for the model itself. And of course the same access strategy may work in different scenarios, but you're not tempted to assume that a single access strategy is going to work for all scenarios.You're encouraged to think and make choices about what you're accessing and updating in each scenario, rather than just defining one overriding strategy and charging ahead blindly on the assumption that it'll always just work, or always perform well, or always make updates that are sufficiently small to not cause concurrency problems.
Am I catching on?
I will definitely ready the other two articles, and I understand the author has a book coming out on the subject, too.Is there a book already available that I should read, that can help teach me this kind of thinking
Hi Stijn,I've been reading those three articles - very useful stuff, thanks.
Just one question - in parts 2 and 3 he talks a lot about eventual consistency, and mentions that "developers are usually indoctrinated with an atomic change mentality".Depending on the circumstances, I'm fine with the idea of eventual consistency at the database-level. But eventual consistency in the software itself, now that sounds like an extremely risky approach, and a bad idea all-round. In order to do this safely, you would have to basically predict and know all possible/expected uses and extension-points in the software, into the future - I don't understand how you could possible allow for delayed updates, and rely on "retrying", and I don't understand how you could possibly permit and even design for possible eventual failure as described in this article??
Eventual consistency has to be eventual, doesn't it? as in, it will happen, sooner or later. According to Wikipedia:
"It means that given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent"Alright, so, if eventual failure is possible, that's not eventual consistency, that's "maybe, possibly consistency, who knows" ;-)
If you had to design with eventual failure in mind, that's going to add a lot of complexity. There's nothing worse than having to debug periodic errors, where the reasons for failure are so complex that you spend the first day just trying to consistently reproduce the error.I can accept eventual consistency at the DBMS-level, because the DBMS does eventually achieve consistency, assuming it was designed correctly - but such design adds complexity, and I don't want that kind of complexity in my software.I am glad that RavenDB is ACID by default, and that was one of the attractions for me. If you design software on top of that, that isn't atomic, you've really just thrown away that feature, haven't you?
I don't feel like eventual consistency is an acceptable solution to the problem. You often cannot predict application growth, and for applications that grow large, going back to review the entire codebase when adding a new feature, is probably unacceptable.