No, basically you only denormalize when it makes sense, and that is usually in one of two scenarios - sharding under some circumstances and to persist a point in-time view of data. In those 2 scenarios it actually makes sense to denormalize, hence its not a sacrifice (for example, you WANT the product price to be denormalized into the order object so future price changes won't affect past orders).
We have multi-maps and includes to handle 99% of all other cases
I'm not sure that technically is denormalization?
You may be storing the same piece of data, but you're actually storing two different pieces of information. For example, "the customer's current address" is not the same information as "the customer's address at the time he placed the order" - even if the data you're persisting is identical in those two cases, the information it conveys is different when you're preserving historical information.My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.
Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?
I saw a custom key examples somewhere, where the user's e-mail address was being used as the primary key, and references to that user would be stored as "users/j...@doe.com", which seems clever at a glance - but what happens if the user changes their e-mail address? seems like an extremely bad idea, I don't know why such an example would even be cited...
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.
Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?
I saw a custom key examples somewhere, where the user's e-mail address was being used as the primary key, and references to that user would be stored as "users/j...@doe.com", which seems clever at a glance - but what happens if the user changes their e-mail address? seems like an extremely bad idea, I don't know why such an example would even be cited...
inlineOn Thu, Apr 12, 2012 at 1:23 AM, mindplay <ras...@mindplay.dk> wrote:My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.You are breaking apart the key in your head. That isn't how it works in RavenDB.This is a _single value_. It is structured this way for one major reason. Readability. Because having this in this fashion make is easier to work with your code.
I saw a custom key examples somewhere, where the user's e-mail address was being used as the primary key, and references to that user would be stored as "users/j...@doe.com", which seems clever at a glance - but what happens if the user changes their e-mail address? seems like an extremely bad idea, I don't know why such an example would even be cited...
So what do you call a "denormalization"?
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?
We don't have the notion of foreign keys - RavenDB is NOT relational. "123" is not a document in a table of "categories"; "categories/123" is a document, and internally we group similar documents by their Entity-Name under a logical unit called a Collection.
Coming from a relational background, it's important to remember 2 things: the complete object graph is persisted, and we don't care about repeating ourselves where it makes sense to do so ("denormalization"). The question we ask when modeling is "what is an object" and "where does it make sense to repeat ourselves". Both are answered by Domain-Driven-Design concepts like an aggregate root (using the transactional boundaries to define discrete objects) and by expected usage patterns.
Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.
As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.
On Wednesday, April 11, 2012 6:46:02 PM UTC-4, Itamar Syn-Hershko wrote:So what do you call a "denormalization"?the introduction of any "accidental" state - when you're storing copies of data to increase performance, for example. Ideally, you should never have to do that.the only time you should have to make copies of data, is when copying the data is part of an operation that isn't "accidental" - that is, it satisfies a real requirement, not just something you have to do because the underlying systems suffer from technical limitations that make them unable to handle normalized models with sufficient performance.
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?the word "categories" is redundant - storing the ID as a string, for that matter, is redundant, if you know every key is going to be a number.
We don't have the notion of foreign keys - RavenDB is NOT relational. "123" is not a document in a table of "categories"; "categories/123" is a document, and internally we group similar documents by their Entity-Name under a logical unit called a Collection.
But the model you're storing contains relations - so the data you're storing is relational in nature, and "categories/123" is a foreign key to a specific document.
If you're going to store a list of category-document ids in a property, storing strings like "categories/123" is denormalization in some sense - you could just as well store the integer "123", since you know this list is going to contain strictly "category" document ids.
Coming from a relational background, it's important to remember 2 things: the complete object graph is persisted, and we don't care about repeating ourselves where it makes sense to do so ("denormalization"). The question we ask when modeling is "what is an object" and "where does it make sense to repeat ourselves". Both are answered by Domain-Driven-Design concepts like an aggregate root (using the transactional boundaries to define discrete objects) and by expected usage patterns.
Well, yes, a complete object-graph is persisted in one shot, but the complete model-graph is not automatically persisted - and you have to design around this when designing your model.Bear with me:Where I would normally have IList<Category> you have IList<string> instead - so now the model itself is not directly traversable.
In a sense, this is an indirect means of defining the boundaries of self-contained documents within your model, as a means of explaining to the data-mapper (Raven) which relations cross boundaries between documents, to prevent it from traversing outside the scope of your document-object-graph.
This adds complexity, because you have to go back to the database and fetch another piece of the model-graph when needed - and this complexity is accidental, because this is not a true expression of your model-graph.
This is what I was getting at early on when we connected on TwitterAlthough I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.I understand that you can't perform a query without providing the full document ID - but if you know you're storing only categories in a specific collection, that ID is easily reconstructed by adding "categories/" in front of the number itself, so there isn't technically any reason why you would need to store the whole string, other than convention, is that correct?As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.
It still seems to that a lot of your considerations are not business-related, but technical.
And it may be that this actually works better in practice - that it's more realistic and honest to deal with storage-mechanisms as what they really are.For me personally, as mentioned, working with the AR implementation in Yii was definitely a lot more transparent and productive than NHibernate, which attempts to fully abstract and hide almost every aspect of persistence.
On Wednesday, April 11, 2012 6:46:02 PM UTC-4, Itamar Syn-Hershko wrote:My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?the word "categories" is redundant - storing the ID as a string, for that matter, is redundant, if you know every key is going to be a number.
If you're going to store a list of category-document ids in a property, storing strings like "categories/123" is denormalization in some sense - you could just as well store the integer "123", since you know this list is going to contain strictly "category" document ids.
Well, yes, a complete object-graph is persisted in one shot, but the complete model-graph is not automatically persisted - and you have to design around this when designing your model.Bear with me:Where I would normally have IList<Category> you have IList<string> instead - so now the model itself is not directly traversable.In a sense, this is an indirect means of defining the boundaries of self-contained documents within your model, as a means of explaining to the data-mapper (Raven) which relations cross boundaries between documents, to prevent it from traversing outside the scope of your document-object-graph.
This adds complexity, because you have to go back to the database and fetch another piece of the model-graph when needed - and this complexity is accidental, because this is not a true expression of your model-graph.
This is what I was getting at early on when we connected on TwitterAlthough I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.I understand that you can't perform a query without providing the full document ID - but if you know you're storing only categories in a specific collection, that ID is easily reconstructed by adding "categories/" in front of the number itself, so there isn't technically any reason why you would need to store the whole string, other than convention, is that correct?
As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.It still seems to that a lot of your considerations are not business-related, but technical
On Wednesday, April 11, 2012 6:46:02 PM UTC-4, Itamar Syn-Hershko wrote:So what do you call a "denormalization"?the introduction of any "accidental" state - when you're storing copies of data to increase performance, for example. Ideally, you should never have to do that.the only time you should have to make copies of data, is when copying the data is part of an operation that isn't "accidental" - that is, it satisfies a real requirement, not just something you have to do because the underlying systems suffer from technical limitations that make them unable to handle normalized models with sufficient performance.
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?the word "categories" is redundant - storing the ID as a string, for that matter, is redundant, if you know every key is going to be a number.
We don't have the notion of foreign keys - RavenDB is NOT relational. "123" is not a document in a table of "categories"; "categories/123" is a document, and internally we group similar documents by their Entity-Name under a logical unit called a Collection.
But the model you're storing contains relations - so the data you're storing is relational in nature, and "categories/123" is a foreign key to a specific document.I understand that RavenDB itself is not "relational" in the traditional sense, but clearly a lot of work went into providing means of dealing with relational data. I can think of very, very few applications that would not need extensive relation management. Certainly every example I've seen so far makes use of relations and foreign keys, or "document ids", if that's what you prefer to call them.
If you're going to store a list of category-document ids in a property, storing strings like "categories/123" is denormalization in some sense - you could just as well store the integer "123", since you know this list is going to contain strictly "category" document ids.
Coming from a relational background, it's important to remember 2 things: the complete object graph is persisted, and we don't care about repeating ourselves where it makes sense to do so ("denormalization"). The question we ask when modeling is "what is an object" and "where does it make sense to repeat ourselves". Both are answered by Domain-Driven-Design concepts like an aggregate root (using the transactional boundaries to define discrete objects) and by expected usage patterns.
Well, yes, a complete object-graph is persisted in one shot, but the complete model-graph is not automatically persisted - and you have to design around this when designing your model.Bear with me:Where I would normally have IList<Category> you have IList<string> instead - so now the model itself is not directly traversable.In a sense, this is an indirect means of defining the boundaries of self-contained documents within your model, as a means of explaining to the data-mapper (Raven) which relations cross boundaries between documents, to prevent it from traversing outside the scope of your document-object-graph.This adds complexity, because you have to go back to the database and fetch another piece of the model-graph when needed - and this complexity is accidental, because this is not a true expression of your model-graph.This is what I was getting at early on when we connected on Twitter
Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.I understand that you can't perform a query without providing the full document ID - but if you know you're storing only categories in a specific collection, that ID is easily reconstructed by adding "categories/" in front of the number itself, so there isn't technically any reason why you would need to store the whole string, other than convention, is that correct?
As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.It still seems to that a lot of your considerations are not business-related, but technical.And it may be that this actually works better in practice - that it's more realistic and honest to deal with storage-mechanisms as what they really are.For me personally, as mentioned, working with the AR implementation in Yii was definitely a lot more transparent and productive than NHibernate, which attempts to fully abstract and hide almost every aspect of persistence.
It may be that attempting to run and hide from any aspects of persistence is somewhat delusional.I am definitely still very interested in RavenDB on account of it's simplicity. I'm just trying to gauge whether that simplicity will scale well to a model and requirements as complex as the ones in the application I'm building.
No, actually, that isn't how you would model this.This is how you are _used_ to modeling this, because you are thinking about how NHibernate does this.You have to understand that this has been an explicit design choice with RavenDB. I have seen the problems that you get into when you try to go the magic route.Since you noted the issue with `is` and `GetType()`, and you probably know about the SELECT N+1 issues, you are probably familiar with those issues.Instead of trying to imagine a world where everything is in memory (the abstraction that NHibernate is trying to create), RavenDB follows the Aggregate model, where there are clear boundaries between different entities. That matches well to the way things actually work, because you can rely on being able to cheaply access anything inside the aggregate, and there is an explicit step that you have to take to access anything that isn't in the aggregate.Note that RavenDB contains a lot of features, like `Include()` and `Live Projections` that allows you to easily get the related data, but again, we do that as an explicit step because you _have_ to respect the boundary.> So now there's an aspect of persistence to this entity, which suggests to me that the intention is to write dedicated DTO's rather than "business"-entities, and persist those?Nope, it is just that you model you entities in a different way than you would using a relational database.> but my concern here is not really performance, but transparency.So was mine when designing this. But instead of pretending that "oh, it doesn't matter, let us deal with this in the OR/M layer", I decided that we need to be transparent about the actual implications of what you are doing. The end result is a much better application, because you don't have hidden snares waiting for you.> In an ideal world, I would just write completely persistence-ignorant models, optimizing for the problem-domain of the software itself, without regard for persistence, perhaps other than specifying which properties are persistent or transient.No, you won't. Because even if we assume that you entire model is in memory, that is _still_ a bad way to design things. You need to think about things like concurrency, you need to think about transaction boundaries, you need to think about how to actually _deal_ with things. What you are saying is valid if you had only one user, only one time. But it falls apart once you start to consider what is actually going on.
Thank you for taking the time to elaborate on this.I guess I don't see the practical reason why aspects like persistence and transaction-boundaries should affect the OOP design-patterns you choose?
For example, the following models Stores that are closed on particular days:public class Closing{public DateTime ClosedFrom { get; set; }public DateTime ClosedTo { get; set; }}public class Store{public IList<Closing> Closings { get; set; }}This would be harder to model with a relational database, where this would actually persist in two tables, say, "stores" and "closings". That's a lot of complexity just to store something that is really truly composite - and in a sense, it's "wrong", because a closing-date is unique to a Store, it's not an independent thing that has any meaning outside the context of the Store it belongs to. Yet, with an RDBMS, we're forced to store them as independent units.Much simpler with RavenDB, where this is stored as one unit, a document. "It just works." :-)Now let's say that stores are listed in a number of cities.public class Store{public IList<Closing> Closings { get; set; }public IList<City> Cities { get; set; }}Now a City does not belong to a Store, and a Store does not belong to a City - they're related of course, but neither has ownership of the other. They are independent units.Let's be clear about the fact that I didn't choose this design-pattern because I'm thinking about persistence - this is basic, traditional, persistence-ignorant OO.And now you want me to define the transaction-boundary by changing the model:public class Store{public IList<Closing> Closings { get; set; }public IList<string> Cities { get; set; }}
My problem with this approach is, you did a lot more than just defining a transaction-boundary for persistence, and it has far-reaching consequences.
Why can't you just declare the document-boundary instead? For example:public class Store{public IList<Closing> Closings { get; set; }[Documents]public IList<City> Cities { get; set; }}And then let the persistence layer do the work?
There are at least a few common patterns that cover probably 90-95% of common relations in any given model.Why can't we model those cases using declarations instead of code?
Why can't we model those cases using declarations instead of code?The way I see it, you want to start writing a lot of complex stuff that will most probably be prone to a lot of bugs just to solve what you consider to be a problem, which is gone in 3 lines of code. I think before we move on with this conversation, you need to explain WHY this actually bugs you, and why you REALLY need that?...
On Thursday, April 12, 2012 11:12:48 AM UTC-4, Itamar Syn-Hershko wrote:Why can't we model those cases using declarations instead of code?The way I see it, you want to start writing a lot of complex stuff that will most probably be prone to a lot of bugs just to solve what you consider to be a problem, which is gone in 3 lines of code. I think before we move on with this conversation, you need to explain WHY this actually bugs you, and why you REALLY need that?...3 lines of code here and 3 lines of code there. And every time, code that most likely has nothing to do with the task it's performing.That's mainly what bugs me - when I'm writing code that works with the model, I don't want to have to think about persistence, I want to focus on the task I'm trying to perform.
There's also a maintenance issue - suppose your model changes, and something that was previously a component is now an independent document, so you change the collection from IList<Category> to IList<string> and this breaks all of your existing business-procedures. Suppose you have 100s of business-procedures that require a list of categories.
In light of this last consideration alone, I'm tempted to consistently add methods for every collection to every model object, essentially duplicating all of my collections:class Store{public IList<string> Categories { get; set; }
public IEnumerable<Category> GetCategories(){// fetch and return Categories...}}
That's mainly what bugs me - when I'm writing code that works with the model, I don't want to have to think about persistence, I want to focus on the task I'm trying to perform.Tough luck, persistence IS part of the problem that you are trying to solve, and you have to take it into account
May I suggest, go and write an idiomatic RavenDB application, then come back and tell us what the experience was like.Right now, you don't have valid reasons, you have gut feeling based on experience in completely different technology and methodology.
On Thursday, April 12, 2012 12:44:20 PM UTC-4, Oren Eini wrote:That's mainly what bugs me - when I'm writing code that works with the model, I don't want to have to think about persistence, I want to focus on the task I'm trying to perform.Tough luck, persistence IS part of the problem that you are trying to solve, and you have to take it into accountof course, but I still feel like it should be handled in isolation and not mixed into business-procedures.
May I suggest, go and write an idiomatic RavenDB application, then come back and tell us what the experience was like.Right now, you don't have valid reasons, you have gut feeling based on experience in completely different technology and methodology.that's going to be hard sell, since I don't understand your methodology - probably the only way I would get to do that, is if I had the time to do it on my own dime. I can't really go to management or to my client and try to sell them on an idea I don't understand.if it works as well as you claim, hopefully it will become popular enough to warrant a book - not just on the software, but on the methodology.The examples and tutorials available at the moment are all very trivial and show individual features working well in isolation, but I don't feel like there's enough depth to provide a big picture.
My main concern is that this won't scale in terms of complexity. If you could show me a complex app that leverages the features and applies the methodology in practice, perhaps this would be more accessible.
RavenDB runs msnbc.com and pluralsight, among others.We have several big sample apps, RacconBlog is a god example.
We have several big sample apps, RacconBlog is a god example.