introductions. and: is RavenDB right for me?

1020 views
Skip to first unread message

mindplay

unread,
Apr 11, 2012, 3:26:12 PM4/11/12
to rav...@googlegroups.com
Hello List,

I've been looking at RavenDB for a while now, and recently decided to install and play around with it.

Just to provide a bit of background, I'm currently working on a very large business application, which we decided to build using NHibernate and Fluent. This has been about two years in the making at this point.

NH breaks standard C# language features such as GetType() and the "is" keyword, which gets really problematic when combined with the ASP.NET/MVC framework, which makes extensive use of reflection. 

About a year into the project, we hit so many walls we eventually gave up using the criteria API, which appears to be an incomplete abstraction. We also had to give up Fluent, which also appears to be incomplete. We dropped down to generating HQL queries - so much for strong typing. Feels like 1998 all over again. In some cases we even hit limitations with HQL, and had to drop down to raw SQL and views. And here I was thinking we had come further than that.

In the past years, I built several apps using PHP and the Yii framework - it has a nice, simple implementation of AR that doesn't make things overly complicated or get in the way of doing simple things the simple way. No, it's not a full, clean abstraction of SQL databases by any means - but it doesn't attempt to be either, which makes it thinner, easier to understand, and easier to work with.

Even after two years of intense day-to-day work with NH, it still seems to be full of surprises. In my honest opinion, I believe it's the Black Monolith of O/RM, and I can never truly know all of it's secrets ;-)

After years of looking with envy at various new graph and object DBMS, I have become increasingly convinced that even the best efforts of the smartest people (with all due respect to the authors of NHibernate) cannot make RDBMS really truly work well for complex object graphs. I like to say, and only half-jokingly, that the one thing a relational database can't handle, is relations. ;-)

Anyway, so much for introductions. As said, I've decided to take a look at RavenDB and see for myself first-hand what it's about.

I'm going to ask some questions, and please don't take these the wrong way - I'm not trying to criticize or provoke or thwart anybody's efforts, I'm just simply trying to understand the value proposal from my own point of view.

Taking one thing at a time, the first major surprise to me, was the fact that there is no real support for traversal of a model as such. And what I mean by that is...

public class Product
{
    public string Name { get; set; }
    public float Price { get; set; }
    public IList<string> Categories { get; set; }
}

This Product-class has a list of categories, probably strings like "categories/123", etc.

That's not how you would model an entity in a typical "business"-model - you'd have something like IList<Category> containing references to the actual Category objects, or simulating that presence of that collection using a proxy and lazy-loading pattern.

So now there's an aspect of persistence to this entity, which suggests to me that the intention is to write dedicated DTO's rather than "business"-entities, and persist those?

Or maybe the intent is actually for me to write "business"-entities and just accept the fact that persistence aspects are going to get mixed in? The NHibernate community (and the software itself) has gone to great lengths to teach me that this is "wrong" and "bad" for various reasons. And my hope/dream after seeing the early RavenDB videos was that it would "just work", as you keep saying - but it seems like you still need to quite carefully design with persistence in mind? And that there is no clean/direct way to separate these concerns?

I understand that you have advanced indexing and querying features, allowing you to prefetch related entities in advance and so forth - but my concern here is not really performance, but transparency.

In an ideal world, I would just write completely persistence-ignorant models, optimizing for the problem-domain of the software itself, without regard for persistence, perhaps other than specifying which properties are persistent or transient.

I realize this is not an ideal world, and perhaps the philosophy of RavenDB is to just accept that fact and deal with it?

But storing full keys like "categories/123" almost seems worse to me than just storing "123" in a category-id-column in a relational database - at least then you have things like cascading updates/deletes without having to deal with that housekeeping aspect of persistence. How is this better?

How are you using RavenDB, or what is the intended use? Do you write dedicated DTOs alongside your business-model, or do you just write business-entities, mix in your storage concerns, and live with that?

As said, please don't take these questions the wrong way - I'm not trying to provoke or attack your efforts, your work, your ideas or your ideals. I'm just trying to understand whether or how I can adopt your ideas/values and work productively that way.

Bottom line, my concern is scalability in terms of complexity - not in terms of performance. I don't build Twitter or FaceBook, and for the most part, RDBMS perform acceptably for the applications I need to build.

Thanks!

-- Rasmus Schultz <http://mindplay.dk>

Chris Marisic

unread,
Apr 11, 2012, 4:03:16 PM4/11/12
to rav...@googlegroups.com
Your questions here could literally lead to hours worth of discussion. The short answer, RavenDB is designed specifically to solve ALL of those problems you mentioned you face.

That being said, you have new problems to over come which is dealing with transaction boundaries and deciding where and when to denormalize data vs not denormalizing.

Generally with RavenDB you want to specifically avoid designs that require "cascade" type behavior, that implies an incorrect usage of a nosql db. These operations in some scenarios can't be avoided, and just need to be limited as much as possible especially since a cascade of that sort could theoretically require mutating every document in your database (or atleast every document in a collection)

mindplay

unread,
Apr 11, 2012, 5:02:54 PM4/11/12
to rav...@googlegroups.com
I realize this is no small topic, but what I'm fishing for here is the intended use of RavenDB specifically, moreso that a general debate about right and wrong - I guess I'm looking for the "happy path" that will lead to maximum joy and fewest possible headaches with RavenDB specifically.

It sounds like denormalization is one of the first facts of life with RavenDB that one needs to accept? I'm going to have a hard time with that - I generally have been taught to avoid denormalization, particularly for performance, and over the years have gotten good at avoiding this with SQL-server/MySQL without sacrificing performance.

It sounds like one is going to have to make other sacrifices with RavenDB in terms of denormalization, for other reasons, but unavoidably, and by design?

Itamar Syn-Hershko

unread,
Apr 11, 2012, 5:31:15 PM4/11/12
to rav...@googlegroups.com
No, basically you only denormalize when it makes sense, and that is usually in one of two scenarios - sharding under some circumstances and to persist a point in-time view of data. In those 2 scenarios it actually makes sense to denormalize, hence its not a sacrifice (for example, you WANT the product price to be denormalized into the order object so future price changes won't affect past orders).

We have multi-maps and includes to handle 99% of all other cases

Itamar Syn-Hershko

unread,
Apr 11, 2012, 5:33:30 PM4/11/12
to rav...@googlegroups.com
Re " I generally have been taught to avoid denormalization " - the best practices with Raven and any other non-relational DB are COMPLETELY different than those of RDBMSes. Specifically, we don't try to persist normal forms, hence normalization is not a cardinal sin.

Itamar Syn-Hershko

unread,
Apr 11, 2012, 5:34:48 PM4/11/12
to rav...@googlegroups.com
Let me rephrase:  hence denormalization is not a cardinal sin

mindplay

unread,
Apr 11, 2012, 6:23:46 PM4/11/12
to rav...@googlegroups.com
I'm not sure that technically is denormalization?

You may be storing the same piece of data, but you're actually storing two different pieces of information. For example, "the customer's current address" is not the same information as "the customer's address at the time he placed the order" - even if the data you're persisting is identical in those two cases, the information it conveys is different when you're preserving historical information.

My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.

Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?

I saw a custom key examples somewhere, where the user's e-mail address was being used as the primary key, and references to that user would be stored as "users/j...@doe.com", which seems clever at a glance - but what happens if the user changes their e-mail address? seems like an extremely bad idea, I don't know why such an example would even be cited...


On Wednesday, April 11, 2012 5:31:15 PM UTC-4, Itamar Syn-Hershko wrote:
No, basically you only denormalize when it makes sense, and that is usually in one of two scenarios - sharding under some circumstances and to persist a point in-time view of data. In those 2 scenarios it actually makes sense to denormalize, hence its not a sacrifice (for example, you WANT the product price to be denormalized into the order object so future price changes won't affect past orders).

We have multi-maps and includes to handle 99% of all other cases

Itamar Syn-Hershko

unread,
Apr 11, 2012, 6:46:02 PM4/11/12
to rav...@googlegroups.com
inline

On Thu, Apr 12, 2012 at 1:23 AM, mindplay <ras...@mindplay.dk> wrote:
I'm not sure that technically is denormalization?

So what do you call a "denormalization"?
 
You may be storing the same piece of data, but you're actually storing two different pieces of information. For example, "the customer's current address" is not the same information as "the customer's address at the time he placed the order" - even if the data you're persisting is identical in those two cases, the information it conveys is different when you're preserving historical information.

My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.

Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?

We don't have the notion of foreign keys - RavenDB is NOT relational. "123" is not a document in a table of "categories"; "categories/123" is a document, and internally we group similar documents by their Entity-Name under a logical unit called a Collection.


Coming from a relational background, it's important to remember 2 things: the complete object graph is persisted, and we don't care about repeating ourselves where it makes sense to do so ("denormalization"). The question we ask when modeling is "what is an object" and "where does it make sense to repeat ourselves". Both are answered by Domain-Driven-Design concepts like an aggregate root (using the transactional boundaries to define discrete objects) and by expected usage patterns.
 

Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?

No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.
 

I saw a custom key examples somewhere, where the user's e-mail address was being used as the primary key, and references to that user would be stored as "users/j...@doe.com", which seems clever at a glance - but what happens if the user changes their e-mail address? seems like an extremely bad idea, I don't know why such an example would even be cited...

You should take that into account when designing your system. In the RavenDB website we did the same, since we assume the email address won't change, or at least we don't care if it will. Doing this makes it very easy to enforce a unique constraint over the email address. If you have a website where you want to support changes to email addresses, you'd probably want to go in another route.

As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.

Itamar.

Beyers

unread,
Apr 11, 2012, 7:06:03 PM4/11/12
to rav...@googlegroups.com
Rob Ashton has a post I found helpful that explains the whole transaction boundary concept and how it translates to good document design:  http://codeofrob.com/entries/ravendb---document-design-with-collections.html 
In some cases it is perfectly fine to store child documents as part of the parent document, in other cases storing references is better. The above post helped me a lot to clarify which design to use when, maybe it will help you as well.

Justin A

unread,
Apr 12, 2012, 2:59:22 AM4/12/12
to rav...@googlegroups.com
Hi Mindplay - welcome to the secret ninja DoJo.

i'm one of the more challenged individuals around here - and i'm slowly getting the hang of it. It == NoSql and RavenDb's implimentation of it.

The hardest thing I struggled with initially was rewiring my brain to -stop- thinking like an RDBMS / SQL language and now thinking more about documents / my domain model.

What i've found is this : Using RavenDb i don't really have to worry about the database any more. Meaning, here's my solution/product .. now how do I need to stick this crap into a Sql Server? PITA.

Once i broke free of that personal constraint (ie. modelling everything for a fricking database instead of modelling for my solution/domain) .. things started working better for me.

And then all these scenario's (like why is an -identity- a string? OMG! etc..) just became: question => reason and solution.

And development life is much easier and quicker :)

Quote Itamar: As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.

Drink the cool-aid... u won't regret it :)

Oren Eini (Ayende Rahien)

unread,
Apr 12, 2012, 4:23:50 AM4/12/12
to rav...@googlegroups.com
> That's not how you would model an entity in a typical "business"-model - you'd have something like IList<Category> containing references to the actual Category objects, or simulating that presence of that collection using a proxy and lazy-loading pattern.


No, actually, that isn't how you would model this.
This is how you are _used_ to modeling this, because you are thinking about how NHibernate does this.
You have to understand that this has been an explicit design choice with RavenDB. I have seen the problems that you get into when you try to go the magic route.
Since you noted the issue with `is` and `GetType()`, and you probably know about the SELECT N+1 issues, you are probably familiar with those issues.

Instead of trying to imagine a world where everything is in memory (the abstraction that NHibernate is trying to create), RavenDB follows the Aggregate model, where there are clear boundaries between different entities. That matches well to the way things actually work, because you can rely on being able to cheaply access anything inside the aggregate, and there is an explicit step that you have to take to access anything that isn't in the aggregate.

Note that RavenDB contains a lot of features, like `Include()` and `Live Projections` that allows you to easily get the related data, but again, we do that as an explicit step because you _have_ to respect the boundary.

> So now there's an aspect of persistence to this entity, which suggests to me that the intention is to write dedicated DTO's rather than "business"-entities, and persist those?

Nope, it is just that you model you entities in a different way than you would using a relational database.

> but my concern here is not really performance, but transparency.

So was mine when designing this. But instead of pretending that "oh, it doesn't matter, let us deal with this in the OR/M layer", I decided that we need to be transparent about the actual implications of what you are doing. The end result is a much better application, because you don't have hidden snares waiting for you.

> In an ideal world, I would just write completely persistence-ignorant models, optimizing for the problem-domain of the software itself, without regard for persistence, perhaps other than specifying which properties are persistent or transient.

No, you won't. Because even if we assume that you entire model is in memory, that is _still_ a bad way to design things. You need to think about things like concurrency, you need to think about transaction boundaries, you need to think about how to actually _deal_ with things. What you are saying is valid if you had only one user, only one time. But it falls apart once you start to consider what is actually going on.

Oren Eini (Ayende Rahien)

unread,
Apr 12, 2012, 4:27:41 AM4/12/12
to rav...@googlegroups.com
inline

On Thu, Apr 12, 2012 at 1:23 AM, mindplay <ras...@mindplay.dk> wrote:

My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.


You are breaking apart the key in your head. That isn't how it works in RavenDB.
This is a _single value_. It is structured this way for one major reason. Readability. Because having this in this fashion make is easier to work with your code.
 
Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?

That will work, yes. It isn't recommended. Same readability argument.
 

I saw a custom key examples somewhere, where the user's e-mail address was being used as the primary key, and references to that user would be stored as "users/j...@doe.com", which seems clever at a glance - but what happens if the user changes their e-mail address? seems like an extremely bad idea, I don't know why such an example would even be cited...

You don't do that, then, if that is an option.
The document key cannot be changed, that is a cardinal rule in RavenDB. You cannot "rename" a document.
If you have the option of changing emails,you don't use the email as the document id.

Chris Marisic

unread,
Apr 12, 2012, 9:03:35 AM4/12/12
to rav...@googlegroups.com


On Thursday, April 12, 2012 4:27:41 AM UTC-4, Oren Eini wrote:
inline

On Thu, Apr 12, 2012 at 1:23 AM, mindplay <ras...@mindplay.dk> wrote:

My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.


You are breaking apart the key in your head. That isn't how it works in RavenDB.
This is a _single value_. It is structured this way for one major reason. Readability. Because having this in this fashion make is easier to work with your code.
 

We use RavenDB with almost no HiLo keys, we've been able to create natural keys for our documents and then for related documents for a customer/# are customer/#/someresource/someidentifer. This results in natural restful URL structures you can carry over to your app. This also allows for doing full text search using multimaps across ID in additon to content fields and boost the value of ID which then also gives you very relevant searches even across multiple collections.


I saw a custom key examples somewhere, where the user's e-mail address was being used as the primary key, and references to that user would be stored as "users/j...@doe.com", which seems clever at a glance - but what happens if the user changes their e-mail address? seems like an extremely bad idea, I don't know why such an example would even be cited...

You don't do that, then, if that is an option.
The document key cannot be changed, that is a cardinal rule in RavenDB. You cannot "rename" a document.
If you have the option of changing emails,you don't use the email as the document id.

[Well I guess we break that in 1 scenario where we actually let staff change a natural identifier in the rare cases it needs to change. It was fairly easy to do that using straight ravendb CommandData and the  DatabaseCommands.StartsWith and the benefits of having the natural keys in our scenario is definitely worth it. Related documents are only small groupings]

mindplay

unread,
Apr 12, 2012, 9:43:01 AM4/12/12
to rav...@googlegroups.com
On Wednesday, April 11, 2012 6:46:02 PM UTC-4, Itamar Syn-Hershko wrote:

So what do you call a "denormalization"?

the introduction of any "accidental" state - when you're storing copies of data to increase performance, for example. Ideally, you should never have to do that.

the only time you should have to make copies of data, is when copying the data is part of an operation that isn't "accidental" - that is, it satisfies a real requirement, not just something you have to do because the underlying systems suffer from technical limitations that make them unable to handle normalized models with sufficient performance.
 
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.

Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?

the word "categories" is redundant - storing the ID as a string, for that matter, is redundant, if you know every key is going to be a number.
 
We don't have the notion of foreign keys - RavenDB is NOT relational. "123" is not a document in a table of "categories"; "categories/123" is a document, and internally we group similar documents by their Entity-Name under a logical unit called a Collection.

But the model you're storing contains relations - so the data you're storing is relational in nature, and "categories/123" is a foreign key to a specific document. 

I understand that RavenDB itself is not "relational" in the traditional sense, but clearly a lot of work went into providing means of dealing with relational data. I can think of very, very few applications that would not need extensive relation management. Certainly every example I've seen so far makes use of relations and foreign keys, or "document ids", if that's what you prefer to call them.

If you're going to store a list of category-document ids in a property, storing strings like "categories/123" is denormalization in some sense - you could just as well store the integer "123", since you know this list is going to contain strictly "category" document ids.
 
Coming from a relational background, it's important to remember 2 things: the complete object graph is persisted, and we don't care about repeating ourselves where it makes sense to do so ("denormalization"). The question we ask when modeling is "what is an object" and "where does it make sense to repeat ourselves". Both are answered by Domain-Driven-Design concepts like an aggregate root (using the transactional boundaries to define discrete objects) and by expected usage patterns.

Well, yes, a complete object-graph is persisted in one shot, but the complete model-graph is not automatically persisted - and you have to design around this when designing your model.

Bear with me:

Where I would normally have IList<Category> you have IList<string> instead - so now the model itself is not directly traversable.

In a sense, this is an indirect means of defining the boundaries of self-contained documents within your model, as a means of explaining to the data-mapper (Raven) which relations cross boundaries between documents, to prevent it from traversing outside the scope of your document-object-graph.

This adds complexity, because you have to go back to the database and fetch another piece of the model-graph when needed - and this complexity is accidental, because this is not a true expression of your model-graph.

This is what I was getting at early on when we connected on Twitter
 
 Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?

No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.

I understand that you can't perform a query without providing the full document ID - but if you know you're storing only categories in a specific collection, that ID is easily reconstructed by adding "categories/" in front of the number itself, so there isn't technically any reason why you would need to store the whole string, other than convention, is that correct?
 
As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.

It still seems to that a lot of your considerations are not business-related, but technical.

And it may be that this actually works better in practice - that it's more realistic and honest to deal with storage-mechanisms as what they really are.

For me personally, as mentioned, working with the AR implementation in Yii was definitely a lot more transparent and productive than NHibernate, which attempts to fully abstract and hide almost every aspect of persistence.

It may be that attempting to run and hide from any aspects of persistence is somewhat delusional.

I am definitely still very interested in RavenDB on account of it's simplicity. I'm just trying to gauge whether that simplicity will scale well to a model and requirements as complex as the ones in the application I'm building.

I appreciate your willingness to discuss this! a lot!

Thanks :-)

Oren Eini (Ayende Rahien)

unread,
Apr 12, 2012, 10:00:28 AM4/12/12
to rav...@googlegroups.com
inline

On Thu, Apr 12, 2012 at 4:43 PM, mindplay <ras...@mindplay.dk> wrote:
On Wednesday, April 11, 2012 6:46:02 PM UTC-4, Itamar Syn-Hershko wrote:

So what do you call a "denormalization"?

the introduction of any "accidental" state - when you're storing copies of data to increase performance, for example. Ideally, you should never have to do that.

the only time you should have to make copies of data, is when copying the data is part of an operation that isn't "accidental" - that is, it satisfies a real requirement, not just something you have to do because the underlying systems suffer from technical limitations that make them unable to handle normalized models with sufficient performance.
 

Then don't do it when you don't have to. There is nothing that requires it.
 
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.

Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?

the word "categories" is redundant - storing the ID as a string, for that matter, is redundant, if you know every key is going to be a number.
 

You _can_ make it just a number, you are aware, right?
And the actual representation is meaningless, the _only_ reason the key is there is to make it human readable.
 
We don't have the notion of foreign keys - RavenDB is NOT relational. "123" is not a document in a table of "categories"; "categories/123" is a document, and internally we group similar documents by their Entity-Name under a logical unit called a Collection.

But the model you're storing contains relations - so the data you're storing is relational in nature, and "categories/123" is a foreign key to a specific document. 


No, it isn't a relation. It is a property holding the value of another document key, that is quite different. It isn't relation, there aren't FK, etc.
  

If you're going to store a list of category-document ids in a property, storing strings like "categories/123" is denormalization in some sense - you could just as well store the integer "123", since you know this list is going to contain strictly "category" document ids.

You can do that if you want, it is easier if you store everything.
 
 
Coming from a relational background, it's important to remember 2 things: the complete object graph is persisted, and we don't care about repeating ourselves where it makes sense to do so ("denormalization"). The question we ask when modeling is "what is an object" and "where does it make sense to repeat ourselves". Both are answered by Domain-Driven-Design concepts like an aggregate root (using the transactional boundaries to define discrete objects) and by expected usage patterns.

Well, yes, a complete object-graph is persisted in one shot, but the complete model-graph is not automatically persisted - and you have to design around this when designing your model.

Bear with me:

Where I would normally have IList<Category> you have IList<string> instead - so now the model itself is not directly traversable.

Yes, intentionally so. 
Let me try something else, try imagining User <<-- -- >> Group scenario. If you try to do strong references this way, you end up with having to access all the users if you load just one. And all their groups as well.
We put explicit boundaries for a reason.

If you want that, feel free to check OODB, instead of a document db. They have much of the same behavior that you seem to want, but RavenDB is not an OODB. And it behave differently.
 

In a sense, this is an indirect means of defining the boundaries of self-contained documents within your model, as a means of explaining to the data-mapper (Raven) which relations cross boundaries between documents, to prevent it from traversing outside the scope of your document-object-graph.

You confuse the data mapper with the actual physical structure of the documents. 
You aren't trying to map data, you are defining the actual physical structure of the documents. 
 

This adds complexity, because you have to go back to the database and fetch another piece of the model-graph when needed - and this complexity is accidental, because this is not a true expression of your model-graph.


I would disagree that this is accidental. It is quite intentional. 

 
This is what I was getting at early on when we connected on Twitter
 
 Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?

No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.

I understand that you can't perform a query without providing the full document ID - but if you know you're storing only categories in a specific collection, that ID is easily reconstructed by adding "categories/" in front of the number itself, so there isn't technically any reason why you would need to store the whole string, other than convention, is that correct?
 
As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.

It still seems to that a lot of your considerations are not business-related, but technical.

Try building an actual application using this, and I think that you'll come up with a different conclusion. 
 

And it may be that this actually works better in practice - that it's more realistic and honest to deal with storage-mechanisms as what they really are.

For me personally, as mentioned, working with the AR implementation in Yii was definitely a lot more transparent and productive than NHibernate, which attempts to fully abstract and hide almost every aspect of persistence.


You are actually feeling this way mostly because you have been writing SQL apps for a very long time. 

Chris Marisic

unread,
Apr 12, 2012, 10:12:12 AM4/12/12
to rav...@googlegroups.com


On Thursday, April 12, 2012 9:43:01 AM UTC-4, mindplay wrote:
On Wednesday, April 11, 2012 6:46:02 PM UTC-4, Itamar Syn-Hershko wrote:
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.

Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?

the word "categories" is redundant - storing the ID as a string, for that matter, is redundant, if you know every key is going to be a number.

No it is not redundant. The database has no notion WHATSOEVER of "123" in isolation.
 
If you're going to store a list of category-document ids in a property, storing strings like "categories/123" is denormalization in some sense - you could just as well store the integer "123", since you know this list is going to contain strictly "category" document ids.

This is inaccurate. You can say these are ""category" document ids.", but what will tell the server? Nothing can. That's why these full IDs matter.
 


Well, yes, a complete object-graph is persisted in one shot, but the complete model-graph is not automatically persisted - and you have to design around this when designing your model.

Bear with me:

Where I would normally have IList<Category> you have IList<string> instead - so now the model itself is not directly traversable.

In a sense, this is an indirect means of defining the boundaries of self-contained documents within your model, as a means of explaining to the data-mapper (Raven) which relations cross boundaries between documents, to prevent it from traversing outside the scope of your document-object-graph.

This is inaccurate. IList<Category> vs IList<string> are 2 incredible different things.  If you use IList<Category> that is saying you want to store all of this data as part of the same document. IList<string>  is no different, except here you're specifically refereing to an IList of CategoryIds 


This adds complexity, because you have to go back to the database and fetch another piece of the model-graph when needed - and this complexity is accidental, because this is not a true expression of your model-graph.

Dealing with relationships adds complexity. RavenDB is not adding complexity here, this is inherent complexity that most ORMs just let people forget about things that actually do matter. This is not accidental, RavenDB was specifically built to make 1-to-many relationship traversal be explicit and not implicit as ORMs do it and automatically eliminate the most common problem in data access when it's using ORMs, N+1. RavenDB specifically provides methods that allow you to get all of the data with a single request to the server when this is truly needed.

In our applications have almost no usages of List<string> foreignKeys.
 

This is what I was getting at early on when we connected on Twitter
 
 Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?

No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.

I understand that you can't perform a query without providing the full document ID - but if you know you're storing only categories in a specific collection, that ID is easily reconstructed by adding "categories/" in front of the number itself, so there isn't technically any reason why you would need to store the whole string, other than convention, is that correct?

No, this is also incorrect. Collections do not exist, they are synthetic. When you talk to RavenDB in any fashion you are not querying collections, you are querying indexes, or using IDs to talk directly to the store. There is no way to "easily reconstructed by adding "categories/" in front of the number itself," This could only be done in limited fashions on the client API. This is why ravendb supports having a Category { Id = "categories/123" } document, you can do Session.Load<Category>((int)123)  (cast for extra clarity, not needed). HERE, RavenDB can guess "categories/123".
 
 
As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.

It still seems to that a lot of your considerations are not business-related, but technical

I don't agree with this. You're experiencing tunnel vision, that your expectations of  how you would solve problems when targeting a RDBMS are not being thought of in isolation to "how would i solve this problem". I have yet to ever specifically change my domain design to accommodate raven, there's been many times I've had to drastically alter domain design because of sql server.

Itamar Syn-Hershko

unread,
Apr 12, 2012, 10:15:07 AM4/12/12
to rav...@googlegroups.com
inline

On Thu, Apr 12, 2012 at 4:43 PM, mindplay <ras...@mindplay.dk> wrote:
On Wednesday, April 11, 2012 6:46:02 PM UTC-4, Itamar Syn-Hershko wrote:

So what do you call a "denormalization"?

the introduction of any "accidental" state - when you're storing copies of data to increase performance, for example. Ideally, you should never have to do that.

the only time you should have to make copies of data, is when copying the data is part of an operation that isn't "accidental" - that is, it satisfies a real requirement, not just something you have to do because the underlying systems suffer from technical limitations that make them unable to handle normalized models with sufficient performance.

Great, then by your definition of "denormalization", you should NEVER denormalize with RavenDB. Correctly using Includes, MultiMaps and the TransformResults function, you will be able to both respect transaction boundaries AND have all the data you need in a VERY efficient manner, without any cost similar to an RDBMS JOIN...
 
 
My concern is that storing a string like "categories/123" seems to be a general practice - and as mentioned, in some sense, this is worse than just storing the foreign key "123", as you would typically do with an RDBMS, since you're now storing two pieces of redundant information.

Why is that? "categoeries/123" is a document ID. With RavenDB, document IDs are strings. And I'm not sure what are the 2 pieces of redundant information?

the word "categories" is redundant - storing the ID as a string, for that matter, is redundant, if you know every key is going to be a number.

It is not, it is part of the ID, by convention. If you had stored a category with an ID "123" and then a product under the same ID "123", the product will overwrite the category. The reason for this is IDs are unique per-database, hence the convention which also makes it very readable.

And you don't always know every key is going to be a number. Consider IDs like "tags/ravendb" in a blog for a Tag entity for example, or the users example you brought up. With RavenDB IDs are always strings, the client API makes it transparent for you in some occasions, handling HiLo and other stuff f
 
 
We don't have the notion of foreign keys - RavenDB is NOT relational. "123" is not a document in a table of "categories"; "categories/123" is a document, and internally we group similar documents by their Entity-Name under a logical unit called a Collection.

But the model you're storing contains relations - so the data you're storing is relational in nature, and "categories/123" is a foreign key to a specific document. 

I understand that RavenDB itself is not "relational" in the traditional sense, but clearly a lot of work went into providing means of dealing with relational data. I can think of very, very few applications that would not need extensive relation management. Certainly every example I've seen so far makes use of relations and foreign keys, or "document ids", if that's what you prefer to call them.

A foreign-key is a term coming from the relational world, meaning there is an index build on it. That is not the case with RavenDB, and there's a big difference. You can store a reference to another document and load it efficiently using Includes, or "join" them while indexing using multi-maps, but thats a completely different thing.

Since we use DDD concepts, and relations between aggregate roots are allowed - and that is also the case in real-world scenarios - we ought to support that, hence all the work. This is not relational-data per se, think of it more as relations between aggregate roots, which are - as mentioned before - VERY different than what your objects look like in an RDBMS ER diagram.
 

If you're going to store a list of category-document ids in a property, storing strings like "categories/123" is denormalization in some sense - you could just as well store the integer "123", since you know this list is going to contain strictly "category" document ids.

Yes, you could do that, but I wouldn't call it denormalization in that case. For full readability of my object graph I'd probably still go with categories/123. Don't fear the "added cost" of reading a few more bytes, its completely negligible.
 
 
Coming from a relational background, it's important to remember 2 things: the complete object graph is persisted, and we don't care about repeating ourselves where it makes sense to do so ("denormalization"). The question we ask when modeling is "what is an object" and "where does it make sense to repeat ourselves". Both are answered by Domain-Driven-Design concepts like an aggregate root (using the transactional boundaries to define discrete objects) and by expected usage patterns.

Well, yes, a complete object-graph is persisted in one shot, but the complete model-graph is not automatically persisted - and you have to design around this when designing your model.

Bear with me:

Where I would normally have IList<Category> you have IList<string> instead - so now the model itself is not directly traversable.

In a sense, this is an indirect means of defining the boundaries of self-contained documents within your model, as a means of explaining to the data-mapper (Raven) which relations cross boundaries between documents, to prevent it from traversing outside the scope of your document-object-graph.

This adds complexity, because you have to go back to the database and fetch another piece of the model-graph when needed - and this complexity is accidental, because this is not a true expression of your model-graph.

This is what I was getting at early on when we connected on Twitter

Again, a question of modeling and use cases

You will have a List<Category> if the Category object you persist has no meaning outside the scope of your object. Like List<OrderLine> within an Order object. Since your category is probably going to be referenced from somewhere else, and contain some more data unique to it, it should be stored on its own and given its own document ID, which you later reference.

It is going to be traversible quite easily by using Includes. And note using includes will not generate extra network traffic. Or you can use multi-maps to and transform results to search on partial data and project custom objects, even view models, directly from the index.
 
 
 Although I suppose there's no reason you'd be forced to store keys in such a long form? You could use an IList<int> for category-ids, for example, if you wanted to, and still make use of multi-maps and includes, as far as I can tell, correct?

No, you need the full ID there. As I said, the ID is a string that by convention holds the collection name as well.

I understand that you can't perform a query without providing the full document ID - but if you know you're storing only categories in a specific collection, that ID is easily reconstructed by adding "categories/" in front of the number itself, so there isn't technically any reason why you would need to store the whole string, other than convention, is that correct?

See my comment above.
 
 
As you can see, using RavenDB is all about considering your business model closely. It may be confusing at first.

It still seems to that a lot of your considerations are not business-related, but technical.

And it may be that this actually works better in practice - that it's more realistic and honest to deal with storage-mechanisms as what they really are.

For me personally, as mentioned, working with the AR implementation in Yii was definitely a lot more transparent and productive than NHibernate, which attempts to fully abstract and hide almost every aspect of persistence.

Exactly what RavenDB doesn't do. It doesn't abstract anything, it just works with your model. And we are convinced thats a better route to go in. That being said, considerations are not all too technical, they involve a LOT of business logic and expected use cases. Try modeling something with the DDD approaches and you'll see.
 

It may be that attempting to run and hide from any aspects of persistence is somewhat delusional.

I am definitely still very interested in RavenDB on account of it's simplicity. I'm just trying to gauge whether that simplicity will scale well to a model and requirements as complex as the ones in the application I'm building.

Much more than any RDBMS will, that I can assure you. Try it, we are here to assist.

Itamar Syn-Hershko

unread,
Apr 12, 2012, 10:15:26 AM4/12/12
to rav...@googlegroups.com
Oren and I in a race condition. Oren wins.

Oren Eini (Ayende Rahien)

unread,
Apr 12, 2012, 10:19:34 AM4/12/12
to rav...@googlegroups.com
Isn't that last write wins?

Itamar Syn-Hershko

unread,
Apr 12, 2012, 10:22:16 AM4/12/12
to rav...@googlegroups.com
LOL

mindplay

unread,
Apr 12, 2012, 10:39:27 AM4/12/12
to rav...@googlegroups.com
Thank you for taking the time to elaborate on this.

I guess I don't see the practical reason why aspects like persistence and transaction-boundaries should affect the OOP design-patterns you choose?

For example, the following models Stores that are closed on particular days:

public class Closing
{
    public DateTime ClosedFrom { get; set; }
    public DateTime ClosedTo { get; set; } 
}

public class Store
{
    public IList<Closing> Closings { get; set; }
}

This would be harder to model with a relational database, where this would actually persist in two tables, say, "stores" and "closings". That's a lot of complexity just to store something that is really truly composite - and in a sense, it's "wrong", because a closing-date is unique to a Store, it's not an independent thing that has any meaning outside the context of the Store it belongs to. Yet, with an RDBMS, we're forced to store them as independent units.

Much simpler with RavenDB, where this is stored as one unit, a document. "It just works." :-)

Now let's say that stores are listed in a number of cities.

public class Store
{
    public IList<Closing> Closings { get; set; }
    public IList<City> Cities { get; set; }
}

Now a City does not belong to a Store, and a Store does not belong to a City - they're related of course, but neither has ownership of the other. They are independent units.

Let's be clear about the fact that I didn't choose this design-pattern because I'm thinking about persistence - this is basic, traditional, persistence-ignorant OO.

And now you want me to define the transaction-boundary by changing the model:

public class Store
{
    public IList<Closing> Closings { get; set; }
    public IList<string> Cities { get; set; }
}

My problem with this approach is, you did a lot more than just defining a transaction-boundary for persistence, and it has far-reaching consequences.

Why can't you just declare the document-boundary instead? For example:

public class Store
{
    public IList<Closing> Closings { get; set; }
    
    [Documents]
    public IList<City> Cities { get; set; }
}

And then let the persistence layer do the work?

There are at least a few common patterns that cover probably 90-95% of common relations in any given model.

Why can't we model those cases using declarations instead of code?

Since we're no longer dealing with all the cases like the list of store closing-dates above, that should eliminate a lot of the more complex scenarios that were so difficult to handle in NH - I bet you would need only a few declarations to make it all the way to full persistence-abstraction with RavenDB...

Or maybe a lot, maybe I'm delusional ;-)


On Thursday, April 12, 2012 4:23:50 AM UTC-4, Oren Eini wrote:

No, actually, that isn't how you would model this.
This is how you are _used_ to modeling this, because you are thinking about how NHibernate does this.
You have to understand that this has been an explicit design choice with RavenDB. I have seen the problems that you get into when you try to go the magic route.
Since you noted the issue with `is` and `GetType()`, and you probably know about the SELECT N+1 issues, you are probably familiar with those issues.

Instead of trying to imagine a world where everything is in memory (the abstraction that NHibernate is trying to create), RavenDB follows the Aggregate model, where there are clear boundaries between different entities. That matches well to the way things actually work, because you can rely on being able to cheaply access anything inside the aggregate, and there is an explicit step that you have to take to access anything that isn't in the aggregate.

Note that RavenDB contains a lot of features, like `Include()` and `Live Projections` that allows you to easily get the related data, but again, we do that as an explicit step because you _have_ to respect the boundary.

> So now there's an aspect of persistence to this entity, which suggests to me that the intention is to write dedicated DTO's rather than "business"-entities, and persist those?

Nope, it is just that you model you entities in a different way than you would using a relational database.

> but my concern here is not really performance, but transparency.

So was mine when designing this. But instead of pretending that "oh, it doesn't matter, let us deal with this in the OR/M layer", I decided that we need to be transparent about the actual implications of what you are doing. The end result is a much better application, because you don't have hidden snares waiting for you.

> In an ideal world, I would just write completely persistence-ignorant models, optimizing for the problem-domain of the software itself, without regard for persistence, perhaps other than specifying which properties are persistent or transient.

No, you won't. Because even if we assume that you entire model is in memory, that is _still_ a bad way to design things. You need to think about things like concurrency, you need to think about transaction boundaries, you need to think about how to actually _deal_ with things. What you are saying is valid if you had only one user, only one time. But it falls apart once you start to consider what is actually going on.


Chris Marisic

unread,
Apr 12, 2012, 11:11:27 AM4/12/12
to rav...@googlegroups.com
You think you want those features, but they really wouldn't be beneficial. They would just allow developers to make badly designed non-relational databases behave exactly the same as relation-databases and manifest all of the common problems RDBMS create with even more problems because you're trying to make a non-relational system behave relationally.

Itamar Syn-Hershko

unread,
Apr 12, 2012, 11:12:48 AM4/12/12
to rav...@googlegroups.com
inline

On Thu, Apr 12, 2012 at 5:39 PM, mindplay <ras...@mindplay.dk> wrote:
Thank you for taking the time to elaborate on this.

I guess I don't see the practical reason why aspects like persistence and transaction-boundaries should affect the OOP design-patterns you choose?

They don't. Transactional boundaries are completely business related, and come from DDD, which is OOP-next-gen if you wish. And persistence is never being considered as a factor. As I said - as far as we are concerned, you just drop your objects into Raven.
 

For example, the following models Stores that are closed on particular days:

public class Closing
{
    public DateTime ClosedFrom { get; set; }
    public DateTime ClosedTo { get; set; } 
}

public class Store
{
    public IList<Closing> Closings { get; set; }
}

This would be harder to model with a relational database, where this would actually persist in two tables, say, "stores" and "closings". That's a lot of complexity just to store something that is really truly composite - and in a sense, it's "wrong", because a closing-date is unique to a Store, it's not an independent thing that has any meaning outside the context of the Store it belongs to. Yet, with an RDBMS, we're forced to store them as independent units.

Much simpler with RavenDB, where this is stored as one unit, a document. "It just works." :-)

Now let's say that stores are listed in a number of cities.

public class Store
{
    public IList<Closing> Closings { get; set; }
    public IList<City> Cities { get; set; }
}

Now a City does not belong to a Store, and a Store does not belong to a City - they're related of course, but neither has ownership of the other. They are independent units.

Let's be clear about the fact that I didn't choose this design-pattern because I'm thinking about persistence - this is basic, traditional, persistence-ignorant OO.

And now you want me to define the transaction-boundary by changing the model:

public class Store
{
    public IList<Closing> Closings { get; set; }
    public IList<string> Cities { get; set; }
}

My problem with this approach is, you did a lot more than just defining a transaction-boundary for persistence, and it has far-reaching consequences.

True, since here you've taken a decision to have a City in its own object (probably rightly so). When in memory you can keep references to the same object, but we can't do that with json, hence the List<string>.

With about 3 lines of code (Include + a foreach loop) you can overcome this, so I don't really see the problem here.

But I see a different problem here - how can a store be in several places at once? perhaps you're looking for something along those lines instead (StoreCompany and ActualStore)? http://ayende.com/blog/84993/document-based-modeling-auctions
 
Why can't you just declare the document-boundary instead? For example:

public class Store
{
    public IList<Closing> Closings { get; set; }
    
    [Documents]
    public IList<City> Cities { get; set; }
}

And then let the persistence layer do the work?

How would that work?
 

There are at least a few common patterns that cover probably 90-95% of common relations in any given model.

Why can't we model those cases using declarations instead of code?

The way I see it, you want to start writing a lot of complex stuff that will most probably be prone to a lot of bugs just to solve what you consider to be a problem, which is gone in 3 lines of code. I think before we move on with this conversation, you need to explain WHY this actually bugs you, and why you REALLY need that?...

mindplay

unread,
Apr 12, 2012, 12:00:17 PM4/12/12
to rav...@googlegroups.com
On Thursday, April 12, 2012 11:12:48 AM UTC-4, Itamar Syn-Hershko wrote:
Why can't we model those cases using declarations instead of code?

The way I see it, you want to start writing a lot of complex stuff that will most probably be prone to a lot of bugs just to solve what you consider to be a problem, which is gone in 3 lines of code. I think before we move on with this conversation, you need to explain WHY this actually bugs you, and why you REALLY need that?...

3 lines of code here and 3 lines of code there. And every time, code that most likely has nothing to do with the task it's performing.

That's mainly what bugs me - when I'm writing code that works with the model, I don't want to have to think about persistence, I want to focus on the task I'm trying to perform.

I don't want somebody else reading the code getting distracted by it either - having 3 lines of persistence-related code in the middle of a business-procedure can be confusing or misleading.

There's also a maintenance issue - suppose your model changes, and something that was previously a component is now an independent document, so you change the collection from IList<Category> to IList<string> and this breaks all of your existing business-procedures. Suppose you have 100s of business-procedures that require a list of categories.

In light of this last consideration alone, I'm tempted to consistently add methods for every collection to every model object, essentially duplicating all of my collections:

class Store
{
    public IList<string> Categories { get; set; }

    public IEnumerable<Category> GetCategories()
    {
        // fetch and return Categories...
    }
}

I don't know, maybe this is truer to the actual data-model, but it seems like a lot of boiler-plate. Maybe in some sense it's actually better than simply IList<Category> though, since this enables you to choose in each case: do you need just the category document IDs, or do you want to fetch the actual objects? Not something that is easily possible using the other approach.

Thoughts?

Chris Marisic

unread,
Apr 12, 2012, 12:09:10 PM4/12/12
to rav...@googlegroups.com
I would never build a model that behaved as such. I do not ever want my persistence mechanism coupled to my business objects.  Even with ORMs it was tough enough for me to accept the virtual keyword everywhere for dynamic proxy implementations, I certainly wouldn't tolerate this in my model.

Oren Eini (Ayende Rahien)

unread,
Apr 12, 2012, 12:44:20 PM4/12/12
to rav...@googlegroups.com
inline

On Thu, Apr 12, 2012 at 7:00 PM, mindplay <ras...@mindplay.dk> wrote:
On Thursday, April 12, 2012 11:12:48 AM UTC-4, Itamar Syn-Hershko wrote:
Why can't we model those cases using declarations instead of code?

The way I see it, you want to start writing a lot of complex stuff that will most probably be prone to a lot of bugs just to solve what you consider to be a problem, which is gone in 3 lines of code. I think before we move on with this conversation, you need to explain WHY this actually bugs you, and why you REALLY need that?...

3 lines of code here and 3 lines of code there. And every time, code that most likely has nothing to do with the task it's performing.

That's mainly what bugs me - when I'm writing code that works with the model, I don't want to have to think about persistence, I want to focus on the task I'm trying to perform.


Tough luck, persistence IS part of the problem that you are trying to solve, and you have to take it into account
 
There's also a maintenance issue - suppose your model changes, and something that was previously a component is now an independent document, so you change the collection from IList<Category> to IList<string> and this breaks all of your existing business-procedures. Suppose you have 100s of business-procedures that require a list of categories.


This is NOT just a minor change. This has a lot of implications. You WANT the code to break.
 
In light of this last consideration alone, I'm tempted to consistently add methods for every collection to every model object, essentially duplicating all of my collections:

class Store
{
    public IList<string> Categories { get; set; }

    public IEnumerable<Category> GetCategories()
    {
        // fetch and return Categories...
    }
}

Absolutely horrible. You will end up with a lot of pain for absolutely no gain.
May I suggest, go and write an idiomatic RavenDB application, then come back and tell us what the experience was like.
Right now, you don't have valid reasons, you have gut feeling based on experience in completely different technology and methodology. 

mindplay

unread,
Apr 12, 2012, 3:06:59 PM4/12/12
to rav...@googlegroups.com
On Thursday, April 12, 2012 12:44:20 PM UTC-4, Oren Eini wrote:
That's mainly what bugs me - when I'm writing code that works with the model, I don't want to have to think about persistence, I want to focus on the task I'm trying to perform.

Tough luck, persistence IS part of the problem that you are trying to solve, and you have to take it into account

of course, but I still feel like it should be handled in isolation and not mixed into business-procedures.

May I suggest, go and write an idiomatic RavenDB application, then come back and tell us what the experience was like.
Right now, you don't have valid reasons, you have gut feeling based on experience in completely different technology and methodology. 

that's going to be hard sell, since I don't understand your methodology - probably the only way I would get to do that, is if I had the time to do it on my own dime. I can't really go to management or  to my client and try to sell them on an idea I don't understand.

if it works as well as you claim, hopefully it will become popular enough to warrant a book - not just on the software, but on the methodology. 

The examples and tutorials available at the moment are all very trivial and show individual features working well in isolation, but I don't feel like there's enough depth to provide a big picture.

My main concern is that this won't scale in terms of complexity. If you could show me a complex app that leverages the features and applies the methodology in practice, perhaps this would be more accessible.

As you probably know, it's much harder to un-learn than it is to learn - and it sounds like these ideas go against much of the established, mainstream software theory I was taught. I'm afraid you'd have to take out my brain and reset it - at the moment it's stuck screaming "NO NO NO" to some of these ideas.

I dipped my toes, and the water doesn't feel too cold, but I still fear there may be sharks ;-)

Maybe I can make it to one of your courses later this year. Do you teach the methodology or do these courses mainly focus on programming?

Oren Eini (Ayende Rahien)

unread,
Apr 12, 2012, 3:16:05 PM4/12/12
to rav...@googlegroups.com
inline

On Thu, Apr 12, 2012 at 10:06 PM, mindplay <ras...@mindplay.dk> wrote:
On Thursday, April 12, 2012 12:44:20 PM UTC-4, Oren Eini wrote:
That's mainly what bugs me - when I'm writing code that works with the model, I don't want to have to think about persistence, I want to focus on the task I'm trying to perform.

Tough luck, persistence IS part of the problem that you are trying to solve, and you have to take it into account

of course, but I still feel like it should be handled in isolation and not mixed into business-procedures.


40 years of RDMBS experience says that it isn't a good way to go.
 
May I suggest, go and write an idiomatic RavenDB application, then come back and tell us what the experience was like.
Right now, you don't have valid reasons, you have gut feeling based on experience in completely different technology and methodology. 

that's going to be hard sell, since I don't understand your methodology - probably the only way I would get to do that, is if I had the time to do it on my own dime. I can't really go to management or  to my client and try to sell them on an idea I don't understand.

if it works as well as you claim, hopefully it will become popular enough to warrant a book - not just on the software, but on the methodology. 

The examples and tutorials available at the moment are all very trivial and show individual features working well in isolation, but I don't feel like there's enough depth to provide a big picture.

There is a book in the works, yes :-)
 

My main concern is that this won't scale in terms of complexity. If you could show me a complex app that leverages the features and applies the methodology in practice, perhaps this would be more accessible.


RavenDB runs msnbc.com and pluralsight, among others. 
We have several big sample apps, RacconBlog is a god example.

Oren Eini (Ayende Rahien)

unread,
Apr 12, 2012, 3:16:49 PM4/12/12
to rav...@googlegroups.com
Oh, and in the courses, I am not focusing very much on the API, that is why we have intellisense, we focus much more on the actual semantics and the zen. How to think about about building applications using RavenDb.

On Thu, Apr 12, 2012 at 10:06 PM, mindplay <ras...@mindplay.dk> wrote:

Troy

unread,
Apr 12, 2012, 3:39:36 PM4/12/12
to rav...@googlegroups.com
What other big sample apps are there, and where are they located?
 
RavenDB runs msnbc.com and pluralsight, among others. 
We have several big sample apps, RacconBlog is a god example.
 
 
Also, when is the book in progress going to be released? Any ETA? 

mindplay

unread,
Apr 12, 2012, 4:17:08 PM4/12/12
to rav...@googlegroups.com
We have several big sample apps, RacconBlog is a god example.

the application I'm building has 100+ entities with thousands of properties and several hundred relationships, factory-classes with hundreds of query-parameters, and will be twice as big when it's done.

I'm still trying to imagine what that would look like if, every time I had to traverse one of these relationships, I had to go back to the database and retrieve them manually... and in some cases, having to traverse across five or ten document-boundaries...

Chris Marisic

unread,
Apr 12, 2012, 4:38:33 PM4/12/12
to rav...@googlegroups.com

This is just inherent poor modeling. Or atleast, inherently poor modeling for a nonrelational system.

You also keep ignoring that you don't need to make multiple trips to the database and retrieve them manually, RavenDB provides that for you. The only time you'd have to make successive trips is if you load an object, include it's many to 1 relationship, and then need to load more objects based off the objects in the many to 1 relationship. If you're encountering scenarios like this, you're not doing document modeling right. I mean maybe there's some valid case for needing to traverse more than a single layer down, but I doubt there's many, and I expect there's exactly zero valid reasons for traversing another iteration or more... Unless you're doing a graph db which ravendb has support for.

My current major project built overtop of Raven has 7 aggregate roots currently. Outside of that we have less than 25 supporting business models that are used by those 7 aggregate roots. Our documents contain almost no relationships whatsoever.

alwin

unread,
Apr 12, 2012, 5:28:25 PM4/12/12
to ravendb
"IList<string> Categories" versus "IList<Category> Categories" is just
an explicit boundary between entities versus an implicit/hidden one.

What i miss from this discussion are the indexes and the separation
between read/write that indexes provide in a natural way.

At first, I found RavenDB clumsy to use because you can't easily "hop"
from one object to a related object. But most of the times you need
this in read scenario's.
And then it clicked for me, with indexes you separate all the reading
from the actual model. Your document boundaries disappear for queries
(of course you need to write an index definition for that).
With all the querying separated, the actual write model becomes much
simpler. And then you only have a few times where you cross the
boundary.

Chris Marisic

unread,
Apr 13, 2012, 8:10:49 AM4/13/12
to rav...@googlegroups.com
Well said alwin. Solving problems with RavenDB lets you directly solve problems. RDBMS abstractions pervasively leak deep into your design.

Justin A

unread,
Apr 13, 2012, 8:55:09 AM4/13/12
to rav...@googlegroups.com
+1 Chris :)

Itamar Syn-Hershko

unread,
Apr 14, 2012, 1:53:30 PM4/14/12
to rav...@googlegroups.com
Troy,

Sample apps, or real-world open-source apps using RavenDB (some may not be using latest recommended approaches):

https://github.com/PureKrome/RavenOverflow (and the RavenOverflow vid on our YT channel)
The Actya CMS

No ETA on the book currently

Itamar Syn-Hershko

unread,
Apr 14, 2012, 2:03:27 PM4/14/12
to rav...@googlegroups.com
Rasmus,

Without discussing concrete and complete business models, and leveraging different considerations, we can't really hand you a strict guidelines of DOs and DONTs for your scenario. Correct modeling is a process that comes after days of thinking and trial and error.

As both Oren and I suggested - spend a few days trying to build a PoC for your problem with RavenDB, and ping us here for help whenever you need it. From past experience, it would be MUCH faster to set it up and resolve prospect problems there than continuing the RDBMS route, especially if your app is as complex as you say it is.

Seeing the number of trips to the server disturbs you that much, and the hypothetical problem of converting a List<Category> to List<string>, it is clear the relational modeling is still guiding you. This is completely fine, but if you want to REALLY try RavenDB you need to try and ignore it for a couple of days.

HTH,

Itamar.

mindplay

unread,
Apr 14, 2012, 4:59:21 PM4/14/12
to rav...@googlegroups.com
inline


On Saturday, April 14, 2012 2:03:27 PM UTC-4, Itamar Syn-Hershko wrote:
As both Oren and I suggested - spend a few days trying to build a PoC for your problem with RavenDB, and ping us here for help whenever you need it. From past experience, it would be MUCH faster to set it up and resolve prospect problems there than continuing the RDBMS route, especially if your app is as complex as you say it is.

it is proprietary and under strict non-disclosure agreement with the client, but I suppose I could attempt to model some of the trickier parts to see how that works out. in particular, this application uses a lot of hierarchical models, and a hierarchical security-model - you can grant certain permissions to specific users for entire sets under a hierarchy.

so in other words, a collection of documents might be associated with different areas in a hierarchy - so if you grant somebody a permission to view documents for an area, only documents within those areas are visible to them on the document search screen. so basically, the query parameter itself is hierarchical, and the documents are stored in hierarchical categories, so it's a hierarchical-search against hierarchical data.

there's a lot of "inheritance" aspects to this application - inheriting rights from a parent area to which you've been granted specific rights is just one example, there are lots of examples of various entities "inheriting" bits of information from parent entities.

this definitely does not lend itself well to RDBMS by any means, I am well aware of that, and that's why I'm looking at RavenDB.
 
Seeing the number of trips to the server disturbs you that much, and the hypothetical problem of converting a List<Category> to List<string>, it is clear the relational modeling is still guiding you. This is completely fine, but if you want to REALLY try RavenDB you need to try and ignore it for a couple of days.

I know. I just wish there was a book I could pick up - I find that books help me abstract from what I already know, while sitting in front of the computer has a tendency to make me think in familiar patterns, you know? :-)

I'm thinking about attending a course...

Matt Warren

unread,
Apr 14, 2012, 5:16:55 PM4/14/12
to rav...@googlegroups.com
Until a "proper" book comes out you could always download the old e-book from http://builds.hibernatingrhinos.com/Builds/RavenDBBook. It's a little bit out of date, but some of the fundamentals still apply.

Also look through the archive of posts on Ayende's blog, there's ones on RavenDB, modeling, document databases, etc.

Stijn Volders

unread,
Apr 14, 2012, 6:21:35 PM4/14/12
to rav...@googlegroups.com
3 things came in mind

1) Maybe you find Vaughn Vernon's essay on Effective Aggregate Design helpful (google it, there are 3 pdf's and 2 video's). I was an eye opener for me and will help you understand how to model your domain and the AR pattern.

2) the way your current application is designed will cause you trouble, no matter what abstraction layer you use. As you mentioned, NHibernate is giving you also a lot of pain. 

3) while not related to your questions, you might want to split up your application. Working in such a large monolithic application isn't going to bring anything other than friction and pain IMHO

Rasmus Schultz

unread,
Apr 14, 2012, 11:43:39 PM4/14/12
to rav...@googlegroups.com
inline

On Sat, Apr 14, 2012 at 6:21 PM, Stijn Volders <stijn....@gmail.com> wrote:

> 1) Maybe you find Vaughn Vernon's essay on Effective Aggregate Design
> helpful (google it, there are 3 pdf's and 2 video's). I was an eye opener
> for me and will help you understand how to model your domain and the AR
> pattern.

thanks, I will definitely take a look at that. I wasn't aware I had
anything left to learn in that area, but I'm starting to think maybe
there's some important pieces I'm still missing...

> 2) the way your current application is designed will cause you trouble, no
> matter what abstraction layer you use. As you mentioned, NHibernate is
> giving you also a lot of pain.

It is - but, for the most part, once you get it working, it actually
just works and you don't have to think much more about it. I'm just
talking about loading/saving/traversing the model here - queries are
and remain a real pain. That's probably the worst part about NH, I
think - HQL isn't as verbose as SQL, and it respects your defined
relations and writes joins for you, but everything above and beyond
that seems to be just as much work and causes just as many problems as
old-school SQL...

(for the most part I actually find SQL to be easier and more
transparent, and it seems that the most common questions on forums etc
revolve around "how to do this SQL query in HQL" or the criteria
API...)

> 3) while not related to your questions, you might want to split up your
> application. Working in such a large monolithic application isn't going to
> bring anything other than friction and pain IMHO

I would if I could - I am not the first or only person to suggest that
to the client, but they insist on very large iterations. Nor I or
management have been able to convince them of the fact that it's
safer, wiser and more cost-effective, and less disruptive to their
business process to build in smaller iterations and introduce the
software progressively...

Serge van den Oever

unread,
Apr 15, 2012, 7:56:47 AM4/15/12
to rav...@googlegroups.com
Thanks for the discussion, I think I learned a lot.

I'm working on a really simple example to get the hang of it. What I want to do is create website/mobile app on top of MVC Web API.
Say I want to model FAQ list:

public class Faq
{
  string Id { get; set; }
  string Question { get; set; }
  string Answer { get; set; }
  List<Category> Categories { get; set; }
}

public class Category
{
  string Id {get; set; }
  string Name { get; set; }
  int Order { get; set; } // need order when presenting for example list of categories
}

I started first with List<string> CategoryIds, but from the above discussion I learned not to do that. Ok, still have the question what if we rename the category? Should we go through all Faq's containing the category? What happens in cloud hosted scenario when you have to update a lot of Faq's? If each request costs money, we would have an O(n) approach?

I see also advantages, I assume full text search can index all info on your document, so you get more meaningful results instead of searching over documents having references!

Now I want to put an MVC web api on top.
i will get requests like:


Will I return the same FAQ class used for persistance, or create a Web API "View model" class, and thanslate?
I don't want the user of my api get a category Id categories/123, she should only see 123, so an new GET REQUEST http://myserver/API/Categories/123 can be done.

Another question: Should I maintain a separate set of Category documents? I need a picker for available categories, so need to trach them somewhere... Should I have each category as a seperate document, or create a single document witth all categories?

Any ideas on this?

Ryan Heath

unread,
Apr 15, 2012, 8:30:14 AM4/15/12
to rav...@googlegroups.com
Inline ***


On Sunday, April 15, 2012, Serge van den Oever wrote:
Thanks for the discussion, I think I learned a lot.

I'm working on a really simple example to get the hang of it. What I want to do is create website/mobile app on top of MVC Web API.
Say I want to model FAQ list:

public class Faq
{
  string Id { get; set; }
  string Question { get; set; }
  string Answer { get; set; }
  List<Category> Categories { get; set; }
}

public class Category
{
  string Id {get; set; }
  string Name { get; set; }
  int Order { get; set; } // need order when presenting for example list of categories
}

I started first with List<string> CategoryIds, but from the above discussion I learned not to do that.
 
*** in this case I would use a list of ids ... Since one category could be used in one or more Faqs. 
 
 Ok, still have the question what if we rename the category? Should we go through all Faq's containing the category? What happens in cloud hosted scenario when you have to update a lot of Faq's? If each request costs money, we would have an O(n) approach?

*** you can issue patch commands which operate on documents at a time. 
 

I see also advantages, I assume full text search can index all info on your document, so you get more meaningful results instead of searching over documents having references!

Now I want to put an MVC web api on top.
i will get requests like:


Will I return the same FAQ class used for persistance, or create a Web API "View model" class, and thanslate?

*** I would separate it yes where it makes sense. Think how your persistent model could change but you still need to support version 1.0 of your api. 
 
I don't want the user of my api get a category Id categories/123, she should only see 123, so an new GET REQUEST http://myserver/API/Categories/123 can be done.

*** One small thing, you are not required to have integer ids, you might as well have categories/mobile, categories/desktop etc etc as ids. Making your api look more 'natural' or 'discoverable'. 


Another question: Should I maintain a separate set of Category documents? I need a picker for available categories, so need to trach them somewhere... Should I have each category as a seperate document, or create a single document witth all categories?

*** in this case I would yes separate category from FAQ 
If category was nothing more than a name then I would have put it into FAQ 
 

Any ideas on this?

// Ryan

Sent from my iPhone  

Serge van den Oever

unread,
Apr 15, 2012, 11:40:49 AM4/15/12
to rav...@googlegroups.com
@Ryan, thanks for your reply!

I'm still not sure about list of ids ... although one category could be used in one or more Faqs, we are getting relational. What is the impact on full text search?. 

I think you are right about seperating persistance model and web api model, although they are in this case much alike. This means introducing code to convert from persitance model to web api model. 

The useage of string id's like categories/mobile, categories/desktop sounds good. Still remains issue of categories/123 versus 123, or categories/mobile versus mobile.
I know RavenDB can do custom id's, don't know about web api. User only sees web api, so should use it in web api, but to use it in RavenDB as well means transforming between categories/mobile and mobile just like categories/123 and 123.

I know that if category was just a name its better to put it in Faq, but the purpose is to investigate these issues.:-)

Regards,
Serge van den Oever

Ryan Heath

unread,
Apr 15, 2012, 12:44:15 PM4/15/12
to rav...@googlegroups.com
inline ***

> I'm still not sure about list of ids ... although one category could be used
> in one or more Faqs, we are getting relational. What is the impact on full
> text search?.

Even in nosql-land we have relations between documents ;)
When defining an index you can get related documents into the index as if
all information was located in that one single document. So that
should no be an issue.

> I know RavenDB can do custom id's, don't know about web api. User only sees
> web api, so should use it in web api, but to use it in RavenDB as well means
> transforming between categories/mobile and mobile just like categories/123
> and 123.

It really depends how you construct your API. You could setup routes
that take the id as 'categories/123' instead of just '123'.
Personally I would not like to construct the id for RavenDB, I'd
prefer to have the full id from the API.
You could encounter situations where you do not know whether an id
points to a category or to, say, a faq document. In those cases a
'full' id has a lot more advantages.

// Ryan

mindplay

unread,
Apr 16, 2012, 1:03:58 PM4/16/12
to rav...@googlegroups.com
Thank you for pointing me to those 3 PDF's - I read the first one last night, and I'm starting to see what's going on here. Let me explain what I'm thinking, so you can verify (or not) if I'm starting to "see the light" ;-)

Back when I took my education in systems development, basically, I was taught to build aggregates as large, as complete and as connected as possible. But that was 14 years ago, and I'm starting to think, what they taught me back then was based on the kind of thinking that works for single-user, typically desktop applications, where the entire model was assumed to be in-memory, and therefore had to be traversible, since there was no "engine" you could go back to and ask for another piece of the model.

I can see now why that doesn't make sense for concurrent applications with large models persisted in the background. It just never occurred to me, and looked extremely wrong to me, because that's not how I was taught to think.

Furthermore, I'm starting to see why NHibernate doesn't really work well for me. So here's the main thing that's starting to dawn on me, and please confirm or correct me on this:

It seems that the idea behind NH is to configure the expected data-access strategies for the model itself. You write configuration-files that define the expected data-access strategies, but potentially, you're doing this based on assumptions about how you might access the data in this or that scenario.

The problem I'm starting to see, is that you're defining these assumptions statically - and while it is possible to deviate from these defined patterns, it's easy to think that once you've defined your access strategies, you're "done", and the model "just works" and you can focus on writing business logic, which too frequently turns out to be untrue in practice.

This contrasts with RavenDB, where you formally define the access strategies for specific scenarios - rather than for the model itself. And of course the same access strategy may work in different scenarios, but you're not tempted to assume that a single access strategy is going to work for all scenarios.

You're encouraged to think and make choices about what you're accessing and updating in each scenario, rather than just defining one overriding strategy and charging ahead blindly on the assumption that it'll always just work, or always perform well, or always make updates that are sufficiently small to not cause concurrency problems.

Am I catching on?

I will definitely ready the other two articles, and I understand the author has a book coming out on the subject, too.

Is there a book already available that I should read, that can help teach me this kind of thinking?

Thank you folks, for putting up with all my questions and pushing me in the right direction! - I'm starting to see that there's something important I need to learn here :-)

On Saturday, April 14, 2012 6:21:35 PM UTC-4, Stijn Volders (ONE75) wrote:
3 things came in mind

1) Maybe you find Vaughn Vernon's essay on Effective Aggregate Design helpful (google it, there are 3 pdf's and 2 video's). I was an eye opener for me and will help you understand how to model your domain and the AR pattern.

2) the way your current application is designed will cause you trouble, no matter what abstraction layer you use. As you mentioned, NHibernate is giving you also a lot of pain. 

3) while not related to your questions, you might want to split up your application. Working in such a large monolithic application isn't going to bring anything other than friction and pain IMHO

On 14-apr.-2012, at 22:59, mindplay wrote:

Chris Marisic

unread,
Apr 16, 2012, 2:28:54 PM4/16/12
to rav...@googlegroups.com


On Monday, April 16, 2012 1:03:58 PM UTC-4, mindplay wrote:
Furthermore, I'm starting to see why NHibernate doesn't really work well for me. So here's the main thing that's starting to dawn on me, and please confirm or correct me on this:

It seems that the idea behind NH is to configure the expected data-access strategies for the model itself. You write configuration-files that define the expected data-access strategies, but potentially, you're doing this based on assumptions about how you might access the data in this or that scenario.

The problem I'm starting to see, is that you're defining these assumptions statically - and while it is possible to deviate from these defined patterns, it's easy to think that once you've defined your access strategies, you're "done", and the model "just works" and you can focus on writing business logic, which too frequently turns out to be untrue in practice.

This contrasts with RavenDB, where you formally define the access strategies for specific scenarios - rather than for the model itself. And of course the same access strategy may work in different scenarios, but you're not tempted to assume that a single access strategy is going to work for all scenarios.

You're encouraged to think and make choices about what you're accessing and updating in each scenario, rather than just defining one overriding strategy and charging ahead blindly on the assumption that it'll always just work, or always perform well, or always make updates that are sufficiently small to not cause concurrency problems.

Am I catching on?


Nailed it.

Chris Marisic

unread,
Apr 16, 2012, 2:35:55 PM4/16/12
to rav...@googlegroups.com
When I started working with RavenDB nearly 2 years ago, the Id system seemed the most wrong to me. I've come full circle and absolutely love the way Raven pushes you to identify things.

All the data going into system has very few relationships, in cases where relationships exist they're all structured like: cars/hyundai/model/tiburon. This matches to a restful server phenomenally, the child object, MODEL in this example, would have CarsId: cars/hyundai physically on the document, but it's also embedded in the route structure of the resource.

If i'm on the controller action for  cars/hyundai/model/tiburon I can easily get the parent Car document, the child Model document, or both using includes in a single http request without any queries.

Stijn Volders (ONE75)

unread,
Apr 16, 2012, 2:40:17 PM4/16/12
to rav...@googlegroups.com
Inline


On Monday, April 16, 2012 7:03:58 PM UTC+2, mindplay wrote:
Thank you for pointing me to those 3 PDF's - I read the first one last night, and I'm starting to see what's going on here. Let me explain what I'm thinking, so you can verify (or not) if I'm starting to "see the light" ;-)

Back when I took my education in systems development, basically, I was taught to build aggregates as large, as complete and as connected as possible. But that was 14 years ago, and I'm starting to think, what they taught me back then was based on the kind of thinking that works for single-user, typically desktop applications, where the entire model was assumed to be in-memory, and therefore had to be traversible, since there was no "engine" you could go back to and ask for another piece of the model.

I can see now why that doesn't make sense for concurrent applications with large models persisted in the background. It just never occurred to me, and looked extremely wrong to me, because that's not how I was taught to think.

This is part of the problem. Almost everybody was taught otherwise and the www is full of examples of aggregate clusters (as a best practice)
 

Furthermore, I'm starting to see why NHibernate doesn't really work well for me. So here's the main thing that's starting to dawn on me, and please confirm or correct me on this:

It seems that the idea behind NH is to configure the expected data-access strategies for the model itself. You write configuration-files that define the expected data-access strategies, but potentially, you're doing this based on assumptions about how you might access the data in this or that scenario.

The problem I'm starting to see, is that you're defining these assumptions statically - and while it is possible to deviate from these defined patterns, it's easy to think that once you've defined your access strategies, you're "done", and the model "just works" and you can focus on writing business logic, which too frequently turns out to be untrue in practice.

This contrasts with RavenDB, where you formally define the access strategies for specific scenarios - rather than for the model itself. And of course the same access strategy may work in different scenarios, but you're not tempted to assume that a single access strategy is going to work for all scenarios.

You're encouraged to think and make choices about what you're accessing and updating in each scenario, rather than just defining one overriding strategy and charging ahead blindly on the assumption that it'll always just work, or always perform well, or always make updates that are sufficiently small to not cause concurrency problems. 

Am I catching on?

Yes. RavenDB is build on the AR pattern. You have to think about your model, your consistency boundaries and therefore the interaction and dependencies between entities. You don't model based on their "real life" structure but on behavior. 



I will definitely ready the other two articles, and I understand the author has a book coming out on the subject, too.

Is there a book already available that I should read, that can help teach me this kind of thinking 

Oren Eini (Ayende Rahien)

unread,
Apr 16, 2012, 2:59:52 PM4/16/12
to rav...@googlegroups.com
DAMN
That is a pretty awesome description
Can I use this in the blog?

mindplay

unread,
Apr 16, 2012, 10:23:29 PM4/16/12
to rav...@googlegroups.com
Thanks, that's really reassuring! And yes, of course, use whatever you want :-)

Thank you everyone for taking the time to discuss this. You've got a great group of people here :-)

BAM

unread,
Apr 17, 2012, 10:41:44 AM4/17/12
to rav...@googlegroups.com
Thank you, mindplay, for driving the conversation and to all who participated. Taken all together it makes for a VERY helpful/educational thread. I've suggested that everyone on my team read this thread as I think it highlights many of the shifts in perspective required to go from the NH SQL world to working with Raven. +1 all around.