Joins

67 views
Skip to first unread message

Ayende Rahien

unread,
Aug 9, 2010, 3:41:45 AM8/9/10
to ravendb
I am thinking about adding document joins, for m:1 associations.

Given the following documents

// users/oren
{
     "Name": "Oren",
     "Company": { "Id": "companies/hibernating-rhinos" }
}

// companies/hibernating-rhinos
{
   "Name": "Hibernating Rhinos"
}

GET /docs/users/oren?include=Company

Will result in:

{
   "Name": "Oren",
   "Company: " { "Name": "Hibernating Rhinos" }
}

Problems:

How to do that in the client API?
Given that the document that you write is different than the document that you read.

How to avoid writing the invalid document (because it embedds another root document)?

Thoughts?

Matt

unread,
Aug 9, 2010, 5:47:11 AM8/9/10
to ravendb
Maybe treat them as a special index. Obviously not same as from
Lucene.

But then probably applies to single doc loading:

doc.Load(x)
doc.Load(x, ResolveMode.None)
doc.Load(x, ResolveMode.AllManyToOne)


doc.Save(entity, saveStratedgy);

Where saveStratedy tells the client API how to persist the docs.

Ayende Rahien

unread,
Aug 9, 2010, 6:21:19 AM8/9/10
to rav...@googlegroups.com
Too complex, IMO.
I _love_ that Raven is simple.
I am thinking about making people have two models, one that is persistable, the second for reads.

SHSE

unread,
Aug 9, 2010, 7:36:45 AM8/9/10
to ravendb
You could change result format to:

{
"Name": "Oren",
"Company: " {
"Id": "companies/hibernating-rhinos",
"Name": "Hibernating Rhinos"
}
}

so the "Id" property would be an indicator that this is reference to
another entity.

Client API:

session.Load<JObject>("users/oren", "Company")
session.Load<User>("users/oren", user => user.Company) // could be an
extension method

session.Query<JObject>("IndexName").Include("Company").Where(...)
session.Query<User>("IndexName").Include(user =>
user.Company).Where(...) // could be an extension method

Ayende Rahien

unread,
Aug 9, 2010, 7:44:29 AM8/9/10
to rav...@googlegroups.com
Problems:

How do we maintain a consistent model in the face of this?

What does User looks like?

SHSE

unread,
Aug 9, 2010, 8:12:09 AM8/9/10
to ravendb
class User {
public string Id { get; set; }
public string Name { get; set; }
public EntityRef<Company> Company { get; set; }
}

class EntityRef<T> {
public EntityRef (string entityId) { this.Id = entityId; }

public T Entity { get; set; }
public bool IsLoaded { get; set; }
public string Id { get; set; }
}

Raven client api should support cascade create/update/delete.

Ernst Naezer

unread,
Aug 9, 2010, 8:14:37 AM8/9/10
to rav...@googlegroups.com
of maybe this is something that can be fixed using dynamics? It's afterall a dynamic object you're constructing

Ayende Rahien

unread,
Aug 9, 2010, 8:18:53 AM8/9/10
to rav...@googlegroups.com
This leads you right back to the the problem outlined here (association management):



On Mon, Aug 9, 2010 at 2:12 PM, SHSE <shum...@gmail.com> wrote:

Ayende Rahien

unread,
Aug 9, 2010, 8:19:11 AM8/9/10
to rav...@googlegroups.com
Can you suggest an API for this?

Ernst Naezer

unread,
Aug 9, 2010, 8:23:09 AM8/9/10
to rav...@googlegroups.com
If we seperate loading from writing the api could be something like this;



User
{
  string Name
  Company Company
}

Company
{

}









public dynamic LoadWithReferences(string id)

public dynamic Save(object entity)

Ernst Naezer

unread,
Aug 9, 2010, 8:23:41 AM8/9/10
to rav...@googlegroups.com
sorry... my bad... shoudn't try to code in my mail...

Ernst Naezer

unread,
Aug 9, 2010, 8:32:26 AM8/9/10
to rav...@googlegroups.com
What if we keep the client document models the way you would expect them;

i.e:

class User {
       string Name
       Company Company
}

class Compan  {
        string Name
        List<Users> Users
}

and tell raven to split them up:

//configure denormalized entities:
DocumentStore.DenormalizeReferenceStorage.For<User>(u => u.Company)

and for loading something like:

session.LoadWithReferences<User>(u=> u.Company, "users/ernst")

Ernst Naezer

unread,
Aug 9, 2010, 8:46:34 AM8/9/10
to rav...@googlegroups.com
or maybe:

session.Load<User>("users/ernst").Include( e => e.Company)

Ayende Rahien

unread,
Aug 9, 2010, 8:50:56 AM8/9/10
to rav...@googlegroups.com
Ernst,
This leads to a design that is requiring a lot of choices from the user.
That is not a good thing.

I view the join as an optimization, and from what I see, we can deal with that that using 2 different models.
While dynamic is possible, it isn't (yet) a common model in the .NET world to allow us accept it, I think

SHSE

unread,
Aug 9, 2010, 9:32:57 AM8/9/10
to ravendb
public class User {
public string Id { get; set; }
public string Name { get; set; }
public EntityRef<Company> Company { get; set; }
}

public class Company {
public string Id { get; set; }
public string Name { get; set; }
}

public class EntityRef {
private dynamic entity;

public EntityRef(string entityId) {
this.Id = entityId;
}

public EntityRef(dynamic entity) {
this.Entity = entity;
}

[JsonIgnore]
public dynamic Entity {
get { return this.entity; }
set {
this.entity = value;
this.Id = this.entity.Id;
this.HasEntity = true;
}
}

[JsonIgnore]
public bool HasEntity { get; private set; }

public string Id { get; set; }


public static implicit operator EntityRef(string entityId) {
return new EntityRef(entityId);
}
}

public class EntityRef<T> : EntityRef {
public EntityRef(T entity) : base(entity) {}
public EntityRef(string entityId) : base(entityId) {}

[JsonIgnore]
public new T Entity {
get { return base.Entity; }
set { base.Entity = value; }
}

public static implicit operator EntityRef<T>(T entity) {
return new EntityRef<T>(entity);
}

public static implicit operator EntityRef<T>(string entityId) {
return new EntityRef<T>(entityId);
}

public static implicit operator T(EntityRef<T> entityRef) {
return entityRef.Entity;
}
}

internal class Program {
private static void Main(string[] args) {
var user = new User {
Id = "users/oren",
Name = "Oren",
Company = "companies/hibernating-rhinos"
};

var company = new Company {
Id = "companies/hibernating-rhinos",
Name = "Hibernating Rhinos"
};

user.Company = company;

var session = new Program();

session.Store(user);

session.Delete(user);

Console.WriteLine(user.Company.Entity.Name);

Console.WriteLine(JObject.FromObject(user));

Console.ReadLine();
}

public void Store(object entity) {
// If entity is already stored in current session then return

// Store entity

Console.WriteLine("Stored " + entity);

foreach (var referencedEntity in
GetReferencedEntities(entity))
this.Store(referencedEntity);
}

public void Delete(object entity) {
// If entity is already deleted in current session then return

// Delete entity

Console.WriteLine("Delete " + entity);

foreach (var referencedEntity in
GetReferencedEntities(entity))
this.Delete(referencedEntity);
}

private static IEnumerable<dynamic> GetReferencedEntities(object
entity) {
return from property in entity.GetType().GetProperties()
where property.PropertyType.IsSubclassOf(typeof
(EntityRef))
let reference = (EntityRef) property.GetValue(entity,
null)
where reference.HasEntity
select reference.Entity;
}
}

On Aug 9, 4:19 pm, Ayende Rahien <aye...@ayende.com> wrote:
> Can you suggest an API for this?
>
>
>
> On Mon, Aug 9, 2010 at 2:14 PM, Ernst Naezer <ernstnae...@gmail.com> wrote:
> > of maybe this is something that can be fixed using dynamics? It's afterall
> > a dynamic object you're constructing
>

slav

unread,
Aug 9, 2010, 9:36:50 AM8/9/10
to ravendb
So using a special attribute on a property to indicate if I want that
to be saved separately or not is requiring to make too many choices?

class User {
string Name
[StoreSeparate] // or something like that
Company Company
}

class Company {
string Name
List<Users> Users
}
It's not really POCO anymore, but you don't have to use the attribute.
Only if you want to customize how documents are stored / loaded.

You could also make it configurable through Conventions. I'm not sure
if I like that idea though.

On Aug 9, 8:50 am, Ayende Rahien <aye...@ayende.com> wrote:
> Ernst,
> This leads to a design that is requiring a lot of choices from the user.
> That is not a good thing.
>
> I view the join as an optimization, and from what I see, we can deal with
> that that using 2 different models.
> While dynamic is possible, it isn't (yet) a common model in the .NET world
> to allow us accept it, I think
>

Daniel Steigerwald

unread,
Aug 9, 2010, 9:55:42 AM8/9/10
to ravendb
It reminds me my first steps with Raven. I even made thread here
somewhere asking about embedded roots/aggregated document. This change
would have immense impact to client side code, especially classes.
Should I have { CompanyId: '..', or { Company:, or even both?
And writing. You maybe remember my surprise that embedded value
objects were saved without id.
Now, I have to save Company Channels as value objects with id, which I
have to generate by my self (not nice)
Company = { LastChannelId: 545, Channels = [channel1, channel2]...
Maybe http://en.wikipedia.org/wiki/Graph_database approach would fit
better.
But what would fix things for me right now.

[Aggregate]
class Channel {}

[Aggregate]
class Organization { Channels: [] }

[Aggregate]
class FlatOrganization {
[Aggregate(off)]
Channels: []
}

var org = session.Load<Organization>(email);

One attribute could handle loading and saving both.

SHSE

unread,
Aug 9, 2010, 10:10:39 AM8/9/10
to ravendb
Here is converter:

[JsonConverter(typeof(EntityRefConverter))]
public class EntityRef { ... }

public class EntityRefConverter : JsonConverter {
public override void WriteJson(JsonWriter writer, object value,
JsonSerializer serializer) {
var entityRef = (EntityRef)value;
writer.WriteStartObject();
writer.WritePropertyName("Id");
writer.WriteValue(entityRef.Id);
writer.WriteEndObject();
}

public override object ReadJson(JsonReader reader, Type
objectType, object existingValue, JsonSerializer serializer) {
var referenceData = JObject.Load(reader);

if (referenceData.Properties().Count() > 1) {
object entity;

if (objectType.IsGenericType) {
var entityType =
objectType.GetGenericArguments().Single();
entity =
serializer.Deserialize(referenceData.CreateReader(), entityType);
} else {
entity = new ExpandoObject();
serializer.Populate(reader, entity);
}

return (EntityRef)Activator.CreateInstance(objectType,
entity);
}

return (EntityRef)Activator.CreateInstance(objectType,
referenceData["Id"].Value<string>());
}

public override bool CanConvert(Type objectType) {
return objectType.IsSubclassOf(typeof(EntityRef));
}
}

So if client receives:
{
"Id": "users/oren",
"Name": "Oren",
"Company": { "Id": "companies/hibernating-rhinos" }
}

then it creates User instance with empty reference. If client
receives:
{
"Id": "users/oren",
"Name": "Oren",
"Company": {
"Id": "companies/hibernating-rhinos",
"Name": "Hibernating Rhinos"
}
}

it creates User instance with reference with entity.

Ayende Rahien

unread,
Aug 9, 2010, 12:54:45 PM8/9/10
to rav...@googlegroups.com
Guys,

I think that you miss an important aspect. It is _supposed_ to be hard to create associations. 
You _want_ to have a strict separation between documents. 

fschwiet

unread,
Aug 9, 2010, 2:18:45 PM8/9/10
to ravendb
Maybe rather then join the documents into one result, such a request
would cause the 'joined' entities to be preloaded instead.

From the client API perspective, I would load do the joined load of
the user object, and the user is returned in its original form. But
now the session has the company object preloaded, so when I try to
load the company object via the client API no request is made to the
server. From the caller's perspective, the only change to usage has
been the preload hint passed in the original request.

I'm not sure how that would be accomplished in terms of the HTTP
API.

So if I load a user by ID, I get the original document with the
reference to the company. But

Ayende Rahien

unread,
Aug 9, 2010, 2:24:34 PM8/9/10
to rav...@googlegroups.com
You are a genius!!!!

Ernst Naezer

unread,
Aug 9, 2010, 3:36:08 PM8/9/10
to rav...@googlegroups.com
very nice!

Brian Vallelunga

unread,
Aug 9, 2010, 7:01:22 PM8/9/10
to ravendb
How would caching of objects work in this scenario? I'm using
AppFabric to cache objects quite frequently. In this scenario, when I
pull them out of cache, the original document session is gone. Does
this mean the child documents are gone as well? Or would those be
serialized along with the parent entity when cached?

I really like the concept of child documents, and being able to
reference/update a child document is a very useful feature. For my
needs, I'd almost always want the child documents loaded with the
parent document. For me, this is really a subset of the association
issue.

A follow-up issue would be relating to arrays/lists of child
documents. Will those be preloaded as well?

If a child document is not preloaded, are you looking to add lazy
loading? I thought that was something you really wanted to avoid with
Raven.

Ajai Shankar

unread,
Aug 9, 2010, 7:31:58 PM8/9/10
to rav...@googlegroups.com
Very nice solution!

Similar I think to $expand for OData http://www.odata.org/developers/protocols/uri-conventions#ExpandSystemQueryOption

So, include would prevent round trips to server from client.

But on server the initial call to load included entities would still be N + 1?

Ajai

fschwiet

unread,
Aug 10, 2010, 1:03:34 AM8/10/10
to ravendb
If you are caching multiple objects from RavenDb, you would need to do
the traversal yourself then store it in whatever manner interests
you. If you are using the objects this traversal is more natural.

What kind of throughput were you running against RavenDb before you
decided you needed Velocity? Both scale out horizontally so I wonder
if that's the right thing to do. Velocity doesn't even have batch
gets, that I've seen.

Intersting point about preloading lists. It'd be nice to have them
paged, just say "Load cart" and the products are preloaded, paged.
Consider if you can invert the query though, "Load products by cartId"
and then the cart is preloaded. This lets you use paging in the usual
manner.
Reply all
Reply to author
Forward
0 new messages