[objectify-appengine] Simple (?) Datastore Question about Relationships

69 views
Skip to first unread message

Jake

unread,
May 20, 2010, 2:40:54 PM5/20/10
to objectify-appengine
Hello all,

I've long been confused between the usefulness of Parent-Child
relationships vs. just storing a Long parentId in the child. So, I
have a real life example that I would love feedback on...

Our application asks questions (a Prompt) and the user can respond
with many Response objects, which reference the Prompt. A user can
later edit a single Response (e.g. I've answered a Prompt with a
drawing Response and a text Response, but I want to edit the text).
For research purposes, we want a record of these changes. So, I've
created a ResponseData object that, conceptually, has a Response
object as a parent (many ResponseData to one Response), only one of
which is "current".

I've used nothing but Long id's to maintain relationships thus far.
Should I be using this whole Parent Entity group thing here? Here are
my two thoughts:

--#1--

public class Response {
Long id;
Long currentResponseDataId;
}

public class ResponseData {
Long id;
Long owningResponseId;
}

--#2--

public class Response {
Long id;
List<ResponseData> responses;
/* "current" would be at position 1 */
}

public class ResponseData {
Long id;
/* Do I need something else here? */
}

I can guarantee that a ResponseData will never need a different
Response parent. My questions are essentially what is the
difference?

One relevant bit is that in #1, there are a lot of issues when
creating a brand new Response (e.g. Create Response, Store Response,
Create ResponseData, Set ResponseData.owningResponseId, Store
ResponseData, Set Response.currentResponseDataId, Store Response
again). Does Option #2 avoid that? How does Option #2 actually
work? Any pitfalls? Any way to make Option #1 easier?

Also, querying for ALL ResponseData objects will be rare or non-
existent. We just want the data there.

Thank you in advance for your help. I'm not a datastore expert, but
I'm not horribly dumb - just need help understanding why entity groups
are useful.

Jake

Martin Algesten

unread,
May 20, 2010, 4:46:51 PM5/20/10
to objectify...@googlegroups.com

Hi Jake,

The only reason that I've found is when you want to make updates transactionally. A transaction can only span objects belonging to the same entity group. You will get an error if you try to make a transaction that mixes objects from different entity groups. In your example this feature doesn't sound relevant, so I'd stick with just the Long (or alternatively an objectify Key-object) as the reference.

Parent-child may also look like referential integrity, which it sort of is - but if you design heavily with them you will most likely end up in something performing really badly, where each write needs to update the whole entity group. The kind of referential integrity you get in an RDBM have no equivalence in GAE. Using parent-child to achieve it is mostly a bad idea.

Most apps can be designed to programmatically ensure data integrity in the DAO-layer. The same goes for most transactions.

The whole reason GAE scales so well is that the database only needs to consider one entity group at a time, this is how it achieves the distributed nature of the storage. When designing your data model, it's good to keep that in mind - avoid having many users updating the same entity group.

Martin

Jeff Schnitzer

unread,
May 24, 2010, 4:04:15 PM5/24/10
to objectify...@googlegroups.com
I'll try to add a little to Martin's comments:

Jake, I think you're asking a lot of big questions about data modeling
that go beyond whether or not to use the parent-child mechanism. GAE
(and thus Objectify) has three ways of modeling a one-to-many
relationship:

* Storing a pointer from many-side to one-side.
* Storing a pointer from many-side to one-side as a @Parent.
* Storing a list of pointers from one-side to many-side.

In addition, there is a question of exactly how to represent a
pointer. You can use a Key<?> object or you can simply embed a raw
Long id. Note that you cannot (currently) use a full entity object
(ie a ResponseData) as a pointer in itself - these need to be Key<?>
or Long or String or some other simple type.

As Martin said, unless you have specific needs for transactional
integrity, you should best avoid @Parent. It complicates everything.

When you say "querying for ALL ResponseData objects will be rare or non-
existent. We just want the data there", this strongly suggests to me
that your initial intuition (option #1) is the right path. Make the
child point to the parent, and let the parent point to the "current"
value. This will be especially important if there are likely to be
more than 5,000 ResponseData entities.

The next question is - store just the Long id or store a
Key<ResponseData> or Key<Response>. I am partial to Key<?> objects
because they are more explicit and can be easier to work with, but I
don't think there's anything wrong with manipulating Long ids as you
describe.

If enough people requested that Objectify support uninitialized
entities as surrogates for the Key (as Twig does), I'd probably add
the feature. Then you could use Response as a field type in
ResponseData. However, automatic activation is pretty much out of the
question (too much magic). If this paragraph doesn't make sense don't
worry about it.

Jeff

Jake

unread,
May 25, 2010, 12:17:30 PM5/25/10
to objectify-appengine
Hello,

Thank you for the feedback. I figured #1 would be the way to go, but
I've been working with GAE for awhile and never seen a need to use
Entity Groups or formal relationships at all - I just use Long
referenceId everywhere. (Note: I prefer Long so that someday the code
might be portable. I know, I know... it's silly).

Anyways, I've been working on the rule of thumb that, in BigTable,
getObjectById(), rather than queries, is the way to go for as much as
possible - and why I've stuck with this somewhat convoluted data
structure. My question about the complexities of initializing Option
#1 still remain, though. Here is my implementation for #1 (a user is
responding to a global Prompt):

Response r = null;
r = ofy.query(Response.class).filter("promptId",
prompt.getId()).filter("userId", user.getId()).get();

if (r == null) {
r = new Response(user, type, prompt);
ofy.put(r); // Generate ID
}

ResponseData rd = r.getNewResponseDataObject();
rd.setText(text);
ofy.put(rd); // Generate ID and save
r.setResponseDataId(rd.getId());
ofy.put(r);

It just seems like 3 puts to create the first response is too many.
Any thoughts?

Thanks!

Jake

Jeff Schnitzer

unread,
May 25, 2010, 12:38:00 PM5/25/10
to objectify...@googlegroups.com
You can explicitly allocate the two ids in advance:

long responseId =
ofy.getDatastoreService().allocateIds(fact.getKind(Response.class),
1).getStart().getId();
long responseDataId =
ofy.getDatastoreService().allocateIds(fact.getKind(ResponseData.class),
1).getStart().getId();

Not the most elegant syntax, but nobody has asked for an
ObjectifyFactory method to allocate ids before. If you'd like, we can
add one that looks like this:

long responseId = fact.allocateIds(Response.class, 1).getStart().getId();

Jeff

Jake

unread,
May 26, 2010, 9:17:37 AM5/26/10
to objectify-appengine
Hey,

Excellent! Exactly what I was looking for.. Syntax isn't all that
important since I'm only doing it in a few places. However, now that
I realize an ID can be generated without a put(), I'll probably end up
using it more frequently. Instead of returning a long, I would almost
prefer some sort of pre-put() that fills that field directly. E.g.

ResponseData rd = new ResponseData(...);
ofy.generateId(rd); // Could return the id that was generated
// Do stuff with ID
ofy.put(rd);

Anyways, thanks again! Most helpful list ever.

Jake

On May 25, 12:38 pm, Jeff Schnitzer <j...@infohazard.org> wrote:
> You can explicitly allocate the two ids in advance:
>
> long responseId =
> ofy.getDatastoreService().allocateIds(fact.getKind(Response.class),
> 1).getStart().getId();
> long responseDataId =
> ofy.getDatastoreService().allocateIds(fact.getKind(ResponseData.class),
> 1).getStart().getId();
>
> Not the most elegant syntax, but nobody has asked for an
> ObjectifyFactory method to allocate ids before.  If you'd like, we can
> add one that looks like this:
>
> long responseId = fact.allocateIds(Response.class, 1).getStart().getId();
>
> Jeff
>

Jeff Schnitzer

unread,
May 26, 2010, 1:42:53 PM5/26/10
to objectify...@googlegroups.com
I just checked in some friendlier allocateIds() methods on
ObjectifyFactory and a typesafe KeyRange<?> class, which should help
out.

Jeff

btoc

unread,
Jun 2, 2010, 3:54:49 PM6/2/10
to objectify-appengine
The use of pre-generating your keys cannot be overstated. It is how
one can prevent index explosion when using List indexes. If you are
using a typical publisher/subscriber model then one would usually want
to sort most recent first. This means creating an index on top of a
date property which when used with a List index (as per million fan
out) will cause major index overhead.

I use the pre-generated id and *1 (multiply by one) which results in
ids of like -1, -2, -3 .....

Now the entities are returned most recent first without any explicit
ordering.

btoc

On May 26, 1:42 pm, Jeff Schnitzer <j...@infohazard.org> wrote:
> I just checked in some friendlierallocateIds() methods on

Jeff Schnitzer

unread,
Jun 2, 2010, 4:29:44 PM6/2/10
to objectify...@googlegroups.com
I don't quite grasp what you're trying to say. Is there an issue?

Objectify get() methods always try to return results in the same order
the keys were specified.

Jeff

btoc

unread,
Jun 2, 2010, 9:58:11 PM6/2/10
to objectify-appengine
There is no issue. The problem is the normal auto assignment of keys
(for example when you use @id Long) is an upward increment. So for
example if you are creating a blog entry (e.g.
http://code.google.com/p/objectify-appengine/wiki/AdvancedPatterns#Ancestor_Queries
), the id will be 1 for the first entry, 2 for the second etc.

If one does a query for BlogEntry then the normal app engine ordering
is by ascending id, so you will get the first entry then the second
entry. For a blog this would not be the typical way to return entries.
You would want the entries ordered most recent first.

You can do this by using an index on a date property, but again it is
an unnecessary index. To ensure most recent first ordering you want
the id to start high and get lower. Using allocatedids as I described
above the first entry will be -1, then -2 etc. IOW, the most recent
entry will be lower than the prior (-2 is less than -1).

Where this becomes important is the the example of the million user
fan out. So,


class MessageIndex {
private long id;
private Key<Message>;
private List<Long> subscribers;
}

If you do not use allocateids to generate the id in a descending way,
then the app engine will always return the entries FIFO (First In
First Out). Again this is not how would typically consumes a publish/
subscribe model (you want LIFO - Last In First Out).

You could add a date to sort appropriately, but that would require a
custom index on subscribers, date desc.

One can avoid using date and an index by allocating the id in a
descending manner.

btoc
> >> >> >> >> Also, querying for ALL ResponseData objects will be rare or...
>
> read more »

Joost Bloemsma

unread,
Jun 3, 2010, 3:05:36 AM6/3/10
to objectify-appengine
I don't quite get the problem here.
If you simply let the id be auto generated, it will be incremental
like you said. Why then not simple sort the result descending on id?
There already is an index in the id so you can use that, and it would
avoid having to pre-generate id's (which will obviously invoke some
overhead in it's turn).

- Joost

On 3 jun, 03:58, btoc <btoc...@gmail.com> wrote:
> There is no issue. The problem is the normal auto assignment of keys
> (for example when you use @id Long) is an upward increment. So for
> example if you are creating a blog entry (e.g.http://code.google.com/p/objectify-appengine/wiki/AdvancedPatterns#An...
> ...
>
> meer lezen »

Joost Bloemsma

unread,
Jun 3, 2010, 3:05:22 AM6/3/10
to objectify-appengine
I don't quite get the problem here.
If you simply let the id be auto generated, it will be incremental
like you said. Why then not simple sort the result descending on id?
There already is an index in the id so you can use that, and it would
avoid having to pre-generate id's (which will obviously invoke some
overhead in it's turn).

- Joost

On 3 jun, 03:58, btoc <btoc...@gmail.com> wrote:
> There is no issue. The problem is the normal auto assignment of keys
> (for example when you use @id Long) is an upward increment. So for
> example if you are creating a blog entry (e.g.http://code.google.com/p/objectify-appengine/wiki/AdvancedPatterns#An...
> ...
>
> meer lezen »

btoc

unread,
Jun 3, 2010, 7:10:34 PM6/3/10
to objectify-appengine
First of all GAE does not work that way. If you are going to sort the
result descending on id, GAE will need to manage another index.
Something like:

<datastore-index kind="Foo" ancestor="false" source="auto">
<property name="__key__" direction="desc"/>
</datastore-index>

Of course composite keys containing lists really make this
undesirable.

Secondly if this query is used on a a Class and the entity has a
@Parent, then you cannot use the id to order (I thought you could not
just use filter, but it seems order is also restricted)

btoc
> > If you do not useallocateidsto generate the id in a descending way,
> > > >> >> >> > Most apps can be designed to programmatically ensure data integrity in the DAO-layer. The same goes for...
>
> read more »
Reply all
Reply to author
Forward
0 new messages