Problem
=======
I've got 3 main types of Entity objects. I'll call them A, B & C.
For each A, there are many B's. For each B, there are many C's.
Each user may have access rights to many A's, and each A may have many
users that can access it.
If a user has access to an A, then it has access rights to the related
B's and C's.
Queries
============
- List A's that a user has access to
- List B's that a user has access to (without filtering to a single B)
- List C's that a user has access to (without filtering to a single A
or B)
- List users that have access to an A
Proposed Entity Model
=====================
@Entity
public class A {
@Id Long id;
}
@Entity
public class B {
@Id Long id;
Key<A> a;
}
@Entity
public class C {
@Id Long id;
Key<B> b;
Key<C> a; // denormalized data
}
@Entity
public class UserAccount {
@Id Long id;
com.google.appengine.api.users.User user;
}
@Entity
public class UserAccess {
@Id Long id;
Key<A> a;
Key<UserAccount> userAccount;
}
List all A's that a user has access to
----------------------------------------
- query(UserAccess.class).filter("userAccount", userAccount).fetch(),
then for each item, get(A.class, item.a)
- add @Parent on UserAccess::a, then
query(UserAccess.class).filter("userAccount",
userAccount).fetchKeys(), then for each key, get(A.class,
key.getParent())
- Would instead putting @Parent on UserAccess::userAccount improve
performance, as all UserAccess entities for a user would be stored in
the same entity group.
- Perhaps eliminate UserAccess entity all together, and add
'Key<UserAccount>[] users' to A. Then I could go
query(A.class).filter("users", userAccount); There probably wont be
more that 5000 users per A.
List all B's that a user has access to
----------------------------------------
- query(UserAccess.class).filter("userAccount",
userAccount).fetchKeys(), then query(B.class).filter("a in", keys).
This will fail if a user has access to more than 30 A's (which is not
a normal use case, but could happen).
Instead could just go query(B.class).fetch(), and then check each B
and see if b.a is in the list keys. Doesn't sound scalable.
* List all C's that a user has access to
----------------------------------------
- pretty much the same queries as B
* List users that have access to an A
----------------------------------------
- query(UserAccess.class).filter("a", a).fetch()
- Could add "Key<A>[] as" to the UserAccount entity, meaning I don't
have to run so many queries about user access. Also, could put @Cache
on UserAccount as i'll probably be accessing it quite a bit.
Your comments please?
In SQL I would do something like
SELECT ...
FROM B, A, USER_ACCESS
WHERE B.A = A.ID AND A.ID = USER_ACCESS.A AND USER_ACCESS.ID = ?
How do I perform that query on AppEngine with Objectify?
There is no support for joins in the datastore, but you can do this
stuff in your own code with multiple queries. There are some tricks
with parents (ancestors) but that may cause other problems if you ever
want to change parentage or with high volume writes in an entity
group.
class SecureData {
@Id Long id;
List<Key<UserAccount>> allowedUser = new ArrayList<Key<UserAccount>>();
String secureData;
}
class MoreSecureData {
@Id Long id;
Key<SecureData> secureRoot;
String moreSecureData;
}
//Querying
UserAccount me = ...;
Key<Account> meKey = dao.fact().getKey(me);
//returns SecureData instances you have access to
Iterable<SecureData> it =
dao.ofy().query(SecureData.class).filter("allowedUser =", meKey);
//checking MoreSecureData
Long id = someIdiWant;
//get the actual data, then check if the user has access
MoreSecureData msd = dao.ofy().get(MoreSecureData.class, id)
SecureData sd = dao.ofy().get(msd.getSecureRoot);
if(sd.getAllowedUsers().contains(meKey)) return msd;
You can optimize this a bit, but you get the idea.
I'm writing up a separate email that has some advice for doing it
strictly with queries, not precalculating anything.
Jeff
One helpful way to think about the problem is that it's not true that
AppEngine doesn't have joins. It just doesn't do joins for you. You
can do as many joins as you want, you just have to write them
yourself. How would an RDBMS implement that query? Just do the same.
You basically spelled it out:
* You start with a user id
* Get all A's that a user has access to
* For each A:
* Get all B's that are related to the A.
* For each B:
* Get all C's that are related to B.
Yeah, this might perform terribly depending on how many As and Bs and
Cs there are. The only improvement on your code I would make is to
put UserAccess in A's entity group:
public class UserAccess {
@Id Long id;
@Parent Key<A> a;
Key<UserAccount> userAccount;
}
Now your initial translation of userAccount -> Key<A> can be a
fetchKeys() query which will be more efficient than fetching the
UserAccess object.
You might consider caching the results, either in memcache or in the
database itself. This isn't all that different than Scott's
suggestion of ACLs, although I don't think you would want to store the
ACL in the A,B,C objects themselves since loading/saving them now
becomes very expensive. Create a separate index entity that holds the
permission:
class BAccess {
@Id Long id;
@Parent Key<B> b;
Key<UserAccount> userAccount;
}
class CAccess {
@Id Long id;
@Parent Key<C> c;
Key<UserAccount> userAccount;
}
Now, whenever a user is granted access to a new A (or access is
revoked from an A), fire off a batch job that creates or deletes all
the relevant Access entities.
This is the tradeoff of appengine - you can have fast queries by
precalculating, but this makes writes expensive. You can have
(relatively) cheap writes by leaving off indexes, but they your
queries are slow (or impossible). You might find a middle ground with
caching. But there's no magic bullet.
Jeff
I'm wondering about the class BAccess that you suggested.
class BAccess {
@Id Long id;
@Parent Key<B> b;
Key<UserAccount> userAccount;
}
I notice it's in the same same entity group as B. This could be good,
because then:
- I'll be able to save all the BAccess entities for a B in a single
transaction
- I'll be able to to query BAccess by just fetching keys and then
getting the parent
You said you don't think I would want to store the ACL in the A,B,C
objects themselves since loading/saving them would becomes very
expensive.
If the BAccess entities are stored in the same entity group, any
change to B will require and the entire entity group is rewritten for
each change. Is this an expense I need to worry about?
Brendan
This sounds like a conceptual misunderstanding - entity groups are
entities that live on the same server in the cluster and can be
transacted upon. They are still separate entities and can be written
independently.
Jeff
On Fri, Feb 26, 2010 at 11:37 PM, brendan <bre...@doherty.net.nz> wrote:
> Thank you. I'm just implementing some of your suggestions now and
> seeing how it goes.
[snip]
> You said you don't think I would want to store the ACL in the A,B,C
> objects themselves since loading/saving them would becomes very
> expensive.
This depends on how many entries each ACL might contain. If it is
small, and unique to each A/B/C, then keeping the data there might
make more sense than attaching it to a separate object, which you will
have to get in addition datastore request. This all depends on your
usage patterns and query needs.
> If the BAccess entities are stored in the same entity group, any
> change to B will require and the entire entity group is rewritten for
> each change. Is this an expense I need to worry about?
No, the whole entity group does not have to be re-written when any
member object changes. But as you have read, you can only have
transactions within an entity group (entities that all share the same
root ancestor -- the topmost parent).
I went back and watched the videos on datastores and confirmed it was
my mistake.
After listening to Google I/O 2008 App Engine Datastore Under the
Covers, only the root entity and then BAccess entity modified is
written when a change to a BAccess occurs.
On Feb 28, 7:54 am, Jeff Schnitzer <j...@infohazard.org> wrote: