Datastore design question

11 views
Skip to first unread message

Phil McDonnell

unread,
Jun 27, 2010, 1:39:50 PM6/27/10
to google-a...@googlegroups.com
I'm using JDO to connect to the datastore and wondering how I should design my tables.  I have users who sign up.  Users have items.  In the RDMS world I'd have a User table and a Items table and the Items table would have a field of user_id which would join with a primary key in the User table.

However, with bigtable I thought you could append values to a cell.  In this situation I might just have my User table with a column "items".  For each item that gets added to the user over time I could then write something to that user's "items" column and it would be appended to the other items also written to that column.  Is there an atomic way to do this in the AppEngine datastore?  I could imagine doing it through a transactional read-write operation, but locking on every write like that seems like a bad idea.

I'm looking to squeeze everything into the Users table because I assume the read will be much faster if I can do a lookup on a particular row as opposed to a table scan query on a separate items table, even if I indexed the items table on user_id.

Thanks,
Phil

Luís Marques

unread,
Jun 27, 2010, 9:53:42 PM6/27/10
to Google App Engine
Hello Phil,

Anyone correct me if I'm wrong but you use one of the following
options, among others.

You can have a "User" model, with a "items" ListProperty. The pros: if
you have the key for the user (e.g. you can use the email), then with
a simple get (faster than a query) you can retrieve the entity
including the items. Writing a new object is transactional. The cons:
you must always get a set the entire user entity. The entity might get
large if there are a lot of items. If you use custom indexes with the
"items" property the index might get large quickly, especially if you
use more than one ListProperty (exploding indexes).

You can have a "User" model, and a "UserItem" whose parent is the
User, making it part of a transaction group. Cons: you have limited
QPS writes to the transaction group (about 1-10 QPS), but that's the
same limitation you have to a standalone User with items. You have to
query (seek) the items, although I wouldn't call that a problem
without profile data. Pros: you can write individual UserItems. You
can have transactions for the user and the items s/he has.

You can have a "User" model and a "UserItem" with a User key
reference. Cons: you can no longer have a transaction depending on the
user and the items s/he has. Pros: you can write individual items
without write collisions to the User.

You can have a "User" and a flat list (e.g. text, non-indexed) of the
corresponding items. Cons: limits the queries you can make, have to
update all items at once. Pros: better index performance.

Etc. I hope it was helpful and correct?

Best regards,
Luís
Message has been deleted

Robert Kluin

unread,
Jun 28, 2010, 2:17:16 AM6/28/10
to google-a...@googlegroups.com
The answer to this question depends on several other questions too.
For example:
1) how many items are you expecting?
2) how often do a users items change, and how do they get changed
(the user or something else)?
3) what type of querying / reporting do you need to be able to do?
4) what type of information would you need to store about each users item?

Robert

On Sun, Jun 27, 2010 at 9:59 PM, Tom Wu <servic...@gmail.com> wrote:
> Hi Luís,
>
> If you have a "User" model, with a "items" heavy Property like TextProperty
> which is not necessary for every query.
> Which db structure is better ?
>
> Best Regards
> Tom Wu
>
>
> 2010/6/28 Luís Marques <luism...@gmail.com>

>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to google-a...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> google-appengi...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

Alon Carmel

unread,
Jun 28, 2010, 6:30:18 AM6/28/10
to google-a...@googlegroups.com
i got alot of user meta data, i save them on another db model called usermeta which is referencekey to the user model key. under the usermeta i write duplicated data from the user table too such as user key id as string and much more. i usually pull the usermeta only and use the user model just for authentication keeping it clean and to the purpose.

-
Cheers,
public static function AlonCarmel() {
//Contact me
var email = 'a...@aloncarmel.me';
var twitter = '@aloncarmel';
var web = 'http://aloncarmel.me';
var phone = '+972-54-4860380';
}

* If you received an unsolicited email from by mistake that wasn't of your matter please delete immediately. All E-mail sent from Alon Carmel is copyrighted to Alon Carmel 2008. Any details revealed in e-mails sent by Alon Carmel are owned by the Author only. Any attempt to duplicate or imitate any of the Content is prohibited under copyright law 2008.

Siva P Thumma

unread,
Jun 28, 2010, 1:39:15 AM6/28/10
to Google App Engine
You have to read thoroughly how to code for the one-to-many in
bigtable implementation.
Instead of fluffing all the content in one column, use that column to
hold an object which fluffs the given objects into it. (Object of
Objects, a list in python, a collection in Java.)

When code, you can say, "my_object.that_list". (After getting the
my_object through GQL )
Which will give you no worry over queries per second and the likes.
The whole operation would be atomic, as we should group the objects in
subject with 'parent-child' concepts.

But Phil, You may have to forget RDBMS if you wish to use Bigtable and
vice-versa.
Message has been deleted

Phil McDonnell

unread,
Jun 29, 2010, 3:41:06 AM6/29/10
to Google App Engine
It sounds like the list property can't be appended to without a
transaction? If so, it seems it'll be better to create two tables,
one for Users and one for Items.

Just to answer the above questions... I expect many reads of the data
and few writes. I also expect the normal case to have few items (1-2
on avg) and it won't change often. The key to the Items table would
be user_id-item_name. Are prefix scans in GQL efficient? Also, in
terms of data I'm storing about the items, I have ~5-10 fields of
metadata.

For now I'm assuming that the right thing to do here is to split the
data between tables. Even if it seems to be an unlikely problem now,
I don't want to put myself in a situation where I'm limited by the QPS
that transactions can support.

Thanks,
Phil


On Jun 27, 11:17 pm, Robert Kluin <robert.kl...@gmail.com> wrote:
> The answer to this question depends on several other questions too.
> For example:
>   1) how many items are you expecting?
>   2) how often do a users items change, and how do they get changed
> (the user or something else)?
>   3) what type of querying / reporting do you need to be able to do?
>   4) what type of information would you need to store about each users item?
>
> Robert
>
>
>
> On Sun, Jun 27, 2010 at 9:59 PM, Tom Wu <service.g2...@gmail.com> wrote:
> > Hi Luís,
>
> > If you have a "User" model, with a "items" heavy Property like TextProperty
> > which is not necessary for every query.
> > Which db structure is better ?
>
> > Best Regards
> > Tom Wu
>
> > 2010/6/28 Luís Marques <luismarq...@gmail.com>

Luís Marques

unread,
Jun 29, 2010, 7:44:34 AM6/29/10
to Google App Engine
On Jun 29, 8:41 am, Phil <phil.a.mcdonn...@gmail.com> wrote:
> It sounds like the list property can't be appended to without a
> transaction?  If so, it seems it'll be better to create two tables,
> one for Users and one for Items.

The list property can be appended without an explicit transaction. It
just might not be wise, since between a read of the entity and the
write with the appended list another request can update the entity,
and that update would be lost. The entity write itself is atomic
though. You won't have some of the properties written from the
original request and some written from the other concurrent request.
So just be careful of race conditions, if you can be sure there aren't
concurrent requests or that they are idempotent then I guess you
should be OK without a transaction.

> Just to answer the above questions... I expect many reads of the data
> and few writes.  I also expect the normal case to have few items (1-2
> on avg) and it won't change often.  The key to the Items table would
> be user_id-item_name.  Are prefix scans in GQL efficient?  Also, in
> terms of data I'm storing about the items, I have ~5-10 fields of
> metadata.

If you know the key then that's the fastest way to retrieve the
entity, instead of using a query. I don't know the Java syntax by
heart, but it's easy to find.

> For now I'm assuming that the right thing to do here is to split the
> data between tables.  Even if it seems to be an unlikely problem now,
> I don't want to put myself in a situation where I'm limited by the QPS
> that transactions can support.

Someone more experienced might be able to elucidate, but on your
scenario I don't know what you'd gain with that, other than smaller
RPC requests for writing to the user items (at least performance-wise,
but it might be a cleaner design). If you have to make a transaction
then the User and UserItems must be part of a transaction group (items
have the user as a parent), so the transaction locks out the entire
group anyway. That's the same as locking out a group made of a single
entity, a user with a list of items property.

See if you can avoid transactions with your design, if you ever run
into write performance problems.
Reply all
Reply to author
Forward
0 new messages