You asked a good question. I've been thinking about it for a couple
days, and really, I was hoping you would get some more responses,
because you've described a common scenario.
As you point out, with the GAE datastore, you can't query for
"missing" data the way you can with the LEFT OUTER JOIN. (Your MySQL
query will show the view count for *all* cards, even when there's no
view count record for a particular card.)
Your approach (creating a view record where view_count = 0 for each
card for each user) is probably best, but here's another idea to
consider/test:
For each user, create an unseen_cards list of db.keys, where the each
db.key points to a Card that the user hasn't viewed.
class User(db.Model):
user = db.UserProperty()
...
unseen_cards = db.ListProperty(db.key)
The upside:
* When you add a user, you do 1 write (a single list of 1000
unseen_cards), rather than 1000 writes (1 write for each card). This
means that adding a user should be faster, especially as the number of
flashcards grows.
The downsides:
* Generating the "sorted by view_count" flashcard list is slightly
more complicated, since you have to query on views, then query on
unseen_cards and append the unseen_cards to the list.
* How many db.keys can you keep in a ListProperty? At some point,
you'll have too many cards to handle efficiently in a ListProperty,
but I have no idea when that is.
* Viewing the card for the first time is more complicated, since you
have to remove the db.key from the unseen_cards list and create the
view record. For data consistency, you should do this operation in a
transaction.
Tom
--
My blog: http://offermann.us/
Yes. In both Drew's original idea and in my suggestion, you would have
to set the view_count to 0 for each user for every new card.
Obviously, you wouldn't be able to complete millions of updates in a
single Web request without timing out. So, until Google allows
background processes, you would have to do something like add recently
added cards to a user's account when that user logs in, rather than
adding them to all users when you first create the flashcard.
The argument is that somehow you've got to initialize view_count for
every card for every user, otherwise you can't efficiently query for
all flashcards for a user sorted by view_count (including unseen
cards). Unless someone can show a better way to do this query that
doesn't rely on initializing view_count...
Tom
--
Blog: http://offermann.us