Data schema for showing a user 5 random questions they haven't already answered?

70 views
Skip to first unread message

DurhamG

unread,
Jan 12, 2012, 10:35:36 PM1/12/12
to google-a...@googlegroups.com
I've been trying to figure this one out for a few hours now:

I have an ever growing table of questions, and an ever growing table of users.  For a given user, I would like to query for 5 random questions that they have not already answered.  What kind of model schema would allow this?

I've seen how to query for N (semi)random entities (which uses a > query filter), and I've seen the presentation on how to do microblogging style schemas (which uses = on lists as a query filter), which together might allow me to query for '5 random questions the user has already answered'.  To do the opposite though would require > and != on different properties which isn't allowed.

The best I can come up with so far is to keep a list of the answered questions on the user entity, then query for batches of random questions until I find 5 which aren't in their list.  This assumes the number of questions answered by a single user remains under 5000 (the list size limit if I recall) and that there are more unanswered questions than answered ones for any given user (so that I don't have to pull too many batches in looking for questions).  These limitations might be reasonable, but this approach still seems less than optimal.

Any ideas on how to accomplish this?


Someone asked this question a couple years ago with no response, so I'm hoping some changes have occurred in the mean time to make this possible.

Thanks!

Brandon Wirtz

unread,
Jan 12, 2012, 10:54:35 PM1/12/12
to google-a...@googlegroups.com

For real you aren’t just getting us to do your homework?  This is a very common question in Comp Sci Classes.

 

Just in case this is just logic in a mish mash of PHP, python, and java syntax J

 

$RandomNumberOne = RND(1,100000000000000)  // we have lots of questions

 

Array($AnsweredQustions) = Query

 

While ( $RandomNumberOne iscontainedin $AnsweredQustions )

{

$RandomNumberOne++

}

 

Fetch_QuestionByID($RandomNumberOne)

 

///////////////

 

If you need it to be more random take the random number and pick a number half way between it and the max rather than just incrementing.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/SId0PjB1xtcJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

DurhamG

unread,
Jan 12, 2012, 11:36:34 PM1/12/12
to google-a...@googlegroups.com
This is an app engine group so I thought it was obvious I was asking how to format my data to deal with the limitations imposed by the app engine datastore, not how to solve the problem in general.  I don't believe app engine assigns id's going from 1 to N, so I don't think I could use that as my range anyways.  I could assign my own id that fits the range, but if multiple questions are added at the same time I risk adding two items with the same id.  Not to mention potentially having to deal with deleted questions.  That approach seems very un-appengine like and ideally it'd be nice to do it in one query for 5 items instead of 5 queries for 1 item each.

Thanks anyways though.

Brandon Wirtz

unread,
Jan 12, 2012, 11:48:58 PM1/12/12
to google-a...@googlegroups.com

1 Query for all answered Items

Then calculate 5 to answer.

 

I don’t see any other way to do it.  You can’t query for “Not” X and get randoms.  You are going to have to pick 5 ids.

You should assign a key an increment I mean if you want to pick from a list.  Each question needs an ID.

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/ZQHxi2TuuZoJ.

jon

unread,
Jan 14, 2012, 2:24:23 AM1/14/12
to Google App Engine
Durham, any time you're facing this kind of problem remember to always
include memcache in your solution because datastore operations are
expensive.

I think the ideal arrangement is like this:
1. Memcache is where you query for data. Think of it as a "database
that can die any time". Your app should hit this guy a lot.
2. Datastore is a "backup for re-creating the dead database".
Basically you constrain datastore to fallback use only.

How about this:

1. As users post new questions, save the question IDs in memcache,
keeping at most 1000 of them (you decide what the limit should be).

2. When you need to present random questions, take those 1000 IDs, and
do an IN query against the questions which the user has answered. The
diff between the query result and those 1000 IDs are questions
unanswered by the user.

3. You randomly pick 5 unanswered IDs and then map these IDs into
entities. This mapping should use data from memcache as well.
Objectify's caching is great for this (Java).

Pros and cons of suggested solution:
1. Steps 1 and 3 skip datastore completely. Step 2 would fall into
small datastore operations (I think) therefore cheaper.
2. This solution would only present random questions that are new-ish.
Generally this isn't a big problem.

On Jan 13, 2:35 pm, DurhamG <durham.go...@gmail.com> wrote:
> I've been trying to figure this one out for a few hours now:
>
> I have an ever growing table of questions, and an ever growing table of
> users.  For a given user, I would like to query for 5 random questions that
> they have not already answered.  What kind of model schema would allow this?
>
> I've seen how to query for N (semi)random entities<http://stackoverflow.com/questions/3002999/fetching-a-random-record-f...> (which
> uses a > query filter), and I've seen the presentation
> <http://www.google.com/events/io/2009/sessions/BuildingScalableComplex...>on
> how to do microblogging style schemas (which uses = on lists as a query
> filter), which together might allow me to query for '5 random questions the
> user *has* already answered'.  To do the opposite though would require >
> and != on different properties which isn't allowed.
>
> The best I can come up with so far is to keep a list of the answered
> questions on the user entity, then query for batches of random questions
> until I find 5 which aren't in their list.  This assumes the number of
> questions answered by a single user remains under 5000 (the list size limit
> if I recall) and that there are more unanswered questions than answered
> ones for any given user (so that I don't have to pull too many batches in
> looking for questions).  These limitations might be reasonable, but this
> approach still seems less than optimal.
>
> Any ideas on how to accomplish this?
>
> Someone asked this question a couple years ago with no response<http://groups.google.com/group/google-appengine/browse_thread/thread/...>,
Reply all
Reply to author
Forward
0 new messages