Long Running Tasks

16 views
Skip to first unread message

acl...@thoughtworks.com

unread,
Sep 2, 2014, 4:56:37 AM9/2/14
to rapi...@googlegroups.com
Hi everyone,

After testing the matching functionality, there were too many invalid matches being returned because our naive approach was matching all Enquiry data against all indexed Child data.  It was decided to limit which fields from an Enquiry will be used to find potentially matching children.  For example, we would not use the nationality of an enquirer when trying to find a matching child (completely made up example).

To implement our new feature, we gave admins the option to mark fields as "matchable."  When developing the feature, we realized that marking or unmarking a field would affect the saved potential matches, and we would need to update all the pre-existing enquiries.  Currently, whenever an enquiry is changed or added, we search the Solr instance for matching children, and save the potential matches as an array on the enquiry object.  Updating all the pre-existing enquiries raised performance concerns, and we're trying to think of different ways to implement the feature or the update process in a efficient manner.

Options that we've come up with so far:
  1. Update all enquiries during the web request:
    1. - This appears to be much too slow, and would cause time outs once we get to several hundred enquiries (see data below)
    2. + Very simple approach
  2. Update all enquiries as a scheduled task:
    1. - This seems wasteful if we only need to do bulk match updates occasionally
    2. + Already exists as a pattern in the RapidFTR codebase
  3. Update all enquiries as a background task:
    1. - New pattern, new libraries, adds complexity to app and code
    2. + Only runs when we need, doesn't affect web request timing
We've begun implementing the feature using approach #1, but it does not appear scalable based on the data below.  We've looked at Solr update times in production, and they appear very similar to the update times on the Macbook used.  We don't believe the differences between machines will improve performance enough to make #1 viable in production.  Right now, we're leaning towards creating a background task, even though it makes life in RapidFTR more complex.

Does anyone have any solutions other than these three?  Also, any thoughts on the three approaches listed above?



DATA

Time to run "Enquiry.update_all_child_matches"" in Rails console in Vagrant/Vbox on a MacbookPro.
The following data was created using child name fields only (i.e. enquiries only had first and last name of child filled int, children only had name filled).

107 Enquiries, 0 Children
50.9 seconds

107 Enquiries, 101 Children
48.6 seconds

507 Enquiries, 101 Children
236.0 seconds

507 Enquiries, 501 Children
268.0 seconds

acl...@thoughtworks.com

unread,
Sep 8, 2014, 10:02:54 AM9/8/14
to rapi...@googlegroups.com
Okay, we are going to take a stab at the "Update all enquiries as a background task" approach sometime in the near future.
Reply all
Reply to author
Forward
0 new messages