"Rapid" Examine Replication in a Load Balanced Environment using Hot Backups?

250 views
Skip to first unread message

Matt Muller

unread,
Mar 5, 2014, 1:31:28 PM3/5/14
to umbra...@googlegroups.com
Hello,

Apologies for the lengthy post - wanted to give as much background as possible while also providing some solution points for "dumb front end servers".

We are in the process of building out a Proof of Concept for a client.  Our test site has 30,000 nodes and we are at the stage of building out a site in a dev environment looking for perf issues.

The production instance is "beefy" (I think) as it is load balanced across 4 servers (4 core, 32gb RAM each).  Back office is on a separate 5th server that connects to a 6th/7th box where SQL is (2 servers mirrored) that all 5 servers access.  The actual load balanced setup does not (can not due to client ops requirements) allow us to directly access the 4 front end boxes via http.  After initial setup of the 4 front end boxes, the only "access" we have is via the file system (DFSR) and the common database - essentially this means that the standard umbraco load balancing approach of distributed calls over http is not an option for us.

With the above in mind (no distributed calls) we have been trying to come up with solutions to having the 4 front-end boxes get updated content once something is published on the separate authoring server which updates the database.  So far we have this:
  • Custom table in database that "tracks" the 4 servers by name
  • Scheduled background task running every 5 minutes that does the following:
  • Pass in server name and get back date/time stamp of the latest published node the server has (stored in custom table which only has 4 rows)
  • Use above date/time stamp to check against umbracoLog (only look at latest Publish/UnPublish for each node) to see what nodes have been Published/UnPublished since our task last ran
  • Using above list of nodes, call UpdateDocumentCache with each node id
  • Update the custom table with the latest timestamp so next time round we only grab latest
For the most part the above works fine on our front end "dumb" boxes, however we can only process ~27 new published nodes/second in the above scenario on each front end box.  This is due to UpdateDocumentCache firing off an examine reindex of the node and our heavy reliance on examine and using the gathering event to grab a bunch of other stuff and shove it into the index so our front end code has everything it needs from a "single" call into examine.  We believe the client will be publishing upwards of 300 nodes at a time, so 11 seconds for each front end server to refresh during the scheduled task is too long, even on a background thread.

Essentially, 11 seconds is too long (tested on dev machine, so may be faster on above specced boxes).  I've read Ismail's blog post on hot backups which seemed like a potential solution but with limited/no access to the front end boxes to do a remove from load balancing and then app pool reset and then copy files, we hit a dead end.  Has anyone got experience with getting examine to update more rapidly using hot backups?  Is there a way to copy over a current examine index while still getting around the reader/writer lock?  (I think we can disable writer on front end boxes?)


When a new server comes on-line:
Less importantly but still on our radar is the scenario where a new front end server comes online and has nothing in the cache and nothing in the Examine index.  "Spinning" up this server takes about 20 minutes which is mainly taken up by the Examine cache build of the 30,000 nodes.  If we solve the above issue, then this 20 minutes will also be (vastly) reduced.

Regards,
Matt

Matt Muller

unread,
Mar 10, 2014, 12:33:40 PM3/10/14
to umbra...@googlegroups.com
I managed to get a solution working to this:

Has anyone got experience with getting examine to update more rapidly using hot backups?  Is there a way to copy over a current examine index while still getting around the reader/writer lock?  (I think we can disable writer on front end boxes?)

Essentially my PoC does this:
  1. Do a hot backup of the Examine index using instructions from Ismail's blog post to a custom folder name - "backup.yyyyMMddHHmmss" works well
  2. copy the above folder across to the machine you want to restore on
  3. store the backup name from step 1 in the database
  4. modify the way Examine is instantiated in your code from the "standard way" to the code below which specifies the location of the index - we pull the location of the index from a standard path name + the name you get out of the database
var analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
string folder = "~/App_Data/TEMP/ExamineIndexes/External/snapshot.20140305210528/"; //final part of this path would come from db
var path = new DirectoryInfo(Server.MapPath(folder));
var searcher = new UmbracoExamine.UmbracoExamineSearcher(path, analyzer);
var searchCriteria = searcher.CreateSearchCriteria(Examine.SearchCriteria.BooleanOperation.And);
IBooleanOperation query = searchCriteria.Field("nodeTypeAlias", "Resource");
var searchResults = searcher.Search(query.Compile());

We have not tested out the solution above with timings to see if it is faster than doing large (300 node) publish operations, however we definitely know that it will be faster than the 20+ minutes it takes to rebuild our lucene index on each machine, so it seems to be a viable option for bringing a new server in a pool online.

Regards,
Matt

Shannon Deminick

unread,
Mar 11, 2014, 6:54:19 PM3/11/14
to umbra...@googlegroups.com
Just for anyone referencing this topic, there's more information on our here:

If you aren't writing on the front-end boxes than you could remove the writers but that probably isn't going to help you much as the writers do not keep an open lock on the indexes (in the future they might though). The searchers do keep an open lock on the indexes once search has initiated. Here's a zany trick that might work though this is something un-tested, unsupported may cause threading issues, etc... but you can try :)

You can get a reference to your provider like:

var provider = (LuceneSearcher) ExamineManager.Instance.SearchProviderCollection["myIndex"];

You can close the reader which will remove the lock - though if there's currently a search taking place, this reader may be instantly re-opened again.

var indexSearcher = (IndexSearcher) provider.GetSearcher();
var indexReader = indexSearcher.GetIndexReader();
indexSearcher.Close();
indexReader.Close();

The only issue with the your implementation is that the media cache that is in examine is referenced directly by the Examine provider name "InternalSearcher" so unless you can get that specific provider configured to use your custom folder the media will not be coming from the Examine cache,or it will be coming from a stale Examine cache. 

Examine v2 will let you initialize all examine providers via code (without config) so you can dynamically re-configure them on startup. I'm not sure how often you are shipping your Examine backups to your servers? Could you potentially just update the Examine config file in code to point to your new folder? This will restart the app pool and Examine would then be configured with the newer up-to-date folder.

Otherwise without restarting the app pool, changing a folder at runtime for the provider's IndexReader would be an interesting challenge and might actually be possible if you implement your own provider by inheriting from the default one. In your own provider you could detect somehow that a new snapshot folder is available, when it is you could attempt to close the current reader. Then the next time a search takes place the code will automatically try to re-create the reader and then you can override the method GetLuceneDirectory() to return the new folder. You can see how this works by looking here: https://github.com/Shandem/Examine/blob/master/Projects/Examine/LuceneEngine/Providers/LuceneSearcher.cs in the method: ValidateSearcher when ReaderStatus.Closed.
There is a parameter overload for ValidateSearcher to forceReopen but unfortunately that doesn't ever get used.

Let me know if that helps at all!
Reply all
Reply to author
Forward
0 new messages