Strange query problem with replication

60 views
Skip to first unread message

Gluber

unread,
Mar 26, 2015, 6:26:36 PM3/26/15
to rav...@googlegroups.com
Hi !

We're currently developing a solution for a client and have moved from running on local development machines only to RavenHQ yesterday.

The only difference in environments is that locally we ran RavenDB in development mode as a single instance of the native window service, and on RavenHQ we use the "High Performance" plan with replication.

Our is a simple restful WEBAPI application, that uses 19 indices and a few document types ( its basically a proxy for the old internal service our client had that could not cope with the load ) 

Now to our problem: One endpoint only for our application randomly ( in about 90 % of cases ) returns zero results ( even though there are actually about a 1000 matching entries, manually verified ) 
However from time to time the query succeeds and returns normal results..... 

This only happens in the replicated environment not the local single machine one..... 

I tried to diagnose as much as i could, in the replicated server setup: All documents are properly replicated, all indices are non stale and valid, and there are no errors anywhere. This is true for both servers. ( because at first i thought replication was not working and sometimes we would return invalid results from a secondary server ) but all seems fine to me.....

Here is the relevant document:

 public class Leaflet
    {
          public string   Id          { get; set; }

        public DateTime CreatedOn   { get; set; }

        public DateTime UpdatedOn   {  get; set; }

        public string               Name                { get; set; }
        public string               IndustryId          { get; set; }

        public string               IndustryName        { get; set; }

        public string               RetailerId          { get; set; }
        public string               RetailerName        { get; set; }

        public string               RegionId            { get; set; }

        public string               RegionName          { get; set; }

        public DateTime             ValidFrom           { get; set; }

        public DateTime             ValidTo             { get; set; }

        public DateTime             TimeStamp           { get; set; }

        public int                  TotalPages          { get; set; }

        public bool                 IsPromoted          { get; set; }

        public bool                 IsTagged            { get; set; }

        public string               ImageBasePath       { get; set; }

        public string               PreviewImageLink    { get; set; }

        public int[]                PreviewImageSize    { get; set; }
        public string               ShareLink           { get; set; }
    }   

( please note the model is not very good, because it has to satisfy strange legacy requirements ) 


The index definition is:

 public class LeafletSearchIndex : AbstractIndexCreationTask<Leaflet,LeafletIndexEntry>
    {
        public LeafletSearchIndex()
        {
            Map = leaflets => leaflets.Select(l => new
            {
                l.Name,
                l.IndustryId,
                l.IndustryName,
                l.RetailerId,
                l.RetailerName,
                l.RegionId,
                l.RegionName,
                l.ValidFrom,
                l.ValidTo,
                l.TimeStamp,
                l.TotalPages,
                l.IsPromoted,
                PromotionOrder = l.IsPromoted ? 1 : 0,

                FullText = new[]
                {
                    l.Name,
                    l.IndustryName,
                    l.RetailerName
                }
            });

            Index(c => c.FullText,FieldIndexing.Analyzed);
        }
    }

    public class LeafletIndexEntry : Leaflet
    {
        public int                  PromotionOrder  { get; set; }

        public IEnumerable<string>  FullText        { get; set; } 
    }

And one of the failing queries looks like this:

   var regionIds = new  ["Region/2147483647", "Region/4", "Region/12", "Region/13", "Region/16", "Region/17", "Region/19", "Region/20", "Region/21"]; // Note this is extracted and dynamically generated normally...... but our diagnostics logging provides me with the information so i pasted it here 

   RavenQueryStatistics statistics1 = null;

                var topLeaflets = await session.Query<LeafletIndexEntry, LeafletSearchIndex>()
                    .Statistics(out statistics1)
                    .Where(l => l.RegionId.In(regionIds))
                    .Where(l => l.IsPromoted)
                    .TimeContrainedByDay()
                    .OrderByDescending(l => l.TimeStamp)
                    .As<Leaflet>()
                    .Take(1024)
                    .ToListAsync().ConfigureAwait(false);

                LogQueryResult("TopLeaflets", statistics1, topLeaflets);

The query log ( LogQueryResult ) 

 "TopLeaflets": 0, statistics RavenQueryStatistics { IsStale: False, DurationMilliseconds: 8, TotalResults: 0, SkippedResults: 0, Timestamp: 03/26/2015 20:44:33, IndexName: "LeafletSearchIndex", IndexTimestamp: 03/26/2015 20:44:33, IndexEtag: Etag { Restarts: 72057594037927937, Changes: 361398 }, NonAuthoritativeInformation: False, LastQueryTime: 01/01/0001 00:00:00, TimingsInMilliseconds: [], ResultSize: 456, ScoreExplanations: [] }

The only thing that looks of to me are the TimingsInMilliseconds and the LastQueryTime ( since the query has been executed before without client restarts ) ....

Anybody has an idea what could be going wrong here ????

Gluber

unread,
Mar 26, 2015, 6:36:39 PM3/26/15
to rav...@googlegroups.com
One missing piece of information all versions involved are at 3.0.3599 ( RavenHQ servers, local dev server and client library )

Tal Weiss

unread,
Mar 26, 2015, 9:36:05 PM3/26/15
to rav...@googlegroups.com

What is your replication topology? Are you guys using multiple masters?

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gluber

unread,
Mar 26, 2015, 9:56:40 PM3/26/15
to rav...@googlegroups.com
No its the standard RavenHQ topology... One master replicating to one slave
But at the point of query the indices on both servers are non stale and all documents are there on both servers as well

Tal Weiss

unread,
Mar 27, 2015, 1:52:24 AM3/27/15
to rav...@googlegroups.com

I see, this indeed sounds strange. Is it possible that you export the database so I could examine it? If you can't share the content can you provide me with the classes used in the database and the average size of each collection so I could try and replicate it locally. I would also want the index definitions and invocation code of one of the faulty query.
You can also set the slave master topology locally and see if it reproduce.
I'm sorry I can't give you a straight answer but this is something I need to first be able to reproduce.

Tal Weiss

unread,
Mar 29, 2015, 7:01:53 AM3/29/15
to rav...@googlegroups.com
Just wondering what is TimeContrainedByDay, is this an extension method you guys wrote?
If so can you please provide its implementation?
--

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Tal Weiss l Core Team Developer Mobile:+972-54-802-4849

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/ 

t...@ayende.com

unread,
Mar 29, 2015, 7:45:04 AM3/29/15
to rav...@googlegroups.com
It is a validated bug, we will work to fixed it ASAP.
You can track the status of this issue here:

Bernhard Glück

unread,
Mar 29, 2015, 6:35:59 PM3/29/15
to rav...@googlegroups.com
Ups sorry, i did not think about that 

Here is the implementation of TimeConstrianedByDay and its helper methods

      public static IRavenQueryable<T> TimeContrainedByDay<T>(this IRavenQueryable<T> input, DateTime? cutTime = null)
            where T : ITimeContrainedDocument
        {
            return input
                .Where(r => r.ValidTo >= GetLowerBound(cutTime));

        }


        private static DateTime GetLowerBound(DateTime? cutoffTime)
        {
            var realTime    = cutoffTime ?? DateTime.UtcNow;

            var lowerBound  = realTime - TimeSpan.FromDays(1.0);
            var result      = new DateTime(lowerBound.Year, lowerBound.Month, lowerBound.Day, 23, 59, 59);

            return result;
        }

Bernhard Glück

unread,
Mar 29, 2015, 8:33:22 PM3/29/15
to rav...@googlegroups.com
Thanks by the way for your time and effort !

Tal Weiss

unread,
Mar 30, 2015, 9:52:44 AM3/30/15
to rav...@googlegroups.com
i'm assuming i also need to add:
        public interface ITimeContrainedDocument
        {
            DateTime? ValidTo { get; set;}
        }


        public class Leaflet : ITimeContrainedDocument

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Tal Weiss l Core Team Developer Mobile:+972-54-802-4849

Tal Weiss

unread,
Apr 1, 2015, 9:03:57 AM4/1/15
to rav...@googlegroups.com
After fixing the test i'm unable to reproduce this.
I have ran this query many times and it consistently returned the correct amount of results.
Can you guys please send the fiddler log containing two request one that returns 0 and one that returns the correct amount of results?

Gluber

unread,
Apr 1, 2015, 11:12:14 PM4/1/15
to rav...@googlegroups.com

Oren Eini (Ayende Rahien)

unread,
Apr 2, 2015, 5:24:12 AM4/2/15
to ravendb
Am I missing something? They are all being done against the same db against the same server.
Inline image 1

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Tal Weiss

unread,
Apr 2, 2015, 5:43:45 AM4/2/15
to rav...@googlegroups.com
Hi Gluber,
look the the results it seems you have two version of the query, one with ispromoted and one without (the one with is promoted returns empty results.
 @in<RegionId>:(Region/2147483647,Region/4,Region/12,Region/13,Region/16,Region/17,Region/19,Region/20,Region/21) ) AND (IsPromoted:true) AND ( ValidTo:[2015-03-26T23:59:59.0000000 TO NULL])
vs
( @in<RegionId>:(Region/2147483647,Region/4,Region/12,Region/13,Region/16,Region/17,Region/19,Region/20,Region/21) ) AND ( ValidTo:[2015-03-26T23:59:59.0000000 TO NULL])
Tt is fixed but you can't upgrade the RavenHQ server so what you can do is change the order of the Where clauses so the where region in regions will come last.


On Thu, Apr 2, 2015 at 6:12 AM, Gluber <glube...@gmail.com> wrote:

Gluber

unread,
Apr 2, 2015, 8:44:26 PM4/2/15
to rav...@googlegroups.com
1. Oren, this is done against the first server all the time as a step in our debugging process.... since the problem only shows up on the replicated environment, we thought maybe there is a problem with one of the servers, so we fixed it to one server only.
2. Sorry to make it more complex, yes there are multiple queries in there, since maybe the sequence of queries had anything to do with it.. Here is the file with only the two offending queries ( both exactly the same ) one returning 1700 results the other zero. Note that to me it looks like as if the one that returns 0 is answered from the client side caching ... maybe a race condition ?

Here is the link
https://drive.google.com/file/d/0B1fLVs-63PEqRjJwWXc2clZCblU/view?usp=sharing

I am not sure how the mentioned bug applies to us, since we don't have this problem in general in dev, it only shows up in the replicated environment of RavenHQ

Oren Eini (Ayende Rahien)

unread,
Apr 3, 2015, 1:37:48 AM4/3/15
to ravendb
Hm... the two requests are identical. I don't see any reason why they would return different results.

What is more, I don't see any indication of caching here, and the etags are the same.


Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


Gluber

unread,
Apr 8, 2015, 11:27:51 AM4/8/15
to rav...@googlegroups.com
Any news on this ? 

Oren Eini (Ayende Rahien)

unread,
Apr 8, 2015, 11:29:26 AM4/8/15
to ravendb
No,do you have any way that we can try reproducing this locally?
Reply all
Reply to author
Forward
0 new messages