NSPredicate IN Operator, Limit and Speed

716 views
Skip to first unread message

Kunal Sood

unread,
Sep 28, 2014, 6:09:42 AM9/28/14
to realm...@googlegroups.com
I have an NSArray (arrayOfUnreadArticleIdentifiers) with 8249 strings, which are NSString (articleIdentifier) properties on my Article object, which is an RLMObject subclass. I need to fetch all articles with articleIdentifier that is in arrayOfUnreadArticleIdentifiers. So, my first thought was to do the following :-

[Article objectsInRealm:realm withPredicate:[NSPredicate predicateWithFormat:@"articleIdentifier IN %@", arrayOfUnreadArticleIdentifiers]];


However that throws an invalid predicate/invalid query exception.

After a bit of trial & error, I find that there is a limit to the number of strings I can have in arrayOfUnreadArticleIdentifiers (2507 in my case), so, doing the following makes my predicate/query valid again :-

[Article objectsInRealm:realm withPredicate:[NSPredicate predicateWithFormat:@"articleIdentifier IN %@", [arrayOfUnreadArticleIdentifiers objectsAtIndexes:[NSIndexSet indexSetWithIndexesInRange:NSMakeRange(0, 2507)]]]];


My first question/request: I would love some details about the limits of the IN Operator in Realm.

Another observation I have made while retrofitting my existing Core Data app with Realm..

The following method, with my current dataset, takes approximately 80 seconds to complete the query when articleIdentifier is not indexed, and  approximately 78 seconds when it is indexed :-

[Article objectsInRealm:realm withPredicate:[NSPredicate predicateWithFormat:@"articleIdentifier IN %@", [arrayOfUnreadArticleIdentifiers objectsAtIndexes:[NSIndexSet indexSetWithIndexesInRange:NSMakeRange(0, 2507)]]]];


While..
I have a RLMArray category, which has a method which looks like this :-

-(NSSet *)setOfAllObject
{
   
NSMutableSet *set = [NSMutableSet new];
   
for (id object in self)
   
{
       
[set addObject:object];
   
}
   
return [set copy];
}


And, if I use the following method and do the filtering in-memory, it takes approximately 13 seconds, with the same dataset (Note: I don't limit the array I pass in the IN predicate) :-

[[[Article allObjectsInRealm:realm] setOfAllObject] filteredSetUsingPredicate:[NSPredicate predicateWithFormat:@"articleIdentifier IN %@", arrayOfUnreadArticleIdentifiers]];


So, I guess my second question is: Is this something I could be doing wrong where in-memory filtering is significantly faster?

Brian Munkholm

unread,
Sep 28, 2014, 7:00:02 PM9/28/14
to Kunal Sood, realm...@googlegroups.com
Hi Kunai!

Using IN with so many values is not a good solution (for any database).
It is equivalent to roughly this (pseudo code):
for object in Realm:
    for id in arrayOfUnreadArticleIdentifier:
        if object.id == id

I don't know how many objects you got in your Realm, but you can multiply that with 8249 to calculate the number of comparisons that has to be made.

I think the best solution is to address your data model first.
Where do the NSArray (arrayOfUnreadArticleIdentifiers) come from? Couldn't you just update your data model to include the information? So if you add a property "unread" in realm, the query would be very fast compared to above, since it be basically remove the inner loop. Would that be feasable?
 
Cheers?
//Brian

--
You received this message because you are subscribed to the Google Groups "Realm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to realm-cocoa...@googlegroups.com.
To post to this group, send email to realm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/realm-cocoa/c5698d08-ed9c-40c8-9e70-fe75fd1b03c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kunal Sood

unread,
Sep 29, 2014, 1:51:15 AM9/29/14
to realm...@googlegroups.com, kunals...@gmail.com
Hi Brian,

Thanks a lot for your response!

8249 is an extreme case, yes, but, it is one that needs to be handled in my app. My articles already have an "isUnread" BOOL property. The NSArray (arrayOfUnreadArticleIdentifiers) comes from a Rest API (Sync server), they are identifiers of articles that need to be marked as unread if they aren't already. So, I first filter out all articles that're already unread, and then run the query with the IN operator (either by using chained queries or by using 'AND' in my predicate). What do you think would be the most efficient way to handle this?

Thanks again for your response, and help.

Best,

Kunal

Ari Lazier

unread,
Sep 29, 2014, 12:17:43 PM9/29/14
to realm...@googlegroups.com, kunals...@gmail.com
Have you tried creating an index in Realm on the 'articleIdentifier' property? I would think this should improve performance dramatically. You can see how to add an index here http://realm.io/docs/cocoa/0.85.0/api/Classes/RLMObject.html#//api/name/attributesForProperty: (this only works for string properties at the moment).

Ari Lazier

unread,
Sep 29, 2014, 12:19:09 PM9/29/14
to realm...@googlegroups.com, kunals...@gmail.com
Clicking on that link doesn't work for some reason as it ignores the ':' at the end. What I was trying to point you at was the sample code using 'attributesForProperty:'.

Kunal Sood

unread,
Sep 29, 2014, 1:02:14 PM9/29/14
to realm...@googlegroups.com, kunals...@gmail.com
As a matter of fact, I did try indexing the 'articleIdentifier' (which is indeed a string property). Performance on that query did improve, but, I wouldn't call the improvement dramatic, query time came down from 80 seconds when the 'articleIdentifier' property is not indexed to 78 seconds when it is indexed.

I'm currently relying on doing the filtering in-memory as that seems to be the fastest, and I'm not seeing any spikes in memory usage while it is happening (I'm guessing that's because how RLMObject properties are loaded lazily).

Brian Munkholm

unread,
Sep 29, 2014, 1:33:00 PM9/29/14
to Kunal Sood, realm...@googlegroups.com
Did you also try:
for id in arrayOfUnreadArticleIdentifier:
     lookup id in realm, and set id.unRead=true

Btw, I also think that it will actually be possible significantly speed up the IN operator if we make support for it in the core. Currently it's emulated by replacing it with a lot of or()'s to the underlying core database.





On Mon, Sep 29, 2014 at 7:02 PM, Kunal Sood <kunals...@gmail.com> wrote:
As a matter of fact, I did try indexing the 'articleIdentifier' (which is indeed a string property). Performance on that query did improve, but, I wouldn't call the improvement dramatic, query time came down from 80 seconds when the 'articleIdentifier' property is not indexed to 78 seconds when it is indexed.

I'm currently relying on doing the filtering in-memory as that seems to be the fastest, and I'm not seeing any spikes in memory usage while it is happening (I'm guessing that's because how RLMObject properties are loaded lazily).

--
You received this message because you are subscribed to the Google Groups "Realm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to realm-cocoa...@googlegroups.com.
To post to this group, send email to realm...@googlegroups.com.

Kunal Sood

unread,
Sep 30, 2014, 3:13:26 AM9/30/14
to realm...@googlegroups.com, kunals...@gmail.com
I hadn't tried it before, but, tried it now, it turns out, it's a lot faster than most other techniques, but, still about 2 seconds slower than the in-memory way.

Kunal Sood

unread,
Sep 30, 2014, 8:11:11 AM9/30/14
to realm...@googlegroups.com, kunals...@gmail.com
If it counts for anything, I vote for making support for the IN operator in the core, if that will speed things up significantly. There are times where I might also need to use predicates like: [NSPredicate predicateWithFormat:@"NOT (articleIdentifier IN %@)", arrayOfUnreadArticleIdentifiers], which is even slower with the currently possible techniques.

Thanks again for all your help, guys!

Brian Munkholm

unread,
Sep 30, 2014, 8:35:45 AM9/30/14
to Kunal Sood, realm...@googlegroups.com
Thanks for YOUR help in pointing this out!
We actually already identified an obvious optimization for the current concatenation of a lot of conditions - originally we did not expect anyone to make 4000 or()'s :-)
This optimization currently improves 2000 multiple or() conditions (that's used in IN) with about a factor of 250. It's not released yet though. Would be awesome to get your feedback on that ones released.


On Tue, Sep 30, 2014 at 2:11 PM, Kunal Sood <kunals...@gmail.com> wrote:
If it counts for anything, I vote for making support for the IN operator in the core, if that will speed things up significantly. There are times where I might also need to use predicates like: [NSPredicate predicateWithFormat:@"NOT (articleIdentifier IN %@)", arrayOfUnreadArticleIdentifiers], which is even slower with the currently possible techniques.

Thanks again for all your help, guys!

--
You received this message because you are subscribed to the Google Groups "Realm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to realm-cocoa...@googlegroups.com.
To post to this group, send email to realm...@googlegroups.com.

Kunal Sood

unread,
Oct 1, 2014, 2:30:36 AM10/1/14
to realm...@googlegroups.com, kunals...@gmail.com
Sounds awesome! I'll be on the lookout for the update that adds this optimization.

Kunal Sood

unread,
Oct 5, 2014, 1:57:29 AM10/5/14
to realm...@googlegroups.com, kunals...@gmail.com
Just an update after trying 0.86.1: Query time is down to a second from 78 seconds, which is super!

One thing I'm still somewhat unclear about is the size of the collection I can pass in an IN predicate. Sometimes it throws an exception with an array with 2506 objects, other times at 2501.

I would love some documentation on this limit, so that I can decide how to work around it.

Thanks for all your work. Realm is awesome!

t...@realm.io

unread,
Oct 5, 2014, 10:05:19 PM10/5/14
to realm...@googlegroups.com, kunals...@gmail.com
The limiting factor is that there's a place where the query engine is using recursion to loop over the clauses of the query, so with too many items it overflows the stack. As a result, the exact limit depends on the platform and the depth of the call stack at the point where you call the query. I'm looking into eliminate the recursion to remove the limit, but it won't be a trivial change.

Kunal Sood

unread,
Oct 6, 2014, 7:54:04 AM10/6/14
to realm...@googlegroups.com, kunals...@gmail.com
That makes sense. Appreciate the explanation. I'll just continue using my existing workarounds.

t...@realm.io

unread,
Oct 14, 2014, 1:51:01 PM10/14/14
to realm...@googlegroups.com, kunals...@gmail.com
I've now committed a fix to the core db for the stack overflow, so you shouldn't see any more crashes from large IN queries in the next version of Realm.

Kunal Sood

unread,
Oct 15, 2014, 4:34:27 AM10/15/14
to realm...@googlegroups.com, kunals...@gmail.com
Thanks for letting me know. Can't wait to try it out.
Reply all
Reply to author
Forward
0 new messages