Hi all,
I am new to CQEngine and trying to get good performance for a query with ORDER BY on timestamp field with 1 million objects. Filtering is fast. It's the ORDER BY that's killing.
public class Job {
private long type; // notnull required field
private int priority; // notnull required field
private long timestamp; // notnull required field milliseconds
int status; // notnull required field
// getter setters omitted here -- assume they are available.
// All fields have SimpleAttributes declared as static finals as follows: -- omitted here
// type - TYPE, priority - PRIORITY, timestamp - TIMESTAMP, status - STATUS.
}
IndexedCollection<Job> jobs = new ConcurrentIndexedCollection<Job>();
jobs.addIndex(NavigableIndex.onAttribute(TYPE));
jobs.addIndex(NavigableIndex.onAttribute(PRIORITY));
jobs.addIndex(NavigableIndex.onAttribute(TIMESTAMP));
QueryOptions queryOptions = queryOptions(orderBy(descending(PRIORITY), ascending(TIMESTAMP)));
// for given STATUS s, and a list of TYPEs t, find all matching jobs
Query<Job> query = and(equal(STATUS, s), in(TYPE, t));
// Retrieve the matching jobs ordered by PRIORITY desc and TIMESTAMP asc.
ResultSet<Job> resultSet = jobs.retrieve(query, queryOptions);
// Iterate on the result set to return Top N jobs.
When I run this with 2 million jobs, it takes few seconds to retrieve top 10 jobs.
When I try the same without the queryOptions, then its works in < 100micros
I tested this with various combinations of one or more of the following but does not have any effect if I pass the queryOptions in retrieve method.
1. Setting queryOptions for TIMESTAMP with Order By Ascending
2. Setting queryOptions for PRIORITY with Order By Descending
3. Setting EngineThresholds.INDEX_ORDERING_SELECTIVITY = 1.0, 0.75. 0.5. 0.25 etc.
4. Tried quantizers on the TIMESTAMP index with compression of 1000 (seconds), 60000 (minutes), 600000 (10 minutes)
I also tried storing the timestamp compressed value in the Job's constructor (using integer division) directly as opposed to using a Quantized index as mentioned above. But no luck.
I would appreciate any clues on how to make this faster.
Note that the jobs are inserted into indexed collection more or less in the same order timestamp would be sorted. So is there any way to tell the indexed collection to preserve the insertion order like in a linked hash map? I checked the code and I see that internally its a ConcurrentHashMap. If I implement my own ObjectStore with say a ConcurrentLinkedHashMap implementation, then could I remove the ORDER BY clauses and get better performance? I am going to try this with Guava or Caffeine or some other libraries and see how it works.
Are there any other partitioning strategies that would work in this case?
Thanks & Regard,
Chakra