CQEngine 2.1 released

767 views
Skip to first unread message

Niall

unread,
Aug 23, 2015, 7:42:59 PM8/23/15
to cqengine-discuss
Hi All,

CQEngine 2.1 has now been released! It is now in maven central.

Headline features are:
  • Support for running SQL queries on the collection (see example below).
  • Support for running CQN (CQEngine Native) string-based queries on the collection (queries with the same syntax as programmatic queries, but in string form - see example below).
  • Significant performance improvements for complex queries.
  • Bulk import support for Off-heap and Disk persistence.
  • More fine-grained control over the ordering of objects by attributes where some objects might not have values for the attribute: orderBy(missingFirst(attribute)) and orderBy(missingLast(attribute)).
  • Nearly all indexes (On-heap, Off-heap and Disk) can now accelerate standing queries; StandingQueryIndex, which was on-heap only, is deprecated.
  • The statistics APIs exposed by indexes, now provide additional statistics on the distribution of values in the index, and allow applications to traverse indexes directly (for advanced use cases).
  • The performance of the "index" ordering strategy, useful for time time-series queries is improved; details below.
Example of running an SQL query on a collection (full source here):
public static void main(String[] args) {
   
SQLParser<Car> parser = SQLParser.forPojoWithAttributes(Car.class, createAttributes(Car.class));
   
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
    cars
.addAll(CarFactory.createCollectionOfCars(10));

   
ResultSet<Car> results = parser.retrieve(cars, "SELECT * FROM cars WHERE (" +
                                   
"(manufacturer = 'Ford' OR manufacturer = 'Honda') " +
                                   
"AND price <= 5000.0 " +
                                   
"AND color NOT IN ('GREEN', 'WHITE')) " +
                                   
"ORDER BY manufacturer DESC, price ASC");
   
for (Car car : results) {
       
System.out.println(car); // Prints: Honda Accord, Ford Fusion, Ford Focus
    }
}

Example of running a CQN query on a collection (full source here):
public static void main(String[] args) {
   
CQNParser<Car> parser = CQNParser.forPojoWithAttributes(Car.class, createAttributes(Car.class));
   
IndexedCollection<Car> cars = new ConcurrentIndexedCollection<Car>();
    cars
.addAll(CarFactory.createCollectionOfCars(10));

   
ResultSet<Car> results = parser.retrieve(cars,
                                   
"and(" +
                                       
"or(equal(\"manufacturer\", \"Ford\"), equal(\"manufacturer\", \"Honda\")), " +
                                       
"lessThanOrEqualTo(\"price\", 5000.0), " +
                                       
"not(in(\"color\", GREEN, WHITE))" +
                                   
")");
   
for (Car car : results) {
       
System.out.println(car); // Prints: Ford Focus, Ford Fusion, Honda Accord
    }
}


Overview of the "index" ordering strategy:
  • This "index" ordering strategy causes CQEngine to use an index on an attribute by which results must be ordered, to drive its search. No other indexes will be used.
    • This strategy can be useful when results must be ordered in time series (most recent first, for example), and the objects which match the query will be stored consecutively in the index used for ordering.
    • It also makes sense to use this strategy, when a query matches a large fraction of the collection - because it avoids the need to sort a large fraction of the collection afterwards.
  • The "materialize" ordering strategy allows CQEngine to use other indexes to locate objects matching the query, and then to sort those results afterwards.
    • This strategy is useful for general queries, where ultimately the objects to be returned will not necessarily be stored consecutively in any particular index used for ordering.
    • It also makes sense to retrieve results and sort them afterwards, when a small number of results would need to be sorted, and when other indexes can narrow the candidate set of objects more effectively than the index used for ordering.
  • Therefore in CQEngine 2.1, the application can enable the index ordering strategy by setting a threshold value (instead of a flag) via query options: applyThresholds(threshold(INDEX_ORDERING_SELECTIVITY, 1.0))
    • Threshold 1.0 tells CQEngine to always use the index ordering strategy, if the required indexes are available.
    • Threshold 0.0 (the default for now) tells CQEngine to never use the "index" ordering strategy, and to always use the regular "materialize" strategy instead.
  • Setting a threshold between 0.0 and 1.0, such as 0.5, causes CQEngine to choose between the "index" strategy and the "materialize" strategy automatically, based on the selectivity of the query.
    • The selectivity of the query is a measure of how "selective" (or "specific") the query is, or in other words how big the fraction of the collection it matches is.
      • A query with high selectivity (approaching 1.0) is specific: it matches a small fraction of the collection.
      • A query with a low selectivity (approaching 0.0) is vague: it matches a large fraction of the collection.
    • If a threshold between 0.0 and 1.0 is specified, then CQEngine will compute the selectivity of the query automatically.
    • It will then automatically use the "index" strategy if the selectivity is below the given threshold, and the "materialize" strategy if the selectivity is above the given threshold.
    • However, actually computing the selectivity of the query, itself introduces computation overhead.
    • Performance can sometimes be better, by forcing use of a particular strategy for certain types of query, than to incur the overhead to try to compute the best strategy on-the-fly.
CQEngine 2.1 has some minor API changes:
  • The method by which the "index" ordering strategy introduced in 2.0 can be enabled, has been changed.
    • Previously, the application enabled the strategy by setting a flag in query options: orderingStrategy(INDEX)
    • Now, the application enables the strategy by setting a threshold value instead: applyThresholds(threshold(INDEX_ORDERING_SELECTIVITY, 1.0))
    • This will require (minor) code changes in any applications which explicitly enabled this strategy previously.
    • No code changes will be required in applications which did not request index ordering explicitly.

Feel free to post any questions or problems in the forum!

Best regards,
Niall

PSI A

unread,
Aug 27, 2015, 4:38:37 PM8/27/15
to cqengine-discuss
Great work Niall !!

Quick question, can you give me more samples on SQL queries involving Date? example 'shipDate < today's date'

Thanks,
Anil

PSI A

unread,
Aug 27, 2015, 4:59:45 PM8/27/15
to cqengine-discuss
In addition, does it support something like : shipDate > OrderDate + 4

Niall Gallagher

unread,
Aug 27, 2015, 6:26:14 PM8/27/15
to cqengine...@googlegroups.com
Hi Anil,

Yes, sort of. You could basically register a virtual attribute called ’shippingDelay’ as follows:

parser.registerAttribute(new SimpleAttribute<Order, Long>("shippingDelay") {
@Override
public Long getValue(Order order, QueryOptions queryOptions) {
return (order.shipDate.getTime() - order.orderDate.getTime()) / 86400000;
}
});

And then use it in SQL or CQN queries as follows:

    ResultSet<Car> results = parser.retrieve(cars, "SELECT * FROM orders WHERE (shippingDelay > 4)");



For queries relative to the current time in such as 'shipDate < today's date', you could register the DateMathParser as the value parser for type Date, with the SQL parser:

parser.registerValueParser(Date.class, new DateMathParser());


And then use it in SQL queries as follows:

    ResultSet<Car> results = parser.retrieve(cars, "SELECT * FROM orders WHERE (shipDate < +0DAY)");


Basically, you can register a custom value parser for any type of value. In the case of ‘shipDate’ here, the SQL or CQN parsers, will look for an attribute called ’shipDate’ which has been registered with the parser, it will determine that the type of the shipDate attribute is Date, and will then look for a value parser which can parse the string ‘+0DAY’ into a Date object. So it finds DateMathParser, and invokes that to parse the string.

DateMathParser is just one parser. You could register your own SuperFlexibleDateParser if you wanted to support say several allowed date formats, and possibly date math statements as well, in the same query.

HTH!
Niall


--
-- You received this message because you are subscribed to the "cqengine-discuss" group.
http://groups.google.com/group/cqengine-discuss
---
You received this message because you are subscribed to the Google Groups "cqengine-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cqengine-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

PSI A

unread,
Aug 27, 2015, 7:07:57 PM8/27/15
to cqengine-discuss
HI Niall,

Thanks for quick reply,

I have another question about comparing field-to-field. Do you have any strategy or plan for support for this type of compare instead of adding 'virtual function'? If this scenario is possible, let me know what areas needs to be changed in cqengine.

We need to have field-to-field compare as our product uses lots of custom attributes which are defined at deployment time, and user of application can construct any type of field-to-field comparison. With current cqengine design, there would be lots of permutations possible for compare. 

Let me know,

Thanks,
Anil

Niall Gallagher

unread,
Aug 28, 2015, 7:23:59 AM8/28/15
to cqengine...@googlegroups.com
Hi Anil,

Good question, and good idea. 

To be honest I didn't have a need for field-to-field comparison until now, but I can see how it would be useful.

I haven't given it enough thought yet. But the idea is now queued in my subconscious and typically that means it will irritate me until I find a solution for it!
So I will try to add support for that in future.

I guess it could be implemented as a core feature in CQEngine's native (programmatic) query trees (and then accessible to SQL and CQN queries as well), or it could be implemented in the SQL or CQN parser land only.

My first thoughts are it should be a core feature. I'd envisage a parallel set of classes which implement the Query interface, and might be called AttributesEqual, AttributesLessThan and so on, and these would accept as parameters literal values, OR references to other attributes. It might be useful to keep these as a parallel set of classes initially, so as not to change the performance of existing queries which don't need it. The SQL and CQN parsers could then transparently create the new attribute-reference based Query objects, when they see string queries refer to other attributes, instead of providing literal values.

If you have any ideas or code to help here, it would be welcome. But I will start thinking as a feature for the future anyway.

Thanks!
Niall

PSI A

unread,
Aug 28, 2015, 11:39:15 AM8/28/15
to cqengine-discuss
Hi Niall,

Is it possible for you to prepare design document to support field-to-field comparison so that I can talk to my management to assign one developer working on it to get the feature in? How much work would you estimate?

Niall

unread,
Sep 1, 2015, 9:31:00 PM9/1/15
to cqengine-discuss
Hi Anil,

Sorry for the delay. Thinking about a design doc itself requires free time!

AFAICT it would be reasonably straightforward. 
Most of the work, will involve adding one class for each of the types of query for which you require field-to-field comparison support. However the developer will need to write unit tests for each of the new classes as well. I try to have at least 90% test coverage in general for CQEngine, but in particular we don't want any logic bugs to get into query classes which might cause incorrect results to be returned. 

You can find a list of all of the built-in queries here. I guess, that you would only be interested in adding field-to-field comparison support for basic query types such as LessThan, GreaterThan, Equal and Between? So this would involve writing 4 new classes, plus unit tests for each class as well.

I'd divide the task into two steps: (1) adding programmatic support for field-to-field queries, and (2) adding SQL and CQN support.

If you could implement part (1) which is the main part, then I think I could easily do part (2) myself.

I'd estimate between 2 to 5 man days for part 1. Part 2 is probably 1 hour's work.

==Part 1 design spec==
Add four new Query classes which support field-to-field comparison: AttributeLessThan, AttributeGreaterThan, AttributeEqual, AttributeBetween. 

Process:
  1. Learn how queries work:
    1. Take a look at the Query interface, note that there is only one method: boolean matches()
    2. Take a look at the abstract class SimpleQuery, which implements Query, and just take note that its implementation of matches(), simply calls two abstract methods matchesSimpleAttribute or matchesNonSimpleAttribute, depending on whether the query is on a SimpleAttribute, or a MultiValueAttribute or similar.
    3. Take a look at class LessThan, which is a concrete implementation of SimpleQuery. 
      1. It is the object which QueryFactory instantiates when you construct a query programmatically using lessThan() or lessThanOrEqualTo().
      2. Note that 3 arguments are provided to its constructor: an attribute, a value provided in the query, and a boolean flag which will be false if the query was constructed as lessThan(), and true if it was constructed as lessThanOrEqualTo(). 
      3. Take a look at matchesSimpleAttribute in this class, and note that it simply invokes the attribute which was provided to the constructor and supplies it with a candidate object to be tested, which returns to it the value of that attribute in the candidate object. The matchesSimpleAttribute method then proceeds to compare the value provided in the query, with the value returned for the candidate object. If the value from the candidate object is indeed lessThan() or lessThanOrEqualTo(), the method accordingly returns true, otherwise false.
      4. Note that matchesNonSimpleAttribute operates in a similar way. The difference is, the attribute may return more than one value for a candidate object, and the method will then return true if any of those values match the query. 
        1. For example if a Car had a list of features ["sunroof", "radio"] and a query requested Cars which had radios, then that Car matches the query because one of its values for the features attribute matches the query.
  2. Now, start to implement your AttributeLessThan query
    1. It will be almost identical to the LessThan query above, except that instead of its constructor taking three arguments: [an attribute, a value provided in the query, and a boolean flag], it will take these three arguments instead: [an attribute, a referenced attribute, and a boolean flag]. That is, instead of a value being provided in the query, a second attribute will be provided instead.
    2. You will need to implement matchesSimpleAttribute so that it is similar as in the LessThan query, but that it obtains the value to be used in the comparison, from the referenced attribute by supplying this attribute with the candidate object to be tested as well. At that point you will have a value from the first attribute, and the value from the referenced attribute. You need to compare these values and return the result using the same logic as in LessThan.
    3. You will need to implement matchesNonSimpleAttribute in a similar way.
  3. Now use the same approach to implement AttributeGreaterThan, AttributeEqual, AttributeBetween
  4. Finally, add static factory methods to the QueryFactory class. This will make the new types of query first-class citizens amongst the built-in queries.
General notes:
  • You might find that the new classes could reuse a lot of the code in the existing query classes. If so feel free to move existing code into static helper methods in the original classes, and call the helper methods from the new classes.
  • You'll need to decide if you want to support field-to-field comparison in situations where both fields could be multi-valued. CQEngine supports the case where the first field is multi-valued already.
    • If the referenced attribute is not a SimpleAttribute, then it may return multiple values from a given candidate object as well.
    • So in the case that both the main attribute, and the referenced attribute were MultiValueAttributes, then you'd have two sets of values to compare with each other, and you would need to return true if any combination of values from those sets matched the query.
    • I won't object if you prefer to restrict your support for field-to-field comparison, to simple attributes only (but it would be awesome if you could support multiple values!)
Thanks!
Niall

PSI A

unread,
Sep 4, 2015, 6:46:11 PM9/4/15
to cqengine-discuss
Thanks Niall for input.

shashi kant

unread,
Feb 10, 2016, 5:57:48 AM2/10/16
to cqengine-discuss
Hi Niall,

I need to query on a specific date and/or time. Please let me know if it is possible to do something like "select * from XXX where someDateField = '26 Jan 2015' or 'select * from XXX where someDateField > '26 Jan 2015 08:45:25' '. 
Reply all
Reply to author
Forward
0 new messages