Optimal way for indexing Objects with Map values

716 views
Skip to first unread message

Hans

unread,
Jun 13, 2014, 8:28:59 AM6/13/14
to cqengine...@googlegroups.com
Hi,

i have objects which do not only have plain attributes like 
int type
int scope

etc

but also most interesting values are in a parameter map e.g.

Map<String, String> params;

these params are than something like PARAM_FIRSTNAME with value Peter for example.

Not every object must have all param key values.

I have about 20k - 100k of these - what is the best approach to add indexing and searching for these params ?

Thanks

Niall

unread,
Jun 15, 2014, 5:10:34 PM6/15/14
to cqengine...@googlegroups.com
Hi Hans,

I would usually handle this by defining virtual attributes which read from the map and act as functions over its data, which you can then reference in your queries.Then you can optionally also build indexes on those virtual attributes. 

Something like this:
package com.googlecode.cqengine.testutil;

import com.googlecode.cqengine.*;
import com.googlecode.cqengine.attribute.*;
import com.googlecode.cqengine.resultset.ResultSet;
import java.util.Collections;
import java.util.Map;
import static com.googlecode.cqengine.query.QueryFactory.equal;

public class Person {

    static final Attribute<Person, String> FIRSTNAME = new SimpleNullableAttribute<Person, String>() {
        @Override
        public String getValue(Person person) {
            return person.properties.get("PARAM_FIRSTNAME");
        }
    };

    final Map<String, String> properties = Collections.emptyMap(); // Might contain PARAM_FIRSTNAME -> Peter

    public static void main(String[] args) {
        IndexedCollection<Person> people = CQEngine.newInstance();
        ResultSet<Person> peopleWithFirstNamePeter = people.retrieve(equal(FIRSTNAME, "Peter"));
    }
}

If the content of these maps might change a lot, the other option would be to put the map data in a separate indexed collection, and do a join at runtime.

HTH,
Niall

Hans

unread,
Jun 16, 2014, 4:19:55 AM6/16/14
to cqengine...@googlegroups.com
Thanks for your answer and suggestion.

However this would be troublesome because the parameters are free to be chosen and i can not possible know or want to create a function for every possible parameter.
This is user data and can be set in any way.

Isnt there any other way ?

The SQL equivalent is a separate table for key/value pairs for a person ID where i could simply use

select personID from person_params where param_key = "FirstName" and param_value = "Steve";

Niall

unread,
Jun 16, 2014, 5:43:41 AM6/16/14
to cqengine...@googlegroups.com
Hi Hans,

I see. You can do that with CQEngine the same way as you would do in SQL. You can look up or create attributes dynamically as follows:

package com.googlecode.cqengine.testutil;

import com.googlecode.cqengine.*;
import com.googlecode.cqengine.attribute.*;
import com.googlecode.cqengine.resultset.ResultSet;
import java.util.Collections;
import java.util.Map;
import static com.googlecode.cqengine.query.QueryFactory.equal;

public class Person {


   
public static Attribute<Person, String> dynamicAttribute(final String param) {
       
return new SimpleNullableAttribute<Person, String>() {

           
@Override
           
public String getValue(Person person) {

               
return person.properties.get(param);

           
}
       
};
   
}

   
final Map<String, String> properties = Collections.emptyMap(); // Might contain PARAM_FIRSTNAME -> Peter

   
public static void main(String[] args) {
       
IndexedCollection<Person> people = CQEngine.newInstance();

       
ResultSet<Person> peopleWithFirstNamePeter = people.retrieve(equal(dynamicAttribute("PARAM_FIRSTNAME"), "Peter"));
   
}
}

HTH,
Niall
Message has been deleted

Hans

unread,
Jun 16, 2014, 6:11:28 AM6/16/14
to cqengine...@googlegroups.com
Thanks.

But why isnt this added to the index like

cars.addIndex(HashIndex.onAttribute(Car.FEATURES));

in the example?

Does this mean this lookup is not indexed at all?

Niall Gallagher

unread,
Jun 16, 2014, 6:58:15 AM6/16/14
to cqengine...@googlegroups.com

If you know the full set of possible param names, then iterate them and add an index for each of the dynamic attributes.

Sent from my Android

--
-- You received this message because you are subscribed to the "cqengine-discuss" group.
http://groups.google.com/group/cqengine-discuss
---
You received this message because you are subscribed to the Google Groups "cqengine-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cqengine-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Hans

unread,
Jun 16, 2014, 7:55:51 AM6/16/14
to cqengine...@googlegroups.com
Ah thank you, so its basically

addIndex(NavigableIndex.onAttribute(Person.PARAMETER("FIRST_NAME"))) ?

Thanks


On Friday, June 13, 2014 2:28:59 PM UTC+2, Hans wrote:

Niall

unread,
Jun 16, 2014, 8:21:38 AM6/16/14
to cqengine...@googlegroups.com
Yes that will add an index on the dynamic attribute for FIRSTNAME.

You can do the same for other dynamic attributes too if you wish.

Hans

unread,
Jun 17, 2014, 6:40:51 AM6/17/14
to cqengine...@googlegroups.com
I think i found a bug or i dont understand the indexer.

When i add 2 Params to the Indexer - the search will no longer work.

Just using the given example PERSON.PARAMETER here

entities.addIndex(NavigableIndex.onAttribute(Person.PARAMETER(Person.FIRST_NAME)));
entities.addIndex(NavigableIndex.onAttribute(Person.PARAMETER(Person.LAST_NAME)));

no search for these parameters will work (however, searching for primiti INT like person getAge() etc works)

When i remove the second parameter index, i can search for 

entities.retrieve(equal(Person.PARAMETER(Person.FIRST_NAME), "Hans"))

again and returns the expected results - otherwise with simply adding one more parameter index to the table it never returns anything.

Hans

unread,
Jun 17, 2014, 7:09:46 AM6/17/14
to cqengine...@googlegroups.com
After some further test - i found that yes the search does not work but it still works for 1 indexed param.

If i index 2 params - one search for param 1 will never return any results and one search for param 2 will return the expected results.

Niall Gallagher

unread,
Jun 17, 2014, 7:12:53 AM6/17/14
to cqengine...@googlegroups.com
Hi Hans,

It's probably something related to the equals() or hashCode() method in the attribute. It's possibly not considering the param name. You might need to implement equals & hashCode in the attribute -- see if that helps.


On 17 June 2014 12:09, Hans <hairleftmo...@gmail.com> wrote:
After some further test - i found that yes the search does not work but it still works for 1 indexed param.

If i index 2 params - one search for param 1 will never return any results and one search for param 2 will return the expected results.

--

Hans

unread,
Jun 17, 2014, 8:23:31 AM6/17/14
to cqengine...@googlegroups.com
Thats a nice pitfall, would be good if you could supply a key or something like that in the constructor for the Index type.

This is my workaround:

import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;

import com.googlecode.cqengine.attribute.SimpleNullableAttribute;

// workaround for default SimpleNullableAttribute which has a collision if
// you add multiple indexes of string, hence we need to overwrite equals and hashcode
public abstract class MySimpleNullableAttribute<O, A> extends SimpleNullableAttribute<O, A> {

private String paramName = null;
public MySimpleNullableAttribute(String paramName) {
this.paramName = paramName;
}
public String getParamName() {
return paramName;
}
public abstract A getValue(O object);

public int hashCode() {
         return new HashCodeBuilder(17, 31). // two randomly chosen prime numbers
             append(paramName).
             toHashCode();
     }

     public boolean equals(Object obj) {
        if (!(obj instanceof SimpleNullableAttribute))
             return false;
         if (obj == this)
             return true;

         MySimpleNullableAttribute<?, ?> rhs = (MySimpleNullableAttribute<?, ?>) obj;
         return new EqualsBuilder().
             // if deriving: appendSuper(super.equals(obj)).
             append(paramName, rhs.getParamName()).             
             isEquals();
     }
}



Niall

unread,
Jun 17, 2014, 9:14:41 AM6/17/14
to cqengine...@googlegroups.com
Hi Hans,

Yes I agree it could be improved. The problem is in the definition of the attribute, not in the index. I'll try to make it easier to define parameterized dynamic attributes in future.

You should also account for super.equals()/hashCode() in your subclass. So my earlier suggestion would look like this:

Enter code here..package com.googlecode.cqengine.testutil;


import com.googlecode.cqengine.*;
import com.googlecode.cqengine.attribute.*;
import com.googlecode.cqengine.resultset.ResultSet;
import java.util.Collections;
import java.util.Map;
import static com.googlecode.cqengine.query.QueryFactory.equal;

public class Person {


   
final Map<String, String> properties = Collections.emptyMap(); // Might contain PARAM_FIRSTNAME -> Peter

   
// Attributes....


   
public static Attribute<Person, String> dynamicAttribute(final String param) {

       
return new ParameterizedAttribute(param);
   
}

   
public static class ParameterizedAttribute extends SimpleNullableAttribute<Person, String> {

       
final String parameter;

       
public ParameterizedAttribute(String attributeName, String parameter) {
           
super(attributeName);
           
this.parameter = parameter;
       
}

       
public ParameterizedAttribute(String parameter) {
           
this.parameter = parameter;

       
}

       
@Override
       
public String getValue(Person person) {

           
return person.properties.get(parameter);
       
}

       
@Override
       
public boolean equals(Object o) {
           
if (this == o) return true;
           
if (!(o instanceof ParameterizedAttribute)) return false;
           
if (!super.equals(o)) return false;
           
ParameterizedAttribute that = (ParameterizedAttribute)o;
           
return parameter.equals(that.parameter);
       
}

       
@Override
       
public int hashCode() {
           
int result = super.hashCode();
            result
= 31 * result + parameter.hashCode();
           
return result;
       
}
   
}

   
// Usage...

   
public static void main(String[] args) {
       
IndexedCollection<Person> people = CQEngine.newInstance();
       
ResultSet<Person> peopleWithFirstNamePeter = people.retrieve(equal(dynamicAttribute("PARAM_FIRSTNAME"), "Peter"));
   
}
}.

Thanks,
Niall

Hans

unread,
Jun 18, 2014, 6:40:35 AM6/18/14
to cqengine...@googlegroups.com
>><you should also account for super.equals()/hashCode() in your subclass. So my earlier suggestion would look like this:

Can you elaborate why this is needed ? My Implementation is from this popular stackoverflow post http://stackoverflow.com/questions/27581/overriding-equals-and-hashcode-in-java

I dont see a need to add super 

Niall Gallagher

unread,
Jun 18, 2014, 7:33:41 AM6/18/14
to cqengine...@googlegroups.com
Hi Hans,

It reduces the likelihood of two dynamic attributes evaluating equal when fields in the subclass are equal but fields in the superclass are not.

The superclass of most attribute implementations is AbstractAttribute, and this includes fields its calculation of equals() and hashCode() which are not present in the subclass: the attribute type, object type and attribute name. So the subclass should basically add to the superclass' equals()/hashCode() implementation, instead of replacing it entirely.

Niall




--

Hans

unread,
Jun 18, 2014, 8:07:27 AM6/18/14
to cqengine...@googlegroups.com
Hi,

thank you for your explanation.

I still found an issue with the index.

As soon as i add a specific parameter to the index, it will look up all other values again from the objects getter , e.g.

  entities.addIndex(NavigableIndex.onAttribute(Person.TYPE)); // int
     
  entities.addIndex(NavigableIndex.onAttribute(Person.PARAMETER(Person .FIRST_NAME));

// search     

 for (Person entity : entities.retrieve(and(equal(Person .TYPE, 1),equal(Person .PARAMETER(Person .FIRST_NAME), "Peter")))) {
            System.out.println(entity);
        }

In this case person.getType() is always looked up again for ALL person objects i added an output line to type

 public static final Attribute<Person , Integer> TYPE = new SimpleAttribute<Person , Integer>("type") {
       public Integer getValue(Person entity) {System.out.println("getTYPE Called"); return entity.getType(); }
   };

As soon as i remove the index for the parameter - getType() is no longer called.

If i understand this correctly, when i add objects to the index the indexed properties should be called ONCE for each object upon indexing and never again during search.

This only works for me without parameters added (and funny enough, even tho primitive types are looked up again the parameter in question (FirstName) is not)

Niall Gallagher

unread,
Jun 18, 2014, 8:24:24 AM6/18/14
to cqengine...@googlegroups.com

Hi Has,

getType() belongs to the attribute not to the objects in the collection. I expect that it might be called one or possibly a few times during query processing as CQEngine does that to look up indexes.

Sent from my Android

--

Hans

unread,
Jun 18, 2014, 8:48:20 AM6/18/14
to cqengine...@googlegroups.com
But the attribute function will call entity.getType() each time, thus calling the objects getType.

This is done for every object in the index.

Why would it need to be called more than once if the index creates his tables based on that value.

Otherwise it would be the same as if no index exist - slow lookup because the search has to iterate every object.

In my above example - getType is called for every object even tho the search should limited it to exactly 1 object.

Niall Gallagher

unread,
Jun 18, 2014, 9:13:38 AM6/18/14
to cqengine...@googlegroups.com
Attributes are used for multiple things.

The main purpose, is that attributes encapsulate the logic to read field(s) from the type of object stored in the collection. Note "read field(s) from the type of object" - a single attribute can read the value of a field from any instance of the type of object it is programmed to access. There is not a one-to-one relationship between an attribute and each object in the collection. One attribute can read from all objects in the collection. So when attribute.getObjectType() or attribute.getAttributeType() is called, it does not read this from objects in the collection - it is actually stored in the superclass, AbstractAttribute.

Attributes are also used internally in the query engine as keys in a map where the values of the map are the indexes on that attribute. For each attribute referenced in a query, attribute.equals() and attribute.hashCode() might therefore be called multiple times to locate the relevant indexes.

I hope I've clarified it. Basically you don't need to worry about these internal details. The query engine intentionally avoids scanning objects in the collection when indexes are available.


--
Reply all
Reply to author
Forward
0 new messages