Major Bug ? CQEngine returns ghosts results after removing modified object from index

1,308 views
Skip to first unread message

Hans

unread,
Jul 15, 2014, 4:42:22 AM7/15/14
to cqengine...@googlegroups.com
I have been debugging a weird issue were i set a boolean property to either true or false and the search afterwards would return incorrect results over time.
When i update an objects property i remove it from the index and readd it to the index afterward. The remove method from the cqengine set would always result in true, also the index size would
tell me the correct number of objects should be indexed. This resulted in the object being found with one equal(boolproperty = true) and equal(boolproperty=false) - the same objects lifes in 2 states.

However this is not the case, cqengine will still return the removed object on search queries!

The test program will create 10 entires with ID 0-9, then i add a test object with ID 10. All expected behaviour during search.
Then i modify the object BEFORE i remove it - now things get weird, also the index.remove() method proclaims TRUE e.g. it has removed the object and the index.size also tells me this
the search will still return the removed object.

This behavior stops if you outcomment the testCar.setTestVal(true); line

It seems when you remove the object from your internal list you honor the objects equal/hashcode method, but you do not honor it when removing it from your index sets for each attribute (as you explained me before you do a compare here because its faster) - however - since the object has been modified, you cannot find it anymore (just a wild guess)


/**
 * Copyright 2012 Niall Gallagher
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import com.googlecode.cqengine.CQEngine;
import com.googlecode.cqengine.IndexedCollection;
import com.googlecode.cqengine.index.navigable.NavigableIndex;
import com.googlecode.cqengine.index.radixreversed.ReversedRadixTreeIndex;
import com.googlecode.cqengine.index.suffix.SuffixTreeIndex;
import com.googlecode.cqengine.query.Query;
import com.googlecode.cqengine.query.option.DeduplicationStrategy;
import com.googlecode.cqengine.query.option.QueryOption;

import java.util.LinkedHashSet;
import java.util.Set;
import java.util.concurrent.TimeUnit;

import static com.googlecode.cqengine.query.QueryFactory.*;
/**
 * An introductory example which demonstrates usage using a Car analogy.
 *
 * @author Niall Gallagher
 */
public class CarBenchmark {
public static Set<CarBenchObj> createCollectionOfCars(int numCars) {
        Set<CarBenchObj> cars = new LinkedHashSet<CarBenchObj>(numCars);
        for (int carId = 0; carId < numCars; carId++) {
       
        String uniqueName = "car"+carId;
            switch (carId % 10) {
                case 0: cars.add( new CarBenchObj(carId, "Ford",   uniqueName,   CarBenchObj.Color.RED,   5, 5000.00) ); break;
                case 1: cars.add( new CarBenchObj(carId, "Ford",   uniqueName,  CarBenchObj.Color.RED,   4, 3999.99) ); break;
                case 2: cars.add( new CarBenchObj(carId, "Ford",   uniqueName,  CarBenchObj.Color.GREEN, 4, 6000.00) ); break;
                case 3: cars.add( new CarBenchObj(carId, "Honda",  uniqueName,   CarBenchObj.Color.WHITE, 5, 4000.00) ); break;
                case 4: cars.add( new CarBenchObj(carId, "Honda",  uniqueName,  CarBenchObj.Color.BLACK, 5, 3000.00) ); break;
                case 5: cars.add( new CarBenchObj(carId, "Honda",  uniqueName, CarBenchObj.Color.GREEN, 3, 5000.00) ); break;
                case 6: cars.add( new CarBenchObj(carId, "Toyota", uniqueName, CarBenchObj.Color.GREEN, 5, 5999.95) ); break;
                case 7: cars.add( new CarBenchObj(carId, "Toyota", uniqueName,   CarBenchObj.Color.BLUE,  3, 8500.00) ); break;
                case 8: cars.add( new CarBenchObj(carId, "Toyota", uniqueName,   CarBenchObj.Color.RED,   5, 7800.55) ); break;
                case 9: cars.add( new CarBenchObj(carId, "BMW",    uniqueName,      CarBenchObj.Color.BLUE,  2, 9000.23) ); break;
                default: throw new IllegalStateException();
            }
        }
        return cars;
    }

    public static void main(String[] args) {
        // Create an indexed collection (note: could alternatively use CQEngine.copyFrom() existing collection)...
        IndexedCollection<CarBenchObj> cars = CQEngine.newInstance();

        // Add some indexes...
        cars.addIndex(NavigableIndex.onAttribute(CarBenchObj.CAR_ID));
        cars.addIndex(NavigableIndex.onAttribute(CarBenchObj.MODEL));
        cars.addIndex(NavigableIndex.onAttribute(CarBenchObj.MANUFACTURER));
        cars.addIndex(NavigableIndex.onAttribute(CarBenchObj.TESTVAL));
        
        cars.addAll(createCollectionOfCars(10));
        Query<CarBenchObj> query = equal(CarBenchObj.TESTVAL, false);
        
        // Prints True
        System.out.println(cars.retrieve(query).size() == 10);
        
        CarBenchObj testCar = new CarBenchObj(10, "BMW",    "car11",      CarBenchObj.Color.BLUE,  2, 9000.23);
        cars.add(testCar);
        
        // Prints True
        System.out.println(cars.retrieve(query).size() == 11);
        System.out.println(cars.retrieve(equal(CarBenchObj.TESTVAL, true)).size() == 0);
        
        // modify object
        testCar.setTestVal(true);
        
        // Prints True, object removed
        System.out.println(cars.remove(testCar));
        
        // Still prints true, should be 10
        System.out.println(cars.retrieve(query).size() == 11);
        
        // Car Object ID #10 still exist
        for(CarBenchObj car:cars.retrieve(query)) {
            System.out.println(car.getCarId());
        }
        
        // Prints true, total is only 10
        System.out.println(cars.size() == 10);
        
        QueryOption<CarBenchObj> deduplication = deduplicate(DeduplicationStrategy.MATERIALIZE);
                
        // Still prints true, should be 10
        System.out.println(cars.retrieve(query,queryOptions(deduplication)).size() == 11);
       

    }
}


The car class

import com.googlecode.cqengine.attribute.Attribute;
import com.googlecode.cqengine.attribute.SimpleAttribute;

/**
 * @author Niall Gallagher
 */
public class CarBenchObj {

    public static final Attribute<CarBenchObj, Integer> CAR_ID = new SimpleAttribute<CarBenchObj, Integer>("carId") {
        @Override
public Integer getValue(CarBenchObj car) { return car.carId; }
    };

    public static final Attribute<CarBenchObj, String> MANUFACTURER = new SimpleAttribute<CarBenchObj, String>("manufacturer") {
        @Override
public String getValue(CarBenchObj car) { return car.manufacturer; }
    };

    public static final Attribute<CarBenchObj, String> MODEL = new SimpleAttribute<CarBenchObj, String>("model") {
        @Override
public String getValue(CarBenchObj car) { return car.model; }
    };

    public static final Attribute<CarBenchObj, Color> COLOR = new SimpleAttribute<CarBenchObj, Color>("color") {
        @Override
public Color getValue(CarBenchObj car) { return car.color; }
    };

    public static final Attribute<CarBenchObj, Integer> DOORS = new SimpleAttribute<CarBenchObj, Integer>("doors") {
        @Override
public Integer getValue(CarBenchObj car) { return car.doors; }
    };

    public static final Attribute<CarBenchObj, Double> PRICE = new SimpleAttribute<CarBenchObj, Double>("price") {
        @Override
public Double getValue(CarBenchObj car) { return car.price; }
    };
    
    public static final Attribute<CarBenchObj, Boolean> TESTVAL = new SimpleAttribute<CarBenchObj, Boolean>("testVal") {
        @Override
        public Boolean getValue(CarBenchObj car) { return car.testVal; }
    };

    public enum Color {RED, GREEN, BLUE, BLACK, WHITE}
    private final int carId;
    private final String manufacturer;
    private final String model;
    private final Color color;
    private final int doors;
    private final double price;
    private boolean testVal = false;

    public CarBenchObj(int carId, String manufacturer, String model, Color color, int doors, double price) {
        this.carId = carId;
        this.manufacturer = manufacturer;
        this.model = model;
        this.color = color;
        this.doors = doors;
        this.price = price;
    }

    public int getCarId() {
        return carId;
    }

    public String getManufacturer() {
        return manufacturer;
    }

    public String getModel() {
        return model;
    }

    public Color getColor() {
        return color;
    }

    public int getDoors() {
        return doors;
    }

    public double getPrice() {
        return price;
    }

    @Override
    public String toString() {
        return "Car{" +
                "carId=" + carId +
                ", manufacturer='" + manufacturer + '\'' +
                ", model='" + model + '\'' +
                ", color=" + color +
                ", doors=" + doors +
                ", price=" + price +
                '}';
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        CarBenchObj car = (CarBenchObj) o;

        if (carId != car.carId) return false;

        return true;
    }

    @Override
    public int hashCode() {
        return carId;
    }

    public boolean isTestVal() {
        return testVal;
    }

    public void setTestVal(boolean testVal) {
        this.testVal = testVal;
    }
    
    
}




Niall Gallagher

unread,
Jul 15, 2014, 7:30:36 AM7/15/14
to cqengine...@googlegroups.com
You should not modify any fields in objects which are used in the equals(), hashCode() or compareTo() methods, while they are stored in the IndexedCollection.

The same rule applies to other Java collections. For example if you define your own object and use it as a key in a java.util.HashMap, and you modify one of its fields after adding to the HashMap, then you will end up changing its hashCode, leaving it stored in an incorrect bucket.

http://stackoverflow.com/questions/7842049/are-mutable-hashmap-keys-a-dangerous-practice



--
-- You received this message because you are subscribed to the "cqengine-discuss" group.
http://groups.google.com/group/cqengine-discuss
---
You received this message because you are subscribed to the Google Groups "cqengine-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cqengine-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hans

unread,
Jul 15, 2014, 7:41:53 AM7/15/14
to cqengine...@googlegroups.com
No.

The equals and hashcode do not reference that field

 @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        CarBenchObj car = (CarBenchObj) o;

        if (carId != car.carId) return false;

        return true;
    }

    @Override
    public int hashCode() {
        return carId;
    }

I only modified the testvalue field which is not used at all here.

Even if i did - cqengine has obviously a bug that its still retaining a ghost entry even tho it reports it has successfully removed it

Niall Gallagher

unread,
Jul 15, 2014, 7:54:20 AM7/15/14
to cqengine...@googlegroups.com
Good point. Okay I should clarify!:

You should not modify any fields in objects which are indexed, or used in the equals(), hashCode() or compareTo() methods, while they are stored in the IndexedCollection.

It's the same reason - HashIndex is backed by a HashMap, so we shouldn't change hash codes of keys stored in the index (attribute values in this case) after they have been indexed.

If you really need to modify one of those fields, you need to remove the object from the collection, modify the field, then re-add the object to the collection.

I know this is not ideal. If you want other threads to always see some version of your object in the collection (you cannot tolerate it disappearing for a couple of milliseconds), you can use the locking approach discussed in TransactionIsolation.
I have designed a better solution which provides lock-free reads and MVCC transactions (with support to replace objects atomically) in the pipeline, but it is not yet finished.

Hans

unread,
Jul 15, 2014, 8:07:49 AM7/15/14
to cqengine...@googlegroups.com
I hope i did not spent the last 2 weeks implementing cqengine for my project and it is now useless to me.

I only receive the info when i need to update the index, i have no control who changed what fields on my objects.

When i receive the object, it is already changed - i overwrote the hashcode/equals method so that any java implementation should find the very same instance again
by comparing an ID field.

My objects have a unique ID each - so i simply want to remove these objects from your index by the unique ID - that must be possible.
No matter what fields i changed!

How can i implement a solution for this ?




Message has been deleted

Hans

unread,
Jul 15, 2014, 8:35:08 AM7/15/14
to cqengine...@googlegroups.com
I have a hack now, if you dont have a better suggestion:

In AbstractMapBasedAttributeIndex i no longer look for the cached attribute, but i simply delete by the object itself which implements the correct hashcode and equals method.

 @Override
    public void notifyObjectsRemoved(Collection<O> objects) {
        ConcurrentMap<A, StoredResultSet<O>> indexMap = this.indexMap;
        for (O object : objects) {
            
            Iterable<A> attributeValues = getAttribute().getValues(object);
            for (A attributeValue : attributeValues) {
                Iterator<Entry<A, StoredResultSet<O>>> it = indexMap.entrySet().iterator();
                
                while(it.hasNext()) {
                    Entry<A, StoredResultSet<O>> entry = it.next();
                    
                    StoredResultSet<O> valueSet = entry.getValue();
                    if(valueSet.contains(object)) {
                        System.out.println("Found object");
                        valueSet.remove(object);
                        if (valueSet.isEmpty()) {
                            indexMap.remove(attributeValue);
                        }
                    }
                }
            }
            
           /* Iterable<A> attributeValues = getAttribute().getValues(object);
            for (A attributeValue : attributeValues) {

                // Replace attributeValue with quantized value if applicable...
                attributeValue = getQuantizedValue(attributeValue);

                StoredResultSet<O> valueSet = indexMap.get(attributeValue);
                if (valueSet == null) {
                    continue;
                }
                valueSet.remove(object);
                if (valueSet.isEmpty()) {
                    indexMap.remove(attributeValue);
                }
            }*/
        }
    }


Niall Gallagher

unread,
Jul 15, 2014, 10:21:27 AM7/15/14
to cqengine...@googlegroups.com
Hi Hans,

Let's say you give an object retrieved from the IndexedCollection to "untrusted code" which might modify its fields without allowing us to update indexes. There might be a problem with that approach to update indexes after the untrusted code has modified the fields. Because how can CQEngine locate the object in the index to remove it, without knowing the previous value of the indexed attribute?

The safest approach is to make all fields in the object stored in the IndexedCollection immutable, and to give untrusted code a mutable copy of the object (if it absolutely must be able to modify fields directly). Then when you get the mutable copy back, check if its fields have been modified. If so, remove the old immutable object from IndexedCollection and insert a new one containing the modifications.

Long story short - many collections in general do not work well if fields in objects will be modified after those objects have been added to the collection. The only fields which are safe to modify are those which are excluded from the calculation of hashCode(), equals(), compareTo(), and in CQEngine, indexes.

Note: there is actually a way to handle your use case more gracefully: don't give untrusted code your actual mutable object - give it a proxy instead. The proxy can record invocations on setter methods and simply record them in a list of modifications to be applied later. Then when you get the object back, you can safely apply all modifications all at once. There's a few nuances but this is probably the route I would take.

HTH,
Niall



Niall Gallagher

unread,
Jul 15, 2014, 10:45:11 AM7/15/14
to cqengine...@googlegroups.com
Or perhaps simpler than a proxy: for each supposedly immutable field in your object, add a second counterpart field of the same type which we designate as mutable. The public setters and getters operate on the mutable variants. CQEngine can index the original supposedly immutable fields. When you get the object back from untrusted code, if any mutable variants have changed, remove the object from the collection, copy mutable values into the 'immutable' fields, re-add the object.

Hans

unread,
Jul 15, 2014, 10:55:03 AM7/15/14
to cqengine...@googlegroups.com
>>Because how can CQEngine locate the object in the index to remove it, without knowing the previous value of the indexed attribute?

It does not need to, since it caches the objects it can simply execute the equal or hashmethod - in my case these implement values from fields that never change (a primary ID for each object).

As i posted above - cqengine is very well able to cope with modification of objects if you simply change the notifyObjectsRemoved
method to not look for the old indexed properties but for the object itself! Why isnt it implemented that way?

Furthermore, cqengine is able to remove the object from its own collection

 public boolean remove(Object object) {
        @SuppressWarnings({"unchecked"})
        O o = (O) object;
        boolean modified = collection.remove(o);
        if  (modified) {
            indexEngine.notifyObjectsRemoved(Collections.singleton(o));
        }
        return modified;
    }

So if you can remove the object from the collection (modified is true) but you cant/want for some design reason to remove the index - at the very least you can also not remove the object or throw and exception or return false. Because thats whats happening, you never really remove the object from the cache.

With the small change i provided above cqengine can cope with the modification of any field that is not used in the hashcode or equals calucation - as it should be per contract with the collection remove() logic

Niall Gallagher

unread,
Jul 15, 2014, 12:50:23 PM7/15/14
to cqengine...@googlegroups.com
Hi Hans,

Well yes that approach will work - as long as you have not modified any fields used by the equals() method.

It's not implemented that way at the moment because that's an O(n) operation, to iterate all objects in the index to find the object you want to remove. But there is nothing wrong with that if it works for you.

The current approach is O(1) -- but of course it requires that the field is not modified before it tries to remove it!

As I mentioned I do want to improve this kind of thing in future.

Thanks,
Niall




--
Message has been deleted

Hans

unread,
Jul 16, 2014, 5:13:20 AM7/16/14
to cqengine...@googlegroups.com
Can you please review my improvement so that it does not take O(N) ?

On notifyObjectsAdded i add a reverse lookup for the object themself.

In notifyObjectsRemoved i no longer query the index map for the result but the reverse lookup.

Do you think this would work fine ?

Also, since i did not want to modify cqengine i added this as an extends class - however im unable to overwrite

 the NavigableIndex.withQuantizerOnAttribute because it uses a protected field filterFirstResultSet - can you remove that restriction in an upcoming release? Otherwise i have to copy&paste the whole class

import java.util.Collection;
import java.util.Map.Entry;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ConcurrentNavigableMap;

import com.googlecode.cqengine.attribute.Attribute;
import com.googlecode.cqengine.attribute.SimpleAttribute;
import com.googlecode.cqengine.index.common.Factory;
import com.googlecode.cqengine.index.navigable.NavigableIndex;
import com.googlecode.cqengine.resultset.stored.StoredResultSet;


public class MyNavigableIndex<A extends Comparable<A>, O> extends NavigableIndex<A, O> {

    
    protected MyNavigableIndex(Factory<ConcurrentNavigableMap<A, StoredResultSet<O>>> indexMapFactory, Factory<StoredResultSet<O>> valueSetFactory,
            Attribute<O, A> attribute) {
        super(indexMapFactory, valueSetFactory, attribute);
    }

    // ---------- Static factory methods to create NavigableIndexes ----------

    /**
     * Creates a new {@code NavigableIndex} on the given attribute. The attribute can be a {@link SimpleAttribute} or a
     * {@link com.googlecode.cqengine.attribute.MultiValueAttribute}, as long as the type of the attribute referenced
     * implements {@link Comparable}.
     * <p/>
     * @param attribute The attribute on which the index will be built, a {@link SimpleAttribute} or a
     * {@link com.googlecode.cqengine.attribute.MultiValueAttribute} where the type of the attribute referenced
     * implements {@link Comparable}
     * @param <A> The type of the attribute
     * @param <O> The type of the object containing the attribute
     * @return A new HashIndex which will build an index on this attribute
     */
    public static <A extends Comparable<A>, O> NavigableIndex<A, O> onAttribute(Attribute<O, A> attribute) {
        return onAttribute(new DefaultIndexMapFactory<A, O>(), new DefaultValueSetFactory<O>(), attribute);
    }
    
    public static <A extends Comparable<A>, O> NavigableIndex<A, O> onAttribute(Factory<ConcurrentNavigableMap<A, StoredResultSet<O>>> indexMapFactory, Factory<StoredResultSet<O>> valueSetFactory, Attribute<O, A> attribute) {
        return new MyNavigableIndex<A, O>(indexMapFactory, valueSetFactory, attribute);
    }
    
    
    private ConcurrentHashMap<O, ObjectIndex> objectIndexMap = new ConcurrentHashMap<O, ObjectIndex>();
    
    class ObjectIndex {
        private ConcurrentHashMap<A, StoredResultSet<O>> valueSets = new ConcurrentHashMap<A, StoredResultSet<O>>();
        
        public ConcurrentHashMap<A, StoredResultSet<O>> getValueSets() {
            return valueSets;
        }
    }
    
    /**
     * {@inheritDoc}
     */
    @Override
    public void notifyObjectsAdded(Collection<O> objects) {
        ConcurrentMap<A, StoredResultSet<O>> indexMap = this.indexMap;
        for (O object : objects) {
            Iterable<A> attributeValues = getAttribute().getValues(object);
            for (A attributeValue : attributeValues) {

                // Replace attributeValue with quantized value if applicable...
                attributeValue = getQuantizedValue(attributeValue);

                // Look up StoredResultSet for the value...
                StoredResultSet<O> valueSet = indexMap.get(attributeValue);
                if (valueSet == null) {
                    // No StoredResultSet, create and add one...
                    valueSet = valueSetFactory.create();
                    StoredResultSet<O> existingValueSet = indexMap.putIfAbsent(attributeValue, valueSet);
                    if (existingValueSet != null) {
                        // Another thread won race to add new value set, use that one...
                        valueSet = existingValueSet;
                    }
                }
                // Add the object to the StoredResultSet for this value...
                valueSet.add(object);
                
             // Add reverse Lookup
                ObjectIndex oi = objectIndexMap.get(object);
                
                if(oi == null) {
                    oi = new ObjectIndex();
                    objectIndexMap.put(object, oi);
                }

                
                oi.getValueSets().put(attributeValue, valueSet);
            }
        }
    }
  
    /**
     * {@inheritDoc}
     */
    public void notifyObjectsRemoved(Collection<O> objects) {
        ConcurrentMap<A, StoredResultSet<O>> indexMap = this.indexMap;
        for (O object : objects) {
            
            ObjectIndex oi = objectIndexMap.get(object);
            
             if(oi != null) {
                Set<Entry<A, StoredResultSet<O>>> valueSets = oi.getValueSets().entrySet();
        
                for (Entry<A, StoredResultSet<O>> entry : valueSets) {
                    StoredResultSet<O> valueSet = entry.getValue();
        
                    valueSet.remove(object);
        
                    if (valueSet.isEmpty()) {
                        indexMap.remove(entry.getKey());
                    }
                }

                oi.getValueSets().clear();
                objectIndexMap.remove(object);
            }
                      
        }
    }

}


 

Hans

unread,
Jul 18, 2014, 5:50:22 AM7/18/14
to cqengine...@googlegroups.com
Can i get some feedback, please?

Niall

unread,
Jul 18, 2014, 7:08:27 AM7/18/14
to cqengine...@googlegroups.com
Hi Hans,

Sorry I have been busy.

So in the index, you are storing an additional map, from object O -> ObjectIndex? And ObjectIndex stores a map of attribute values to StoredResultSets. I don't think you need to store any results in your reverse lookup map.

Example:
Let's say you have a Car object which has a features field which is a List<String>.
You also have a FEATURES attribute (MultiValueAttribute) which reads the Car.features field.

Let's say a particular car has features ["radio", "sunroof"].

You create an IndexedCollection and you add a NavigableIndex to it for the FEATURES attribute.
You then add this particular car to the collection.

CQEngine notifies the NavigableIndex that the new Car has been added.
The NavigableIndex has been configured to index the FEATURES attribute, so it asks that attribute to give it a List<String> of the values of that attribute for this particular car. The attribute returns ["radio", "sunroof"].

The NavigableIndex then iterates those attribute values, and adds the object to two StoredResultSets in the index, one for key "radio" and one for key "sunroof".
This is how CQEngine builds indexes on objects.

Later, if that particular car is removed from the collection, the NavigableIndex again gets notified. It asks the attribute for the List<String> of the values for this particular car, it gets ["radio", "sunroof"], and it removes the car from those StoredResultSets.
This is how CQEngine removes objects from indexes.

BUT let's say outside of CQEngine's control, some code had modified the list of features in the car before the object was removed. Let's say the feature "radio" was removed. Then when NavigableIndex tries to remove the object from the index, it will remove it from only one of the StoredResultSets - not from both.

So my understanding of how you were planning to work around this, was that you would create a backup of the attribute values at the point when the object is first added to the index, and store those in a reverse lookup map.
Then when trying to remove the object, you would not ask the attribute to provide the values again, you would just look them up in your reverse lookup map.
So I thought that your reverse lookup map would simply be:
Map<O, List<A>> // map of Object, to list of the values it has for the attribute.

This approach will work. Think about multi-threaded requests however - you might need to add synchronization (on the write paths only) because you update two separate data structures.

In general I think it's easier to keep your fields immutable so as not to run into these problems! But there will always be exceptions to the rule. I hope I followed correctly what you were planning. Hope this (somehow) helps. 

Niall

Hans

unread,
Jul 18, 2014, 7:38:17 AM7/18/14
to cqengine...@googlegroups.com

So my understanding of how you were planning to work around this, was that you would create a backup of the attribute values at the point when the object is first added to the index, and store those in a reverse lookup map.
Then when trying to remove the object, you would not ask the attribute to provide the values again, you would just look them up in your reverse lookup map.
So I thought that your reverse lookup map would simply be:
Map<O, List<A>> // map of Object, to list of the values it has for the attribute.

Not really, i dont really understand the attribute part.

I saw in your code at

notifyObjectsRemoved

  valueSet.remove(object);
                 if (valueSet.isEmpty()) {
                     indexMap.remove(attributeValue);
                 }

So i thought it is important to also remove the attributeValue from the index.

The only thing i did really was store an Object -> StoredResultSet reference (and since 1 object can have multiple storedresults its encapsulated in another list).
On delete i simply lookup the Object Map and remove all entries from the StoredResultSet.

Where i wasnt sure was the above part with the attributes - since the fields on the Object may have changed i wouldnt know how to remove them from the indexMap. 

Hence why i stored an object together with its ConcurrentHashMap<A, StoredResultSet<O>>  relation, to be able to call indexMap.remove on all attributes when the result set is empty.

As i said im not sure on the attribute part, i thought simply removing the object reference from the StoredResultSet might not have been enough.

Since my ConcurrentHashMap<O, ObjectIndex> objectIndexMap honors the equals/hashcode impl of my Object it can always find the object no matter what fields have been modified (because its calculated on one field that never changes, a primary ID)

Niall Gallagher

unread,
Jul 18, 2014, 7:49:27 AM7/18/14
to cqengine...@googlegroups.com

Ok I think I understand. That would work, but I guess might leave some empty StoredResultSets in the index.

Currently it also removes empty StoredResultSets, so you might want to do that too.

Sent from my Android

--

Péter Sinóros-Szabó

unread,
Mar 19, 2015, 8:18:42 AM3/19/15
to cqengine...@googlegroups.com
Hi,

is this better solution for the lock-free reads and mvcc implemented already? If yes, which class methods should I use for it?

Thanks,
Peter

Niall Gallagher

unread,
Mar 19, 2015, 8:56:11 AM3/19/15
to cqengine...@googlegroups.com
Hi Péter,

MVCC is implemented in trunk, but it has not been released just yet, and I have not updated the site with usage instructions yet. You can use it now if you want though: the MVCC support has now been tested etc. If you check out CQEngine from source and run mvn install, it will install a snapshot containing MVCC support in your local maven repo. Then see the JavaDocs on class TransactionalIndexedCollection for usage instructions, and see its unit tests for examples - as I have not had time yet to update the site with usage instructions.

The reason it has not been released is that I am finalizing some other unrelated features which I want to include in the next release as well.

The same restrictions about modifying fields while objects are stored in the collection, will apply with MVCC: in other words - don't modify fields! The difference is MVCC will allow you to replace objects in the collection atomically. So you can swap one version of an object for a modified version easily.

HTH,
Niall



Hans

unread,
Mar 31, 2015, 9:51:32 AM3/31/15
to cqengine...@googlegroups.com
Niall,

i found an issue with either my workaround for updating objects in the entry or the cqengine search code.

When i update an object in the index as i described above in the first posts - and search the object via an indexed attributed again - sometimes the object can NOT be found.

This is randomly - i made a small test class where i search and upate the object 100000 times - and in less than 1% of the cases - cqgengine does not find my object

To iterate how i update an object:

1. i remove my object via a primary id (long) from IndexedCollection entityIndex
2. Afterwards i add my object via  entityIndex.add(myObj) again

This is all in the same thread - no multi write/read

It looks like even tho i add an object to the index - it can not be immediately found - is there a way to force CQEngine to wait till the object has been added to the index ?

Niall

unread,
Mar 31, 2015, 10:38:37 AM3/31/15
to cqengine...@googlegroups.com
Hi Hans,

There is no asynchronous processing. As soon as the add() method returns, your object should have been added to all indexes. Therefore if you then search for the object, you should find it.

I still think it's best to avoid these problems which are associated with mutable fields entirely. It's really not recommended with any collection (whether CQEngine, or JDK collections) to modify the fields of objects, while those objects are stored in the collection. Yes you may be able to add workarounds to the collection to support it in some situations, but it's unconventional and more error prone to take this approach.

Have you considered the alternative approach I suggested above, where you give your "untrusted code" a wrapper or proxy around your objects, which intercept and record the modifications instead? As far as I can see it should not be a big deal to do it that way. When you get your objects back from the untrusted code, you could then have a nice list of which fields it changed which would eliminate a lot of the guesswork. You could then update the collection in a more orderly way, without risk of inadvertently changing hashCodes etc.

I'm also removing "Major Bug" from the subject line! This is not a bug in CQEngine, it's a topic which applies to all collections.

Hope that helps, somehow!
Niall

Hans

unread,
Mar 31, 2015, 11:11:51 AM3/31/15
to cqengine...@googlegroups.com
Hi Niall,

again what i want to do is simple - instead of worrying about which field of an object is updated i like to discard the whole object and readd it to the index thus indexing all fields anew.

Is this somehow possible with your latest changes with allowing IndexedCollecion.update() ? 

I have a simply primary key long and an object to it - whenever the object is changed i simply remove the entry associated with that long ID and add it again

Niall

unread,
Mar 31, 2015, 6:32:25 PM3/31/15
to cqengine...@googlegroups.com
Hi Hans,

The new update() method will allow you to replace objects atomically. But technically you should not need that feature, and the current release 1.3.2 should be sufficient, if I understand you correctly.

You mentioned - "whenever the object is changed i simply remove the entry associated with that long ID and add it again"

That is not safe or recommended!
It is risky to allow an object to be changed before you remove it. This might be the cause of the problem you are seeing:
  • If the field which was changed was used in your object's implementation of hashCode() or equals() then it might be impossible to remove that object from any HashMaps or other collections (because it would be lost, in a bucket corresponding to its previous hashCode).
  • Or, in the case of CQEngine, if you had previously added an index on an attribute which read from the field which was changed, then likewise CQEngine might be unable to find the object again to remove it, because it would not know what the previous value of the field was so that it could locate it in the index. So this can leave "ghost" objects in Java collections or indexes, which can only be found by iterating the entire collection.
The safest option is to prevent your application from modifying objects while they are stored in the collection: make all fields final.
Effective Java, Item 13 - "Favor Immutability"! It just avoids all of those problems.

Or, if you can't do that (I'm copying the following from my earlier mail):
  1. Investigate returning a clone or a proxy wrapper around the objects to your application - "the proxy can record invocations on setter methods and simply record them in a list of modifications to be applied later. Then when you get the object back, you can safely apply all modifications all at once"
  1. "Or perhaps simpler than a proxy: for each supposedly immutable field in your object, add a second counterpart field of the same type which we designate as mutable. The public setters and getters operate on the mutable variants. CQEngine can index the original supposedly immutable fields. When you get the object back from untrusted code, if any mutable variants have changed, remove the object from the collection, copy mutable values into the 'immutable' fields, re-add the object."
    HTH,
    Niall

    Vladimir Legkunets

    unread,
    Nov 22, 2016, 6:38:41 AM11/22/16
    to cqengine-discuss
    Thanks, Niall,

    I faced this behavior too and you save some of my time.
    Thank you for cqengine, that's great library!

    Best regards, Vladimir

    Vladimir Legkunets

    unread,
    Nov 22, 2016, 11:31:07 AM11/22/16
    to cqengine-discuss
    Hi, Niall, I have a question.

    You proposed to modify object after remove, but what is more straight forward way to have this remove->applyChanges->add chain atomic?
    I understand your proposal about immutable objects but would like to avoid object cloning.

    Thanks in advance, Vlad.

    Niall Gallagher

    unread,
    Nov 22, 2016, 3:47:37 PM11/22/16
    to cqengine...@googlegroups.com
    No problem Vladimir,

    For reference, the TransactionalIndexedCollection exists now of course, and it can be used to replace objects atomically.

    Best regards,
    Niall

    Niall

    unread,
    Nov 22, 2016, 4:41:53 PM11/22/16
    to cqengine-discuss
    Sorry - I forgot to answer that question..

    Basically the choices are:
    • Use the TransactionalIndexedCollection which provides transaction isolation, atomicity and lock-free reads via MVCC (but which may involve cloning objects if you only want to change one field), or
    • Use the ConcurrentIndexedCollection and use the approach discussed in Transaction Isolation without MVCC to achieve transaction isolation via locking (which can block reads), or
    • Use the ConcurrentIndexedCollection and the remove-modify-add approach to modify an existing field in an object, with the caveat that reading threads might find that the object has disappeared from the collection briefly while it was being modified
    Hope that helps!
    Niall

    Vladimir Legkunets

    unread,
    Nov 23, 2016, 4:55:17 AM11/23/16
    to cqengine-discuss
    Yes, thanks, option 2 is working for me I believe.

    Vladimir Legkunets

    unread,
    Nov 23, 2016, 8:16:14 AM11/23/16
    to cqengine-discuss
    Hi Niall,

    what do you think about idea to add lambda updates? To have method update(objects, change(s)) I beleive it could be useful, especially  for transactional collection.

    If it looks interesting from your point of view I can create feature request on GitHub.

    Vlad


    On Tuesday, November 22, 2016 at 4:41:53 PM UTC-5, Niall wrote:

    Niall Gallagher

    unread,
    Nov 23, 2016, 12:57:38 PM11/23/16
    to cqengine...@googlegroups.com
    A challenge with lambda updates would be the lack of information about which field (attribute) was being modified by the lambda. So no easy way to update specific indexes.
    But on that topic, I have considered adding support for attribute updates though. The idea would be to have an API like: collection.update(objects, attributes, newValues). This would avoid the need to clone objects. 

    However, creating collections of attributes and newValues could be more expensive on GC than supplying a single collection of cloned objectsToAdd. It would also not be compatible with MVCC because MVCC actually needs two versions of the object to exist at the same time. So there's a lot of tradeoffs involved.

    --
    -- You received this message because you are subscribed to the "cqengine-discuss" group.
    http://groups.google.com/group/cqengine-discuss
    ---
    You received this message because you are subscribed to the Google Groups "cqengine-discuss" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to cqengine-discuss+unsubscribe@googlegroups.com.

    Vladimir Legkunets

    unread,
    Dec 2, 2016, 9:07:11 AM12/2/16
    to cqengine-discuss
    Hi Niall,

    but why can't we do remove -> lambda update -> add under single lock in TransactionalIndexCollection or even in ObjectLockingIndexCollection to have this update atomic?

    By the way, what stops us from having update atomic in ObjectLocking? (now it has separate locks in add and remove as far as I understood)

    Best regards, Vladimir
    To unsubscribe from this group and stop receiving emails from it, send an email to cqengine-discu...@googlegroups.com.

    Niall

    unread,
    Dec 19, 2016, 9:54:32 PM12/19/16
    to cqengine-discuss
    Hi Vladimir,

    Sorry for the delay replying.

    Behind the scenes, the TransactionalIndexedCollection actually does need the original object and the replacement object (i.e. both versions of the same object) to be separate objects, because it will show the different versions of the object to different threads which are accessing the collection (the MVCC algorithm requires one object per version). So there's no easy way to get around this requirement. Maybe in future CQEngine could automatically clone the original object, and then apply the lambda to the clone though. But it might need to make assumptions when doing this, and it would need to add a hidden version field for sure, which might require bytecode generation etc. so it would require more investigation.

    The ObjectLockingIndexedCollection doesn't actually hold read locks - the locks only exist to prevent multiple writing threads from adding/removing the same object at the same time.

    If you'd like to experiment with this stuff yourself, you could make an implementation of IndexedCollection yourself which wraps another indexed collection and adds such locking.

    I don't have a need for more sophisticated locking than exists at the moment myself, but the project is open to pull requests! 
    So if you'd like to contribute something in this area I'd support it and try to help as much as I can!

    Best regards,
    Niall
    Reply all
    Reply to author
    Forward
    0 new messages