Partial serialization on read of map with TreeSet as value. Is it possible?

54 views
Skip to first unread message

Sabuj Das

unread,
Aug 20, 2020, 11:11:42 AM8/20/20
to Hazelcast
Hi,
I am using Hazelcast 4.0.2 with Oracle-JDK-1.8.

My use-case is:
  • I put TreeSet of Data ordered by a filed score in an IMap.
  • The set can grow up to a Milion records
  • Every update of the data and score, I add/remove from the TreeSet
  • What I want to achieve is very lo-latency read of Top-X items from the TreeSet.

My data class:
@Getter
public class Data implements IdentifiedDataSerializable {
    private Long identifier;
    private Double score;

    public Data(Long identifier, Double score) {
        this.identifier = identifier;
        this.score = score;
    }
    @Override
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (!(o instanceof Data)) {
            return false;
        }
        Data that = (Data) o;
        return identifier.equals(that.identifier);
    }
    @Override
    public int hashCode() {
        return Objects.hash(identifier);
    }
    @Override
    @JsonIgnore
    public int getFactoryId() {
        return SerializerConstants.SERIALIZER_FACTORY_ID_SCORED_LEAD;
    }
    @Override
    @JsonIgnore
    public int getClassId() {
        return SerializerConstants.SERIALIZER_CLASS_ID_SCORED_LEAD;
    }
    @Override
    public void writeData(ObjectDataOutput out) throws IOException {
        out.writeLong(identifier);
        out.writeDouble(score);
    }
    @Override
    public void readData(ObjectDataInput in) throws IOException {
        identifier = in.readLong();
        score = in.readDouble();
    }
}

Also, a comparator is implemented on the score.

How I read data:
final IMap<String, TreeSet<Data>> imap = getImap("CACHE_NAME");
final TreeSet<Data> dataSet = imap.get(groupName);
if (CollectionUtils.isEmpty(dataSet)) {
    return null;
}
Set<Data> selected = dataSet.stream()
        .filter(val -> !excluding.contains(val.getIdentifier()))
        .limit(limit)
        .collect(Collectors.toSet());
...

To read top 100 items from the TreeSet
If the TreeSet has 50,000 or fewer items, then the time is 25 milliseconds.
If the TreeSet has 100,000 or more items, then it takes more than 200 milliseconds.

The issue I see here is, every final TreeSet<Data> dataSet = imap.get(groupName); hazelcast de-serializes the complete collection and apply comparator again. 

Is there any way to limit the serialization of a collection? 
And also not to use Comparator on read of a TreeSet? 

Thanks in advance...

Joe Sherwin

unread,
Aug 20, 2020, 11:37:48 AM8/20/20
to haze...@googlegroups.com
I would consider redesigning your storage of this data as a TreeSet. Use an object key with groupName & identifier as attributes. Then you can simple use Hazelcast query by score >=$2 AND groupName =$2. The new data structure will also avoid your cache becoming unbalanced, were very large treeSets are in a single node.

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/564da64f-a656-492f-bd39-efb89ba31f5dn%40googlegroups.com.


This message contains confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. -Hazelcast

Sabuj Das

unread,
Aug 21, 2020, 1:23:37 AM8/21/20
to Hazelcast
Hi,
Thanks for the reply.
If I understand correctly, you suggest storing the Data(groupName, identifier) in a IMap key and score as a value. 
Then use Predicates for a query.

I shall try.

Sabuj Das

unread,
Aug 24, 2020, 9:46:40 AM8/24/20
to Hazelcast
Hi..
The solution works, however, there is no improvement in performance.

Joe Sherwin

unread,
Aug 24, 2020, 9:54:33 AM8/24/20
to haze...@googlegroups.com
Did you create an index? Share the code which is not performing.


On Aug 24, 2020, at 9:46 AM, Sabuj Das <sabu...@smava.de> wrote:

Hi..

Ahmet Mircik

unread,
Aug 26, 2020, 1:43:50 PM8/26/20
to Hazelcast
Another approach is using entry processor: https://docs.hazelcast.org/docs/latest/manual/html-single/#entry-processor
With entry processor, you can read diffs instead of full data set.

Sabuj Das

unread,
Aug 31, 2020, 3:43:53 AM8/31/20
to Hazelcast
Hi All,
Sorry for replying late.
However, I have simplified my data structure in IMap.
I have changed to IMap<Long, Double>

Is there anything I can improve on this?

Thanks in advance.
Reply all
Reply to author
Forward
0 new messages