Partial serialization on read of map with TreeSet as value. Is it possible?

Sabuj Das

unread,

Aug 20, 2020, 11:11:42 AM8/20/20

to Hazelcast

Hi,

I am using Hazelcast 4.0.2 with Oracle-JDK-1.8.

My use-case is:

I put TreeSet of Data ordered by a filed score in an IMap.
The set can grow up to a Milion records
Every update of the data and score, I add/remove from the TreeSet
What I want to achieve is very lo-latency read of Top-X items from the TreeSet.

My data class:

@Getter

public class Data implements IdentifiedDataSerializable {

private Long identifier;

private Double score;

public Data(Long identifier, Double score) {

this.identifier = identifier;

this.score = score;

}

@Override

public boolean equals(Object o) {

if (this == o) {

return true;

}

if (!(o instanceof Data)) {

return false;

}

Data that = (Data) o;

return identifier.equals(that.identifier);

}

@Override

public int hashCode() {

return Objects.hash(identifier);

}

@Override

@JsonIgnore

public int getFactoryId() {

return SerializerConstants.SERIALIZER_FACTORY_ID_SCORED_LEAD;

}

@Override

@JsonIgnore

public int getClassId() {

return SerializerConstants.SERIALIZER_CLASS_ID_SCORED_LEAD;

}

@Override

public void writeData(ObjectDataOutput out) throws IOException {

out.writeLong(identifier);

out.writeDouble(score);

}

@Override

public void readData(ObjectDataInput in) throws IOException {

identifier = in.readLong();

score = in.readDouble();

}

Also, a comparator is implemented on the score.

How I read data:

final IMap<String, TreeSet<Data>> imap = getImap("CACHE_NAME");

final TreeSet<Data> dataSet = imap.get(groupName);

if (CollectionUtils.isEmpty(dataSet)) {

return null;

}

Set<Data> selected = dataSet.stream()

.filter(val -> !excluding.contains(val.getIdentifier()))

.limit(limit)

.collect(Collectors.toSet());

...

To read top 100 items from the TreeSet

If the TreeSet has 50,000 or fewer items, then the time is 25 milliseconds.

If the TreeSet has 100,000 or more items, then it takes more than 200 milliseconds.

The issue I see here is, every final TreeSet<Data> dataSet = imap.get(groupName); hazelcast de-serializes the complete collection and apply comparator again.

Is there any way to limit the serialization of a collection?

And also not to use Comparator on read of a TreeSet?

Thanks in advance...

Joe Sherwin

unread,

Aug 20, 2020, 11:37:48 AM8/20/20

to haze...@googlegroups.com

I would consider redesigning your storage of this data as a TreeSet. Use an object key with groupName & identifier as attributes. Then you can simple use Hazelcast query by score >=$2 AND groupName =$2. The new data structure will also avoid your cache becoming unbalanced, were very large treeSets are in a single node.

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/564da64f-a656-492f-bd39-efb89ba31f5dn%40googlegroups.com.

This message contains confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. -Hazelcast

Sabuj Das

unread,

Aug 21, 2020, 1:23:37 AM8/21/20

to Hazelcast

Hi,

Thanks for the reply.

If I understand correctly, you suggest storing the Data(groupName, identifier) in a IMap key and score as a value.

Then use Predicates for a query.

I shall try.

Sabuj Das

unread,

Aug 24, 2020, 9:46:40 AM8/24/20

to Hazelcast

Hi..

The solution works, however, there is no improvement in performance.

Joe Sherwin

unread,

Aug 24, 2020, 9:54:33 AM8/24/20

to haze...@googlegroups.com

Did you create an index? Share the code which is not performing.

On Aug 24, 2020, at 9:46 AM, Sabuj Das <sabu...@smava.de> wrote:

Hi..

To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/616a3749-964b-4ea2-99f2-f06454e454cen%40googlegroups.com.

Ahmet Mircik

unread,

Aug 26, 2020, 1:43:50 PM8/26/20

to Hazelcast

Another approach is using entry processor: https://docs.hazelcast.org/docs/latest/manual/html-single/#entry-processor

With entry processor, you can read diffs instead of full data set.

To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/AC6E786B-70C7-44E4-8840-36F5FD7B20A2%40hazelcast.com.

Sabuj Das

unread,

Aug 31, 2020, 3:43:53 AM8/31/20

to Hazelcast

Hi All,

Sorry for replying late.

However, I have simplified my data structure in IMap.

I have changed to IMap<Long, Double>

Here is the full code: https://github.com/Consolefire/hazelcast-spring-boot-sample