Set arithmetic on two or more subindexed sets of a PState

26 views
Skip to first unread message

Sashidhar T

unread,
Jan 8, 2025, 12:20:08 PMJan 8
to rama-user
Hi all,

I've a PState which is a map from a term to set of document ids (basically an inverted index). I would like to perform set arithmetic (intersection, union) on two or more subindexed sets belonging to the same PState. 

MicrobatchTopology indexing = topologies.microbatch("indexing");

indexing.pstate("$$termToDocIdPostings", PState.mapSchema(String.class, PState.setSchema(String.class)).subindexed());

How can I achieve this ?

One vague approach I've is to use a query topology in-memory PState to incrementally perform the set operations. Not sure if this is a right approach. Here's what I tried so far, 

topologies.query("intersectPostings", "*term1", "*term2").out("*res")
        .hashPartition("*term1")
        .localSelect("$$termToDocIdPostings", Path.collect(
               Path.multiPath(
                               Path.key("*term1"),
                               Path.key("*term2"))
                       .all())).out("*allPostings")
               .localTransform("$$intersectPostings$$", Path.termVal("*allPostings"))
               .each(Ops.PRINTLN, "Intersection:", "*allPostings");

Not sure how to proceed further doing the set intersection for two or more sets.

Thanks,
Sashi



Nathan Marz

unread,
Jan 8, 2025, 4:46:53 PMJan 8
to rama...@googlegroups.com
You can do this in a foreign select with a path like this:

Path.subselect(Path.multiPath(Path.key("*term1"), Path.key("*term2)))
       .view(SomeClass::unionSets)

SomeClass::unionSets would be a function you declare that's in the classpath of both the client and the module that takes in a list of sets and intersects them.

You control the partition that's queried in the foreign select with the partitioning key argument.


--
You received this message because you are subscribed to the Google Groups "rama-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rama-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/rama-user/704d105a-7046-44e2-a014-c29784d7da51n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages