Efficient Range Query in Rama ?

57 views
Skip to first unread message

jacob franco

unread,
Aug 16, 2024, 4:55:56 PM8/16/24
to rama-user

I'm working on a time-based query in Rama and need some advice on the most efficient approach. Here's my current setup:

My PState looks like this
```
stream.pstate("$$timeToItem",
PState.mapSchema(Long.class, PState.mapSchema(Integer.class, Item.class)));

```
This represents a mapping where:

  • The outer key is a Long timestamp
  • The inner key is an Integer item ID
  • The value is the Item object itself

Each Item object contains both the id field (Integer) and the timestamp (Long).

Here's how I'm processing the data from the depot:
```
stream.source("*itemDepot").out("*item")
    .macro(Helpers.extractFields("*item", "*id", "*start"))
    .localTransform("$$itemIdToItem", Path.key("*id").termVal("*item"))
    .localTransform("$$timeToItems", Path.key("*start").key("*id").termVal("*item"));
```
I'm trying to query all items that fall within a given timeframe. Here's my current attempt:
```
topologies.query("getItemsByTimeRange", "*startTime", "*endTime").out("*result")
    .hashPartition("*startTime")
    .localSelect("$$timeToItems",
        Path.subselect(
            Path.sortedMapRange("*startTime", "*endTime")
                .filterEqual(new Expr(Ops.AND,
                    new Expr(Ops.GREATER_THAN_OR_EQUAL,
                        Path.key(),
                        "*startTime"),
                    new Expr(Ops.LESS_THAN_OR_EQUAL,
                        Path.key(),
                        "*endTime")))))
    .out("*filteredItems")
    .originPartition()
    .each(Helpers::sortItems, "*filteredItems")
    .out("*result");
```

However, I'm encountering issues with this approach. I initially tried using Ops.RANGE, but got an overflow error, possibly due to working with Long values.

What I'm trying to achieve is:

  1. Select all items within the given time range
  2. Return a sorted list of these items

What's the most efficient way to accomplish this in Rama, given my current PState structure? I feel like I might have overcomplicated things and would appreciate any insights or suggestions for a more straightforward approach.

Thank you in advance for your help!

Nathan Marz

unread,
Aug 16, 2024, 5:13:05 PM8/16/24
to rama...@googlegroups.com
Your schema structure is fine, as long as those inner maps don't get big. If those inner maps can have more than a few hundred elements, I would subindex those.

To do your range query, all you need is the sortedMapRange navigator or one of its variants.Your usage of "subselect' and "key" aren't correct, and they're unnecessary. Here's an example of getting the sorted submap between times "*startTime" and "*endTime":

.localSelect("$$timeToItems", Path.sortedMapRange("*startTime", "*endTime")).out("*range")

Let me know if that helps.


--
You received this message because you are subscribed to the Google Groups "rama-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rama-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rama-user/d6841fad-91e5-462a-8121-e7dcbecc8ce7n%40googlegroups.com.

jacob franco

unread,
Aug 16, 2024, 5:48:18 PM8/16/24
to rama-user
Okay yeah that worked.  I don't even know what I was thinking, just an abomination of code created after hacking at it all day lol.  Thanks for the clarity!  I do have another question though about what you mean about the inner maps being big.  Do you just mean to be careful in the instance of a few hundred Items being associated with a specific timestamp?

Nathan Marz

unread,
Aug 16, 2024, 6:36:01 PM8/16/24
to rama...@googlegroups.com
Yes, that's right. If a data structure isn't subindexed, it gets read/written as a whole from disk. After a few hundred elements it's much more efficient to subindex it.



Reply all
Reply to author
Forward
0 new messages