Efficient Range Query in Rama ?

jacob franco

unread,

Aug 16, 2024, 4:55:56 PM8/16/24

to rama-user

I'm working on a time-based query in Rama and need some advice on the most efficient approach. Here's my current setup:

My PState looks like this
```
stream.pstate("$$timeToItem",
PState.mapSchema(Long.class, PState.mapSchema(Integer.class, Item.class)));

```
This represents a mapping where:

The outer key is a Long timestamp
The inner key is an Integer item ID
The value is the Item object itself

Each Item object contains both the id field (Integer) and the timestamp (Long).

Here's how I'm processing the data from the depot:
```
stream.source("*itemDepot").out("*item")
.macro(Helpers.extractFields("*item", "*id", "*start"))
.localTransform("$$itemIdToItem", Path.key("*id").termVal("*item"))
.localTransform("$$timeToItems", Path.key("*start").key("*id").termVal("*item"));
```
I'm trying to query all items that fall within a given timeframe. Here's my current attempt:
```
topologies.query("getItemsByTimeRange", "*startTime", "*endTime").out("*result")
.hashPartition("*startTime")
.localSelect("$$timeToItems",
Path.subselect(
Path.sortedMapRange("*startTime", "*endTime")
.filterEqual(new Expr(Ops.AND,
new Expr(Ops.GREATER_THAN_OR_EQUAL,
Path.key(),
"*startTime"),
new Expr(Ops.LESS_THAN_OR_EQUAL,
Path.key(),
"*endTime")))))
.out("*filteredItems")
.originPartition()
.each(Helpers::sortItems, "*filteredItems")
.out("*result");
```

However, I'm encountering issues with this approach. I initially tried using Ops.RANGE, but got an overflow error, possibly due to working with Long values.

What I'm trying to achieve is:

Select all items within the given time range
Return a sorted list of these items

What's the most efficient way to accomplish this in Rama, given my current PState structure? I feel like I might have overcomplicated things and would appreciate any insights or suggestions for a more straightforward approach.

Thank you in advance for your help!

Nathan Marz

unread,

Aug 16, 2024, 5:13:05 PM8/16/24

to rama...@googlegroups.com

Your schema structure is fine, as long as those inner maps don't get big. If those inner maps can have more than a few hundred elements, I would subindex those.

To do your range query, all you need is the sortedMapRange navigator or one of its variants.Your usage of "subselect' and "key" aren't correct, and they're unnecessary. Here's an example of getting the sorted submap between times "*startTime" and "*endTime":

.localSelect("$$timeToItems", Path.sortedMapRange("*startTime", "*endTime")).out("*range")

Let me know if that helps.

--
You received this message because you are subscribed to the Google Groups "rama-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rama-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rama-user/d6841fad-91e5-462a-8121-e7dcbecc8ce7n%40googlegroups.com.

jacob franco

unread,

Aug 16, 2024, 5:48:18 PM8/16/24

to rama-user

Okay yeah that worked. I don't even know what I was thinking, just an abomination of code created after hacking at it all day lol. Thanks for the clarity! I do have another question though about what you mean about the inner maps being big. Do you just mean to be careful in the instance of a few hundred Items being associated with a specific timestamp?

Nathan Marz

unread,

Aug 16, 2024, 6:36:01 PM8/16/24

to rama...@googlegroups.com

Yes, that's right. If a data structure isn't subindexed, it gets read/written as a whole from disk. After a few hundred elements it's much more efficient to subindex it.

To view this discussion on the web visit https://groups.google.com/d/msgid/rama-user/8865f7f4-2612-4c41-ae14-192041dbcd8en%40googlegroups.com.

Reply all

Reply to author

Forward