partbyenum for ints

JerLucid

unread,

Nov 5, 2024, 7:43:02 AM11/5/24

to Data Intellect kdb+/TorQ

Currently, partbyenum works for symbol type columns, but could you add

an option to TorQ to cater for cases where the user wishes to part by an existing

column of integer/short type?

Thanks

Jer

JerLucid

unread,

Nov 7, 2024, 7:03:41 AM11/7/24

to Data Intellect kdb+/TorQ

Along a related line, you could think about adding a template mechanism that would enable users to create dictionary encodings for each table column, as an alternative to symbol type columns

with the standard enumeration. This approach would allow users to encode strings as char, short or int types, on a per column basis.

The main benefit is a reduced in-memory and on-disk table size, faster writes and improved query speed.

Memory is reduced because you could switch out long ints (symbol type) for potentially char, short or ints. Writers are faster because you could potentially remove .Q.en if you

have no symbols. Lookups using char or short are also faster I find.

Currently, in the latest TorQ, when you query the intraday DB, you already provide a mechanism (maptopoint) for mapping symbol to their corresponding long.

neg[gwHandle](`.gw.asyncexec;"select from trade where int=maptoint[`GOOG]";`idb);gwHandle[]

The idea would be to provide a general mechanism/framework so that users can define their encodings for a range of columns.

"select from ExecutionReport where symbol = SymbolMap[`GOOG], order_type = OrderTypeMap[`Limit], side = SideMap[`BUY]"

While this approach does make the query syntax a bit more verbose, it can lead to substantial memory savings.

Additionally, feed handlers publishing data to kdb+ often prefer to avoid strings and work exclusively with integers, making it essential to have a place to store the word associated with each integer.

Peter Murphy

unread,

Apr 3, 2025, 8:58:07 AMApr 3

to Data Intellect kdb+/TorQ

Hi JerLucid,

Thank you for your suggestions!

A change has been merged that allows a WDB in partbyenum mode to partition by "raw" column values if the single p# column in sort.csv has type short, int, or long.

When mapping column values to partition values, the only modification is to clamp the column values between 0 and 2147483647. This clamping can be handled by the existing maptoint function in the IDB.

Negative/null column values and column values above 2147483647 are therefore allowed, but the user would need to take care that the 0 and 2147483647 partitions respectively don't become disproportionately large, and additional constraints may be needed on the partition column in IDB queries.

E.g. if the column "parcol" was used to partition the IDB, then these queries:

select from tab where int=maptoint[999]
select from tab where int=maptoint[-100], parcol=-100
select from tab where int=maptoint[0N], parcol=0N
select from tab where int=maptoint[2147483648], parcol=2147483648

would be equivalent to:

select from tab where int=999
select from tab where int=0, parcol=-100
select from tab where int=0, parcol=0N
select from tab where int=2147483647, parcol=2147483648

An issue has also been raised to track your feature request for custom column encodings.

Kind regards

Peter

Jeremy Lucid

unread,

Apr 3, 2025, 11:10:03 AMApr 3

to Peter Murphy, Data Intellect kdb+/TorQ

That's great Peter, thanks for letting me know.

I've had a look through your change and appreciate the work and testing you put into this one.

Good idea clamping the range between 0 and 2,147,483,647

That range should be more than enough. It's great now that the partitioning

supports "hijs" types

--
dataintellect.com
dataintellect.com/thoughts
---
You received this message because you are subscribed to a topic in the Google Groups "Data Intellect kdb+/TorQ" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kdbtorq/x1p4nzPvXZc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kdbtorq+u...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/kdbtorq/5f07d6cf-87e2-46aa-bd26-68ed3cdc554bn%40googlegroups.com.

Reply all

Reply to author

Forward