partbyenum for ints

53 views
Skip to first unread message

JerLucid

unread,
Nov 5, 2024, 7:43:02 AM11/5/24
to Data Intellect kdb+/TorQ
Currently, partbyenum works for symbol type columns, but could you add
an option to TorQ to cater for cases where the user wishes to part by an existing
column of integer/short type?

Thanks
Jer







JerLucid

unread,
Nov 7, 2024, 7:03:41 AM11/7/24
to Data Intellect kdb+/TorQ
Along a related line, you could think about adding a template mechanism that would enable users to create dictionary encodings for each table column, as an alternative to symbol type columns 
with the standard enumeration. This approach would allow users to encode strings as char, short or int types, on a per column basis. 
The main benefit is a reduced in-memory and on-disk table size, faster writes and improved query speed.

Memory is reduced because you could switch out long ints (symbol type) for potentially char, short or ints. Writers are faster because you could potentially remove .Q.en if you
have no symbols. Lookups using char or short are also faster I find. 

Currently, in the latest TorQ, when you query the intraday DB, you already provide a mechanism (maptopoint) for mapping symbol to their corresponding long.

neg[gwHandle](`.gw.asyncexec;"select from trade where int=maptoint[`GOOG]";`idb);gwHandle[]

The idea would be to provide a general mechanism/framework so that users can define their encodings for a range of columns.

"select from ExecutionReport where symbol = SymbolMap[`GOOG], order_type = OrderTypeMap[`Limit], side = SideMap[`BUY]"

While this approach does make the query syntax a bit more verbose, it can lead to substantial memory savings. 
Additionally, feed handlers publishing data to kdb+ often prefer to avoid strings and work exclusively with integers, making it essential to have a place to store the word associated with each integer.

Peter Murphy

unread,
Apr 3, 2025, 8:58:07 AMApr 3
to Data Intellect kdb+/TorQ
Hi JerLucid,

Thank you for your suggestions!

A change has been merged that allows a WDB in partbyenum mode to partition by "raw" column values if the single p# column in sort.csv has type short, int, or long.

When mapping column values to partition values, the only modification is to clamp the column values between 0 and 2147483647. This clamping can be handled by the existing maptoint function in the IDB.

Negative/null column values and column values above 2147483647 are therefore allowed, but the user would need to take care that the 0 and 2147483647 partitions respectively don't become disproportionately large, and additional constraints may be needed on the partition column in IDB queries.

E.g. if the column "parcol" was used to partition the IDB, then these queries:

select from tab where int=maptoint[999]
select from tab where int=maptoint[-100], parcol=-100
select from tab where int=maptoint[0N], parcol=0N
select from tab where int=maptoint[2147483648], parcol=2147483648


would be equivalent to:

select from tab where int=999
select from tab where int=0, parcol=-100
select from tab where int=0, parcol=0N
select from tab where int=2147483647, parcol=2147483648


An issue has also been raised to track your feature request for custom column encodings.

Kind regards
Peter

Jeremy Lucid

unread,
Apr 3, 2025, 11:10:03 AMApr 3
to Peter Murphy, Data Intellect kdb+/TorQ
That's great Peter, thanks for letting me know.
I've had a look through your change and appreciate the work and testing you put into this one. 
Good idea clamping the range between 0 and 2,147,483,647
That range should be more than enough. It's great now that the partitioning
supports "hijs" types




--
dataintellect.com
dataintellect.com/thoughts
---
You received this message because you are subscribed to a topic in the Google Groups "Data Intellect kdb+/TorQ" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kdbtorq/x1p4nzPvXZc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kdbtorq+u...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/kdbtorq/5f07d6cf-87e2-46aa-bd26-68ed3cdc554bn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages