Hi Jeremy, Edward,
Thanks for outlining the use cases for multiple partitioning levels. DI will flag the request with KX.
In the meantime, regarding the partbyattr/partbenum disparity, partbyenum can theoretically work with multiple p# columns, like partbyattr can - the int partitioning would just have to be done on the most granular p# column. This is not the way it is currently implemented though, and TorQ will raise an error if it detects multiple p# sort columns in sort.csv.
We could consider lifting this restriction, but the p# columns that are not used for int partitioning may be slow to filter/group, e.g. if tab has p# on columns par1 and par2, and is int partitioned by par2, then these queries
select from tab where par1=`foo
select max val by par1 from tab
will be slow, and would need to be restructured like
select from tab where int in exec int from (select first par1 by int from tab) where par1=`foo
select max val by par1 from select max val, first par1 by int from tab
which obviously isn't seamless (i.e. in the grouping example we have to implement our own map-reduce operation rather than relying on the built-in version).
Instead, we could consider int partitioning by the first p# column and applying p# to the next (and potentially g# to any others). This would make filtering or grouping fast on all attribute columns, but then the disk partitions would be larger and we would need to re-sort and re-apply p# after every WDB write. Ultimately, re-sorting is not practical, and even using all g# is not practical, since all attributes are lost when upserting to a splayed table on disk.
For these reasons there are no plans to expand the current functionality in TorQ.
The "canonical" way to do arbitrary partitioning in kdb would be
anymap, but even this approach would suffer the same problem in that it doesn’t fit naturally with the map-reduce constructs in the usual select statement, e.g.
// simple table example
select sum size by lp, sym from quotes where date=2019.01.01, sym in `c`d
// anymap table example
raze exec {[tab;s] select sum size by lp, sym:s from tab}'[prices;sym] from select from quotesnested where date=2019.01.01, sym in `c`d
Any response from KX will go in this thread.
Kind regards
Peter