WDB partbyattr memory limits

83 views
Skip to first unread message

JerLucid

unread,
Jun 8, 2023, 7:46:15 AM6/8/23
to AquaQ kdb+/TorQ
Hi team,

I have been looking at the EOD Merge process where the WDB
has already written the data down intraday using partbyattr mode.

In particular, the getpartchunks function within merge.q
If I am understanding it correctly, it tries to group partitions
whose row sum, or bytes sum, is less than the mergelimit.
So that in the next step,

mergebypart[tableinfo[0];(` sv dest,`)]'[partchunks];

It can use  get to load in multiple partitions at the same time.

However, in the situation where an individual partition is larger than the
mergelimit, get will try to load it directly into memory?
It wont read and write that large partition sequentially in chunks of size mergelimit.

I am worried that if some of my partitions are very large, then
memory will spike during this merge process.

Please let me know if I am understanding this process correctly
and if there is anything I can do to minimise the memory peaks.

Thanks












JerLucid

unread,
Jun 8, 2023, 8:07:55 AM6/8/23
to AquaQ kdb+/TorQ
I must be completely blind, I see there is a method there already to merge by column if the limit was exceeded.

Cianan Richman

unread,
Jun 9, 2023, 7:18:48 AM6/9/23
to AquaQ kdb+/TorQ

Hi JerLucid,

 

Yes you are correct, if the limit is exceeded it is possible to merge data from memory to disk column by column to minimise any memory peaks. The wdb.q script contains three merge functions; mergebypart is optimum when you have many small partitions, mergebycol is optimum when the partition chunks are big, and mergehybrid works out which is the best way to merge the partition chunk.

I would recommend the mergehybrid function as this will cover both scenarios. In the case of a partition chunk over the rowcount/bytesize limit, the chunk will be merged column by column, otherwise if it is within the rowcount/bytesize limit the entire partition chunk will be merged. Using the hybrid function should minimise any memory peaks. If you have any further questions feel free to reach out.

 

Thanks,

Cianan Richman
Data Intellect


On Thursday, June 8, 2023 at 12:46:15 PM UTC+1 JerLucid wrote:

Jeremy Lucid

unread,
Jun 9, 2023, 7:54:25 AM6/9/23
to Cianan Richman, AquaQ kdb+/TorQ
Thanks Cianan, that all makes sense. I am switching to the hybrid mode, that ticks all boxes



--
www.aquaq.co.uk
www.aquaq.co.uk/blog
www.aquaq.co.uk/training
---
You received this message because you are subscribed to a topic in the Google Groups "AquaQ kdb+/TorQ" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kdbtorq/jJvKaR1hFcE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kdbtorq+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/kdbtorq/87c17c99-1dd6-4ba6-aa73-5f6f5a4394ean%40googlegroups.com.

Jeremy Lucid

unread,
Jun 12, 2023, 3:07:16 AM6/12/23
to Cianan Richman, AquaQ kdb+/TorQ
Actually Cianan, one thing I did notice in part mode is that if there is a large number of small partitions, then the grouping can result in the age old error of "Too many open files" 

My system file descriptors limit was low at 4096, and increasing that fixed it, but it's one of those things that could go unnoticed until you get a day with low sizes across many partitions. I had about 1000 partitions, 20 cols per table.

James Massey

unread,
Jun 12, 2023, 4:03:54 AM6/12/23
to AquaQ kdb+/TorQ
Hi Jeremy, 

Thanks for the information on this, we are currently looking into this issue on TorQ, see link below. 

Thanks,
James

Jeremy Lucid

unread,
Jun 12, 2023, 9:05:54 AM6/12/23
to James Massey, AquaQ kdb+/TorQ
Ok brilliant,thanks for the update 

Reply all
Reply to author
Forward
0 new messages