MvStore cache leak

已查看 87 次
跳至第一个未读帖子

Mark McKeown

未读,
2020年10月23日 13:13:012020/10/23
收件人 h2-da...@googlegroups.com
Hi,
     We had an OOM event that seems to be triggered by the MvStore cache, in the heap dump there were many more objects than expected - these are stored in MvStore and so would not expect to see so many in memory. The MvStore has a cache size configured to 4GB.

Looking at the heap dump and in particular the MvStore cache I can see that the "usedMemory" of the Segments in the cache is a negative number, -8011050513476,  this would mean the cache would not be evicting stuff, leading to the many more objects in memory than expected.

The other segments in the cache also seem to have negative "usedMemory".

I have attached a screenshot from VisualVM to illustrate what I am seeing. The heap is too big to calculate retained size.

 Any pointers to the next steps to find out what is causing this? The Segments in the cacheChunkRef have positive numbers for usedMemory.

cheers
Mark

--
MARK MC KEOWN ARCHITECT


THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED

If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone. Any distribution, use or copying of this email or the information it contains by other than an intended recipient is unauthorized. The views and opinions expressed in this email message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf. All email sent to or from this address is subject to electronic storage and review by WANdisco. Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

mvstore.cache.leak.png

Noel Grandin

未读,
2020年10月23日 14:15:442020/10/23
收件人 H2 Database
That looks like an integer overflow somewhere. 

I would tweak the source code in the places that modify that field, to dump stack traces, and work backwards.

Andrei Tokar

未读,
2020年10月23日 22:18:462020/10/23
收件人 H2 Database
I guess, you next step could be to reduce configured cache size to some reasonable number.
What makes you believe that you actually need such huge cache?
If you have configured MVStore cache size to be 4Gb, how can you expect usedMemory field (which is java int) to always stay positive?

Mark McKeown

未读,
2020年10月26日 04:14:552020/10/26
收件人 h2-da...@googlegroups.com
Thanks Noel, Andrei,
        The cache is split into 16 Segments each segment tracks memory used and Max cache size using longs.

Looking at the code the overflow would happen if the size of a page or  an Value is over 2GB, the memory used is stored as an in  - or in the case of a page
if two Values of 1GB are added to a page, you cannot split a page which has 1GB and the second value is added before checking the size which could cause the
overflow.

I suspect someone has added very large values which has triggered this.

cheers
Mark

case PUT: {
value = decisionMaker.selectValue(result, value);
p = p.copy();
if (index < 0) {
p.insertLeaf(-index - 1, key, value);
int keyCount;
while ((keyCount = p.getKeyCount()) > store.getKeysPerPage()
|| p.getMemory() > store.getMaxPageSize()
&& keyCount > (p.isLeaf() ? 1 : 2)) {





--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/h2-database/eaf4273c-c2d7-41dc-b77b-5e76ff9c2aafn%40googlegroups.com.

Noel Grandin

未读,
2020年10月26日 05:17:042020/10/26
收件人 h2-da...@googlegroups.com
We should either

(a) throw an exception if the cache size is so big we can't handle it

(b) use BigInteger or store the cache sizes in kilobytes or megabytes to avoid overflow.

pwagland

未读,
2020年11月1日 20:47:302020/11/1
收件人 H2 Database
Regarding b), would using a `long` be an option instead of `int`? long has a lot less overhead than `BigInteger`, and should handle even the most crazy cache sizes that people will want to throw at the code!
回复全部
回复作者
转发
0 个新帖子