Neo4j Lucene index lookup performance when memory-constrained

101 views
Skip to first unread message

Zongheng Yang

unread,
Jul 28, 2015, 8:33:14 PM7/28/15
to Neo4j, Mattias Persson
Hi Neo4j devs,

My application does the following: constantly do some Lucene index lookups, then loop over the result nodes and get the IDs:

        ResourceIterator<Node> nodes = graphDb.findNodes(
            label, "name" + attr, search);
        Set<Long> userIds = new HashSet<Long>();
        while (nodes.hasNext()) {
            userIds.add(nodes.next().getId());
        }

Environment. Linux box, 15GB RAM, 2GB JVM heap. The Neo4j store files total 29GB on-disk; the Lucene indexes total 6GB. Using Neo4j 2.2 embedded; cache_type is set to none.

Symptom 1. When the Neo4j page cache size (dbms.pagecache.memory) is set to low enough (<= 8.5GB) -- hence leaving enough space for the Lucene indexes -- the latency looks good enough.  

Symptom 2. However, when it is set slightly larger -- to 9.5GB or 10GB -- the following starts to happen during the queries. Constant high IO wait; the OS constantly reads in tens of MBs; constant stream of 3k+ maj_flt for the Java process.  It seems as if the indexes could not evict the Neo4j pages, or in other words, as if the index pages were being independently LRU-cached.  The CPU constantly waits for IO to bring in some pages (I'd guess most likely all Lucene pages) to do any work (1% usr usage every ~10 seconds).

This is very surprising to me, as I'd expect even in memory-constrained cases like this the following would happen: the Lucene indexes would compete against and eventually win over the Neo4j store pages (brought into memory by full warmup done at start time) in the OS page cache, and hence the high IO would occur initially but decrease to none later (5.8 GB of indexes should fit comfortably in 15GB RAM). 

Could someone explain why the above would be happening?

Zongheng


Chris Vest

unread,
Jul 29, 2015, 3:31:06 AM7/29/15
to ne...@googlegroups.com
Neo4j has it’s own page cache, the size of which is controlled by dbms.pagecache.memory. This page cache is not backed by the OS page cache, and it is not used for Lucene’s memory mapping. Lucene does its own independent IO, and thus benefits from the index files fitting in the OS page cache.

--
Chris Vest
System Engineer, Neo Technology
[ skype: mr.chrisvest, twitter: chvest ]


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zongheng Yang

unread,
Jul 29, 2015, 3:56:25 AM7/29/15
to ne...@googlegroups.com
Hmm, I thought Neo4j's page cache uses java.nio.ByteBuffer, which eventually does mmap, and hence it is technically backed by the OS page cache?  I understand all of your other points.

You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/cXM8fKY8-zs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

Chris Vest

unread,
Jul 29, 2015, 5:14:53 AM7/29/15
to ne...@googlegroups.com
Nope, we do our own page caching, with allocating native memory and swapping pages back and forth between memory and files, depending on the fluctuating needs of the system. It has been like this since 2.2. We have our own Page, PagedFile and PageCursor concepts, that we use where one would normally use FileChannel and ByteBuffer.


--
Chris Vest
System Engineer, Neo Technology
[ skype: mr.chrisvest, twitter: chvest ]


Zongheng Yang

unread,
Jul 30, 2015, 4:24:27 PM7/30/15
to ne...@googlegroups.com
Thanks, Chris!  Basically, the Neo4j page cache is malloc'd through Unsafe on native memory, and counts as normal application memory from the OS's perspective; is this right?

I also found the doc you initially committed [1], which has been helpful.  Leaving it here for others.

Chris Vest

unread,
Jul 31, 2015, 4:59:49 AM7/31/15
to ne...@googlegroups.com

On 30 Jul 2015, at 22:24, Zongheng Yang <zongh...@gmail.com> wrote:

Thanks, Chris!  Basically, the Neo4j page cache is malloc'd through Unsafe on native memory, and counts as normal application memory from the OS's perspective; is this right?

That’s right.
Reply all
Reply to author
Forward
0 new messages