Hey folks,
I was poking around with C++ extraction and one of the problems I ran into is that indexes can get very large, from a few hundred MB to 1GB+, for a single TU. On a desktop machine, I ran into my SSD overheating because of this.
I was looking into using the claiming mechanism, but haven't been able to get either the static or dynamic claiming to work, and they don't seem to be well-documented (and the static_claim tool isn't bundled with the standard distribution). I've tried the following:
For static claiming:
- What I did:
Ran extraction using:
- KYTHE_CORPUS=llvm-project KYTHE_ROOT_DIRECTORY=/home/varun/llvm-project KYTHE_OUTPUT_DIRECTORY=/home/varun/kythe-output KYTHE_DIR=/home/varun/kythe-v0.0.60 bash -c '$KYTHE_DIR/tools/runextractor compdb -extractor $KYTHE_DIR/extractors/cxx_extractor'
Then I tried running the static_claim tool after building it from source:
ls kythe-output/*.kzip | ~/kythe/bazel-bin/kythe/cxx/tools/static_claim --show_stats
- The result: Here the number of claimants is shown to be just 1, even though there are 4500+ TUs. This is because all the units have the same v_name(). Is the problem because I didn't pass KYTHE_VNAMES when running the extractor? It's documented as "Optional" so I figured it wasn't necessary.
Later, when I tried running the indexer, the size of the index was unchanged from the situation without claiming, indicating that it isn't working.
For dynamic claiming (this was tested separately):
1. Started a memcached server at the required port.
2. Passed the --experimental_dynamic_claim_cache=--SERVER=localhost:11211 flag when running the indexer.
In this case, I was looking at the memory usage of the memcached server, as well as the size of the indexes. I didn't noticed either of those change.
Any pointers to what I'm missing/how I should debug further to get the claiming to work? What kind of index size reduction can I expect from either forms of claiming? My current expectation is quite high because of the ~100x difference in SLOC vs size of pre-processed files, but I'm wondering if that is misplaced.
Varun