Looking for examples on how to use static/dynamic claiming correctly

54 views

Skip to first unread message

Varun Gandhi

unread,

Dec 8, 2022, 10:06:12 AM12/8/22

to Kythe

Hey folks,

I was poking around with C++ extraction and one of the problems I ran into is that indexes can get very large, from a few hundred MB to 1GB+, for a single TU. On a desktop machine, I ran into my SSD overheating because of this.

I was looking into using the claiming mechanism, but haven't been able to get either the static or dynamic claiming to work, and they don't seem to be well-documented (and the static_claim tool isn't bundled with the standard distribution). I've tried the following:

For static claiming:

- What I did:

Ran extraction using:

- KYTHE_CORPUS=llvm-project KYTHE_ROOT_DIRECTORY=/home/varun/llvm-project KYTHE_OUTPUT_DIRECTORY=/home/varun/kythe-output KYTHE_DIR=/home/varun/kythe-v0.0.60 bash -c '$KYTHE_DIR/tools/runextractor compdb -extractor $KYTHE_DIR/extractors/cxx_extractor'

Then I tried running the static_claim tool after building it from source:

ls kythe-output/*.kzip | ~/kythe/bazel-bin/kythe/cxx/tools/static_claim --show_stats

- The result: Here the number of claimants is shown to be just 1, even though there are 4500+ TUs. This is because all the units have the same v_name(). Is the problem because I didn't pass KYTHE_VNAMES when running the extractor? It's documented as "Optional" so I figured it wasn't necessary.

Later, when I tried running the indexer, the size of the index was unchanged from the situation without claiming, indicating that it isn't working.

For dynamic claiming (this was tested separately):

1. Started a memcached server at the required port.

2. Passed the --experimental_dynamic_claim_cache=--SERVER=localhost:11211 flag when running the indexer.

In this case, I was looking at the memory usage of the memcached server, as well as the size of the indexes. I didn't noticed either of those change.

Any pointers to what I'm missing/how I should debug further to get the claiming to work? What kind of index size reduction can I expect from either forms of claiming? My current expectation is quite high because of the ~100x difference in SLOC vs size of pre-processed files, but I'm wondering if that is misplaced.