Hi,
I'm trying to use Kythe on a large C++ project with multiple 3rd party libraries and I've run into a few issues with the size and contents of the Kythe output
I'm running extraction using
${KYTHE_DIR}/tools/runextractor cmake -extractor=${KYTHE_DIR}/extractors/cxx_extractor -sourcedir=${CMAKE_PROJECT_DIR}
This produces a large but reasonable output of ~3300 kzip files that sum up to 9.1 GB which is in line with the amount of cpp files in the project. Execution time is also relatively short, especially compared to indexing.
I've got some problems when it comes to indexing. Running:
${KYTHE_DIR}/indexers/cxx_indexer --ignore_unimplemented -- ${KYTHE_OUTPUT_DIRECTORY}/*.kzip >> entries
produces a massive output of over 3TB, most of it duplicates that I tried removing later (either by custom script of using
/tools/entrystream).
Running indexing on each kzip in batches eg. indexing 100 files at once and removing duplicates on the batch helps in managing the diskspace the indexing produces, but the process is painfully slow.
To help with the size of the output I have a few questions:
1. Is there a way to exclude specific paths/directories from extraction/indexing? Most of the entries are 3rd party libraries that live in one directory, so excluding those from the output would help as I don't need a graph for that
2. I found some caching params in the cxx_indexer documentation, but I'm struggling to find the details of how to use it and how it actually works. Could caching work for this specific issue? Can you point me to any examples on how to use it?
3. Is there a way to exclude specific nodes or edges from the output?
Please let me know if you have any other suggestions to speed up the process and/or use less resources
Regards,
Filip Szewczyk