--
You received this message because you are subscribed to the Google Groups "Kythe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kythe+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kythe/88effce7-b8f6-41d3-82ea-ba5d208f993do%40googlegroups.com.
This doesn't answer your actual question, but just in case it's helpful: Chrome's code search has the v8 code xref'ed via Kythe.On Wed, Jun 10, 2020 at 11:15 AM <tyle...@gmail.com> wrote:I have been attempting to use Kythe to index a large source tree. As an example I am trying this out on Google's v8 javascript engine.First off, I had a little bit of difficulty setting up the extraction. For example, Kythe did not understand what to do with compiler commands that included response files with the @./response_file syntax, and would simply error out. My understanding is that projects like v8 and chromium are set up for indexing/extraction with Kythe--which is part of why I was using v8 as a test--but I couldn't find any information on that. It might be a helpful piece of documentation for those trying to integrate Kythe with similarly large-scale projects. I've mostly been following https://kythe.io/examples to try to understand how to do things, but it seems likely I've been messing something up.
Eventually I got extraction working and ended up with about 2GB of kzip files, which seems pretty reasonable. I then merged these with the merge tool, which ended up producing a shockingly small ~90MB kzip file.Now when I go to run /opt/kythe/indexers/cxx_indexer to produce.... well I'm not exactly sure yet. After running out of free disk space the first time, I restarted cxx_extractor and rather than writing to disk, piped its output directly to /opt/kythe/tools/write_entries, with pv in between them so I get a little bit of introspection.It's been running for over 20 hours now, and has output about 440GB(!) of data to write_entries, which in turn so far has produced a pretty reasonable 4GB database.
So I guess my questions are:1) In general, how does one run Kythe on large scale projects? Is there some clean/easy interface I have missed? Even if it requires work to integrate for other projects, just seeing how Kythe integrates with v8 or chromium as an example would be useful.
2) What should I be expecting from cxx_indexer? how much data should I expect it to output from a large extraction? Is several hundreds of gigabytes normal?
3) How long should one expect Kythe to complete indexing on a large scale project? Is dozens of hours normal? Is there some option or setting or alternate way to use the data that will make it faster?
----I appreciate Kythe's interface and accuracy compared to other indexing tools so I would love to use it. I assume I messed something up on my end that is causing it to be so slow, but am not sure what it could be.Cheers,Tyler
You received this message because you are subscribed to the Google Groups "Kythe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kythe+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kythe/88effce7-b8f6-41d3-82ea-ba5d208f993do%40googlegroups.com.
You received this message because you are subscribed to the Google Groups "Kythe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kythe+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kythe/CAFzwtj3092SNboX3%3D_sPiSd6EorrrM9mhSm_OTkw26di%2BRMsWw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kythe/CA%2BiRVeeeKzyzWrDYtB00jqqTrvahfZA7gXBvjTNBuCMbtWPCYQ%40mail.gmail.com.
Hi Tyler!I'm also "struggling" with C++, in that I try to manage an extraction. Let me share my commands so far (many inspired by kythe/release/kythe.sh in the repo). I didn't know about the merge tool, will try it (maybe I wouldn't need the memcached based claiming?)
To view this discussion on the web visit https://groups.google.com/d/msgid/kythe/CANiLjA1UDr5L_RL2CFTGA4p-2JjnZghN9%3DmCSrSUbCyjux6aZg%40mail.gmail.com.
On Wed, Jun 10, 2020 at 11:24 AM 'Evan Martin' via Kythe <ky...@googlegroups.com> wrote:This doesn't answer your actual question, but just in case it's helpful: Chrome's code search has the v8 code xref'ed via Kythe.On Wed, Jun 10, 2020 at 11:15 AM <tyle...@gmail.com> wrote:I have been attempting to use Kythe to index a large source tree. As an example I am trying this out on Google's v8 javascript engine.First off, I had a little bit of difficulty setting up the extraction. For example, Kythe did not understand what to do with compiler commands that included response files with the @./response_file syntax, and would simply error out. My understanding is that projects like v8 and chromium are set up for indexing/extraction with Kythe--which is part of why I was using v8 as a test--but I couldn't find any information on that. It might be a helpful piece of documentation for those trying to integrate Kythe with similarly large-scale projects. I've mostly been following https://kythe.io/examples to try to understand how to do things, but it seems likely I've been messing something up.It depends a great deal on which extractor you're using as to whether or not they support@-style parameter files (and which syntax -- there is no single standard for such files). Which extractor were you having issues with?
Eventually I got extraction working and ended up with about 2GB of kzip files, which seems pretty reasonable. I then merged these with the merge tool, which ended up producing a shockingly small ~90MB kzip file.Now when I go to run /opt/kythe/indexers/cxx_indexer to produce.... well I'm not exactly sure yet. After running out of free disk space the first time, I restarted cxx_extractor and rather than writing to disk, piped its output directly to /opt/kythe/tools/write_entries, with pv in between them so I get a little bit of introspection.It's been running for over 20 hours now, and has output about 440GB(!) of data to write_entries, which in turn so far has produced a pretty reasonable 4GB database.2GB -> 90MB seems pretty reasonable. The bulk of the individual .kzip files are generally the dependencies and those are frequently heavily duplicated between compilation units. As a result, merging them will reduce the size dramatically. C++ takes a long time to index and generally results in a *lot* of duplication for a variety of reasons, but mostly templates. Asking the C++ indexer to index a single large kzip with many compilations will serialize indexing and be very slow. You can speed things up by invoking the indexer in parallel on the individual compilation units (rather than merging them) and combining the output. Additionally, there are a variety of flags available on the C++ indexer itself which can help dramatically, the biggest being claiming (particularly --cache and --experimental_dynamic_claim_cache) and template (especially experimental_alias_template_instantiations) related flags. Some of these flags do reduce or simplify the output graph somewhat, though.Unfortunately, the flags themselves are a bit scattered around the code:
So I guess my questions are:1) In general, how does one run Kythe on large scale projects? Is there some clean/easy interface I have missed? Even if it requires work to integrate for other projects, just seeing how Kythe integrates with v8 or chromium as an example would be useful.See above (I'm not sure what I'm allowed to discuss about the internal deployment, so apologies for being vague).2) What should I be expecting from cxx_indexer? how much data should I expect it to output from a large extraction? Is several hundreds of gigabytes normal?Yes, especially if you aren't using claiming and are exhaustively indexing templates.3) How long should one expect Kythe to complete indexing on a large scale project? Is dozens of hours normal? Is there some option or setting or alternate way to use the data that will make it faster?There are many that can help, but the biggest are likely to be claiming and parallelization.--Shahms
--I appreciate Kythe's interface and accuracy compared to other indexing tools so I would love to use it. I assume I messed something up on my end that is causing it to be so slow, but am not sure what it could be.Cheers,Tyler
You received this message because you are subscribed to the Google Groups "Kythe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ky...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kythe/88effce7-b8f6-41d3-82ea-ba5d208f993do%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Kythe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ky...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kythe/CAFzwtj3092SNboX3%3D_sPiSd6EorrrM9mhSm_OTkw26di%2BRMsWw%40mail.gmail.com.
--:-P
Hi Tyler!I'm also "struggling" with C++, in that I try to manage an extraction. Let me share my commands so far (many inspired by kythe/release/kythe.sh in the repo). I didn't know about the merge tool, will try it (maybe I wouldn't need the memcached based claiming?)Extraction:- Run bazel with "-output_base /mnt/data/kythe/output_base". So your home doesn't run out of disk space. Also, your cache won't get trashed if you change compiler options (if you keep the output_base separate... maybe).- Also with "--bazelrc=/opt/kythe/extractors.bazelrc", maybe after commenting out mistyped extractor names and disabling the proto toolchains (which don't seem to work for me even after fiddling, but maybe it is just me)- find /mnt/data/kythe/output_base/execroot/io_kythe/bazel-out/k8-opt/extra_actions/ -name '*.cxx.kzip' | sort > unitsIndexing:- start "memcached" after apt-getting it.- time cat units | parallel --gnu --tmpdir /mnt/data/tmp -L1 ./bazel-bin/kythe/cxx/indexer/cxx/indexer --ignore_unimplemented --experimental_index_lite --experimental_dynamic_claim_cache="--SERVER=localhost:11211" -cache="--SERVER=localhost:11211" -cache_stats | ./bazel-bin/kythe/go/platform/tools/dedup_stream/dedup_stream | ./bazel-bin/kythe/go/storage/tools/write_entries/write_entries -workers 12 -graphstore /mnt/data/kythe/kythe_repo/gs2
Serving tables:- You will struggle. Beam pipeline may work in GCE, but not (efficiently) locally, and not yet on Flink. Legacy pipeline will still be somewhat slow (but atleast uses concurrent disksorting? Maybe I'm wrong.)- I'm working on some code to have an alternative serving representation as a hobby. Others in the list mentioned having other improvements in the pipes.- UI. Well, there's no good UI. My own project Underhood (https://github.com/TreeTide/underhood) is going over some facelift in the robinp-uispeed branch, but I wouldn't advertise it as production-ready (or anything-ready). Will see if the alternative serving representation would help on iterating it.
To unsubscribe from this group and stop receiving emails from it, send an email to ky...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kythe/88effce7-b8f6-41d3-82ea-ba5d208f993do%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Kythe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ky...@googlegroups.com.