Generated proto code

John Edmonds

unread,

Jun 3, 2020, 10:14:22 AM6/3/20

to Kythe

This document https://www.kythe.io/docs/schema/indexing-generated-code.html describes how proto code generation can work with other languages. It mentions a use-case where users browsing source code might want to jump from generated code (e.g calling proto getters/setters in Java) directly to the generator (e.g. proto files) code. I'm looking to implement this use-case.

The document says that it's left up to Kythe clients to interpret and follow the edges. Does "Kythe clients" include the http_server/write_tables (as in ultimately the Kythe serving pipeline should take care of following the edges appropriately)? Or would it make more sense for external clients (like me) to add a post-processing step to output /kythe/edge/ref edges (or otherwise manipulate the resulting graph to work well with write_tables/http_server) to follow the generates edge from the Java semantic node back to the proto anchor node?

Robin Palotai

unread,

Jun 3, 2020, 11:29:02 AM6/3/20

to John Edmonds, Kythe

AFAIU write_tables would make the connection.

Note that the 'imputes' edge might be easier to start with.

I don't know if the legacy or beam pipeline behave the same way in this regard.

Note that the large (?) codebases the beam pipeline is practically unusable. The local executor is slow, flink portable doesn't yet work. GCE might work, but that's quite a lock in.

Robin

On Wed, Jun 3, 2020, 16:14 John Edmonds <john.a....@gmail.com> wrote:

This document https://www.kythe.io/docs/schema/indexing-generated-code.html describes how proto code generation can work with other languages. It mentions a use-case where users browsing source code might want to jump from generated code (e.g calling proto getters/setters in Java) directly to the generator (e.g. proto files) code. I'm looking to implement this use-case.

The document says that it's left up to Kythe clients to interpret and follow the edges. Does "Kythe clients" include the http_server/write_tables (as in ultimately the Kythe serving pipeline should take care of following the edges appropriately)? Or would it make more sense for external clients (like me) to add a post-processing step to output /kythe/edge/ref edges (or otherwise manipulate the resulting graph to work well with write_tables/http_server) to follow the generates edge from the Java semantic node back to the proto anchor node?

--
You received this message because you are subscribed to the Google Groups "Kythe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kythe+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kythe/59537097-12ea-4947-a7ea-a1bf9f76aab2%40googlegroups.com.

John Edmonds

unread,

Jun 3, 2020, 12:01:11 PM6/3/20

to Kythe

Yeah that's what I thought but write_tables doesn't seem to follow the edges, and the Java indexer doesn't output imputes edges, only generates edges. I'm guessing that I have to compute the imputes edges as part of my own post-processing step after the extraction/indexing. Or I'm wondering if it can be part of the pipeline and I can send patches to add it.

I have a bunch of patches internally that make the write_tables beam pipeline work well for our large codebase. I just have to find the time to do the paperwork to start releasing them

On Wednesday, June 3, 2020 at 11:29:02 AM UTC-4, Robin Palotai wrote:

AFAIU write_tables would make the connection.

Note that the 'imputes' edge might be easier to start with.

I don't know if the legacy or beam pipeline behave the same way in this regard.

Note that the large (?) codebases the beam pipeline is practically unusable. The local executor is slow, flink portable doesn't yet work. GCE might work, but that's quite a lock in.

Robin

On Wed, Jun 3, 2020, 16:14 John Edmonds <john.a...@gmail.com> wrote:

This document https://www.kythe.io/docs/schema/indexing-generated-code.html describes how proto code generation can work with other languages. It mentions a use-case where users browsing source code might want to jump from generated code (e.g calling proto getters/setters in Java) directly to the generator (e.g. proto files) code. I'm looking to implement this use-case.

The document says that it's left up to Kythe clients to interpret and follow the edges. Does "Kythe clients" include the http_server/write_tables (as in ultimately the Kythe serving pipeline should take care of following the edges appropriately)? Or would it make more sense for external clients (like me) to add a post-processing step to output /kythe/edge/ref edges (or otherwise manipulate the resulting graph to work well with write_tables/http_server) to follow the generates edge from the Java semantic node back to the proto anchor node?

--
You received this message because you are subscribed to the Google Groups "Kythe" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ky...@googlegroups.com.

Robin Palotai

unread,

Jun 3, 2020, 12:52:55 PM6/3/20

to John Edmonds, Kythe

Hm, after grepping the go code, it indeed seems not much is done to generates (even less to imputes). So my understanding was quite wrong.

> I'm guessing that I have to compute the imputes edges as part of my own post-processing step after the extraction/indexing.

'Imputes' sounds like a fully external workaround, if the more involved way of shipping metadata for 'generates' is not feasible. 'generates' involves a tight interaction between the proto code generator (to emit the metadata) and the indexer (to pick up the metadata and emit generates).

> I have a bunch of patches internally that make the write_tables beam pipeline work well for our large codebase.

Sounds interesting! Is it around reinstating concurrency? I was wondering about doing a PoC to load the graphstore content into SQL tables and querying that (instead of precomputed tables). Could make iteration on explorative / experimental batch queries faster.

Robin

To unsubscribe from this group and stop receiving emails from it, send an email to kythe+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kythe/9b8aec27-75a5-479a-9b63-2622e79d8e8d%40googlegroups.com.

Michael Fromberger

unread,

Jun 3, 2020, 1:21:15 PM6/3/20

to Kythe

On Wednesday, June 3, 2020 at 9:52:55 AM UTC-7, Robin Palotai wrote:

'Imputes' sounds like a fully external workaround, if the more involved way of shipping metadata for 'generates' is not feasible. 'generates' involves a tight interaction between the proto code generator (to emit the metadata) and the indexer (to pick up the metadata and emit generates).

We also had a couple cases where tools would read protobuf schema files and interpret them directly—without any generated intermediate code. GCL is like this, for example. So the relationship can be useful not only as a workaround.

Sounds interesting! Is it around reinstating concurrency? I was wondering about doing a PoC to load the graphstore content into SQL tables and querying that (instead of precomputed tables). Could make iteration on explorative / experimental batch queries faster.

This is definitely an area where it would be interesting to experiment—post-processing is quite resource-hungry, and I don't think we ever really had a tool that bridges the gap between trivial example corpora (e.g., a file and a handful of dependencies) and more "interesting" projects (e.g., where we'd use Beam). I'm not sure what the right approach is, if you don't want to tie it to a particular execution framework. It's (too) easy to fall down the rabbit-hole of reinventing distributed workflow management.

–M

Reply all

Reply to author

Forward