Converting Protobuf to Swagger

3,996 views
Skip to first unread message

garyla...@gmail.com

unread,
Nov 17, 2017, 12:15:22 PM11/17/17
to GA4GH Cloud Technical Work Stream
Trying to convert TRS protobuf to swagger using grpc, currently encountering issues converting enum path parameters.  Made an issue for it already: https://github.com/grpc-ecosystem/grpc-gateway/issues/486  . Does any have a workaround for this?

garyla...@gmail.com

unread,
Dec 1, 2017, 5:26:13 PM12/1/17
to GA4GH Cloud Technical Work Stream

Additionally, here's a list of issues I've encountered during the conversion process from OpenAPI2 (Swagger) to Protocol Buffers Version 3 for TRS.  This does not include issues involving the tools used to convert between them.

  1. Error HTTP responses
    Not sure if it's a design decision or a limitation, but there's currently no way to define specific HTTP responses with an expected JSON schema in Protocol Buffers.  There's only 1 response defined but no clear indicator which response code that this is for (whether it is 200, 204, 404, etc).  It's assumed to be the most common successful response.  This issue currently affects the 4 endpoints that do have an error response in OpenAPI2 (Swagger):
    1. /api/ga4gh/v1/tools/{id}/versions/{version-id}/dockerfile
    2. /api/ga4gh/v1/tools/{id}/versions/{version-id}/{type}/descriptor
    3. /api/ga4gh/v1/tools/{id}/versions/{version-id}/{type}/descriptor/{relative-path}
    4. /api/ga4gh/v1/tools/{id}/versions/{version-id}/{type}/tests
    An example how swagger defines it: https://github.com/ga4gh/tool-registry-schemas/blob/develop/src/main/resources/swagger/ga4gh-tool-discovery.yaml#L170-L173

  2. Required and optional properties
    Swagger can define which properties in a JSON schema are required and which ones are optional.  Proto3 defines all of them as optional. This is a Google design decision for backward compatibility, see https://github.com/google/protobuf/issues/2497. TES currently puts whether it's required or optional in the comments.  
  3. Header information
    Protobuf does not specify any header information such as content-type or even things like next_page, last_page, offset, etc.  It is likely a limitation with protobuf.  This currently affects /api/ga4gh/v1/tools
  4. Protobuf 3 does not allow 2 enums with properties that are present in both to be defined in the global scope.  Currently we may have two enums in Swagger
    1. DescriptorType
      1. CWL
      2. WDL
    2. DescriptorRequestType
      1. CWL
      2. WDL
      3. PLAIN_CWL
      4. PLAIN_WDL
    Since CWL and WDL are both present in both enums, this is not valid in protobuf.  An error appears: Note that enum values use C++ scoping rules, meaning that enum values are siblings of their type, not children of it.  Therefore, "CWL" must be unique within "GA4GH", not just within "DescriptorRequestType".
    This means when a Swagger.yaml with these 2 enums are converted into protobuf with Openapi2proto, a prefix will be prepended onto the enums so it will not conflict.  Currently it gets converted to something like this:

    enum GetToolsIdVersionsVersion_idDescriptorRelative_pathRequest_Type {

            GETTOOLSIDVERSIONSVERSION_IDDESCRIPTORRELATIVE_PATHREQUEST_TYPE_CWL = 0;

            GETTOOLSIDVERSIONSVERSION_IDDESCRIPTORRELATIVE_PATHREQUEST_TYPE_WDL = 1;

            GETTOOLSIDVERSIONSVERSION_IDDESCRIPTORRELATIVE_PATHREQUEST_TYPE_PLAIN_CWL = 2;

            GETTOOLSIDVERSIONSVERSION_IDDESCRIPTORRELATIVE_PATHREQUEST_TYPE_PLAIN_WDL = 3;

        }

    instead of the expected:

    enum DescriptorRequestType {

           CWL = 0;

           WDL = 1;

           PLAIN_CWL = 2;

           PLAIN_WDL = 3;

        }

Additional notes for OpenAPI3 (not that important):

  • OpenAPI3 is not backwards compatible with OpenAPI2 (Swagger)
  • Swagger editor and swagger-ui both support OpenAPI3
  • Lossless conversion process from OpenAPI2 (Swagger) to OpenAPI3 although currently no automated tools to do it
  • No stable version of swagger-codegen that supports OpenAPI3

Alex Buchanan

unread,
Dec 1, 2017, 5:44:45 PM12/1/17
to GA4GH Cloud Technical Work Stream
On Friday, 1 December 2017 14:26:13 UTC-8, garyla...@gmail.com wrote:
  1. Error HTTP responses
    Not sure if it's a design decision or a limitation, but there's currently no way to define specific HTTP responses with an expected JSON schema in Protocol Buffers.  There's only 1 response defined but no clear indicator which response code that this is for (whether it is 200, 204, 404, etc).  It's assumed to be the most common successful response.  This issue currently affects the 4 endpoints that do have an error response in OpenAPI2 (Swagger):
    1. /api/ga4gh/v1/tools/{id}/versions/{version-id}/dockerfile
    2. /api/ga4gh/v1/tools/{id}/versions/{version-id}/{type}/descriptor
    3. /api/ga4gh/v1/tools/{id}/versions/{version-id}/{type}/descriptor/{relative-path}
    4. /api/ga4gh/v1/tools/{id}/versions/{version-id}/{type}/tests
    An example how swagger defines it: https://github.com/ga4gh/tool-registry-schemas/blob/develop/src/main/resources/swagger/ga4gh-tool-discovery.yaml#L170-L173

Not sure I follow. Do you want to explicitly state which each response codes that can be expected for each endpoint? Isn't it easier to assume 404 can be expected from most endpoints
 
  1. Header information
    Protobuf does not specify any header information such as content-type or even things like next_page, last_page, offset, etc.  It is likely a limitation with protobuf.  This currently affects /api/ga4gh/v1/tools

We have an example of how we're doing pagination here: https://github.com/ga4gh/task-execution-schemas/blob/master/task_execution.proto#L422  
Do you have an example of where you need to specify content type?
 
  1. Protobuf 3 does not allow 2 enums with properties that are present in both to be defined in the global scope.  Currently we may have two enums in Swagger
    1. DescriptorType
      1. CWL
      2. WDL
    2. DescriptorRequestType
      1. CWL
      2. WDL
      3. PLAIN_CWL
      4. PLAIN_WDL
Maybe those can be simplified into just CWL and WDL with view field requesting plain vs wrapped? We have an example of views here: https://github.com/ga4gh/task-execution-schemas/blob/master/task_execution.proto#L417
Or, combine the two enums into one?
 
  1. Since CWL and WDL are both present in both enums, this is not valid in protobuf.  An error appears: Note that enum values use C++ scoping rules, meaning that enum values are siblings of their type, not children of it.  Therefore, "CWL" must be unique within "GA4GH", not just within "DescriptorRequestType".
    This means when a Swagger.yaml with these 2 enums are converted into protobuf with Openapi2proto, a prefix will be prepended onto the enums so it will not conflict.  Currently it gets converted to something like this:

    enum GetToolsIdVersionsVersion_idDescriptorRelative_pathRequest_Type {

            GETTOOLSIDVERSIONSVERSION_IDDESCRIPTORRELATIVE_PATHREQUEST_TYPE_CWL = 0;

            GETTOOLSIDVERSIONSVERSION_IDDESCRIPTORRELATIVE_PATHREQUEST_TYPE_WDL = 1;

            GETTOOLSIDVERSIONSVERSION_IDDESCRIPTORRELATIVE_PATHREQUEST_TYPE_PLAIN_CWL = 2;

            GETTOOLSIDVERSIONSVERSION_IDDESCRIPTORRELATIVE_PATHREQUEST_TYPE_PLAIN_WDL = 3;

        }

    instead of the expected:

    enum DescriptorRequestType {

           CWL = 0;

           WDL = 1;

           PLAIN_CWL = 2;

           PLAIN_WDL = 3;

        }

I guess these need manual conversion. Are you trying to set up an automated process that continuously converts swagger to proto?


-Alex

garyla...@gmail.com

unread,
Dec 5, 2017, 11:14:23 AM12/5/17
to GA4GH Cloud Technical Work Stream
In general, I'm merely trying to convert the existing TRS swagger yaml to protobuf 3 without making any changes to the original swagger.yaml or its current implementations with the exception of styles (json properties naming, etc).

1. Currently TRS swagger.yaml defines a specific Error response object to be returned with each 404 response.  Ideally, all TRS would comply with this.  In general, here's a list of responses that swagger could define if needed:  http://developers.zenodo.org/?python#http-status-codes
3. From what I can tell, it looks like your example's request and response has the relevant info in the body.  Whereas TRS yaml currently has info in the request's query parameters and the header of the response https://github.com/ga4gh/tool-registry-schemas/blob/develop/src/main/resources/swagger/ga4gh-tool-discovery.yaml#L121-L133 .  In the future, it might've been better to switch to https://tools.ietf.org/html/rfc5988#section-5 but it won't be possible in protobuf if it doesn't define header info.  As for the content-type, validation checks if the response is "application/json" for most endpoints.  However, there is also an endpoint that returns "text/plain" which should also be validated but it doesn't look like protobuf defines content-type anywhere.
4. Adding an additional field would require modifications to the original TRS swagger yaml and all current implementations of it.  It's a possible solution but the issue of protobuf not being able to define 2 global enums with overlapping properties still remain.  Combining two enums into one is unlikely to work as it would appear as if the endpoint that should accept/return only CWL and WDL also accepts/returns PLAIN_CWL and PLAIN_WDL. 

I've attempted both manual and automatic conversion to protobuf.  The automatic conversion changes the enum properties by uniquely prepending it but is valid protobuf.  The manual conversion I had defined the enums globally as:

enum DescriptorRequestType {

       CWL = 0;

       WDL = 1;

       PLAIN_CWL = 2;

       PLAIN_WDL = 3;

}

enum DescriptorType {

       CWL = 0;

       WDL = 1;

}

which is not valid protobuf apparently due the warning I mentioned originally.  In the mean time, we're switching it to a string with allowable values that will be mentioned in its comment.

Alex Buchanan

unread,
Dec 5, 2017, 2:42:42 PM12/5/17
to GA4GH Cloud Technical Work Stream
Ah, I understand. Thanks.

1.  The spec could have notes to implementors recommending consistent error messages for status codes. Also, we could look at "options" for defining custom endpoint metadata to be used by code generators: https://developers.google.com/protocol-buffers/docs/proto3#options . Personally, I think the HTTP status codes are well-defined and consistent messages aren't necessary, but that's just me.

3. The server implementation (grpc-gateway in the case of Funnel) uses the request message to parse fields from the URL parameters. http://localhost/v1/tasks?view=FULL&page_token=1234 is valid in Funnel.

4. A string with docs seems like a decent compromise. Possibly the "right" solution here is to plan a breaking change to TRS. PLAIN vs WRAPPED is a better description and is more extensible to future descriptor languages.

Alex Buchanan

unread,
Dec 6, 2017, 2:21:29 PM12/6/17
to GA4GH Cloud Technical Work Stream

Alex Buchanan

unread,
Dec 6, 2017, 2:22:50 PM12/6/17
to GA4GH Cloud Technical Work Stream

denis.yuen

unread,
Dec 6, 2017, 2:32:23 PM12/6/17
to GA4GH Cloud Technical Work Stream
No apologies needed, we're specifically asking for some useful resources from protobuf experts.
Between our experience with OpenAPI/Swagger and the effort in the TES repo https://github.com/ga4gh/task-execution-schemas/pull/97 we should be able to work out something consistent.

David Steinberg

unread,
Dec 6, 2017, 6:46:08 PM12/6/17
to GA4GH Cloud Technical Work Stream
Hi all!

After over a year of playing the proto-swagger game I want to encourage switching over to a pure OpenAPI approach. It is more clear in our case and better supported. 

There are two main reasons for this: first is this thread, protobuf is for defining a grpc and the problems of defining an HTTP1.1 api are secondary features. 

JSON serialization of protobuf does not match the grpc-gen-swagger pipeline we are currently using (see how big ints are serialized. This means to use swagger based on the generated proto one has to make modifications to the library to override this type (http://github.com/david4096/bravado-core). Not to mention that google CamelCases their JSON while grpc-gen-swagger does not. 

The second problem is that to define validation or errors one has to use and abuse options and comments. OpenAPI has better control and we are inducing an impedance mismatch with protobuf with no observable gain. 

Thanks for all of the great suggestions! Sorry for the double send Denis!

Best!
David

On Wed, Dec 6, 2017 at 11:32 AM, denis.yuen <denis...@gmail.com> wrote:
No apologies needed, we're specifically asking for some useful resources from protobuf experts.
Between our experience with OpenAPI/Swagger and the effort in the TES repo https://github.com/ga4gh/task-execution-schemas/pull/97 we should be able to work out something consistent.

--
You received this message because you are subscribed to the Google Groups "GA4GH Cloud Technical Work Stream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ga4gh-cloud+unsubscribe@genomicsandhealth.org.



--

David Steinberg
Bioinformatics Programmer
UC Santa Cruz Genomics Institute
linkedin - github - twitter

Angel Pizarro

unread,
Dec 6, 2017, 7:00:28 PM12/6/17
to David Steinberg, GA4GH Cloud Technical Work Stream
+1
To unsubscribe from this group and stop receiving emails from it, send an email to ga4gh-cloud...@genomicsandhealth.org.

Erik van den Bergh

unread,
Dec 7, 2017, 4:29:58 AM12/7/17
to Angel Pizarro, David Steinberg, GA4GH Cloud Technical Work Stream
We've had our own issues converting the grpc spec to 'proper' swagger that can be used further downstream. Also from reading experiences with protobuf, I agree that defining an API is only a secondary feature. So from the EMBL-EBI side we would fully support the switch to OpenAPI.

Best,
Erik

+1
To unsubscribe from this group and stop receiving emails from it, send an email to ga4gh-cloud+unsubscribe@genomicsandhealth.org.

Alex Buchanan

unread,
Dec 7, 2017, 7:07:32 PM12/7/17
to Erik van den Bergh, Angel Pizarro, David Steinberg, GA4GH Cloud Technical Work Stream
I don't doubt that converting proto to swagger to code is problematic, that's a spec on top of a spec on top of a code generator on top of a library :)  I would recommend using one or the other, not both. Can you cut out the swagger middleman? Or would that require rebuilding a lot of code?

If I understand correctly, Google defines all their APIs in protobuf and serves them via HTTP 1.1. If that's true, are APIs really a secondary feature?

David, I guess you mean int64 is serialized as strings? I believe that's protecting you from data loss. Or am I misunderstanding?

Required properties was dropped from protobuf after years of experience using them for hundreds of APIs. Do we really want to ignore that?

I don't see why response descriptions are necessary. HTTP status codes are well defined. For example, in TRS, are these valuable descriptions for 200 and 404? https://github.com/ga4gh/tool-registry-schemas/blob/develop/src/main/resources/swagger/ga4gh-tool-discovery.yaml#L165

To be fair, we have also spent some time scratching our heads over protobuf/JSON, such as how empty/zero-value fields are handled. Won't OpenAPI will have its own quirks and pitfalls though?

All that said, I don't care if we write in YAML or protobuf. The important part is having a design guide for representing resources consistently, and avoiding pitfalls that aren't easy to learn, such as converting int64 to a JS Number. The Google API design guide is a great resource: https://cloud.google.com/apis/design/ . YAML or protobuf, we should extract as much value from that guide as possible.

-A


To unsubscribe from this group and stop receiving emails from it, send an email to ga4gh-cloud...@genomicsandhealth.org.

denis.yuen

unread,
Dec 8, 2017, 12:19:59 PM12/8/17
to GA4GH Cloud Technical Work Stream, evdb...@ebi.ac.uk, dela...@gmail.com, dav...@ucsc.edu
Hi,

Sorry, I probably opened a bit of a Pandora's box. I just want to make sure I understand everyone's perspectives before sharing ours.

TES: primarily defined in proto buffers. I've been monitoring https://github.com/ga4gh/task-execution-schemas/pull/97 which is attempting to auto-generate swagger for client use or possibly interactive documentation (like editor.swagger.io)
WES: looks like Erik is either working with this/referring to it, but this is also defined in protocol buffers

TRS: this is our project, we've implemented this initially in swagger (and we subsequently auto-generated server and client stubs to use in our code).
Our initial plan was to convert this into protoc either manually or automatically to conform with the previous vote. Then convert it back again through an automatic process for use in our code.
The protobuf definition could be treated as the "master copy" and we could continue on our merry way generating code from swagger.
We're going to have to make some breaking changes anyway to match the style guide recommendations from the previous vote (which I think are good).

Those were my initial thoughts. Gary has been doing most of the leg work on this and it seems that many of the incompatibilities between what openAPI can represent and what protoc can represent can be dealt with by comments.
For example, we do things like control pagination using headers like the GitHub API https://developer.github.com/v3/guides/traversing-with-pagination/  , maybe that could be explained as a comment in the protobuf representation?
It does mean that we might need to treat the OpenAPI definition as the "master copy' and the protoc as a best effort conversion. but I still think that may be helpful for folks that are more familiar with protobuf.
I wouldn't hold TRS up as a paragon of APIs, definitely the contrary. But I do think it has been a good learning experience although annoying in the sense that we're running into the edge(?) cases that Gary listed earlier.

My intent in getting Gary to share his experiences was to get tips on how to represent things in protobuf just in case the automatic converter was missing features or was incorrect.
I think the resources that have been shared so far will definitely be helpful. Hopefully that sheds light on a few things.
+1
To unsubscribe from this group and stop receiving emails from it, send an email to ga4gh-cloud...@genomicsandhealth.org.



--

David Steinberg
Bioinformatics Programmer
UC Santa Cruz Genomics Institute
linkedin - github - twitter

--
You received this message because you are subscribed to the Google Groups "GA4GH Cloud Technical Work Stream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ga4gh-cloud...@genomicsandhealth.org.

--
You received this message because you are subscribed to the Google Groups "GA4GH Cloud Technical Work Stream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ga4gh-cloud...@genomicsandhealth.org.

Alex Buchanan

unread,
Dec 8, 2017, 12:41:20 PM12/8/17
to denis.yuen, GA4GH Cloud Technical Work Stream, evdb...@ebi.ac.uk, dela...@gmail.com, dav...@ucsc.edu
As you said, no apologies necessary here. Everyone is making really good, valuable, valid points. There are just a lot of points to sort through. Consistency is a worthy goal, but we won't get there overnight either.

Pagination is a great example of a design point we are likely diverging on, and should considering having a style guide for.
Reply all
Reply to author
Forward
0 new messages