RFC: support of multiple language aliases/dialects per driver

18 views
Skip to first unread message

Alex Bzz

unread,
Feb 22, 2019, 6:39:27 AM2/22/19
to bblfsh-dev
Hi,

here is a request for comments on informal proposal for a new feature for bblfsh project.
Below I'll try to describe the problem context and a possible solution.
All prior discussion on this subject is limited mostly to [9].

Problem: 
 1. language names that enry uses are different from names that bblfsh drivers used
    e.g. bash-driver parses number of *sh languages, while linguist has "shell" (that groups zsh, bash and sh)
    same for csharp-driver and C# in linguist and a cpp-driver for linguist C++

 2. single driver already may support multiple "languages" (to a different extent)
    e.g javascript-driver parses JSX+JS+Flow (but not JSON). Or cpp-driver for C and C++

Out of scope
This proposal intentionally does not touch on a more general problem of support for "mixed scripts" languages
like HTML+JS in a single document inside the same driver.

Context
Language names in Enry 
come and are re-used directly from github Linguist,
any change on this side would complicate maintenance 

Language names in Bblfsh
 - are used in the input, as argument to the Parse() API requests [2]
 - are used in the output, in protocol.v1 parseResponse.SupportedLanguages() [3]
 - are persisted in each driver's manifest file [1]
 - used by `bblfshd` in every request to route it to a driverPool \w right driver image [5]
   * either from user inupt
   * or from calling enry
   * in both cases, results are "normalized" to mitigate problem #1 [4]
 - some clients (e.g scala-clinet) does perform the same "normalization" 
   even before sending the request to bblfsd [6]

Language names for the downstream users:
 - usage pattern 1
   send every request to bblfshd. Is not affected by problems 1/2
 - usage pattern 2
   to reduce load on a bblfshd, in case of processing many files (e.g gitbase, spark-connector)
     * detect language of a file with enry,
     * check if it is in the list of SupportedLanguages() for this bblfshd instance
     * only if it is, send the request
   this hits the problem 1 hard, as right now language names are "normalized" only on input to bblfshd

Proposal
 - on problem 2: document intended language/dialect support
   Where would be the best place? 
   It's a bit different from aliases to the same language as it should be generated from manifest + include manually curated links to original language features spec 
   to clarify level support (e.g cases/features that are not yet mapped to Semantic UAST shape). Flow support in JS driver is the best case at hand here.

 - on problem 1&2: add a new array field to driver manifest which would lists all the supported languages aliases/names.
   e.g for C# language_aliaces: ["c#", "csharp"] and retrofit bblfshd inputpu/output code paths to use it for finding the driver pool.

Backward compatibility
 - a new version of bblfshd should be able to work \w old version of drivers
   if "language_aliases" is not present - use language name, as before

Rollout plan (does not need to be atomic)
  - all drivers need to be released
  - new SDK + bblfshs version need to be release
    \w hard-coded user input "normalization" removed
  - language-related documentation needs to be updated/re-generated

Side notes
  - This task could be ideal test bed for the upcoming release automation [7]
    that is discussed in that issue and can be a motivation to build a first version of it

  - This task could also be a force-function to remove v1 protocol leftovers from the master and add
    native support of the SupportedVersion request


Links:

Denys Smirnov

unread,
Mar 12, 2019, 11:50:42 AM3/12/19
to Alex Bzz, bblfsh-dev
Thanks for a detailed overview Alex! The proposal makes a total sense and is pretty straightforward to implement (at least the language_aliaces part).

The SupportedVersion request also should be taken care of, since we intentionally omitted it from v2 scope. To implement it properly we need to discuss what information we want to include in the driver manifest. Currently, there are few unused fields, especially the "runtime version" and "Go runtime version" which has no effect since the build system was changed since then.

And as was pointed out, the problem is a bit broader and we should think more about defining some boundaries between languages. JSX and C/C++ are not the only examples. Python driver also supports Python2 and Python3 "dialects" which are incompatible, so it raises the question about language versioning as well.

Of course, we can discuss this specific issue later and proceed with language aliases.

--
You received this message because you are subscribed to the Google Groups "bblfsh-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bblfsh-dev+...@googlegroups.com.
To post to this group, send email to bblfs...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bblfsh-dev/7d4217af-6ae8-4293-b06f-85540f097611%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Denys Smirnov

unread,
May 15, 2019, 10:25:04 AM5/15/19
to Alex Bzz, bblfsh-dev
We are moving forward with a language aliases proposal:
https://github.com/bblfsh/sdk/pull/400

The change also makes some initial preparations to support language dialects, although we do not expect them to land anytime soon. The dialects problem is much broader so we will have a separate discussion for it.

As Alex mentioned, we will first update bblfshd, drivers and clients with aliases in a backward-compatible way, and after some time we will deprecate the Languages API v1. 
Reply all
Reply to author
Forward
0 new messages