Google Code Prettify automatic programming language identification

112 views
Skip to first unread message

Shaul Zevin

unread,
Aug 20, 2016, 1:33:54 PM8/20/16
to js-code-prettifier
Would it be fair to say that Google Code Prettify bypasses the identification by generalizing the highlighting so it will work independent of language? 

Is this generalization applied to all programming languages? How does it work?

Mike Samuel

unread,
Aug 20, 2016, 2:59:46 PM8/20/16
to js-code-prettifier
On Sat, Aug 20, 2016 at 12:17 PM, Shaul Zevin <shaul...@gmail.com> wrote:
> Would it be fair to say that Google Code Prettify bypasses the
> identification by generalizing the highlighting so it will work independent
> of language?

It has two broad categories of languages: C-like and HTML-like.
For ones that don't fall into that category (Perl, OCaml, Lua) it
performs marginally without explicit instruction.


> Is this generalization applied to all programming languages? How does it
> work?

It's applied to all fragments of code that don't have a language hint.

It will classify any passage of text that has a first non-whitespace
character which is '<' as HTML-like, and otherwise it treats it as
C-like.

The C-like language grammar tweaks a few rules so it works reasonably
well for languages like Python and bash.

It works reasonably well for code that is written by a human who is
trying to write readable code, but would probably fail badly on
certain kinds machine generated code and I'm pretty sure the C-like
language would do badly given obfuscated-C-contest entries.

It will get horribly confused by programs that exploit subtle
differences in lexical conventions, and sometimes there is no single
*language* for a program

//\u000aclass Example {public static void main(String...a) {
System.out.println("I am a java program"); } }/*
alert('I am a JavaScript program')
// */

is both a valid JS and a valid Java program.
When a language is defined as a set of strings, there's no guarantee
that the set of strings that constitute valid programs in one language
is disjoint from the set of strings that constitute a valid program in
another language.



> --
> You received this message because you are subscribed to the Google Groups
> "js-code-prettifier" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to js-code-prettif...@googlegroups.com.
> To post to this group, send email to js-code-p...@googlegroups.com.
> Visit this group at https://groups.google.com/group/js-code-prettifier.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages