Default styling in lexers

63 views
Skip to first unread message

thomas_li...@hotmail.com

unread,
May 27, 2017, 10:39:29 AM5/27/17
to scintilla-interest
A huge overhead in using a lexer for a certain language (besides finding keywords, which I have addressed in another mail) is to define styles corresponding to the lexer-states.

Some lexer states have names that are easy to guess the meaning, but others are more cryptic especially if you don't have an in-depth knowledge of the particular language.

The programmer of the lexer on the other hand must have an in-depth knowledge of the lexer and (hopefully) also of the language in question.  There are more than 1600 lexer states in Scintilla meaning that if I want to provide support for allt the laguages then I would need to figure out how 1600+ states in lexers for languages that I may never have heard about should be styled.  This is an "impossible" task and it is one of the reasons that (fx) Notepad++ only supports a fragment of the languages that Scintilla supports.

Many lexers have states that seem to refer to the same lexical class (e.g. SCE_<language>_COMMENT) but there are more than 600 state-names (i.e. if you remove the SCE_<language>_ from the state names you end up with more than 600 different names).  Some of them are most likely not intended to be different:
SCE_<language>_DOUBLEQUOTE_STRING
SCE_<language>_DOUBLEQUOTEDSTRING
SCE_<language>_DOUBLEQUOTESTRING
SCE_<language>_DOUBLESTRING
But it seems that there is in any case a huge number of individual states.  If there was only a "tractable" amout of different states it would probably make sense to map from states to more conceptual global "lexical classes".

The point here is that even though COMMENT is part of a large number of languages there are also many things that seem to exist only in one or very few languages.  
This is unfortunate, because it means that it is very difficult to control styling across languages (e.g. to set all major keywords in any language to "blue").

Because there are so many language specific lexical classes it seems that the only one that can easily provide styling for a certain state/lexical-class is the lexer-programmer.

Therefore I suggest that all lexers have default styles for their states (I guess that the defaults should not cover the font-face and the font-size, because these are normally end user preferences).

I do not have any precise proposal for how to handle this, but one possible way would be to let each lexer provide an additional function that sets the styles to their defaults (i.e. as defined by the lexer programmer).  Programs can then call this function, if they wish to set the default styling.  Once the styles have been set the program can also inspect them and for example list them in a dialog.

For end-users (and middle-programmers) it would of course also be nice if each state had a description: "The style used for major keywords: class, static, etc".

Regards Thomas Linder Puls

Neil Hodgson

unread,
May 27, 2017, 8:21:17 PM5/27/17
to scintilla-interest
thomas_linder_puls:

> Many lexers have states that seem to refer to the same lexical class (e.g. SCE_<language>_COMMENT) but there are more than 600 state-names (i.e. if you remove the SCE_<language>_ from the state names you end up with more than 600 different names).

See earlier discussion in
https://groups.google.com/d/topic/scintilla-interest/eW7kYpMnClA/discussion

Neil

Lex Trotman

unread,
May 27, 2017, 8:25:00 PM5/27/17
to scintilla...@googlegroups.com
On 28 May 2017 at 00:39, <thomas_li...@hotmail.com> wrote:
> A huge overhead in using a lexer for a certain language (besides finding
> keywords, which I have addressed in another mail) is to define styles
> corresponding to the lexer-states.
>
> Some lexer states have names that are easy to guess the meaning, but others
> are more cryptic especially if you don't have an in-depth knowledge of the
> particular language.

Well, if nobody has in-depth knowledge of the language then nobody is
using the editor for it, so its probably not worth adding it to your
editor. If people are using the language then they should contribute
their expertise to the mapping, its for their own benefit.

>
> The programmer of the lexer on the other hand must have an in-depth
> knowledge of the lexer and (hopefully) also of the language in question.
> There are more than 1600 lexer states in Scintilla meaning that if I want to
> provide support for allt the laguages then I would need to figure out how
> 1600+ states in lexers for languages that I may never have heard about
> should be styled. This is an "impossible" task and it is one of the reasons
> that (fx) Notepad++ only supports a fragment of the languages that Scintilla
> supports.

See above. Not including languages can also be dependent on your
target platform and target users.

>
> Many lexers have states that seem to refer to the same lexical class (e.g.
> SCE_<language>_COMMENT) but there are more than 600 state-names (i.e. if you
> remove the SCE_<language>_ from the state names you end up with more than
> 600 different names). Some of them are most likely not intended to be
> different:
>
> SCE_<language>_DOUBLEQUOTE_STRING
> SCE_<language>_DOUBLEQUOTEDSTRING
> SCE_<language>_DOUBLEQUOTESTRING
> SCE_<language>_DOUBLESTRING
>
> But it seems that there is in any case a huge number of individual states.
> If there was only a "tractable" amout of different states it would probably
> make sense to map from states to more conceptual global "lexical classes".
>

Geany does this, but it causes issues when users want the style of a
language to look like a common editor for that language. Python users
for example. But because the individual lexer styles are mapped to
"class" styles only one language can match the favourite. Scintilla
should provide maximal flexibility, not make constraining decisions
for its users.

> The point here is that even though COMMENT is part of a large number of
> languages there are also many things that seem to exist only in one or very
> few languages.
> This is unfortunate, because it means that it is very difficult to control
> styling across languages (e.g. to set all major keywords in any language to
> "blue").

See above, making languages look similar is why Geany does it the way
it does, but as I said it has now removed the possibility of a
language looking different. It should be a decision by the user
application, preferable by user configuration, not Scintilla.

>
> Because there are so many language specific lexical classes it seems that
> the only one that can easily provide styling for a certain
> state/lexical-class is the lexer-programmer.
>
> Therefore I suggest that all lexers have default styles for their states (I
> guess that the defaults should not cover the font-face and the font-size,
> because these are normally end user preferences).

Sounds like a great bike-shed opportunity over what those defaults should be :-)

>
> I do not have any precise proposal for how to handle this, but one possible
> way would be to let each lexer provide an additional function that sets the
> styles to their defaults (i.e. as defined by the lexer programmer).
> Programs can then call this function, if they wish to set the default
> styling. Once the styles have been set the program can also inspect them
> and for example list them in a dialog.
>
> For end-users (and middle-programmers) it would of course also be nice if
> each state had a description: "The style used for major keywords: class,
> static, etc".

Documentation always helps, but people who know the language seem to
be able to recognise them ok without, and as I said above, people who
know the language should be contributing to the mapping.

Cheers
Lex

>
> Regards Thomas Linder Puls
>
> --
> You received this message because you are subscribed to the Google Groups
> "scintilla-interest" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to scintilla-inter...@googlegroups.com.
> To post to this group, send email to scintilla...@googlegroups.com.
> Visit this group at https://groups.google.com/group/scintilla-interest.
> For more options, visit https://groups.google.com/d/optout.

Matthew Brush

unread,
May 27, 2017, 9:39:27 PM5/27/17
to scintilla...@googlegroups.com
On 2017-05-27 05:24 PM, Lex Trotman wrote:
> On 28 May 2017 at 00:39, <thomas_li...@hotmail.com> wrote:
>> A huge overhead in using a lexer for a certain language (besides finding
>> keywords, which I have addressed in another mail) is to define styles
>> corresponding to the lexer-states.
>>
>> Some lexer states have names that are easy to guess the meaning, but others
>> are more cryptic especially if you don't have an in-depth knowledge of the
>> particular language.
>
> Well, if nobody has in-depth knowledge of the language then nobody is
> using the editor for it, so its probably not worth adding it to your
> editor. If people are using the language then they should contribute
> their expertise to the mapping, its for their own benefit.
>

If a person/company is trying to make an editor/IDE product that
supports many languages, it's unlikely even with a team of people, that
the union of their language knowledge will cover all the Scintilla
languages. It's kind of hard to say "Buy my editor" and then "Oh and
work on it for free". Of course it's a little easier if the editor is
free and/or open source :)

It's a bit late by now, but the best thing (IMO) would be to have a
minimum requirement for lexers to be added to Scintilla that they come
with complete documentation of the application facing parts. It wouldn't
be terribly onerous as most lexers only expose a dozen or so
states/styles and a handful of keyword lists, and the original
programmer of the lexer is really the only one who knows exactly what
they meant, and not all lexer programmers are around later to answer
questions of people trying to make sense of them.

>>
>> [...]
>>
>> But it seems that there is in any case a huge number of individual states.
>> If there was only a "tractable" amout of different states it would probably
>> make sense to map from states to more conceptual global "lexical classes".
>>
>
> Geany does this, but it causes issues when users want the style of a
> language to look like a common editor for that language. Python users
> for example. But because the individual lexer styles are mapped to
> "class" styles only one language can match the favourite. Scintilla
> should provide maximal flexibility, not make constraining decisions
> for its users.
>

While it's an incredible source of bugs in Geany, it's not really
because the mappings are fixed (they aren't) it's because of the
incredibly complex mechanism required to make it customizable enough to
work right.

>> The point here is that even though COMMENT is part of a large number of
>> languages there are also many things that seem to exist only in one or very
>> few languages.
>> This is unfortunate, because it means that it is very difficult to control
>> styling across languages (e.g. to set all major keywords in any language to
>> "blue").
>
> See above, making languages look similar is why Geany does it the way
> it does, but as I said it has now removed the possibility of a
> language looking different. It should be a decision by the user
> application, preferable by user configuration, not Scintilla.
>

The problem with Geany is that the default mappings were done by me,
without any documentation, and I only know a half dozen of the languages
supported by the application. Most of it was guess work and many turned
out to be wrong.

>>
>> [...]
>>
>> For end-users (and middle-programmers) it would of course also be nice if
>> each state had a description: "The style used for major keywords: class,
>> static, etc".
>
> Documentation always helps, but people who know the language seem to
> be able to recognise them ok without, and as I said above, people who
> know the language should be contributing to the mapping.
>

Not all editors are open-source, and not all users are willing to
reverse engineer C++ lexical analyzers to figure out what lexemes
they're matching.

Regards,
Matthew Brush

Lex Trotman

unread,
May 27, 2017, 10:39:11 PM5/27/17
to scintilla...@googlegroups.com
> It's kind
> of hard to say "Buy my editor" and then "Oh and work on it for free". Of
> course it's a little easier if the editor is free and/or open source :)

Yeah, I was thinking of free open source ones, commercial ones would
be expected to buy/rent the necessary expertise.

[...]
>
> While it's an incredible source of bugs in Geany, it's not really because
> the mappings are fixed (they aren't) it's because of the incredibly complex
> mechanism required to make it customizable enough to work right.

Yep.

[...]
> The problem with Geany is that the default mappings were done by me, without
> any documentation, and I only know a half dozen of the languages supported
> by the application. Most of it was guess work and many turned out to be
> wrong.

Its absolutely no criticism of you, and it supports the contention
that people who know the language should do the mapping. The lexer
writer may be one source of that information of course, but that
assumes that they are still supporting "their" lexer.

>
>>>
>>> [...]
>>>
>>> For end-users (and middle-programmers) it would of course also be nice if
>>> each state had a description: "The style used for major keywords: class,
>>> static, etc".
>>
>>
>> Documentation always helps, but people who know the language seem to
>> be able to recognise them ok without, and as I said above, people who
>> know the language should be contributing to the mapping.
>>
>
> Not all editors are open-source, and not all users are willing to reverse
> engineer C++ lexical analyzers to figure out what lexemes they're matching.

Well, non-$free editors can pay for their expertise, and sure, not all
users will be capable, but it only needs one.

Cheers
Lex

>
> Regards,
> Matthew Brush

thomas_li...@hotmail.com

unread,
Jun 6, 2017, 5:55:42 PM6/6/17
to scintilla-interest


Lex Trotman:
Well, if nobody has in-depth knowledge of the language then nobody is
using the editor for it, so its probably not worth adding it to your
editor.

As an end-user you are interested in an editor that supports your language(s) out of the box.  This suggestion is about moving the full language support into the primary box, rather than having it in all the second (and maybe even third) level boxes.
 
If people are using the language then they should contribute
their expertise to the mapping, its for their own benefit. 

This is exactly why I think that we who uses Scintilla should "contribute our expertice" to Scintilla itself, rather than maintaining it in all the derivatives; "it is for our own benefit".

The problem 

Lex Trotman

unread,
Jun 6, 2017, 7:10:53 PM6/6/17
to scintilla...@googlegroups.com
On 7 June 2017 at 07:55, <thomas_li...@hotmail.com> wrote:
>
>
> Lex Trotman:
>>
>> Well, if nobody has in-depth knowledge of the language then nobody is
>> using the editor for it, so its probably not worth adding it to your
>> editor.
>
>
> As an end-user you are interested in an editor that supports your
> language(s) out of the box. This suggestion is about moving the full
> language support into the primary box, rather than having it in all the
> second (and maybe even third) level boxes.

I don't think it will save anything, just add another place where a
set of styles are defined.

The secondary or tertiary boxes will not save much effort, any mature
tool using Scintilla is going to offer its users a way of modifying
the styling for their own preferred styles, that 10% of males with
colour defects may wish to avoid indistinguishable reds and greens, or
others may want to have the editor look like their favourite does.

So the application using Scintilla still has to do as much work to map
its settings to Scintilla styles and store and restore them by
whatever method it uses for its other settings. Yes Scintilla could
offer to store and restore settings, but since different apps use
different storage methods (Json, Ini, Scite, something else) of
storing their settings they will not want to have just the Scintilla
styles in some other format that Scintilla defined. And for those
applications where editing the stored files is the method of modifying
the settings the user would then have to use two differing settings
languages with their application.

So the secondary and tertiary "boxes" likely still need to do the same
amount of work as now, result https://xkcd.com/927/

Cheers
Lex

>
>>
>> If people are using the language then they should contribute
>> their expertise to the mapping, its for their own benefit.
>
>
> This is exactly why I think that we who uses Scintilla should "contribute
> our expertice" to Scintilla itself, rather than maintaining it in all the
> derivatives; "it is for our own benefit".
>
> The problem
>

thomas_li...@hotmail.com

unread,
Jun 20, 2017, 7:39:51 AM6/20/17
to scintilla-interest
Providing means for configuring and persisting the configuration in an application is done one time for all lexers.
Only if a user wants something different than the default you will need to persist anything.
And if you only persist deviations from the default then changes in the default will still be reflected.

Whether a customization facility is necessary for a tool to be "mature" clearly depends on the purpose of the tool (GitHub and Wikipedia both provides un-configurable syntax highlighting).

More important for a "mature tool" is that the default settings are good, so that users does not have to spend their precious time on tweaking settings.

All in all secondary and tertiary tools will need to do at most one task for all lexers, rather than one task for all lexers plus one task for each lexer.

Regards Thomas Linder Puls

Lex Trotman

unread,
Jun 20, 2017, 8:33:17 AM6/20/17
to scintilla...@googlegroups.com
On 20 June 2017 at 21:39, <thomas_li...@hotmail.com> wrote:
> Providing means for configuring and persisting the configuration in an
> application is done one time for all lexers.
> Only if a user wants something different than the default you will need to
> persist anything.
> And if you only persist deviations from the default then changes in the
> default will still be reflected.
>
> Whether a customization facility is necessary for a tool to be "mature"
> clearly depends on the purpose of the tool (GitHub and Wikipedia both
> provides un-configurable syntax highlighting).
>
> More important for a "mature tool" is that the default settings are good, so
> that users does not have to spend their precious time on tweaking settings.

In an ideal world this is about the only thing I agree with you on,
but it seems that in the real world many people disagree, if the
number of different colour schemes people create for Geany (which is
only one of the Scintilla based editors) is anything of an indication.
(Not to mention colour schemes for the various non-Scintilla editors
that we will not speak of http://colorsublime.com/
http://vimcolors.com/ :)

You could possibly say its because the defaults are bad, and I'd agree
if the themes created were similar, but they are very different see
https://github.com/codebrainz/geany-themes and those are just the ones
submitted. Questions on the ML and IRC suggest there are many
personal ones out there.

So how would you decide the "standard", and would it be for a dark
background or a light background, lots of varying colours, or soft and
muted, artistic or based on some science of colour perception? It
would be just a bikeshed.

>
> All in all secondary and tertiary tools will need to do at most one task for
> all lexers, rather than one task for all lexers plus one task for each
> lexer.
>
> Regards Thomas Linder Puls
>

thomas_li...@hotmail.com

unread,
Jun 20, 2017, 9:39:08 AM6/20/17
to scintilla-interest
Some create complete programmer editor systems and may very well use a lot of effort in providing customization settings, etc.

We happen to offer a programming language in which you can embed scintilla editors in any form/dialog you create.
People may create "mature" tools in which the Scintilla editor is a central player that can be customized in all possible (but often silly) ways. 
But it could also be that the Scintilla editor is just a little side-thing for previewing and/or editing some code in some rare situation, and then a complete configuration system will be total overkill.

With the proposed default styling and default keywords Scintilla would be useful for such "small scale uses" out of the box, at the same time it does not ruin or even change anything for the full and "mature" tools.

Default styling will most likely be somewhat inhomogeneous because many different people will do the styling. But I still think that default styles are better than no styles at all.

Since the Scintilla is by default white, I will suggest that the default styles should be for white background.

Regards Thomas Linder Puls

Neil Hodgson

unread,
Jun 20, 2017, 8:03:52 PM6/20/17
to scintilla...@googlegroups.com
Embedding styles and keyword lists in the Scintilla binary is an increase in project scope and will increase the maintenance work load. It may also lead to more work for downstream projects that use Scintilla and want to remain on this ‘easy path’ since an update to new settings will require updating to a newer Scintilla which may have other unwanted changes. Users that want to update a project on the easy path will need to use compilers and other build tools which many are unfamiliar with.

This information could be provided in an easier to update manner by distributing it as text files. SciTE’s existing .properties files could be a starting point although they encode SciTE’s view of languages and its settings hierarchy. This probably needs a stronger concept of language than SciTE which instead uses a combination of lexer identity and file name matching wildcards.

This settings data distributable should have its own release schedule not tied to Scintilla. While a new release of Scintilla may include new states and thus cause a settings data release, changes to keyword lists or style refinements should not have to wait for a Scintilla release. There would also need to be some personnel to manage it.

There will be pressure here to support additional colour schemes: a dark scheme and a high-contrast scheme will probably be wanted.

Neil
Reply all
Reply to author
Forward
0 new messages