Hi Neil,
On 19 June 2012 15:32, Neil Hodgson <
nyama...@me.com> wrote:
> A recurring request pattern for Scintilla has been to increase the number of keywords for particular lexers in order to highlight sets of identifiers in different colours. The keyword feature was initially designed for true language keywords and has been stretched beyond that design for sets of identifiers. There are sometimes multiple lexeme types that could benefit from multiple styles: for C++, identifiers and documentation comment keywords are candidates with preprocessor macro names a future possibility. For HTML there are lists of tags and attributes as well as identifiers for JavaScript, PHP and other server- and client-side scripts.
>
> This produces conflicting needs since the valid range of style values has to be split up between these different lexeme types and different users will have different needs. I have rejected patches that add more identifier styles in order to preserve freedom to change here.
>
> To allow more identifier styles, a pool of unallocated styles could be maintained with allocations performed on demand from the application. An allocation would extend an existing style with a set of new styles. Only existing styles that are coded to be extensible would be valid. An API extension to ILexer could look like this and be exposed as SCI_ALLOCATEIDENTIFIERSTYLES, ...
Since this is a general feature, why use a name that implies only one
of the uses. You already note documentation comment keywords below,
subsets of operators also come to mind. Perhaps something more
general like SCI_ALLOCATESUBSTYLES since that is what they are really
doing, allocating styles for subsets of the specified style (where the
style is being used as an alias for a token class I guess).
>
> // Returns start of new allocation, -1 on failure
> int AllocateIdentifierStyles(int styleBase, int numberStyles);
> void SetIdentifiers(int style, const char *identifiers);
> void FreeIdentfierStyles();
>
Clearly a lexer could do anything it wanted (was coded to do) with
these new styles, not just a new "keyword" list. Perhaps
AllocateSubStyles SetValues (which is what it is doing) and
FreeSubStyles.
> From the application, this may look like:
>
> Call(SCI_FREEIDENTFIERSTYLES);
> int idents = 8;
> identStyleBase = Call(SCI_ALLOCATEIDENTIFIERSTYLES,
> SCE_C_IDENTIFIER, idents);
> if (identStyleBase >= 0) {
> for (int i=0;i< idents;i++) {
> Call(SCI_STYLESETFORE, identStyleBase + i, colourList[i]);
> Call(SCI_SETIDENTIFIERS, identList[i]);
> }
> }
> dcStyleBase = Call(SCI_ALLOCATEIDENTIFIERSTYLES,
> SCE_C_COMMENTDOCKEYWORD, 3);
> // …
>
> Over time, more styles are defined for each lexer, reducing the pool of styles available for identifiers. Applications should be prepared to handle failure of an SCI_ALLOCATEIDENTIFIERSTYLES call, possibly by merging less important sets of identifiers. The pool of styles may not be contiguous due to the fixed styles 32..39 and other factors.
Although your example above implies that a contiguous range is
allocated, is that the intention? Perhaps a contiguous set of as many
as possible could be allocated each call and the number be the return
value. Then the application can have another go until it has as many
as it wants.
>Another possibility is to define a fixed range of identifier style numbers per lexer although its likely this will just cause the current problem to recur with requests to expand the range.
As you say, its not future proof.
>
> The C++ lexer duplicates each style to allow different styling of active code and code that is inactive due to preprocessor directives. The inactive style is defined by adding 64 to the active style. Adding a new identifier style for C++ will require allocating an active and inactive style and for simplicity, these should be 64 apart. A single call to AllocateIdentifierStyles will allocate both active and inactive styles and the set of identifiers used for an active identifier style will also e used for the corresponding inactive identifier style. Since an application may not know that a lexer supports active/inactive (or other similar features) another API should be provided to return the distance or -1 if there are no secondary styles.
>
> int DistanceToSecondaryStyles()
>
> From the point of view of lexers, there will be new support class(es) to allocate identifier styles and map identifiers to style numbers. Something like
> sc.ChangeState(classifier->classify(SCE_C_IDENTIFIER, ident)|activitySet);
>
> Currently this is just planning - I haven't written any code although it appears quite easy. Since it will require adding APIs to the externally visible ILexer interface and I don't want to have many different versions of this interface in use, it may be delayed until other changes to ILexer are finished. Unicode line end support may also require additions to ILexer.
>
> Neil
>
Cheers
Lex