Substyles for Lua and HTML can highlight more sets of identifiers

29 views
Skip to first unread message

Neil Hodgson

unread,
Apr 7, 2024, 10:57:40 PMApr 7
to scintilla-interest
Substyles are now implemented for the Lua and HTML lexers to allow styling many sets of identifiers.

For Lua, this is simple, just allowing more styling of identifiers that start from SCE_LUA_IDENTIFIER. The Lua lexer already offered multiple keyword lists so this mostly enhances consistency between lexers.

For HTML, it is much more significant as the lexer only provides 6 keyword lists where each is for one specific type of content: tags & attributes, JavaScript, Basic, Python, PHP, and SGML. The new substyles feature allows each of these, except for SGML, to have multiple identifier lists with different visual styles. Two of these cases are further split with separate sets for tags and attributes (instead of a combined set) and separate sets for client-side and server-side JavaScript as the different execution locations may have different APIs available.

As client-side Basic and Python are not popular, client-side reuses the server-side sets.

Both lexers allow 64 substyles. For Lua, these start at the standard value 128 but for HTML, they start at 192 to allow a larger contiguous range of base styles.

While it would make sense to support substyles for SGML, this is not done as it is more difficult to implement than the other cases.

The substyles are checked when a lexeme in one of these base styles is applied: SCE_H_TAG, SCE_H_ATTRIBUTE, SCE_HJ_WORD, SCE_HJA_WORD, SCE_HB_WORD, SCE_HP_WORD, and SCE_HPHP_WORD. The previously-defined keyword lists have priority over the substyle sets.

While the implementation handles most of the obvious cases, there may be more cases desired so it would be best to discuss them before this is in common use so hard to change. For example, if client-side Basic is more popular than I thought then different sets for client- and server-side Basic should be defined now as this would be difficult to retrofit in a compatible way. The downside of separate lists is that identifiers may have to be added to both.

See this documentation for the substyles API or examine SciTE's source code:

Committed mostly in these change sets:

Neil

Neil Hodgson

unread,
Apr 9, 2024, 10:41:10 PMApr 9
to scintilla-interest
Implemented another feature where attributes only possible in particular tags only match inside those tags. This is only implemented for substyles attached to SCE_H_ATTRIBUTE (3). It is not implemented for the old-style keywords.

Tag-scoped attributes are specified as <tag>.<attribute>. Without a <tag>. prefix, the attribute is matched in all tags which is appropriate for global attributes like class and id.

In SciTE this may appear like:
substylewords.3.1.*=class id img.height img.width

If there are potential problems with using '.' as the joiner, such as '.' being allowed inside tags or attributes, another character could be used.

I'm not about to add a large number of tag-scoped attributes to SciTE manually as it looks like much research work. If there is a machine readable list of tags, attributes, and their relationship then that could be processed into properties with a script.

There is a small run-time cost which could be mitigated if it impacts applications significantly.

Features like this implemented within lexers can be difficult to discover, only really being documented within the code, the change log, and email messages like here.


Neil
Reply all
Reply to author
Forward
0 new messages