More incompatible changes for 4.0?

Neil Hodgson

unread,

Aug 2, 2015, 8:27:25 PM8/2/15

to scintilla-interest

With the changes for >2GB documents requiring incompatible lexer interface changes, there is an opportunity to make other changes that would normally be too costly.

The SetIdentifiers method should return a Sci_Position instead of void so that it can notify its caller that a definition change requires restyling from that position, often the document start (0) as lexers do not currently keep any record of identifier positions.

The lexer and document interfaces currently have an original interface (ILexer, IDocument) and a later addition (ILexerWithSubStyles, IDocumentWithLineEnd). There is version detection code and conditional handling of method availability. With 4.0, it may be better to collapse these interfaces so that the extra methods from IDocumentWithLineEnd are added to IDocument and there is no need to check versioning: lexers can just use the extra methods. IDocumentWithLineEnd would disappear or be available but deprecated with a typedef or #define.

Similarly, ILexerWithSubStyles can be subsumed into ILexer and all lexers must provide the extra methods. A base type may be added to provide empty/simple implementations.

Versioning of Scintilla is different to many other libraries in that many releases change lexer behaviour in ways that are not completely compatible. A common occurrence is making distinctions between elements that were previously a single lexical class. One version may only have a style for comments but a later version may differentiate documentation comments. Applications may have been written assuming that all comments are in the original comment style and so may now break. A mechanism could be provided for applications to pin to a particular set of styles. Since lexers commonly allocate new styles after existing styles, applications could specify the last style they know and later styles would be mapped back (through a simple table) to their previous style value.

Packaging and deployment of Scintilla is made more difficult because of the possibility of incompatible lexing. This has meant that I have advocated shipping your application with the version of Scintilla you have tested with and avoided packaging Scintilla in Linux distributions. Separating out the lexers from the rest of Scintilla would allow some more possibilities: new versions of Scintilla libraries (not containing lexers) would be more compatible, making it more reasonable to use a system packaged Scintilla. Each toolkit binding currently includes a set of lexers so, on Linux, there may be lexers delivered with the bindings for Qt, GTK+, wxWidgets and Tk as well as different versions of these toolkits. A separate lexer .so would remove this duplication and also simplify the build procedures for each toolkit. It would be easier to change lexers: perhaps using an older version that matches your application code or updating to a newer version for desired changes.

Neil

Matthew Brush

unread,

Aug 2, 2015, 9:30:56 PM8/2/15

to scintilla...@googlegroups.com

On 15-08-02 05:27 PM, Neil Hodgson wrote:
> [...]

> Versioning of Scintilla is different to many other libraries in that
> many releases change lexer behaviour in ways that are not completely

> compatible. [...]
>

If there was a call to access the ILexer::Version(), could that be
incremented whenever a "break" occurs?

It would also be useful (for more reasons than this) if the styles could
be introspected at runtime, then applications could be coded to decide
what to do based on the set of styles available. It would be even more
useful if the runtime styles could be introspected by name using common
prefix, for example:

SCI_GETNUMSTYLES(lex) -> n
SCI_GETSTYLENAME(lex, n+0) -> "comment"
SCI_GETSTYLENAME(lex, n+1) -> "comment.doc"
SCI_GETSTYLENAME(lex, n+2) -> "comment.doc.keyword"

Then applications could group all same prefixed and style them in a
similar fashion without breaking the styling when new ones are added. A
nice bonus from this is that it would remove some code from applications
which each most likely implement some mechanism/hard-coded mapping like
this themselves (Geany does, at least).

> [...] A separate lexer .so would remove this duplication and also

> simplify the build procedures for each toolkit. It would be easier to
> change lexers: perhaps using an older version that matches your
> application code or updating to a newer version for desired changes.
>

+1

Though it wouldn't really solve the duplication problem if there was
only one library containing all of the lexers since applications would
each still probably want to provide their own customized libraries to
remove the lexers they don't use (this is the one thing Geany currently
applies patches to Scintilla for).

It would be great if the existing lexers could be loaded as separate
libs at runtime using the existing "external lexer" mechanism for this
purpose. It would have the nice side-effect of reducing the memory
footprint of applications too, since they wouldn't have to load all of
the possible lexers they use up front. For example, if the user is only
programming in C/C++ for a particular instance, it would save loading
all of the code for the other 100+ lexers that will never be used (in
Geany's case we prune it down to around 35-40 supported lexers).

Cheers,
Matthew Brush

Matthew Brush

unread,

Aug 2, 2015, 9:40:13 PM8/2/15

to scintilla...@googlegroups.com

On 15-08-02 06:30 PM, Matthew Brush wrote:
> [...]

>
> SCI_GETNUMSTYLES(lex) -> n
> SCI_GETSTYLENAME(lex, n+0) -> "comment"
> SCI_GETSTYLENAME(lex, n+1) -> "comment.doc"
> SCI_GETSTYLENAME(lex, n+2) -> "comment.doc.keyword"
>

Of course rather than accessing styles past the last, I meant where the
style number argument of SCI_GETSTYLENAME() would be in the range of 0
to `n`.

Cheers,
Matthew Brush

Neil Hodgson

unread,

Aug 2, 2015, 11:55:56 PM8/2/15

to scintilla...@googlegroups.com

Matthew Brush:

If there was a call to access the ILexer::Version(), could that be incremented whenever a "break" occurs?

Looking back through the history page, I’d expect that version would have to be incremented for most releases.

It would also be useful (for more reasons than this) if the styles could be introspected at runtime, then applications could be coded to decide what to do based on the set of styles available. It would be even more useful if the runtime styles could be introspected by name using common prefix, for example:

   SCI_GETNUMSTYLES(lex)      -> n
   SCI_GETSTYLENAME(lex, n+0) -> "comment"
   SCI_GETSTYLENAME(lex, n+1) -> "comment.doc"
   SCI_GETSTYLENAME(lex, n+2) -> "comment.doc.keyword"

Then applications could group all same prefixed and style them in a similar fashion without breaking the styling when new ones are added.

Its unlikely the order defined will work as well for every language and will depend on viewpoints: is literal.quoted.string.interpolated.unicode better than literal.quoted.string.unicode.interpolated? Tags may be more general than a hierarchy.

Though it wouldn't really solve the duplication problem if there was only one library containing all of the lexers since applications would each still probably want to provide their own customized libraries to remove the lexers they don't use (this is the one thing Geany currently applies patches to Scintilla for).

Adding application-specific lexers is not that difficult as the application can call SCI_LOADLEXERLIBRARY on a local .so.

It would be great if the existing lexers could be loaded as separate libs at runtime using the existing "external lexer" mechanism for this purpose. It would have the nice side-effect of reducing the memory footprint of applications too, since they wouldn't have to load all of the possible lexers they use up front. For example, if the user is only programming in C/C++ for a particular instance, it would save loading all of the code for the other 100+ lexers that will never be used (in Geany's case we prune it down to around 35-40 supported lexers).

That will depend on your platform and packaging techniques. Building the minimal WordLexer example from http://www.scintilla.org/WordLexer.zip results in a 192K release DLL with my current setup. A similar DLL with all 102 lexers is 750K. So, building all lexers as separate DLLs will likely use 20MB of disk compared to 750KB. The in-memory situation won’t require all the lexers but those that are loaded will be even worse due to expanding DLL sections into whole pages. In SciTE you will likely access 3 lexers at startup (null/text, errorlist, initial file language). Just one more (maybe you want to open some documentation or a build file) and you are worse off than having a monolithic lexers DLL in terms of memory and file reading, plus there is file system (and virus checker) overhead from opening 4 files instead of 1.

This WordLexer DLL was built statically and, I expect, much of the cost is in the duplicated C++ runtime but depending on a shared runtime is difficult on Windows where there is no system C++ ABI (yet). While there are ways to reduce the costs here, and possibly move to some degree of shared runtime, such techniques will normally have costs in complexity, licensing or needing to use installers.

Neil

Matthew Brush

unread,

Aug 3, 2015, 2:55:51 AM8/3/15

to scintilla...@googlegroups.com

On 15-08-02 08:55 PM, Neil Hodgson wrote:
> Matthew Brush:
>
>> If there was a call to access the ILexer::Version(), could that be incremented whenever a "break" occurs?
>
> Looking back through the history page, I’d expect that version would have to be incremented for most releases.
>
>> It would also be useful (for more reasons than this) if the styles could be introspected at runtime, then applications could be coded to decide what to do based on the set of styles available. It would be even more useful if the runtime styles could be introspected by name using common prefix, for example:
>>
>> SCI_GETNUMSTYLES(lex) -> n
>> SCI_GETSTYLENAME(lex, n+0) -> "comment"
>> SCI_GETSTYLENAME(lex, n+1) -> "comment.doc"
>> SCI_GETSTYLENAME(lex, n+2) -> "comment.doc.keyword"
>>
>> Then applications could group all same prefixed and style them in a similar fashion without breaking the styling when new ones are added.
>
> Its unlikely the order defined will work as well for every language and will depend on viewpoints: is literal.quoted.string.interpolated.unicode better than literal.quoted.string.unicode.interpolated? Tags may be more general than a hierarchy.
>

I guess some convention would have to be chosen. I remember reading
about some editors doing something similar (maybe it was
http://manual.macromates.com/en/scope_selectors). Tags would probably
also work.

>> Though it wouldn't really solve the duplication problem if there was only one library containing all of the lexers since applications would each still probably want to provide their own customized libraries to remove the lexers they don't use (this is the one thing Geany currently applies patches to Scintilla for).
>
> Adding application-specific lexers is not that difficult as the application can call SCI_LOADLEXERLIBRARY on a local .so.
>

I meant for removing lexers, like how Geany currently removes all the
lexers that it doesn't have explicit support for
(https://github.com/geany/geany/blob/1.25.0/scintilla/scintilla_changes.patch#L61)

>> It would be great if the existing lexers could be loaded as separate libs at runtime using the existing "external lexer" mechanism for this purpose. [...]
>
> [...] So, building all lexers as separate DLLs will likely use 20MB of disk compared to 750KB.
>

Ok, yeah if the trade-off isn't worth it. I haven't tested compiling the
lexers individually to see what the ELF overhead would be.

Cheers,
Matthew Brush

kugel

unread,

Aug 3, 2015, 4:35:34 AM8/3/15

to scintilla-interest, nyama...@me.com

Am Montag, 3. August 2015 02:27:25 UTC+2 schrieb Neil Hodgson:

With the changes for >2GB documents requiring incompatible lexer interface changes, there is an opportunity to make other changes that would normally be too costly.

On that opporunity I would like to ask to include the changes I proposed in https://groups.google.com/d/topic/scintilla-interest/Z7lk3SkBkxs/discussion which you deemed too invasive.

Please tell me if you'll consider these then I'll pick up that effort again and rebase to the current development head.

Jason Haslam

unread,

Aug 3, 2015, 12:25:28 PM8/3/15

to scintilla...@googlegroups.com

How will the change to 64-bit positions affect memory usage? Will there be an option to choose 32-bit positions at compile time?

The change to 4.0.0 might be a good time to kill off the mask parameter to StartStyling.

Jason

> --
> You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-inter...@googlegroups.com.
> To post to this group, send email to scintilla...@googlegroups.com.
> Visit this group at http://groups.google.com/group/scintilla-interest.
> For more options, visit https://groups.google.com/d/optout.

Neil Hodgson

unread,

Aug 3, 2015, 6:53:29 PM8/3/15

to scintilla...@googlegroups.com

Jason Haslam:

> How will the change to 64-bit positions affect memory usage?
> Will there be an option to choose 32-bit positions at compile time?

The bulk of the memory increase will only occur when Scintilla is compiled with the SCI_LARGE_FILE_SUPPORTED option.

There will be some size increases even when the option is off. For example, SCNotification will grow from 128 to 152 bytes. This is so that applications and other code that calls Scintilla such as lexers do not have to be built to match the option choice.

Some lexers store line-based history. For example, the C++ lexer stores all the preprocessor macros along with the line they were defined on in PPDefinition structs. Some of these may grow but in most cases the extra space will not be a large increase in comparison to the struct’s total size. For the PPDefinition struct there is no growth as there was already alignment padding before the next field.

> The change to 4.0.0 might be a good time to kill off the mask parameter to StartStyling.

Yes.

Neil

Jason Haslam

unread,

Aug 3, 2015, 7:23:04 PM8/3/15

to scintilla...@googlegroups.com

Sounds good. Thanks!

Jason

Neil Hodgson

unread,

Aug 4, 2015, 8:43:46 AM8/4/15

to scintilla...@googlegroups.com

kugel:

> On that opporunity I would like to ask to include the changes I proposed in https://groups.google.com/d/topic/scintilla-interest/Z7lk3SkBkxs/discussion which you deemed too invasive.

I don’t think I used the term “invasive”. That would be a bit too territorial.

> Please tell me if you'll consider these then I'll pick up that effort again and rebase to the current development head.

I tried to push this forward with the scigirmin.patch which was my attempt to simplify the header by removing any changes from the change set that didn’t appear necessary. The goal here is to enable Scintilla for g-ir-scanner in the simplest way possible, not to change the names of things or to add files.

Neil

Jonathan Hunt

unread,

Aug 5, 2015, 3:12:48 PM8/5/15

to scintilla-interest, nyama...@me.com

Do these changes mean that Scintilla will no longer be easily compatible with languages that do not innately support 64-bit data types in a 32 bit binary of the library? What kind of changes would be needed?

One of the reasons I chose scintilla was how stable the interface was, generally speaking. If it were my call, I'd add all new functions to work with 64-bit, or make the changes in the 64-bit version of the library. That's generally how the rest of the world has dealt with this.

Neil Hodgson

unread,

Aug 5, 2015, 6:54:37 PM8/5/15

to scintilla...@googlegroups.com

Jonathan Hunt:

> Do these changes mean that Scintilla will no longer be easily compatible with languages that do not innately support 64-bit data types in a 32 bit binary of the library?

Scintilla will still support 32-bit builds except on Cocoa which is already 64-bit only. No interfaces have been widened for 32-bit builds.

> One of the reasons I chose scintilla was how stable the interface was, generally speaking. If it were my call, I'd add all new functions to work with 64-bit, or make the changes in the 64-bit version of the library. That's generally how the rest of the world has dealt with this.

There have been 64-bit builds since 2002 which projects already depend on. So, to avoid memory costs, this is a compile-time option when producing 64-bit executables.

Neil

Neil Hodgson

unread,

Aug 5, 2015, 7:14:32 PM8/5/15

to scintilla...@googlegroups.com

Matthew Brush:

> It would also be useful (for more reasons than this) if the styles could be introspected at runtime, then applications could be coded to decide what to do based on the set of styles available.

Perhaps this, allowing both runtime introspection and build-time tools to grab the block started with LexicalClass.

/*
26 Tags: Less useful marked with '~'.

character
comment
default
documentation
eol
error
escapesequence
~globalclass
hash
identifier
keyword
line
literal
multiline
numeric
operator
preprocessor
quoted
~raw
regex
string
~taskmarker
~triple
user
uuid
~verbatim

Others that may be useful:

obsolete
unicode/wide/narrow
interpolated

*/

LexicalClass lexicalClasses[] = {
0, "SCE_C_DEFAULT", "default",
1, "SCE_C_COMMENT", "comment",
2, "SCE_C_COMMENTLINE", "comment line",
3, "SCE_C_COMMENTDOC", "comment documentation",
4, "SCE_C_NUMBER", "literal numeric",
5, "SCE_C_WORD", "keyword",
6, "SCE_C_STRING", "literal quoted string",
7, "SCE_C_CHARACTER", "literal quoted string character",
8, "SCE_C_UUID", "literal quoted uuid",
9, "SCE_C_PREPROCESSOR", "preprocessor",
10, "SCE_C_OPERATOR", "operator",
11, "SCE_C_IDENTIFIER", "identifier",
12, "SCE_C_STRINGEOL", "error string eol",
13, "SCE_C_VERBATIM", "literal quoted string multiline verbatim",
14, "SCE_C_REGEX", "literal quoted regex",
15, "SCE_C_COMMENTLINEDOC", "comment line documentation",
16, "SCE_C_WORD2", "keyword",
17, "SCE_C_COMMENTDOCKEYWORD", "comment documentation keyword",
18, "SCE_C_COMMENTDOCKEYWORDERROR", "error comment documentation keyword",
19, "SCE_C_GLOBALCLASS", "identifier globalclass",
20, "SCE_C_STRINGRAW", "literal quoted string multiline raw",
21, "SCE_C_TRIPLEVERBATIM", "literal quoted string multiline triple",
22, "SCE_C_HASHQUOTEDSTRING", "literal quoted string multiline hash",
23, "SCE_C_PREPROCESSORCOMMENT", "preprocessor comment",
24, "SCE_C_PREPROCESSORCOMMENTDOC", "preprocessor comment documentation",
25, "SCE_C_USERLITERAL", "literal user",
26, "SCE_C_TASKMARKER", "comment taskmarker",
27, "SCE_C_ESCAPESEQUENCE", "literal quoted string escapesequence",
};

// ILexer additions:
int SCI_METHOD LexerCPP::MaximumNamedStyle() {
return (sizeof(lexicalClasses) / sizeof(lexicalClasses[0]))-1;
}

const char * SCI_METHOD LexerCPP::NameOfStyle(int style) {
return lexicalClasses[style].name;
}

const char * SCI_METHOD LexerCPP::DescriptionOfStyle(int style) {
return lexicalClasses[style].description;
}

The style words were ordered to match my sense of how I’d want styling: error goes first so that it strongly matches and so shows distinctly (if error is defined at all) instead of inheriting from the parent style and hence appear invisible if that particular sequence doesn’t have a defined appearance.

There could be both informal (candidates for list boxes) and formal parts of the description:

8, "SCE_C_UUID", “IDL UUID like uuid(ba209999-0c6c-11d2-97cf-00c04f8eea45)[literal quoted uuid]",
21, "SCE_C_TRIPLEVERBATIM", “Vala \”\”\"triple-quoted\”\”\" strings[literal quoted string multiline triple]”,

Using strings can result in misspellings but its likely that symbolication will be uglier and not extensible.

Neil

Matthew Brush

unread,

Aug 5, 2015, 9:45:06 PM8/5/15

to scintilla...@googlegroups.com

Is "verbatim" effectively the same as "multiline" and "triple"?

I think it would be useful to have one tag to differentiate from regular
"string" for multi-line/triple-quoted/raw string literals.

The list looks pretty good to me for C-like languages. The only one I'm
not sure is present is the one for secondary keywords. In Geany those
are set by users and CTags files as the "types" (using the 2nd keyword
list in cppWordsList). Is it the same as "globalclass"?

Is there a reason/advantage to using the #define name in the second
field, as opposed to stripping the `SCE_C_` prefix (and possibly
lower-casing it)? It barely matters as an application could
demangle/remangle as needed, mostly just curious.

> // ILexer additions:
> int SCI_METHOD LexerCPP::MaximumNamedStyle() {
> return (sizeof(lexicalClasses) / sizeof(lexicalClasses[0]))-1;
> }
>
> const char * SCI_METHOD LexerCPP::NameOfStyle(int style) {
> return lexicalClasses[style].name;
> }
>
> const char * SCI_METHOD LexerCPP::DescriptionOfStyle(int style) {
> return lexicalClasses[style].description;
> }
>

Would there then be SCI_* messages to call these member functions, or
only from C++ through ILexer interface? In any case, looks good.

> The style words were ordered to match my sense of how I’d want styling: error goes first so that it strongly matches and so shows distinctly (if error is defined at all) instead of inheriting from the parent style and hence appear invisible if that particular sequence doesn’t have a defined appearance.
>

Sounds logical.

> There could be both informal (candidates for list boxes) and formal parts of the description:
>
> 8, "SCE_C_UUID", “IDL UUID like uuid(ba209999-0c6c-11d2-97cf-00c04f8eea45)[literal quoted uuid]",
> 21, "SCE_C_TRIPLEVERBATIM", “Vala \”\”\"triple-quoted\”\”\" strings[literal quoted string multiline triple]”,
>

I really like this idea. It might be an improvement to separate the
"description" and "tags" into to separate fields and having another
member function like TagsOfStyle() or some such.

For the descriptions, I wonder if there would be any implications with
respect to internationalization? I guess that's the applications
problem, if it wants to put the strings into a user-interface, though
I'm not sure if some tools like gettext can catalog these strings
without special special markers (I'm far from an expert on this subject).

Off-topic: It's funny that you used "uuid" as example, giving
description, as it's one of the few I've scratched my head about the
most; Is "uuid" referring to some win32-specific IDL syntax or something?

> Using strings can result in misspellings but its likely that symbolication will be uglier and not extensible.
>

I'm not sure I fully understand this.

Cheers,
Matthew Brush

kugel

unread,

Aug 6, 2015, 3:47:32 AM8/6/15

to scintilla-interest, nyama...@me.com

Neil,

Am Dienstag, 4. August 2015 14:43:46 UTC+2 schrieb Neil Hodgson:

kugel:

I tried to push this forward with the scigirmin.patch which was my attempt to simplify the header by removing any changes from the change set that didn’t appear necessary. The goal here is to enable Scintilla for g-ir-scanner in the simplest way possible, not to change the names of things or to add files.

Thanks for the reply. I replied back on the original thread if that's okay. Please let's discuss again (in the original thread) how we can push this topic forward (I am still interested in it).

Neil Hodgson

unread,

Aug 6, 2015, 6:49:18 PM8/6/15

to scintilla...@googlegroups.com

Matthew Brush:

> Is "verbatim" effectively the same as "multiline" and "triple”?

“verbatim” is pretty much the same as “raw” with both indicating that escape sequences are not interpreted. “multiline” is orthogonal to that although they are often found together. In Python there is r”x” for raw, “””x””” for multiline, and r”””x””” for raw multiline. String types have several attributes which may be available individually or only in fixed groups depending on language. Others include “unicode”, “byte”, “wide”, and “interpolated”. Classifications for Perl and Ruby may require more classifications: they will need at least “heredoc”.

> I think it would be useful to have one tag to differentiate from regular "string" for multi-line/triple-quoted/raw string literals.

The languages that provide these extra types are doing it to meet the expectations of their field of use. Limiting to some popular set will make this a poor fit for the less common features provided by some languages.

> The list looks pretty good to me for C-like languages. The only one I'm not sure is present is the one for secondary keywords. In Geany those are set by users and CTags files as the "types" (using the 2nd keyword list in cppWordsList). Is it the same as "globalclass"?

Keyword list 4 is "Global classes and typedefs”.

> Is there a reason/advantage to using the #define name in the second field, as opposed to stripping the `SCE_C_` prefix (and possibly lower-casing it)? It barely matters as an application could demangle/remangle as needed, mostly just curious.

That is the programmatic name and, to help using code and documentation between languages, should be the same whether it comes from a static C header or by introspection in the app’s scripting language.

> Would there then be SCI_* messages to call these member functions,

Yes, with the normal string copying semantics of SCI_ messages. They won’t pass out a char*.

> I really like this idea. It might be an improvement to separate the "description" and "tags" into to separate fields and having another member function like TagsOfStyle() or some such.

Possibly.

> For the descriptions, I wonder if there would be any implications with respect to internationalization? I guess that's the applications problem, if it wants to put the strings into a user-interface, though I'm not sure if some tools like gettext can catalog these strings without special special markers (I'm far from an expert on this subject).

The format can be understood by a script that can extract them all into a translation document. Then call for each translation in the application.

> Off-topic: It's funny that you used "uuid" as example, giving description, as it's one of the few I've scratched my head about the most; Is "uuid" referring to some win32-specific IDL syntax or something?

Its in DCE IDL as well. For example,
http://www-01.ibm.com/support/knowledgecenter/#!/SSLTBW_1.12.0/com.ibm.zos.r12.euvmn00/euva5a0079.htm

>> Using strings can result in misspellings but its likely that symbolication will be uglier and not extensible.
>>
>
> I'm not sure I fully understand this.

Its easy to misspell in strings. Perhaps I type "literal quoted regez” - there’s no warning. An alternative would use enum { dLiteral=1, dQuoted=2, dRegex=4, …}; and then

12, "SCE_C_STRINGEOL", dError | dString | dEol,
13, "SCE_C_VERBATIM", dLiteral | dQuoted | dString | dMultiline | dVerbatim,
14, "SCE_C_REGEX", dLiteral | dQuoted | dRegez,

The compiler would then warn that dRegez is unknown.

Neil

Neil Hodgson

unread,

Aug 11, 2015, 4:08:01 AM8/11/15

to scintilla...@googlegroups.com

Here’s a script called LexMetadata.py for scavenging lexer metadata from comments in .properties files. It looks for strings in the comments and outputs the comment and a set of tags. Also attached is an updated copy of ScintillaData.py that retrieves more information about lexer number -> (name, file) which is needed for LexMetadata.py. Place both files in scintilla/scripts and run from there. Requires the SciTE source code to be installed next to the scintilla directory as scite.

This is only a way to easily produce a starting set of tags: they should really be checked and added to by a human.

Neil