Lexers written in Lua

104 views
Skip to first unread message

Lorenzo Donati

unread,
Jun 1, 2020, 9:09:53 AM6/1/20
to scite-interest
Hi Neil!

I wanted to know if one can create simple Lexers using the Lua engine
embedded in SciTE.

I think I read something about that some years ago in the list, but I
can't find any reference at the moment.

In particular I'd like to know, assuming it is possible, if there is
some tutorial of sort on how to do that and if the resulting lexer could
possibly override built-in lexers, even on a directory by directory basis.

That is, could I be able to use a custom lexer made using Lua to lex,
say, assembler files just in one directory (i.e. setting some property
in SciTE.properties local file) or in a directory tree (using
SciTEdirectory.properties)?


Thanks in advance for any hint!

Cheers!

-- Lorenzo


P.S.: I don't know where you are based, but I hope everything is ok for
you with this COVID19 mess.

Neil Hodgson

unread,
Jun 1, 2020, 6:36:18 PM6/1/20
to scite-i...@googlegroups.com
Lorenzo Donati:

I wanted to know if one can create simple Lexers using the Lua engine embedded in SciTE.


In particular I'd like to know, assuming it is possible, if there is some tutorial of sort on how to do that and if the resulting lexer could possibly override built-in lexers, even on a directory by directory basis.

   Local properties override global and user properties.

   Neil

Michel Sauvard

unread,
Jun 2, 2020, 3:38:22 AM6/2/20
to scite-interest
Hello
I just did that.
I had some problem with more than one "script lexer" due to OnStyle beeing unique.

I declare an "object" with an OnStyle function
zog = {
  OnStyle=function(styler) ... end
}

In OnOpen I put an item on the buffer depending on props['Language']
  if props['Language'] == 'script_zog' then buffer['OnStyle']=zog.OnStyle

In OnStyle I call the OnStyle noted on the buffer:
  if buffer['OnStyle'] then
    buffer['OnStyle'](styler)
  end

In the beginning I modified directly OnStyle but it seems than changing a call back when it is in use crash.

Hope it is usefull

Best regards

Lorenzo Donati

unread,
Jun 2, 2020, 6:56:48 AM6/2/20
to scite-i...@googlegroups.com
On 02/06/2020 00:36, 'Neil Hodgson' via scite-interest wrote:
> Lorenzo Donati:
>
>> I wanted to know if one can create simple Lexers using the Lua
>> engine embedded in SciTE.
>
> https://www.scintilla.org/ScriptLexer.html
> <https://www.scintilla.org/ScriptLexer.html>
>

Thank you very much!

That's what I was looking for!

>> In particular I'd like to know, assuming it is possible, if there
>> is some tutorial of sort on how to do that and if the resulting
>> lexer could possibly override built-in lexers, even on a directory
>> by directory basis.
>
> Local properties override global and user properties.
>
> Neil
>

-- Lorenzo

Lorenzo Donati

unread,
Jun 2, 2020, 8:08:58 AM6/2/20
to scite-i...@googlegroups.com
On 02/06/2020 00:36, 'Neil Hodgson' via scite-interest wrote:
Hi Neil!

Just a couple of doubts about that example code.

* How can I differentiate between two different lexers so defined?

That is, is there a way to define two different OnStyle functions, one
used for, say, script_zog and another one for script_asm lexers?

Or should/must I define one single global OnStyle func and differentiate
different lexers OnStyle using styler:language value?



* What value does styler:language get? The exact value of the lexer.*
property (e.g., "script_zog"), or something else (e.g., the value is
parsed to remove the "script_" prefix)?



* Is OnStyle called only for files whose lexer name begins with
"script_", or is it called also for files lexed using built-in lexers?



* Inside OnStyle implementation some vars are defined:

S_DEFAULT
S_IDENTIFIER
S_KEYWORD
S_UNICODECOMMENT
identifierCharacters

and

identifier

deep inside the loop.

Is there a reason why they are globals? Can I just declare them as
locals and possibly change their name to suit my coding style?



* If I get it right, the text is automatically styled in the style
specified by the "style" property associated with the number of the
current state. The association between state numbers and what they
represent is arbitrary. I.e. I could choose, say, state 23 to lex
keywords, for example, as long as style.script_zog.23 is assigned a
value. Right?


Thanks!

-- Lorenzo


Neil Hodgson

unread,
Jun 2, 2020, 6:40:03 PM6/2/20
to scite-interest
Lorenzo:

> * How can I differentiate between two different lexers so defined?
>
> That is, is there a way to define two different OnStyle functions, one used for, say, script_zog and another one for script_asm lexers?

Script lexing was implemented as a basic feature without a design for allowing multiple implementations. If someone develops a good way of supporting multiple script lexers then that can be included.

> * What value does styler:language get? The exact value of the lexer.* property (e.g., "script_zog"), or something else (e.g., the value is parsed to remove the "script_" prefix)?

It's the value of the “Language” property which is the name defined by the lexer.* property.

> * Is OnStyle called only for files whose lexer name begins with "script_", or is it called also for files lexed using built-in lexers?

It is called when Scintilla sends an SCN_STYLENEEDED notification which occurs when Scintilla sees that the current lexer is SCLEX_CONTAINER which most commonly occurs when SciTE set that lexer but could occur if there has been a failure to set a lexer.

> Is there a reason why they are globals? Can I just declare them as locals and possibly change their name to suit my coding style?

There is no reason for making them globals.

> * If I get it right, the text is automatically styled in the style specified by the "style" property associated with the number of the current state. The association between state numbers and what they represent is arbitrary. I.e. I could choose, say, state 23 to lex keywords, for example, as long as style.script_zog.23 is assigned a value. Right?

Style numbers are chosen in the range 0-255 by lexers except for the global styles 32-39 which are used for other features like line numbers.

Neil

Lorenzo Donati

unread,
Jun 3, 2020, 5:26:17 AM6/3/20
to scite-i...@googlegroups.com
Thank you for the prompt reply!

On 03/06/2020 00:39, 'Neil Hodgson' via scite-interest wrote:
> Lorenzo:
>
>> * How can I differentiate between two different lexers so defined?
>>
>> That is, is there a way to define two different OnStyle functions,
>> one used for, say, script_zog and another one for script_asm
>> lexers?

[snip]

>> Is there a reason why they are globals? Can I just declare them as
>> locals and possibly change their name to suit my coding style?
>
> There is no reason for making them globals.
>

May I then suggest to modify that examples so to make those vars locals?
Not only it would avoid people having doubts like mine, but it is also a
good programming practice, it could ease understanding and it would
speed execution a bit (which probably in that context is relevant, since
OnStyle seems to be going to be called quite a few times).

>> * If I get it right, the text is automatically styled in the style
>> specified by the "style" property associated with the number of the
>> current state. The association between state numbers and what they
>> represent is arbitrary. I.e. I could choose, say, state 23 to lex
>> keywords, for example, as long as style.script_zog.23 is assigned a
>> value. Right?
>
> Style numbers are chosen in the range 0-255 by lexers except for the
> global styles 32-39 which are used for other features like line
> numbers.
>
> Neil
>

Thanks again.

Cheers!

-- Lorenzo

Michel Sauvard

unread,
Jun 3, 2020, 7:02:42 AM6/3/20
to scite-interest
Hello
For different lexers, have you seen my answer?
Best regards

Le lundi 1 juin 2020 15:09:53 UTC+2, Lorenzo Donati a écrit :

Lorenzo Donati

unread,
Jun 3, 2020, 8:03:25 AM6/3/20
to scite-i...@googlegroups.com
On 03/06/2020 13:02, Michel Sauvard wrote:
> Hello
> For different lexers, have you seen my answer?
> Best regards
>
[snip]
>>
>
Hi Michel!

Actually I only skimmed on it, since I was still gathering information
on how to write a Lexer and some of the things you said still didn't
make any sense for me, since I hadn't studied the topic deeply.

I was waiting for Neil to answer my doubts before getting a prototype
working and answering your post.

Moreover I've quite a tight schedule at work - nothing Lua or SciTE
related - so this is something I'm researching on my free time.

From what I understand, given what Neil has just told me in his post,
changing OnStyle once your Lua engine is loaded is a recipe for
disasters, since OnStyle is called so frequently and unpredictably.

That was one of the reasons why I asked Neil about it.

Moreover I think SciTE is a multithreaded application (at least I
observed sometimes it spawns threads on Windows). So there might be
concurrency issues by changing a callback like OnStyle while the script
is running (but I'm just guessing).

I think the best strategy would be to have a piece of code at the
beginning of OnStyle that selects different "lexing engines" according
to props['Language'].

It could be a simple if-elseif chain if you must discriminate among only
a few lexers. Perhaps a more general approach would be something like
(completely untested):


local lexer_engines = {}
lexer_engine.script_zog = function(styler) ... end
lexer_engine.script_asm = function(styler) ... end
lexer_engine.script_c = function(styler) ... end


function OnStyle( styler )
local lang = props['Language']
-- Dispatch to the right implementation
local engine = lexer_engines[ lang ]
if engine then
engine( styler )
else
-- ignore unsupported language or raise error
end
end



This, at least, will be my first approach when I get some time to
actually work on this.


Cheers!

-- Lorenzo










Michel Sauvard

unread,
Jun 3, 2020, 1:45:16 PM6/3/20
to scite-interest
Hi Lorenzo

Well, that's why I dont change OnStyle. I just put an OnStyle on the buffer, checked by OnStyle.
I found the hard way that changing OnStyle in dangereous but we have no problem using the function pointed by buffer.
This is logical because the buffer contains the text and so must be valid when applying style on it.
And this is true for other callback, I use the same trick with OnStyle, OnKey, OnDoubleClick and OnUpdateUI wich is called very often.
It is equivalent to your if, because you have to test something that must be correlated to the buffer, like props['Language']

Best regards

Le lundi 1 juin 2020 15:09:53 UTC+2, Lorenzo Donati a écrit :

Neil Hodgson

unread,
Jun 3, 2020, 6:54:28 PM6/3/20
to scite-i...@googlegroups.com
Lorenzo:

> Moreover I think SciTE is a multithreaded application (at least I observed sometimes it spawns threads on Windows). So there might be concurrency issues by changing a callback like OnStyle while the script is running (but I'm just guessing).

Lua is always run on the main thread.

Background threads may be used for loading and saving files, performing find in files, and running external programs. In the source code, all threads are started by SciTEBase::PerformOnNewThread.

Neil

Lorenzo Donati

unread,
Jun 3, 2020, 9:16:29 PM6/3/20
to scite-i...@googlegroups.com
Thanks for the info!

Neil Hodgson

unread,
Apr 3, 2021, 3:40:10 AM4/3/21
to scite-i...@googlegroups.com
Lorenzo Donati:

May I then suggest to modify that examples so to make those vars locals?

   Documentation updated.

   Neil

Reply all
Reply to author
Forward
0 new messages