LexAsm, LexBasic and LexD converted to lexer objects

594 views
Skip to first unread message

dlchnr

unread,
Jan 23, 2011, 9:16:13 PM1/23/11
to scintilla-interest
I've converted LexAsm, LexBasic and LexD to lexer objects
to prepare them for adding further folding features/properties.

LexD.cxx includes also a patch for a minnor bug (a line
including a fold end won't be included to the collapsed fold,
if it is the last line of file).

I haven't declared a PropSetSimple props object in the
lexer object and also deleted the neccessary include file
(#include "PropSetSimple.h"), because it isn't used.
Are there plans, to use it for something?
Maybe, we should it delete also from LexCPP.cxx?

I've set options.foldCompact to true in LexD.cxx
and LexBasic.cxx being compatible with
GetPropertyInt("fold.compact", 1) in existing versions.
Maybe we should set options.foldCompact to true also
in LexCPP.cxx, being compatible with versions of scintilla
before introducing OptionSet and having the same default
value for foldCompact in all lexers?

http://dlchnr.dl.funpic.de/in/abdLex.zip

dlchnr

unread,
Jan 26, 2011, 8:23:39 PM1/26/11
to scintilla-interest
Now I'm ready with implementing further folding features
and properties in LexAsm, LexBasic and LexD.

LexD now offers same as LexCPP apart from preprocessor stuff,
including user configurable explicit folding points.

---------------------------------------
# Folding
#fold.d.syntax.based=0
#fold.d.comment.multiline=0
#fold.d.comment.explicit=0
#defaults for fold.d.explicit.start=//{ and fold.d.explicit.end=//}
# can be replaced by defining custom strings, e.g. //[ and //]
#fold.d.explicit.start=//[
#fold.d.explicit.end=//]
#if fold strings are set to something like ///[ and ///] ("///" won't
be handled as COMMENTLINE, but COMMENTLINEDOC),
# enable fold.d.explicit.anywhere, allowing explicit fold points
being anywhere, not just in line comments
#fold.d.explicit.anywhere=1
#lexer.d.fold.at.else=1
---------------------------------------

LexBasic has been extented by user configurable explicit
folding points, too.

---------------------------------------
# Folding (BlitzBasic + PureBasic)
#fold.basic.syntax.based=0
#fold.basic.comment.explicit=0
#defaults for fold.basic.explicit.start=;;{ and
fold.basic.explicit.end=;;}
# can be replaced by defining custom strings, e.g. ;;[ and ;;]
#fold.basic.explicit.start=;;[
#fold.basic.explicit.end=;;]
#if fold strings are set to something like "REM [" and "REM ]",
# enable fold.basic.explicit.anywhere, allowing explicit fold points
being
# anywhere, not just in line comments
#fold.basic.explicit.anywhere=1

# Folding (FreeBasic)
#fold.basic.syntax.based=0
#fold.basic.comment.explicit=0
#defaults for fold.basic.explicit.start=''{ and
fold.basic.explicit.end=''}
# can be replaced by defining custom strings, e.g. ''[ and '']
#fold.basic.explicit.start=''[
#fold.basic.explicit.end='']
#if fold strings are set to something like "REM [" and "REM ]",
# enable fold.basic.explicit.anywhere, allowing explicit fold points
being
# anywhere, not just in line comments
#fold.basic.explicit.anywhere=1
---------------------------------------

Besides user configurable explicit folding points LexAsm
has been extended by folding multiline comments based on
MASM's COMMENT directive with configurable delimiter
(GNU's SCE_ASM_COMMENTBLOCK (/*...*/) should be folded
too, if implemented in lexer) and by syntax based folding.

---------------------------------------
#COMMENT directive's delimter defaults to ~
#set lexer.asm.comment.delimiter accordingly, if another delimiter
will be used
#lexer.asm.comment.delimiter=^

# Folding
#fold.asm.syntax.based=0
#fold.asm.comment.multiline=0
#fold.asm.comment.explicit=0
#defaults for fold.asm.explicit.start=;;{ and
fold.asm.explicit.end=;;}
# can be replaced by defining custom strings, e.g. ;;[ and ;;]
#fold.asm.explicit.start=;;[
#fold.asm.explicit.end=;;]
#if fold strings are set to something like "a regi" and "a endr"
# (e.g. for Visual Studio's "#pragma region" and "#pragma
endregion"),
# enable fold.asm.explicit.anywhere, allowing explicit fold points
being
# anywhere, not just in line comments
#fold.asm.explicit.anywhere=1

# COMMENT Directive multiline comment
style.asm.15=fore:#77CC00
---------------------------------------

MASM and NASM users asked to optimize the list of
folding points. Any idea, how to support something
like MASM's proc/endp for NASM?

static int CheckFoldPoint(char const *token) {
if (!strcmp(token, "proc") || // group 00
!strcmp(token, ".if") || // group 01
!strcmp(token, ".repeat") || // group 02
!strcmp(token, "segment") || // group 03
!strcmp(token, "union") || // group 03
!strcmp(token, "struct") || // group 03/04
!strcmp(token, "istruc") || // group 05
!strcmp(token, "%macro") || // group 06
!strcmp(token, "%imacro") || // group 06
!strcmp(token, "macro") || // group 07
!strcmp(token, "for") || // group 07
!strcmp(token, "forc") || // group 07
!strcmp(token, "irpc") || // group 07
!strcmp(token, "repeat") || // group 07
!strcmp(token, "rept") || // group 07
!strcmp(token, "while") || // group 07
/* !strcmp(token, "if") || // group 08
!strcmp(token, "if1") || // group 08
!strcmp(token, "if2") || // group 08
!strcmp(token, "ifb") || // group 08
!strcmp(token, "ifnb") || // group 08
!strcmp(token, "ife") || // group 08
!strcmp(token, "ifdef") || // group 08
!strcmp(token, "ifndef") || // group 08
!strcmp(token, "ifdif") || // group 08
!strcmp(token, "ifdifi") || // group 08
!strcmp(token, "ifidn") || // group 08
!strcmp(token, "ifidni") || // group 08
!strcmp(token, "%if") || // group 09
!strcmp(token, "%ifdef") || // group 09
!strcmp(token, "%ifndef") || // group 09
!strcmp(token, "%ifmacro") || // group 09
!strcmp(token, "%ifnmacro") || // group 09
!strcmp(token, "%ifctk") || // group 09
!strcmp(token, "%ifnctk") || // group 09
!strcmp(token, "%ifidn") || // group 09
!strcmp(token, "%ifnidn") || // group 09
!strcmp(token, "%ifidni") || // group 09
!strcmp(token, "%ifnidni") || // group 09
!strcmp(token, "%ifid") || // group 09
!strcmp(token, "%ifnid") || // group 09
!strcmp(token, "%ifstr") || // group 09
!strcmp(token, "%ifnstr") || // group 09
!strcmp(token, "%ifnum") || // group 09
!strcmp(token, "%ifnnum") || // group 09
*/ !strcmp(token, ".while")) { // group 99
return 1;
}
if (!strcmp(token, "endp") || // group 00
!strcmp(token, ".endif") || // group 01
!strcmp(token, ".until") || // group 02
!strcmp(token, ".untilcxz") || // group 02
!strcmp(token, "ends") || // group 03
!strcmp(token, "endstruc") || // group 04
!strcmp(token, "iend") || // group 05
!strcmp(token, "%endmacro") || // group 06
!strcmp(token, "endm") || // group 07
/* !strcmp(token, "endif") || // group 08
!strcmp(token, "%endif") || // group 09
*/ !strcmp(token, ".endw")) { // group 99
return -1;
}
return 0;
}

http://dlchnr.dl.funpic.de/in/abdLexFF.zip

Neil Hodgson

unread,
Jan 28, 2011, 7:08:39 AM1/28/11
to scintilla...@googlegroups.com
dlchnr:

> I've converted LexAsm, LexBasic and LexD to lexer objects
> to prepare them for adding further folding features/properties.

LexBasic implements three different lexers, with differing keyword
lists but there is only a single OptionSetBasic class whose
constructor calls DefineWordListSets 3 times with each lexer's word
list names making a composite list of 12 items that does not match any
on the lexers.

In the D lexer, none of the options shared with the cpp lexer
should have descriptions.

> I haven't declared a PropSetSimple props object in the
> lexer object and also deleted the neccessary include file
> (#include "PropSetSimple.h"), because it isn't used.
> Are there plans, to use it for something?

This was used in an earlier version but no longer appears
necessary. The empty definition in Accessor.h is all that is needed by
most lexers now. Committed removal of #include from most lexers and
props in LexCPP.

Neil

dlchnr

unread,
Jan 28, 2011, 8:28:04 PM1/28/11
to scintilla-interest
>    LexBasic implements three different lexers, with differing keyword
> lists but there is only a single OptionSetBasic class whose
> constructor calls DefineWordListSets 3 times with each lexer's word
> list names making a composite list of 12 items that does not match any
> on the lexers.

I wonder, coloring has worked, despite this bug - I think,
I've done it better now.

>    In the D lexer, none of the options shared with the cpp lexer
> should have descriptions.

I've changed it and updated the zip-file:
http://dlchnr.dl.funpic.de/in/abdLex.zip

But there will come up contradictions when implementing
explicit fold points in LexAsm and LexBasic - fold.comment
can't enable folding //{ and //} in this case, it has to be
;;{ and ;;} (''{ and ''} for FreeBasic), what I've choosen or
something like that. Should we give up the desciption
in this case too? Or should I introduce fold.asm.comment
and fold.basic.comment like suggested in comments?
http://dlchnr.dl.funpic.de/in/abdLexFF.zip

Which timeframe has to be placed between the
patches (patch1, which converts to lexer object
and patch2, which add further folding features),
before adding them to scitilla code base?

Neil Hodgson

unread,
Jan 29, 2011, 8:05:10 AM1/29/11
to scintilla...@googlegroups.com
dlchnr:

> I wonder, coloring has worked, despite this bug - I think,
> I've done it better now.

Before lexer objects, client code had to look at compile time
information like SciLexer.h or run a script over the source code of
all the lexers to see which properties they are controlled by. With
lexer objects the client can call SCI_PROPERTYNAMES to find out the
properties or SCI_DESCRIBEKEYWORDSETS to find the keywords. ScITE
doesn't use these but other containers may want to provide a dialog
where you can see the names of the keyword sets and change the
keywords in each set.

>>    In the D lexer, none of the options shared with the cpp lexer
>> should have descriptions.
>
> I've changed it and updated the zip-file:
> http://dlchnr.dl.funpic.de/in/abdLex.zip

OK.

> But there will come up contradictions when implementing
> explicit fold points in LexAsm and LexBasic - fold.comment
> can't  enable folding //{ and //} in this case, it has to be
> ;;{ and ;;} (''{ and ''} for FreeBasic), what I've choosen or
> something like that. Should we give up the desciption
> in this case too? Or should I introduce fold.asm.comment
> and fold.basic.comment like suggested in comments?

Yes, these lexers never supported fold.comment so there is no
backward compatibility issue.

> Which timeframe has to be placed between the
> patches (patch1, which converts to lexer object
> and patch2, which add further folding features),
> before adding them to scitilla code base?

Depends on how long I take to review the code and whether changes
are needed. I'm still a little unhappy about foldAtElseInt having
three values.

There are some unused parameter warnings. To avoid these just don't
give a name to the parameter:

LexBasic.cxx
..\lexers\LexBasic.cxx(391) : warning C4100: 'initStyle' :
unreferenced formal parameter

LexAsm.cxx
..\lexers\LexAsm.cxx(281) : warning C4100: 'pAccess' : unreferenced
formal parameter
..\lexers\LexAsm.cxx(281) : warning C4100: 'initStyle' : unreferenced
formal parameter
..\lexers\LexAsm.cxx(281) : warning C4100: 'length' : unreferenced
formal parameter
..\lexers\LexAsm.cxx(281) : warning C4100: 'startPos' : unreferenced
formal parameter

Neil

dlchnr

unread,
Jan 29, 2011, 10:49:49 AM1/29/11
to scintilla-interest

>    Yes, these lexers never supported fold.comment so there is no
> backward compatibility issue.

But you want to see a fold.xyz.comment property, which enables
folding multiline comments and explicit folding points
(if both are present) and fold.xyz.comment.multiline and
fold.xyz.comment.explicit, which can switch them off
separately - or can we delete fold.comment completely
from these lexers and simply control folding multiline comment
with fold.comment.multiline and folding explicit fold points
with fold.comment.explicit?

> I'm still a little unhappy about foldAtElseInt having
> three values.

I have had no idea to avoid this and being completely
compatible with current version - you?
Should we have a small incompatibilty and simply
evaluate lexer.d.fold.at.else?

dlchnr

unread,
Jan 29, 2011, 11:13:02 AM1/29/11
to scintilla-interest
> LexBasic.cxx
> ..\lexers\LexBasic.cxx(391) : warning C4100: 'initStyle' :
> unreferenced formal parameter
>
> LexAsm.cxx
> ..\lexers\LexAsm.cxx(281) : warning C4100: 'pAccess' : unreferenced
> formal parameter
> ..\lexers\LexAsm.cxx(281) : warning C4100: 'initStyle' : unreferenced
> formal parameter
> ..\lexers\LexAsm.cxx(281) : warning C4100: 'length' : unreferenced
> formal parameter
> ..\lexers\LexAsm.cxx(281) : warning C4100: 'startPos' : unreferenced
> formal parameter

I've updated the zip-file again:
http://dlchnr.dl.funpic.de/in/abdLex.zip

Neil Hodgson

unread,
Feb 3, 2011, 8:26:21 AM2/3/11
to scintilla...@googlegroups.com
dlchnr:

> I have had no idea to avoid this and being completely
> compatible with current version - you?
> Should we have a small incompatibilty and simply
> evaluate lexer.d.fold.at.else?

Its difficult to work out what the best path is here. I'd like to
have a simple model for properties where they have primitive values
without using an out of range value to indicate they haven't been set.
A boolean property should be either true or false. On the other hand,
changing the names and meaning of properties will break client code.
Adding an extra boolean to every property value to mark them as set or
not set seems to me to be just making this more complex for the rest
of time.

Neil

Neil Hodgson

unread,
Feb 3, 2011, 8:30:32 AM2/3/11
to scintilla...@googlegroups.com
dlchnr:

>        if (!strcmp(token, "proc") ||            // group 00
>                !strcmp(token, ".if") ||             // group 01
>                !strcmp(token, ".repeat") ||         // group 02
>                !strcmp(token, "segment") ||         // group 03
>                !strcmp(token, "union") ||           // group 03
>                !strcmp(token, "struct") ||          // group 03/04
>                !strcmp(token, "istruc") ||          // group 05
>                !strcmp(token, "%macro") ||          // group 06
>                !strcmp(token, "%imacro") ||         // group 06
>                !strcmp(token, "macro") ||           // group 07
>                !strcmp(token, "for") ||             // group 07
>                !strcmp(token, "forc") ||            // group 07
>                !strcmp(token, "irpc") ||            // group 07

> ...

That is inefficient. Make a std::set<std::string> or similar at
initialisation that contains all the magic words and then just check
if the word is in the set.

Neil

Neil Hodgson

unread,
Feb 3, 2011, 9:37:24 PM2/3/11
to scintilla...@googlegroups.com
Committed simple conversion to lexer objects of LexAsm.cxx and LexBasic.cxx.

Neil

dlchnr

unread,
Feb 6, 2011, 6:16:29 PM2/6/11
to scintilla-interest


On 4 Feb., 03:37, Neil Hodgson <nyamaton...@gmail.com> wrote:
>    Committed simple conversion to lexer objects of LexAsm.cxx and LexBasic.cxx.
>
>    Neil

I've updated http://dlchnr.dl.funpic.de/in/abdLexFF.zip
including LexAsm.cxx and LexBasic.cxx supporting
the additional folding features and properties.
I've considered your hints also in LexD.cxx,
but the lexer.d.fold.at.else behavior is still open
(as in LexD.cxx without the extensions).

Neil Hodgson

unread,
Feb 7, 2011, 6:44:42 PM2/7/11
to scintilla...@googlegroups.com
dlchnr:

> I've updated http://dlchnr.dl.funpic.de/in/abdLexFF.zip
> including LexAsm.cxx and LexBasic.cxx supporting
> the additional folding features and properties.

The lists of folding start and end points is quite long with many
commented out indicating uncertainty. In this situation it would be
better to allow the user to control the set of start and end points by
adding two keyword lists. In most languages the set of keywords is
much more certain so it is OK to have these hard coded.

> I've considered your hints also in LexD.cxx,
> but the lexer.d.fold.at.else behavior is still open
> (as in LexD.cxx without the extensions).

Despite my misgivings, I can't see a good way around this so will
include the foldAtElseInt code.

StyleContext::Complete() calls styler.Flush() so there is no need
to include an extra call to styler.Flush in lexers that already call
StyleContext::Complete().

C++ requires two / characters to start a comment whereas Basic only
requires a single ' or ; so using the sequences ''{ or ;;{ does not
really appear to match normal Basic style.

Does the early return if foldSyntaxBased and foldCommentExplicit
are false lead to stale fold points if these are turned off during a
session?

Document consuming loops can be unsafe without checks for end of
file. For example, these in the assembler lexer look problematic
although I'm unsure whether they can loop past the end:

while (IsASpaceOrTab(sc.ch)) {
sc.ForwardSetState(SCE_ASM_DEFAULT);
}

while (!sc.atLineEnd) {
sc.Forward();
}

Neil

dlchnr

unread,
Feb 8, 2011, 9:11:36 PM2/8/11
to scintilla-interest
> The lists of folding start and end points is quite long with many
> commented out indicating uncertainty. In this situation it would be
> better to allow the user to control the set of start and end points by
> adding two keyword lists. In most languages the set of keywords is
> much more certain so it is OK to have these hard coded.

It's true, that I'm not sure, what good fold points are - it's been
25 years since the last time I worked with MASM (MASM 4.0).
Therefore my call to optimize it.
I've implemented the keyword sets, the implementation works,
but I still would prefer fold points provided by an
experienced programmer, because a simple, accidental change
(e.g. I've delete the % before macro, when I moved the keywords
to the set) can result in a catastrophic misfolded document.

> Despite my misgivings, I can't see a good way around this so will
> include the foldAtElseInt code.

That means, there's nothing to change for me in the both versions
of LexD.cxx?

> StyleContext::Complete() calls styler.Flush() so there is no need
> to include an extra call to styler.Flush in lexers that already call
> StyleContext::Complete().

My first thought was that StyleContext::Complete() does, what in
LexLisp.cxx is missed, but when I saw the sequence
sc.Complete();
styler.Flush();
in LexCPP.cxx, I was convinced, the idea was wrong - I then simply
copied from LexCPP.cxx. I will remove it from all files,
where I've added it (of course not in LexLisp.cxx :-).

> C++ requires two / characters to start a comment whereas Basic only
> requires a single ' or ; so using the sequences ''{ or ;;{ does not
> really appear to match normal Basic style.

two ideas
- the shorter the sequence, the more likely the risk of producing
the sequence accidentally
- using two comment signs + { / } could be the Scintilla "Standard"
for explicit fold points

You want to see '{ and '} or ;{ and ;}?

>
> Does the early return if foldSyntaxBased and foldCommentExplicit
> are false lead to stale fold points if these are turned off during a
> session?

I must admit that I didn't expect, changes can occur interactively.
I catch a meaning of the lexer, not of the whole Scintilla /
SciTE project - would the following solve the problem?

void SCI_METHOD LexerBasic::Fold(unsigned int startPos, int length,
int /* initStyle */, IDocument *pAccess) {

foldBasic = !foldBasic ? options.foldSyntaxBased ||
options.foldCommentExplicit : true;
if (!(options.fold && foldBasic)
return;
:
loop
:
foldBasic = options.foldSyntaxBased || options.foldCommentExplicit;
}

Otherwise I've to delete this optimisation!

> Document consuming loops can be unsafe without checks for end of
> file. For example, these in the assembler lexer look problematic
> although I'm unsure whether they can loop past the end:
>
> while (IsASpaceOrTab(sc.ch)) {
> sc.ForwardSetState(SCE_ASM_DEFAULT);
> }
>
> while (!sc.atLineEnd) {
> sc.Forward();
> }

I've checked this - Forward() (which will be used by
ForwardSetState(..))
doesn't step behind endPos - and it sets atLineEnd to true, if
necessary,
so the second while can't loop infinite. But my conviction, EOL and
EOF
aren't "ASpaceOrTab" was wrong. I've to change the first while too:
while (IsASpaceOrTab(sc.ch) && !sc.atLineEnd) {

Then it's possible to type "comment" at end of file :-)

Neil Hodgson

unread,
Feb 9, 2011, 5:26:24 PM2/9/11
to scintilla...@googlegroups.com
dlchnr:

> but I still would prefer fold points provided by an
> experienced programmer, because a simple, accidental change
> (e.g. I've delete the % before macro, when I moved the keywords
> to the set) can result in a catastrophic misfolded document.

With the folding words configurable, eventually a MASM user will
turn up and fix the issue.

> That means, there's nothing to change for me in the both versions
> of LexD.cxx?

There is the early exit from folding issue.

> My first thought was that StyleContext::Complete() does, what in
> LexLisp.cxx is missed, but when I saw the sequence
>        sc.Complete();
>        styler.Flush();
> in LexCPP.cxx, I was convinced, the idea was wrong - I then simply
> copied from LexCPP.cxx. I will remove it from all files,
> where I've added it (of course not in LexLisp.cxx :-).

LexCPP.cxx had the Flush before it was included in StyleContext.

> two ideas
> - the shorter the sequence, the more likely the risk of producing
>  the sequence accidentally
> - using two comment signs + { / } could be the Scintilla "Standard"
>  for explicit fold points
>
> You want to see '{ and '} or ;{ and ;}?

Either find a Basic-specific IDE and copy its choice or just leave
out a default. If a user wants fold points then they have to define
fold.basic.explicit.start/end. Then if any consensus emerges it can be
added as the default.

> I must admit that I didn't expect, changes can occur interactively.

Part of the reason for lexer objects is to allow more interactive
changes to settings and for those changes to be visible in a
consistent way.

> I catch a meaning of the lexer, not of the whole Scintilla /
> SciTE project - would the following solve the problem?

> ...


> Otherwise I've to delete this optimisation!

This is looking like trouble so just drop the optimisation.

Just noticed that you check for c == '\n' for line end but line
ends may be just '\r' for old-style MacOS files.

Neil

dlchnr

unread,
Feb 10, 2011, 6:25:00 PM2/10/11
to scintilla-interest
> With the folding words configurable, eventually a MASM user will
> turn up and fix the issue.

Wordlists are implemented now.

> > That means, there's nothing to change for me in the both versions
> > of LexD.cxx?
>
> There is the early exit from folding issue.

I have had the "conversion only version" in mind, which isn't
committed, yet. The version with user configurable fold points
has been changed.

> > - using two comment signs + { / } could be the Scintilla "Standard"
> > for explicit fold points
>
> > You want to see '{ and '} or ;{ and ;}?
>
> Either find a Basic-specific IDE and copy its choice or just leave
> out a default. If a user wants fold points then they have to define
> fold.basic.explicit.start/end. Then if any consensus emerges it can be
> added as the default.

It's a good idea, to use marks, used by introduced editors - I've
found many marks with only a single comment sign but only one case,
where two comment signs have been used (LaTeX/WinF %%{{{, %%}}}).
This kills my idea. Furthermore scanning with a single comment sign
can also detect marks consisting of two comment signs, but not
vica versa - therefore I've implemented single comment sign
standard fold marks in ASM and Basic.

> > Otherwise I've to delete this optimisation!
>
> This is looking like trouble so just drop the optimisation.

I've deleted the optimisation

>
> Just noticed that you check for c == '\n' for line end but line
> ends may be just '\r' for old-style MacOS files.

That's the original code, but I changed it.

Udo

http://dlchnr.dl.funpic.de/in/abdLexFF.zip

Neil Hodgson

unread,
Feb 12, 2011, 7:18:02 PM2/12/11
to scintilla...@googlegroups.com
Most recent abdLexFF download is committed.

This is creating a long list of language specific properties in the
SciTE documentation. I may move these out into a separate section as
they are not of interest to users of other languages.

Neil

Reply all
Reply to author
Forward
0 new messages