Semantic Highlight

Jason Haslam

unread,

Jan 5, 2012, 4:20:45 PM1/5/12

to scintilla...@googlegroups.com

In understand (http://scitools.com) we allow the user to set the style of entity kinds (like function/type/macro/etc.) using information from the understand database. We implemented this by hacking scintilla to style the document with a lexer and *also* deliver style notifications as if it were styled by the container. So the lexer first styles a range and then the application restyles the same range changing the style of identifiers as needed. This works okay but it's a pain to maintain. I also think that other users might benefit from proper support for this kind of thing.

I don't have any concrete proposal to make, but the general idea is to provide some way for the application to install callbacks that the lexers can access at the point where they classify identifiers as keywords. For example, the IDocument interface could be extended to provide a list of KeywordProvider (or similar) objects to the lexers. Then all of the lexers would have to be modified to take advantage of this feature, but I don't see any way around that.

So the question (mostly for Neil), is there any interest in a feature like this for scintilla?

Jason

Neil Hodgson

unread,

Jan 6, 2012, 1:58:19 AM1/6/12

to scintilla...@googlegroups.com

Jason Haslam:

> We implemented this by hacking scintilla to style the document with
> a lexer and *also* deliver style notifications as if it were styled by the
> container. So the lexer first styles a range and then the application
> restyles the same range changing the style of identifiers as needed.

Glbthnrrk!

> I don't have any concrete proposal to make, but the general idea is
> to provide some way for the application to install callbacks that the
> lexers can access at the point where they classify identifiers as keywords.

I've been thinking about moving in the opposite direction - making
lexers more isolated for a couple of scenarios. They are performing
lexing in separate threads or in sandboxed processes. Sandboxing is
becoming more common in OS X 10.7 and Windows 8 and would diminish the
potential for security problems with Scintilla lexers by isolating
them into processes with very restricted capabilities. As
cross-process calls are slow, there would be a need to improve the
caching of the state produced by the lexer and folder so that it was
all returned to the application at the completion of each range.
Calling back to the application to determine the keyword status of
each identifier would be expensive in this situation compared to
keyword classifiers that can be implemented inside the sandbox
process.

Neil

Jason Haslam

unread,

Jan 6, 2012, 12:13:43 PM1/6/12

to scintilla...@googlegroups.com

Okay, I agree that the lexers calling back into application code isn't ideal for a number of reasons (stability, performance, etc.), but it's the best that I could come up with for this situation. Do you have any suggestion for how we can implement this kind of dynamic keyword classification in a more principled manner?

Thanks,
Jason

Neil Hodgson

unread,

Jan 6, 2012, 5:36:31 PM1/6/12

to scintilla...@googlegroups.com

Jason Haslam:

> Okay, I agree that the lexers calling back into application code
> isn't ideal for a number of reasons (stability, performance, etc.),
> but it's the best that I could come up with for this situation.

I don't want to close the door completely but there are competing
directions here.

> Do you have any suggestion for how we can implement this kind
> of dynamic keyword classification in a more principled manner?

In many cases, it is possible to dynamically compile word lists
from the database and use the existing functionality. You'll probably
now tell me that this is too slow for your situation.

Neil

Jason Haslam

unread,

Jan 6, 2012, 6:22:51 PM1/6/12

to scintilla...@googlegroups.com

It's not even that it's too slow, it just doesn't work. Different entity kinds can share the same name. Consider this snippet as styled by understand:

namespace foo {
class foo {
foo(int foo);
};
}

Each occurrence of foo should have a different style. For this to work the callback would certainly have to provide the document position (or line and column) in addition to the identifier.

Jason

Mike Lischke

unread,

Jan 7, 2012, 4:15:57 AM1/7/12

to scintilla...@googlegroups.com

In many cases, it is possible to dynamically compile word lists
from the database and use the existing functionality. You'll probably
now tell me that this is too slow for your situation.

It's not even that it's too slow, it just doesn't work. Different entity kinds can share the same name. Consider this snippet as styled by understand:
namespace foo {
class foo {
foo(int foo);
};
}
Each occurrence of foo should have a different style. For this to work the callback would certainly have to provide the document position (or line and column) in addition to the identifier.

That's an interesting problem so I'm curious to see what could be done. Since you want a semantic styling not just syntax highlighting the lexer in Scintilla is probably not a good place to achieve that. Not even a callback or a notification message could provide enough info for that semantic determination (well, if you find a quick way to map a document position to a parser tree node then it might work).

So I think one solution could be to let the application provide the style info somehow instead of an internal lexer. I'm not sure yet how this could look like, but IMO it is better to generate the style information close to where the source data is stored that determines the styles.

The opposite direction might be a solution too: create a "plugin interface" that allows Scintilla to optionally use a parser too (e.g. via an additional DLL). That would also work quite well with the isolation approach Neil talked about.

Just my 0.02€,

Mike
--
www.soft-gems.net

Jason Haslam

unread,

Jan 7, 2012, 7:47:05 PM1/7/12

to scintilla...@googlegroups.com

I don't propose adding semantic highlighting to scintilla, just a mechanism to facilitate dynamic keyword classification in the lexers. We could do all of the styling in the container but that's exactly what we want to avoid. We'd have to reinvent the wheel for each of the languages (around a dozen) that support semantic highlighting in understand. Not using the lexers is really a non-starter.

In understand we have a performant way to look up the semantic kind of an identifier given it's position in a document. I imagine that something similar could be built on ctags or clang's index library, for example. The difficulty is how do we take advantage of these libraries without also reproducing everything that the lexers do.

Jason

Neil Hodgson

unread,

Jan 7, 2012, 8:59:51 PM1/7/12

to scintilla...@googlegroups.com

I don't think that it would be sensible to add this to mainline
Scintilla at this time. It is possible that it would work well but it
is also possible that it would make other functionality more difficult
to implement.

It would be better to treat this as an experimental branch.
ILexer::PrivateCall can be used to implement arbitrary application to
lexer communication. Pass your KeywordProvider pointer through
PrivateCall then call it when needed. There will probably be three
code insertions into each object-based lexer for classifying one form
of word: declaration of KeywordProvider; implementation of PrivateCall
to set; and a call inside word classification code. It should not take
all that much work to write the code then create a set of patches that
can be reapplied if the upstream lexers change.

The diversity of keyword handling in current lexers due to the
semantic importance of some keywords and keyword-like things (such as
string type prefixes) may make it impossible to write a completely
automatic patching script.

Publishing your derived lexers would then allow others to experiment.

Neil

Jason Haslam

unread,

Jan 8, 2012, 11:42:51 PM1/8/12

to scintilla...@googlegroups.com

Interesting, I never noticed PrivateCall before. I may try this. It would certainly be less horrible than the current hack. My only concern is that it might be even more difficult to maintain going forward. Thanks,

Jason

Reply all

Reply to author

Forward