Code from CEDET can understand syntax of file content based on
Nxml mode parse RelaX NG schema to find allowed tags and attributes
in XML files.
I also would like hear that done in this field for Emacs.
To do this in a sensible way you need a real parser, which can be
implemented using e.g. Semantic. Simple regular expressions and
such cannot be used for this purpose in a sensible way.
Really by '(define-generic-mode ...)' you usually specify regex.
There are possibility use functions for MATCHER (see doc for
You can implement rudimental parser by that way,
but you must carefully wrote it for performance reason.
Is there any good example of use function for MATCHER?
> Following Xah Lee's excellent tutorial, I have been able to get the
> basics done - syntax highlighting, indentation, and so on. What I am
> missing is a small part of the syntax highlighting related to variables.
> Declarations work fine - for example
> int x = 0
> is correctly highlighted. What I can't work out how to do is to
> highlight declared variables in the rest of the code, for example when I
> later use x such as
> x = x+1
> Does anyone have any ideas? Ideally I'd like to only highlight those
> variables I have really declared, not something that just looks like it
> *might* be a variable, so I can see immediately if I've made a mistake
> in my coding or typing.
A lot depends on the language, but in general, you cannot do this
reliably unless you have some sort of parsing support. Some have tried
doing this with regexp, but unless the language is /very/ simple, the
regexp will become too complex. To do it correctly, Emacs needs to
understand the code (i.e. parse it) to determine what class a token
represents. This means you need a mechanism to specify the grammar and
an engine to apply that grammar to the code to parse it. Consider
something like the following to see why only basic regexp will not work
a = b;
b = foo( a + 1 );
c = bar() + b;
For emacs to recognise that a, b and c are all variables, it needs to
know how they would be parsed. Worse still, to know that c has not been
declared as a variable, it needs to know/remember the variables that
have been declared and recognise that c has not (or maybe it was in an
earlier context i.e. like a global). It is farily evident that regexp
are insufficient in this respect.
Things become further complicated when your editing code because the
buffer is often in a state where it cannot be parsed because statements
are incomplete/incorrect. At that point, you then need to make a
decision about what to do with the font-locking of the code - leave it
incorrectly font-locked, remove existing font-locking or something
in-between. To complicate matters further, you also need to consider
performance. Depending on the size of the files being edited,
continuously parsing the buffer is likely to degrade performance and
slow down editing.
The CEDET tools and semantic can be used to implement simple parsing of
code, but it is fairly complex and you still have the issue of handling
incomplete code and deciding what to do with it etc.
In general, while it is theoretically possible to do what you want, the
amount of work required is often too high to be worth the effort.
tcross (at) rapttech dot com dot au