Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

how do I have a mode where '#' is a comment but '.#.' isn't?

11 views
Skip to first unread message

stuart

unread,
Dec 16, 2008, 6:54:02 PM12/16/08
to
I have been working on a mode for a program where a hash mark by
itself is a comment character (#) whereas a hash mark surrounded by
dots (.#.) is not. Currently, I'm using this:

;; Change the interpretation of particular chars in Emacs' syntax
table
(defvar fst-mode-syntax-table
(let ( (fst-mode-syntax-table (make-syntax-table) ) )
(modify-syntax-entry ?# "<" fst-mode-syntax-table) ; start
comment
(modify-syntax-entry ?\n ">" fst-mode-syntax-table) ; end
comment
(modify-syntax-entry ?\\ "_" fst-mode-syntax-table) ; don't
escape quote
(modify-syntax-entry ?% "/" fst-mode-syntax-table) ;
functions as escape char
fst-mode-syntax-table )
"Syntax table for fst-mode" )

But it doesn't do the right thing--i.e., it treats '.#.' as a dot
followed by a comment. Is there any easy fix here? Thanks in advance.

P.S. Here's the entire mode file:

;;
---------------------------------------------------------------------------
;; Author: Stuart Robinson
;; Date: 10 September 2007
;; Description: This file provides an fst mode for emacs
;; Versions: 0.1 -- initial implementation for fst files with the
suffix
;; .infile
;; 0.2 -- cleaned up regex for built-ins, fixed
highlighting
;; glitches observed by Lauri Karttunen ,
added .script
;; file suffix for mode
;; 0.3 -- fixed built-in highlighting for 'regex' and
'push'
;; 0.4 -- backslashes are handled properly (not as
escape char)
;; 0.5 -- various minor fixes (added .p./.P./.o.)
;; Notes: * original implementation based on code found at
;; http://two-wugs.net/emacs/mode-tutorial.html
;; * the pre-defined font categories for emacs need to
be thought
;; out in terms of fst: for example, commands and
built-in
;; variables alike are treated as builtins, and it's
not clear
;; what a keyword would be; there are also probably
more constants
;; Bugs: * curly braces should be treated as string
delimiteres (like
;; quotes)
;; * pound sign in .#. treated as beginning of comment
;;
---------------------------------------------------------------------------

(defvar fst-mode-hook nil)

(defvar fst-mode-map
(let ((fst-mode-map (make-keymap)))
(define-key fst-mode-map "\C-j" 'newline-and-indent)
fst-mode-map)
"Keymap for fst major mode")

;; built-ins & variables

(defconst fst-font-lock-keywords-1
(list
;; NOTE: The regex is wrapped in \\< and \\> to ensure that it only
matches words
;; (i.e., it doesn't match substrings). It is generated by
running the program
;; keyword-list-to-emacs-regex.rb on the file fst-keywords-no-
spaces.txt.
'("\\<\\(for\\|add properties\\|ambiguity net\\|\\(apply \\
(patterns \\)?\\)?\\(up\\|down\\)\\|cleanup net\\|clear\\( stack\\)?\\|
close sigma\\|collect epsilon-loops\\|compact \\(net
\\|sigma\\)\\|compile-replace \\(upper\\|lower\\)\\|complete net\\|
compose\\( net\\)?\\|compose-apply \\(up\\|down\\)\\|concatenate\\( net
\\)?\\|continue script\\|crossproduct net\\|det
erminize net\\|edit properties\\|eliminate flag\\|epsilon-remove net\\|
extract-compile-replace \\(upper\\|lower\\)\\|factorize \\(up\\|down\\)
\\|ignore net\\|inspect net\\|interrupt scr
ipt\\|intersect\\( net\\)?\\|invert\\( net\\)?\\|label net\\|load\\( \\
(defined\\|stack\\)\\)?\\|lower-side net\\|minimize net\\|minus net\\|
multi char sigma net\\|name net\\|negate\\(
net\\)?\\|one-plus net\\|optimize net\\|paste net labels\\|pop stack\\|
print \\(aliases\\|arc-tally\\|defined\\|directory\\|eqv-labels\\|file-
info\\|flags\\|label-maps\\|label-tally\\|l
abels\\|list\\|lists\\|longest-string\\(-size\\)?\\|lower-words\\|name\
\|net\\|nth-lower\\|nth-upper\\|num-lower\\|num-upper\\|random-lower\\|
random-upper\\|random-words\\|shortest-stri
ng\\(-size\\)?\\|sigma\\(\\(-word\\)?-tally\\)?\\|size\\|stack\\|
storage\\|upper-words\\|words\\)\\|prune net\\|push\\( \\(defined\\|
epsilons\\)\\)?\\|\\(read \\)?regex\\|read \\(lec\\|
lexc\\|prolog\\|properties\\|spaced-text\\|text\\)\\|reduce labelset\\|
reverse\\( net\\)?\\|rotate stack\\|save\\( \\(defined\\|stack\\)\\)?\
\|share arcs\\|shuffle net\\|sigma net\\|sin
gle char sigma net\\|sort net\\|sub-string net\\|substitute \\(defined\
\|label\\|symbol\\)\\|substring net\\|\\(test \\)?equivalent\\|test \\
(lower-bounded\\|lower-universal\\|non-null\
\|null\\|overlap\\|sublanguage\\|unambiguous-down\\|unambiguous-up\\|
upper-bounded\\|upper-universal\\)\\|turn stack\\|twosided flag-
diacritics\\|uncompact net\\|unfactorize down\\|unfa
ctorize up\\|union\\( net\\)?\\|unoptimize\\( net\\)?\\|unreduce
labelset\\|unshare arcs\\|unvectorize net\\|upper-side net\\|vectorize
net\\|virtual \\(compose\\|concatenate\\|copy\\|d
eterminize\\|intersect\\|lower\\|minus\\|negate\\|one-plus\\|option\\|
priority-union\\|union\\|upper\\|zero-plus\\)\\|write \\(definition\\|
dot\\|prolog\\|properties\\|spaced-text\\|tex
t\\)\\|zero-plus net\\|alias\\|apropos\\|assert\\|char-encoding\\|
completion\\|copyright-owner\\|count-patterns\\|define\\|delete-
patterns\\|directory\\|echo\\|extract-patterns\\|fail-s
afe-composition\\|flag-is-epsilon\\|help\\|label-map\\|license-type\\|
list\\|locate-patterns\\|mark-\\(patterns\\|version\\)\\|max-\\
(context-length\\|state-visits\\)\\|minimal\\|name-n
ets\\|need-separators\\|obey-flags\\|print-\\(pairs\\|sigma\\|space\\)\
\|process-in-order\\|quit\\(-on-fail\\)?\\|quote-special\\|random-seed\
\|recode-cp1252\\|recursive-\\(apply\\|defi
ne\\)\\|retokenize\\|seq-\\(final-arcs\\|intern-arcs\\|string-one\\)\\|
set\\|show\\(-flags\\)?\\|sort-arcs\\|source\\|system\\|undefine\\|
unlist\\|use-mmap\\|use-timer\\|vectorize-n\\|v
erbose\\|virtual-to-real\\)\\>" . font-lock-builtin-face)
'("\\('\\w*'\\)" . font-lock-variable-name-face))
"Minimal highlighting expressions for FST mode")

;; keywords and constants

(defconst fst-font-lock-keywords-2
(append fst-font-lock-keywords-1
(list
'("\\(\\$\\|?\\|~\\|@\\||\\|->\\|<-\\|&\\|_\\|*\\|\\
\\\\|0\\|\\.#\\.\\|\\.P\\.\\|\\.p\\.\\|\\.o\\.\\|@->\\|->@\\|@>\\|>@\
\)" . font-lock-keyword-face)
'("\\<\\(ON\\|OFF\\|NONE\\)\\>" . font-lock-
constant-face)))
"Additional Keywords to highlight in FST mode")

;; don't understand why this is necessary...

(defconst fst-font-lock-keywords-3
(append fst-font-lock-keywords-2
(list
'("" . font-lock-constant-face)))
"Balls-out highlighting in FST mode")

(defvar fst-font-lock-keywords fst-font-lock-keywords-3
"Default highlighting expressions for FST mode")

;; Indentation not being handled properly--needs to be updated
(defun fst-indent-line ()
"Indent current line as FST code"
(interactive)
(beginning-of-line)
; Check for rule 1
(if (bobp)
(indent-line-to 0)
(let ((not-indented t) cur-indent)
; Check for rule 2
(if (looking-at "^[ \t]*END_")
(progn
(save-excursion
(forward-line -1)
(setq cur-indent (- (current-indentation) default-tab-
width)))
(if (< cur-indent 0)
(setq cur-indent 0)))
(save-excursion
(while not-indented
(forward-line -1)
; Check for rule 3
(if (looking-at "^[ \t]*END_")
(progn
(setq cur-indent (current-indentation))
(setq not-indented nil))
; Check for rule 4
(if (looking-at "^[ \t]*\\(PARTICIPANT\\|MODEL\\|
APPLICATION\\|WORKFLOW\\|ACTIVITY\\|DATA\\|TOOL_LIST\\|TRANSITION\\)")
(progn
(setq cur-indent (+ (current-indentation) default-
tab-width))
(setq not-indented nil))
; Check for rule 5
(if (bobp)
(setq not-indented nil)))))))
; If we didn't see an indentation hint, then allow no
indentation
(if cur-indent
(indent-line-to cur-indent)
(indent-line-to 0)))))


;; Change the interpretation of particular chars in Emacs' syntax
table
(defvar fst-mode-syntax-table
(let ( (fst-mode-syntax-table (make-syntax-table) ) )
(modify-syntax-entry ?# "<" fst-mode-syntax-table) ; start
comment
(modify-syntax-entry ?\n ">" fst-mode-syntax-table) ; end
comment
(modify-syntax-entry ?\\ "_" fst-mode-syntax-table) ; don't
escape quote
(modify-syntax-entry ?% "/" fst-mode-syntax-table) ;
functions as escape char
fst-mode-syntax-table )
"Syntax table for fst-mode" )


(defun fst-mode ()
"Major mode for editing fst scripts"
(interactive)
(kill-all-local-variables)
(set-syntax-table fst-mode-syntax-table)
(use-local-map fst-mode-map)
(set (make-local-variable 'font-lock-defaults) '(fst-font-lock-
keywords))
(set (make-local-variable 'indent-line-function) 'fst-indent-line)
(setq major-mode 'fst-mode)
(setq mode-name "FST")
(run-hooks 'fst-mode-hook))


;; Make FST mode available for .infile and .script files
(provide 'fst-mode)
(add-to-list 'auto-mode-alist '("\\.infile\\'" . fst-mode))
(add-to-list 'auto-mode-alist '("\\.script\\'" . fst-mode))

Xah Lee

unread,
Dec 17, 2008, 7:18:13 AM12/17/08
to
On Dec 16, 3:54 pm, stuart <stu...@zapata.org> wrote:
> I have been working on a mode for a program where a hash mark by
> itself is a comment character (#) whereas a hash mark surrounded by
> dots (.#.) is not. Currently, I'm using this:
>
> ;; Change the interpretation of particular chars in Emacs' syntax
> table
> (defvar fst-mode-syntax-table
> (let ( (fst-mode-syntax-table (make-syntax-table) ) )
> (modify-syntax-entry ?# "<" fst-mode-syntax-table) ; start
> comment
> (modify-syntax-entry ?\n ">" fst-mode-syntax-table) ; end
> comment
> (modify-syntax-entry ?\\ "_" fst-mode-syntax-table) ; don't
> escape quote
> (modify-syntax-entry ?% "/" fst-mode-syntax-table) ;
> functions as escape char
> fst-mode-syntax-table )
> "Syntax table for fst-mode" )
>
> But it doesn't do the right thing--i.e., it treats '.#.' as a dot
> followed by a comment. Is there any easy fix here? Thanks in advance.
>
> P.S. Here's the entire mode file:
> ...

it is my guess that you can't use syntax table for it.

my feeling of recent study of emacs syntax table is that, it's rather
a hacked up system to address some simple syntax issues. In
particular, you can see how the code in syntax tables for comments
basically just address 3 classes of comments, e.g. A: “/* ... */”,
“(* ... *)” , B: “// ... \n”, C: “# ...\n”. I can't say conclusively,
but it is my guess you won't be able to use syntax table to do
comments for anything more complex. I think perhaps you'll have to
resort to the syntax coloring system itself for the comment syntax of
your lang's mode.

I think that the emacs's syntax table system, besides addressing
things like forward-word, and simple comment syntax, that's about all
its power and use. For any simplest parsing issues, it is not useful.

The syntax of emacs's syntax table system also seems a simple hack. In
particular, quite cryptic.

One particular question i have is, whether ALL chars in unicode must
have a syntax table entry. (unicode has some at least 4 thousand
chars) It appears to me yes. One curiosity question is where can i
find the lisp or C code that defines the default syntax table where
every mode inherits.

Xah
http://xahlee.org/

Juanma Barranquero

unread,
Dec 17, 2008, 9:06:47 AM12/17/08
to Xah Lee, help-gn...@gnu.org
On Wed, Dec 17, 2008 at 13:18, Xah Lee <xah...@gmail.com> wrote:

> (unicode has some at least 4 thousand chars)

http://www.unicode.org/versions/Unicode5.1.0/

"Unicode 5.1.0 contains over 100,000 characters, [...]"

> One curiosity question is where can i
> find the lisp or C code that defines the default syntax table where
> every mode inherits.

Take a look at Vstandard_syntax_table, in src/syntax.c.

Juanma


rgb

unread,
Dec 24, 2008, 12:42:32 PM12/24/08
to
I've had similar problems with several major modes I've written.

Cobol for example doesn't even have a comment character, anything
after a particular column is a comment
TAL uses ! as both begin and end and eol is also an implicit end....

So ! this is a comment!but this isn't! and this is
but this isn't

Anyway, the only really good way to get useful results is by
specifying
font-lock-syntactic-keywords in your font-lock-defaults statement.

It's not a terribly simple process.
Some years ago, when I was writing all those modes, I was pretty
fluent and could spout off just exactly how to do it. Fortunately I
answered several how-to questions in several Emacs NGs so my notes are
available.

Try this thread. I think it's pretty complete in covering what you
need to know.

http://groups.google.com/group/comp.emacs/browse_thread/thread/c1b7de4489be181

rgb

unread,
Dec 30, 2008, 1:35:23 PM12/30/08
to
On Dec 24, 11:42 am, rgb <rbiel...@i1.net> wrote:
> I've had similar problems with several major modes I've written.
>
> Cobol for example doesn't even have a comment character, anything
> after a particular column is a comment
> TAL uses ! as both begin and end and eol is also an implicit end....
>
> So ! this is a comment!but this isn't! and this is
> but this isn't
>
> Anyway, the only really good way to get useful results is by
> specifying
> font-lock-syntactic-keywords in your font-lock-defaults statement.
>
> It's not a terribly simple process.
> Some years ago, when I was writing all those modes, I was pretty
> fluent and could spout off just exactly how to do it. Fortunately I
> answered several how-to questions in several Emacs NGs so my notes are
> available.
>
> Try this thread.  I think it's pretty complete in covering what you
> need to know.
>
> http://groups.google.com/group/comp.emacs/browse_thread/thread/c1b7de...
>

Oddly enough I had to brush up on this myself just now. CTP3 of
Powershell just came out and I needed to add support for an additional
comment syntax.

# to eol was the original comment syntax.
That was easily supported by the syntax table like this.
(modify-syntax-entry ?# "<" powershell-mode-syntax-table)
(modify-syntax-entry ?\n ">" powershell-mode-syntax-table)

Now <# #> are are multi-line comment delimiters and, while I should be
able to support that via
(modify-syntax-entry ?# ".23" powershell-mode-syntax-table)
(modify-syntax-entry ?> ".4" powershell-mode-syntax-table)
(modify-syntax-entry ?< ".1" powershell-mode-syntax-table)
it doesn't leave me with a mechanism for supporting for the original
syntax because # can only have ".23" or "<" syntax, not both
simultaneously.

As you can see, I wrote a function that returns match-data to find the
comment delimiters.
Then used font-lock-syntactic-keywords to give only those specific
characters comment delimiter syntax rather than all occurances like a
syntax-table does.

(defun powershell-find-syntactic-keywords (limit)
"Finds PowerShell comment begin and comment end characters.
Returns match 1 or match 2 for <# #> comment sequences respectively.
Returns match 3 and match 4 for #/eol comments."
(when (search-forward "#" limit t)
(cond
((looking-back "<#")
(set-match-data (list (match-beginning 0) (1+ (match-beginning
0))
(match-beginning 0) (1+ (match-beginning
0)))))
((looking-at ">")
(set-match-data (list (match-beginning 0) (match-end 0)
nil nil
(match-beginning 0) (match-end 0)))
(forward-char))
(t
(let ((start (point)))
(if (search-forward "\n" limit t)
(set-match-data (list (1- start) (match-end 0)
nil nil nil nil
(1- start) start
(match-beginning 0) (match-end 0)))
(set-match-data (list start (match-end 0)
nil nil nil nil
(1- start) start))))))
t))

(defun powershell-setup-font-lock ()
"Sets up the buffer local value for font-lock-defaults and
optionally
turns on font-lock-mode"
;; I use font-lock-syntactic-keywords to set some properties and I
;; don't want them ignored.
(set (make-local-variable 'parse-sexp-lookup-properties) t)
;; I really can't imagine anyone wanting this off.
(set (make-local-variable 'parse-sexp-ignore-comments) t)
;; This is where all the font-lock stuff actually gets set up. Once
;; font-lock-defaults has it's value, setting font-lock-mode true
should
;; cause all your syntax highlighting dreams to come true.
(setq font-lock-defaults
;; The first value is all the keyword expressions.
'(powershell-font-lock-keywords
;; keywords-only means no strings or comments get fontified
nil
;; case-fold (ignore case)
nil
;; syntax-alist. Nothing I can think of...
nil
;; syntax-begin - no function defined to move outside
syntactic block
nil
;; font-lock-syntactic-keywords
;; takes (matcher (match syntax override lexmatch) ...)...
(font-lock-syntactic-keywords . ((powershell-find-syntactic-
keywords
(1 "<" t t) (2 ">" t t)
(3 "<b" t t) (4 ">b" t
t))))))
)

0 new messages