Effective February 22, 2024, Google Groups will no longer support new Usenet content. Posting and subscribing will be disallowed, and new content from Usenet peers will not appear. Viewing and searching of historical data will still be supported as it is done today.

how do I have a mode where '#' is a comment but '.#.' isn't?

11 views
Skip to first unread message

stuart

unread,
Dec 17, 2008, 1:54:02 AM12/17/08
to
I have been working on a mode for a program where a hash mark by
itself is a comment character (#) whereas a hash mark surrounded by
dots (.#.) is not. Currently, I'm using this:

;; Change the interpretation of particular chars in Emacs' syntax
table
(defvar fst-mode-syntax-table
(let ( (fst-mode-syntax-table (make-syntax-table) ) )
(modify-syntax-entry ?# "<" fst-mode-syntax-table) ; start
comment
(modify-syntax-entry ?\n ">" fst-mode-syntax-table) ; end
comment
(modify-syntax-entry ?\\ "_" fst-mode-syntax-table) ; don't
escape quote
(modify-syntax-entry ?% "/" fst-mode-syntax-table) ;
functions as escape char
fst-mode-syntax-table )
"Syntax table for fst-mode" )

But it doesn't do the right thing--i.e., it treats '.#.' as a dot
followed by a comment. Is there any easy fix here? Thanks in advance.

P.S. Here's the entire mode file:

;;
---------------------------------------------------------------------------
;; Author: Stuart Robinson
;; Date: 10 September 2007
;; Description: This file provides an fst mode for emacs
;; Versions: 0.1 -- initial implementation for fst files with the
suffix
;; .infile
;; 0.2 -- cleaned up regex for built-ins, fixed
highlighting
;; glitches observed by Lauri Karttunen ,
added .script
;; file suffix for mode
;; 0.3 -- fixed built-in highlighting for 'regex' and
'push'
;; 0.4 -- backslashes are handled properly (not as
escape char)
;; 0.5 -- various minor fixes (added .p./.P./.o.)
;; Notes: * original implementation based on code found at
;; http://two-wugs.net/emacs/mode-tutorial.html
;; * the pre-defined font categories for emacs need to
be thought
;; out in terms of fst: for example, commands and
built-in
;; variables alike are treated as builtins, and it's
not clear
;; what a keyword would be; there are also probably
more constants
;; Bugs: * curly braces should be treated as string
delimiteres (like
;; quotes)
;; * pound sign in .#. treated as beginning of comment
;;
---------------------------------------------------------------------------

(defvar fst-mode-hook nil)

(defvar fst-mode-map
(let ((fst-mode-map (make-keymap)))
(define-key fst-mode-map "\C-j" 'newline-and-indent)
fst-mode-map)
"Keymap for fst major mode")

;; built-ins & variables

(defconst fst-font-lock-keywords-1
(list
;; NOTE: The regex is wrapped in \\< and \\> to ensure that it only
matches words
;; (i.e., it doesn't match substrings). It is generated by
running the program
;; keyword-list-to-emacs-regex.rb on the file fst-keywords-no-
spaces.txt.
'("\\<\\(for\\|add properties\\|ambiguity net\\|\\(apply \\
(patterns \\)?\\)?\\(up\\|down\\)\\|cleanup net\\|clear\\( stack\\)?\\|
close sigma\\|collect epsilon-loops\\|compact \\(net
\\|sigma\\)\\|compile-replace \\(upper\\|lower\\)\\|complete net\\|
compose\\( net\\)?\\|compose-apply \\(up\\|down\\)\\|concatenate\\( net
\\)?\\|continue script\\|crossproduct net\\|det
erminize net\\|edit properties\\|eliminate flag\\|epsilon-remove net\\|
extract-compile-replace \\(upper\\|lower\\)\\|factorize \\(up\\|down\\)
\\|ignore net\\|inspect net\\|interrupt scr
ipt\\|intersect\\( net\\)?\\|invert\\( net\\)?\\|label net\\|load\\( \\
(defined\\|stack\\)\\)?\\|lower-side net\\|minimize net\\|minus net\\|
multi char sigma net\\|name net\\|negate\\(
net\\)?\\|one-plus net\\|optimize net\\|paste net labels\\|pop stack\\|
print \\(aliases\\|arc-tally\\|defined\\|directory\\|eqv-labels\\|file-
info\\|flags\\|label-maps\\|label-tally\\|l
abels\\|list\\|lists\\|longest-string\\(-size\\)?\\|lower-words\\|name\
\|net\\|nth-lower\\|nth-upper\\|num-lower\\|num-upper\\|random-lower\\|
random-upper\\|random-words\\|shortest-stri
ng\\(-size\\)?\\|sigma\\(\\(-word\\)?-tally\\)?\\|size\\|stack\\|
storage\\|upper-words\\|words\\)\\|prune net\\|push\\( \\(defined\\|
epsilons\\)\\)?\\|\\(read \\)?regex\\|read \\(lec\\|
lexc\\|prolog\\|properties\\|spaced-text\\|text\\)\\|reduce labelset\\|
reverse\\( net\\)?\\|rotate stack\\|save\\( \\(defined\\|stack\\)\\)?\
\|share arcs\\|shuffle net\\|sigma net\\|sin
gle char sigma net\\|sort net\\|sub-string net\\|substitute \\(defined\
\|label\\|symbol\\)\\|substring net\\|\\(test \\)?equivalent\\|test \\
(lower-bounded\\|lower-universal\\|non-null\
\|null\\|overlap\\|sublanguage\\|unambiguous-down\\|unambiguous-up\\|
upper-bounded\\|upper-universal\\)\\|turn stack\\|twosided flag-
diacritics\\|uncompact net\\|unfactorize down\\|unfa
ctorize up\\|union\\( net\\)?\\|unoptimize\\( net\\)?\\|unreduce
labelset\\|unshare arcs\\|unvectorize net\\|upper-side net\\|vectorize
net\\|virtual \\(compose\\|concatenate\\|copy\\|d
eterminize\\|intersect\\|lower\\|minus\\|negate\\|one-plus\\|option\\|
priority-union\\|union\\|upper\\|zero-plus\\)\\|write \\(definition\\|
dot\\|prolog\\|properties\\|spaced-text\\|tex
t\\)\\|zero-plus net\\|alias\\|apropos\\|assert\\|char-encoding\\|
completion\\|copyright-owner\\|count-patterns\\|define\\|delete-
patterns\\|directory\\|echo\\|extract-patterns\\|fail-s
afe-composition\\|flag-is-epsilon\\|help\\|label-map\\|license-type\\|
list\\|locate-patterns\\|mark-\\(patterns\\|version\\)\\|max-\\
(context-length\\|state-visits\\)\\|minimal\\|name-n
ets\\|need-separators\\|obey-flags\\|print-\\(pairs\\|sigma\\|space\\)\
\|process-in-order\\|quit\\(-on-fail\\)?\\|quote-special\\|random-seed\
\|recode-cp1252\\|recursive-\\(apply\\|defi
ne\\)\\|retokenize\\|seq-\\(final-arcs\\|intern-arcs\\|string-one\\)\\|
set\\|show\\(-flags\\)?\\|sort-arcs\\|source\\|system\\|undefine\\|
unlist\\|use-mmap\\|use-timer\\|vectorize-n\\|v
erbose\\|virtual-to-real\\)\\>" . font-lock-builtin-face)
'("\\('\\w*'\\)" . font-lock-variable-name-face))
"Minimal highlighting expressions for FST mode")

;; keywords and constants

(defconst fst-font-lock-keywords-2
(append fst-font-lock-keywords-1
(list
'("\\(\\$\\|?\\|~\\|@\\||\\|->\\|<-\\|&\\|_\\|*\\|\\
\\\\|0\\|\\.#\\.\\|\\.P\\.\\|\\.p\\.\\|\\.o\\.\\|@->\\|->@\\|@>\\|>@\
\)" . font-lock-keyword-face)
'("\\<\\(ON\\|OFF\\|NONE\\)\\>" . font-lock-
constant-face)))
"Additional Keywords to highlight in FST mode")

;; don't understand why this is necessary...

(defconst fst-font-lock-keywords-3
(append fst-font-lock-keywords-2
(list
'("" . font-lock-constant-face)))
"Balls-out highlighting in FST mode")

(defvar fst-font-lock-keywords fst-font-lock-keywords-3
"Default highlighting expressions for FST mode")

;; Indentation not being handled properly--needs to be updated
(defun fst-indent-line ()
"Indent current line as FST code"
(interactive)
(beginning-of-line)
; Check for rule 1
(if (bobp)
(indent-line-to 0)
(let ((not-indented t) cur-indent)
; Check for rule 2
(if (looking-at "^[ \t]*END_")
(progn
(save-excursion
(forward-line -1)
(setq cur-indent (- (current-indentation) default-tab-
width)))
(if (< cur-indent 0)
(setq cur-indent 0)))
(save-excursion
(while not-indented
(forward-line -1)
; Check for rule 3
(if (looking-at "^[ \t]*END_")
(progn
(setq cur-indent (current-indentation))
(setq not-indented nil))
; Check for rule 4
(if (looking-at "^[ \t]*\\(PARTICIPANT\\|MODEL\\|
APPLICATION\\|WORKFLOW\\|ACTIVITY\\|DATA\\|TOOL_LIST\\|TRANSITION\\)")
(progn
(setq cur-indent (+ (current-indentation) default-
tab-width))
(setq not-indented nil))
; Check for rule 5
(if (bobp)
(setq not-indented nil)))))))
; If we didn't see an indentation hint, then allow no
indentation
(if cur-indent
(indent-line-to cur-indent)
(indent-line-to 0)))))


;; Change the interpretation of particular chars in Emacs' syntax
table
(defvar fst-mode-syntax-table
(let ( (fst-mode-syntax-table (make-syntax-table) ) )
(modify-syntax-entry ?# "<" fst-mode-syntax-table) ; start
comment
(modify-syntax-entry ?\n ">" fst-mode-syntax-table) ; end
comment
(modify-syntax-entry ?\\ "_" fst-mode-syntax-table) ; don't
escape quote
(modify-syntax-entry ?% "/" fst-mode-syntax-table) ;
functions as escape char
fst-mode-syntax-table )
"Syntax table for fst-mode" )


(defun fst-mode ()
"Major mode for editing fst scripts"
(interactive)
(kill-all-local-variables)
(set-syntax-table fst-mode-syntax-table)
(use-local-map fst-mode-map)
(set (make-local-variable 'font-lock-defaults) '(fst-font-lock-
keywords))
(set (make-local-variable 'indent-line-function) 'fst-indent-line)
(setq major-mode 'fst-mode)
(setq mode-name "FST")
(run-hooks 'fst-mode-hook))


;; Make FST mode available for .infile and .script files
(provide 'fst-mode)
(add-to-list 'auto-mode-alist '("\\.infile\\'" . fst-mode))
(add-to-list 'auto-mode-alist '("\\.script\\'" . fst-mode))

Xah Lee

unread,
Dec 17, 2008, 2:18:13 PM12/17/08
to
On Dec 16, 3:54 pm, stuart <stu...@zapata.org> wrote:
> I have been working on a mode for a program where a hash mark by
> itself is a comment character (#) whereas a hash mark surrounded by
> dots (.#.) is not. Currently, I'm using this:
>
> ;; Change the interpretation of particular chars in Emacs' syntax
> table
> (defvar fst-mode-syntax-table
> (let ( (fst-mode-syntax-table (make-syntax-table) ) )
> (modify-syntax-entry ?# "<" fst-mode-syntax-table) ; start
> comment
> (modify-syntax-entry ?\n ">" fst-mode-syntax-table) ; end
> comment
> (modify-syntax-entry ?\\ "_" fst-mode-syntax-table) ; don't
> escape quote
> (modify-syntax-entry ?% "/" fst-mode-syntax-table) ;
> functions as escape char
> fst-mode-syntax-table )
> "Syntax table for fst-mode" )
>
> But it doesn't do the right thing--i.e., it treats '.#.' as a dot
> followed by a comment. Is there any easy fix here? Thanks in advance.
>
> P.S. Here's the entire mode file:
> ...

it is my guess that you can't use syntax table for it.

my feeling of recent study of emacs syntax table is that, it's rather
a hacked up system to address some simple syntax issues. In
particular, you can see how the code in syntax tables for comments
basically just address 3 classes of comments, e.g. A: “/* ... */”,
“(* ... *)” , B: “// ... \n”, C: “# ...\n”. I can't say conclusively,
but it is my guess you won't be able to use syntax table to do
comments for anything more complex. I think perhaps you'll have to
resort to the syntax coloring system itself for the comment syntax of
your lang's mode.

I think that the emacs's syntax table system, besides addressing
things like forward-word, and simple comment syntax, that's about all
its power and use. For any simplest parsing issues, it is not useful.

The syntax of emacs's syntax table system also seems a simple hack. In
particular, quite cryptic.

One particular question i have is, whether ALL chars in unicode must
have a syntax table entry. (unicode has some at least 4 thousand
chars) It appears to me yes. One curiosity question is where can i
find the lisp or C code that defines the default syntax table where
every mode inherits.

Xah
http://xahlee.org/

Juanma Barranquero

unread,
Dec 17, 2008, 4:06:47 PM12/17/08
to Xah Lee, help-gn...@gnu.org
On Wed, Dec 17, 2008 at 13:18, Xah Lee <xah...@gmail.com> wrote:

> (unicode has some at least 4 thousand chars)

http://www.unicode.org/versions/Unicode5.1.0/

"Unicode 5.1.0 contains over 100,000 characters, [...]"

> One curiosity question is where can i
> find the lisp or C code that defines the default syntax table where
> every mode inherits.

Take a look at Vstandard_syntax_table, in src/syntax.c.

Juanma


rgb

unread,
Dec 24, 2008, 7:42:32 PM12/24/08
to
I've had similar problems with several major modes I've written.

Cobol for example doesn't even have a comment character, anything
after a particular column is a comment
TAL uses ! as both begin and end and eol is also an implicit end....

So ! this is a comment!but this isn't! and this is
but this isn't

Anyway, the only really good way to get useful results is by
specifying
font-lock-syntactic-keywords in your font-lock-defaults statement.

It's not a terribly simple process.
Some years ago, when I was writing all those modes, I was pretty
fluent and could spout off just exactly how to do it. Fortunately I
answered several how-to questions in several Emacs NGs so my notes are
available.

Try this thread. I think it's pretty complete in covering what you
need to know.

http://groups.google.com/group/comp.emacs/browse_thread/thread/c1b7de4489be181

rgb

unread,
Dec 30, 2008, 8:35:23 PM12/30/08
to
On Dec 24, 11:42 am, rgb <rbiel...@i1.net> wrote:
> I've had similar problems with several major modes I've written.
>
> Cobol for example doesn't even have a comment character, anything
> after a particular column is a comment
> TAL uses ! as both begin and end and eol is also an implicit end....
>
> So ! this is a comment!but this isn't! and this is
> but this isn't
>
> Anyway, the only really good way to get useful results is by
> specifying
> font-lock-syntactic-keywords in your font-lock-defaults statement.
>
> It's not a terribly simple process.
> Some years ago, when I was writing all those modes, I was pretty
> fluent and could spout off just exactly how to do it. Fortunately I
> answered several how-to questions in several Emacs NGs so my notes are
> available.
>
> Try this thread.  I think it's pretty complete in covering what you
> need to know.
>
> http://groups.google.com/group/comp.emacs/browse_thread/thread/c1b7de...
>

Oddly enough I had to brush up on this myself just now. CTP3 of
Powershell just came out and I needed to add support for an additional
comment syntax.

# to eol was the original comment syntax.
That was easily supported by the syntax table like this.
(modify-syntax-entry ?# "<" powershell-mode-syntax-table)
(modify-syntax-entry ?\n ">" powershell-mode-syntax-table)

Now <# #> are are multi-line comment delimiters and, while I should be
able to support that via
(modify-syntax-entry ?# ".23" powershell-mode-syntax-table)
(modify-syntax-entry ?> ".4" powershell-mode-syntax-table)
(modify-syntax-entry ?< ".1" powershell-mode-syntax-table)
it doesn't leave me with a mechanism for supporting for the original
syntax because # can only have ".23" or "<" syntax, not both
simultaneously.

As you can see, I wrote a function that returns match-data to find the
comment delimiters.
Then used font-lock-syntactic-keywords to give only those specific
characters comment delimiter syntax rather than all occurances like a
syntax-table does.

(defun powershell-find-syntactic-keywords (limit)
"Finds PowerShell comment begin and comment end characters.
Returns match 1 or match 2 for <# #> comment sequences respectively.
Returns match 3 and match 4 for #/eol comments."
(when (search-forward "#" limit t)
(cond
((looking-back "<#")
(set-match-data (list (match-beginning 0) (1+ (match-beginning
0))
(match-beginning 0) (1+ (match-beginning
0)))))
((looking-at ">")
(set-match-data (list (match-beginning 0) (match-end 0)
nil nil
(match-beginning 0) (match-end 0)))
(forward-char))
(t
(let ((start (point)))
(if (search-forward "\n" limit t)
(set-match-data (list (1- start) (match-end 0)
nil nil nil nil
(1- start) start
(match-beginning 0) (match-end 0)))
(set-match-data (list start (match-end 0)
nil nil nil nil
(1- start) start))))))
t))

(defun powershell-setup-font-lock ()
"Sets up the buffer local value for font-lock-defaults and
optionally
turns on font-lock-mode"
;; I use font-lock-syntactic-keywords to set some properties and I
;; don't want them ignored.
(set (make-local-variable 'parse-sexp-lookup-properties) t)
;; I really can't imagine anyone wanting this off.
(set (make-local-variable 'parse-sexp-ignore-comments) t)
;; This is where all the font-lock stuff actually gets set up. Once
;; font-lock-defaults has it's value, setting font-lock-mode true
should
;; cause all your syntax highlighting dreams to come true.
(setq font-lock-defaults
;; The first value is all the keyword expressions.
'(powershell-font-lock-keywords
;; keywords-only means no strings or comments get fontified
nil
;; case-fold (ignore case)
nil
;; syntax-alist. Nothing I can think of...
nil
;; syntax-begin - no function defined to move outside
syntactic block
nil
;; font-lock-syntactic-keywords
;; takes (matcher (match syntax override lexmatch) ...)...
(font-lock-syntactic-keywords . ((powershell-find-syntactic-
keywords
(1 "<" t t) (2 ">" t t)
(3 "<b" t t) (4 ">b" t
t))))))
)

Reply all
Reply to author
Forward
0 new messages