I'm sick of hitting Emacs (+ ilisp) broken string parsing that mucks up text entry, subsequent syntax highlighting and indentation, e.g.:
" (defun Why am I typing indented? Because Emacs thinks this is code!"
It appears that the regexp parser interprets a ( or [ at the start of a line as lisp code even though I am clearly in the midst of a string. Cf with:
" (defun Emacs no longer thinks the string is Lisp code because I put a space before the opening bracket"
I last hit this issue quoting the LGPL (extract):
"Copyright (C) 1991, 1999 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
[This is the first released version of the Lesser GPL. It also counts as the successor of the GNU Library Public License, version 2, hence the version number 2.1.]
Preamble
The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users."
The problem in the above string is the line starting with [. Because of this character placement the paragraph starting with "The licenses" is not highlighted as a string. If such strings are in the middle of code one finds that subsequent indentation is broken, e.g.
(list " [string " why-is-there-no-indent?)
A: Because Emacs thinks the second " starts instead of concludes a string.
Has anyone fixed Emacs string parsing? Or knows of an editor that can correctly parse Lisp strings, syntax highlight and reindent code?
I am not looking for any way to trick Emacs into interpreting a string as a string [e.g. by inserting a backslash before an opening bracket or by using syntax such as #.(format nil "~%[string~%")]. I simply want to be able to use an editor that understands basic Lisp syntax.
* "Adam Warner" <use...@consulting.net.nz> | Thanks for the tip. BTW you are the first person to mention this setting | in any newsgroup according to Google Groups!
Do you depend on others to read the documentation (and maybe the source code) and post news articles about it? Is the way to make people read the Emacs documentation to post it all regularly to a lot of newsgroups and on lots and lots of web sites so google can find it and people can search the Net instead of their own computer?
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
"Adam Warner" <use...@consulting.net.nz> writes: > I simply want to be able to use an editor that understands basic > Lisp syntax.
Which is exactly what Emacs doesn't do. You could write a new Lisp-mode based on parsing the content of the buffer, or you could switch to Hemlock, which doesn't use braindamaged regexps for everything. Of course, you'll be tied into a particular platform then (cmucl), but I think that's the general tradeoff -- live with Emacs, or use each vendor's Emacs-alike.
-- /|_ .-----------------------. ,' .\ / | No to Imperialist war | ,--' _,' | Wage class war! | / / `-----------------------' ( -. | | ) | (`-. '--.) `. )----'
The problem is that Emacs' indenter does not parse the entire file but instead uses heuristics based on the immediate context and that is easily fooled as you have experienced.
However the alternative is not better, IMHO, as I have tried that in modes for other programming langauges. The wait incurred by the reparsing of large portions of a large file for innocent changes quickly becomes intolerable. Nor would I ever want to use a syntax directed editor (or mode) in which you can only edit in terms of legitimate syntactic contructs (a simple way to keep a parse of the syntax up to date).
I have not found adding a backslash in front of parenthesis contructs to be much of a hassle. These kinds of problems only seem to occur in documentation strings.
------------------------+-------------------------------------------------- --- Christian Lynbech | Ericsson Telebit, Skanderborgvej 232, DK-8260 Viby J Phone: +45 8938 5244 | email: christian.lynb...@ted.ericsson.se Fax: +45 8938 5101 | web: www.ericsson.com ------------------------+-------------------------------------------------- --- Hit the philistines three times over the head with the Elisp reference manual. - peto...@hal.com (Michael A. Petonic)
> The problem is that Emacs' indenter does not parse the entire file but > instead uses heuristics based on the immediate context and that is > easily fooled as you have experienced.
The only response so far is in the vein of just rewrite the legal syntax.
> However the alternative is not better, IMHO, as I have tried that in > modes for other programming langauges. The wait incurred by the > reparsing of large portions of a large file for innocent changes quickly > becomes intolerable.
I would suggest the implementation is poor. With the speed of computers today and the right algorithms even a larger body of code should be able to be understood in close to real time. Especially Lisp code. Understanding what character exits a string (with an escape character proviso) is not difficult: If the start of a string is properly detected then there is simply no excuse for not detecting the end of the string with less than 100% accuracy. Strings are the easiest objects to parse in Lisp. If an editor cannot parse them it is either a ten minute bug fix or an ongoing indication of fundamental design problems.
The main point of highlighting/parsing of Lisp code in an editor is to give visual clues and allow automatic indentation of code. If the parser continually gives false visual clues and can't even understand elementary default syntax it is broken and sometimes more trouble than its worth.
I suspect there are real-time, accurate syntax highlighting XML editors available. If it could be done with XML it could be done with Lisp.
> Nor would I ever want to use a syntax directed editor (or mode) in which > you can only edit in terms of legitimate syntactic contructs (a simple > way to keep a parse of the syntax up to date).
Personally I'd love the opportunity to use an accurate parser written in Lisp that also provides a good way to extend the syntax recognition. After Thomas' comment I'll have another look at Hemlock. It sounds like a decade(s) old editor can better parse Lisp than Emacs 21.
> I have not found adding a backslash in front of parenthesis contructs to > be much of a hassle. These kinds of problems only seem to occur in > documentation strings.
In what I'm doing it's a hassle. But even if it wasn't I despair at the general attitude that broken design is good enough. We don't think its OK if 5% of the time a Lisp interpreter or complier doesn't detect the end of a string. Yet its OK if the editor can't. And we tell others to rewrite their code just to work around the editor. And we wax lyrically about how this fuzziness is actually a good thing because one can extend the language's syntax and some things remain as broken as they already were.
I'm writing documents as Lisp code. When I paste in a huge chuck of text I expect that the code should continue to be properly highlighted if I have not made a mistake in the syntax. Yet I can't rely upon this.
Furthermore I'm also using my triple-double-quote macro that has no backslash escaping. It's great to use for verbatim pasting in of any text (even when it includes backslashes and double quotes). While I don't expect Emacs to always get the parsing right without extending the syntax recognition (e.g. unbalanced double quotes will leave Emacs in the wrong state) I shouldn't have to insert backslashes in what was default legal string syntax. And in fact I can't because they then appear in the output.
If Emacs' Lisp parsing was already on a solid foundation I could feel confident that it would be a small step to add accurate triple-double-quote string parsing.
* Christian Lynbech wrote: > However the alternative is not better, IMHO, as I have tried that in > modes for other programming langauges. The wait incurred by the > reparsing of large portions of a large file for innocent changes quickly > becomes intolerable. Nor would I ever want to use a syntax directed > editor (or mode) in which you can only edit in terms of legitimate > syntactic contructs (a simple way to keep a parse of the syntax up to > date).
This is only because these modes are badly written. There is no need to reparse the whole buffer, or even large portions of it in almost all cases - you just need to keep tabs on where the changes occur and what the parsing state is there. PSGML, which has a horrible parsing job to do and does a reasonably good job of it is perfectly fine for even quite large SGML files.
Emacs's `paren in first column is beginning of defun' trick might have been a reasonably compromise on a vax, but it's just annoyingly stupid now.
> Emacs's `paren in first column is beginning of defun' trick might have > been a reasonably compromise on a vax, but it's just annoyingly stupid > now.
I've spent a few hours coming to understand how the font-lock mode etc. works but didn't figure out how to improve its reliability. I was searching in vain trying to find how square brackets were being matched. I finally realised that the regular expression \s( also matches square brackets in lisp mode.
The .el sources are very tidy and wonderfully commented. I can't be as enthusiastic for some of the design decisions.
At this stage to work around the string parsing problem I've created a function to escape any region and make it font-lock friendly:
(defun escape-and-font-lock-friendly-region (start end) "Escapes any backslashes and double quotes within the region and inserts semantically neutral backslashes at the start of any line that includes a bracket so Emacs' font-lock doesn't get confused." (interactive "r") (save-excursion (goto-char start) (while (search-forward "\\" end t) (replace-match "\\\\" t t))
(goto-char start) (while (search-forward "\"" end t) (replace-match "\\\"" t t))
;;"In Emacs Lisp, the delimiters for lists and vectors (`()' and `[]') ;; are classified as parenthesis characters." (goto-char start) (while (re-search-forward "^\\s(" end t) (backward-char) (insert "\\") (forward-line))))
> (defun escape-and-font-lock-friendly-region (start end) > "Escapes any backslashes and double quotes within the region and inserts > semantically neutral backslashes at the start of any line that includes > a bracket so Emacs' font-lock doesn't get confused." > (interactive "r") > (save-excursion > (goto-char start) > (while (search-forward "\\" end t) > (replace-match "\\\\" t t))
> (goto-char start) > (while (search-forward "\"" end t) > (replace-match "\\\"" t t))
> ;;"In Emacs Lisp, the delimiters for lists and vectors (`()' and `[]') > ;; are classified as parenthesis characters." > (goto-char start) > (while (re-search-forward "^\\s(" end t) > (backward-char) > (insert "\\") > (forward-line))))
Just noticed a bug. Every time a backslash is inserted the end of the region grows by one. Hopefully this is (not efficient) but correct:
(defun escape-and-font-lock-friendly-region (start end) "Escapes any backslashes and double quotes within the region and inserts semantically neutral backslashes at the start of any line that includes a bracket so Emacs' font-lock doesn't get confused." (interactive "r") (save-excursion (goto-char start) (while (search-forward "\\" end t) (replace-match "\\\\" t t) (incf end))
(goto-char start) (while (search-forward "\"" end t) (replace-match "\\\"" t t) (incf end))
;;"In Emacs Lisp, the delimiters for lists and vectors (`()' and `[]') ;; are classified as parenthesis characters." (goto-char start) (while (re-search-forward "^\\s(" end t) (backward-char) (insert "\\") (incf end) (forward-line))))
"Adam Warner" <use...@consulting.net.nz> writes: > Hi Tim Bradshaw,
> > Emacs's `paren in first column is beginning of defun' trick might have > > been a reasonably compromise on a vax, but it's just annoyingly stupid > > now.
> I've spent a few hours coming to understand how the font-lock mode etc. > works but didn't figure out how to improve its reliability. I was > searching in vain trying to find how square brackets were being matched. I > finally realised that the regular expression \s( also matches square > brackets in lisp mode.
I think \s( matyches any character with a "(" syntax class (open of some sort, a two-position slot in the buffer syntax table, with the second being the close character).
//Ingvar -- (defmacro fakelambda (args &body body) `(labels ((me ,args ,@body)) #'me)) (funcall (fakelambda (a b) (if (zerop (length a)) b (format nil "~a~a" (aref a 0) (me b (subseq a 1))))) "Js nte iphce" "utaohrls akr")
"Adam Warner" <use...@consulting.net.nz> writes: > I've spent a few hours coming to understand how the font-lock mode etc. > works but didn't figure out how to improve its reliability. I was > searching in vain trying to find how square brackets were being matched. I > finally realised that the regular expression \s( also matches square > brackets in lisp mode.
> The .el sources are very tidy and wonderfully commented. I can't be as > enthusiastic for some of the design decisions.
You can always override things so that you have complete control. The Delphi mode, for example (e.g. delphi-mode.el, which I wrote), for example, does not use regular expressions at all for font-lock coloring. Instead one "tokenizes" and explicitly skips matching groups as needed.
The font lock set up looks like:
(defconst delphi-font-lock-defaults '(nil ; We have our own fontify routine, so keywords don't apply. t ; Syntactic fontification doesn't apply. nil ; Don't care about case since we don't use regexps to find tokens. nil ; Syntax alists don't apply. nil ; Syntax begin movement doesn't apply (font-lock-fontify-region-function . delphi-fontify-region) (font-lock-verbose . delphi-fontifying-progress-step)) "Delphi mode font-lock defaults. Syntactic fontification is ignored.")
I found I didn't like Emacs' notion of syntax specification: too C-centric, too cryptic, too hacky, too fragile. So I stepped around it and did it the way I thought it should be done.
-- Cheers, The Rhythm is around me, The Rhythm has control. Ray Blaak The Rhythm is inside me, bl...@telus.net The Rhythm has my soul.
On Sun, 12 Jan 2003 15:28:22 +0000, Erik Naggum wrote: > * "Adam Warner" <use...@consulting.net.nz> > | Thanks for the tip. BTW you are the first person to mention this setting in > | any newsgroup according to Google Groups!
> Do you depend on others to read the documentation (and maybe the source code) > and post news articles about it? Is the way to make people read the Emacs > documentation to post it all regularly to a lot of newsgroups and on lots and > lots of web sites so google can find it and people can search the Net instead > of their own computer?
You sound like an asshole Erik. Having a bad day? Gerd was just asking a question, he took the initiative to try to find a solution by searching the usenet articles. That is why they are archived, and sometimes it is easier to find exactly what you are looking for in Google than it is in the manual. Open source is about sharing code, sharing information and sharing experiences so that we can all benefit, and there is nothing wrong with participating in the dialogue and reading the archives. If we all kept our knowledge to ourselves we would stay stuck in the dark ages with you. And thanks to all who have helped me, either directly or through archived information and I am happy to share what I know if it helps others go up the learning curve a little faster so they can produce more.