Multibyte indentation patch

33 views
Skip to first unread message

Sung Pae

unread,
Jul 31, 2012, 7:32:41 PM7/31/12
to vimcl...@googlegroups.com
Hello,

Clojure indentation does not work properly with functions that contain
multibyte characters:

(배성인 1
(αβγ 1 ; Off by 1
(★★★ 1 ; Off by 2
2))) ; Off by 3

This is because the col() function returns byte offsets instead of
character offsets. The virtcol() function returns the apparent column
position (even accounting for double-wide characters), so it is more
appropriate for indent calculation.

I would have submitted this as a bitbucket issue, but it looks like the
project is undergoing a lot of change for the nrepl feature.

BTW Meikel, the last patch I ever sent to you was generated against my
personal fork, and not against the master branch. I'll take more care in
the future not to let that happen.

Sung Pae


diff --git a/vim/indent/clojure.vim b/vim/indent/clojure.vim
index 1bde012..9a3ed85 100644
--- a/vim/indent/clojure.vim
+++ b/vim/indent/clojure.vim
@@ -229,7 +229,7 @@ function! GetClojureIndent()
endif

normal! ge
- return col(".") + 1
+ return virtcol(".") + 1
endfunction

setlocal indentexpr=GetClojureIndent()


Meikel Brandmeyer

unread,
Jul 31, 2012, 7:46:55 PM7/31/12
to vimcl...@googlegroups.com
Hi,

thanks for the patch. There probably lurk more multibyte issues. Let me know when you find something. Korean is probably a good candidate. :)

I'm trying to get also some other fixes out. And I hope to get the nrepl changes done, soon.

Rather philosophical: multibyte character are not valid in Clojure symbols. Your code is out of Clojure's spec.

Kind regards
Meikel

signature.asc

Sung Pae

unread,
Jul 31, 2012, 9:42:44 PM7/31/12
to vimcl...@googlegroups.com
Meikel Brandmeyer <m...@kotka.de> writes:

> Rather philosophical: multibyte character are not valid in Clojure
> symbols. Your code is out of Clojure's spec.

Hmm. I assumed that Unicode was fair game for symbols since they are
explicitly allowed in Java and JavaScript identifiers. [1]

Looking into this, there is this statement from http://clojure.org/reader

Symbols begin with a non-numeric character and can contain
alphanumeric characters and *, +, !, -, _, and ?

Alphanumeric does seem to imply [A-Za-z0-9], but increasingly it means
POSIX [:alnum:] when operating within a Unicode locale.

Reading LispReader.java, symbolPat is defined as the regex:

"[:]?([\D&&[^/]].*/)?([\D&&[^/]][^/]*)"

which is extremely liberal. It does not enforce any limitations beyond
the namespacing feature!

In fact, the definition at clojure.org/reader doesn't even cover method
names like -> and +' which are found in clojure.core. There is however,
this bit of ambiguity:

(other characters will be allowed eventually, but not all macro
characters have been determined)

This is vaguely threatening to anybody who wants to use creative symbol
names, but I think the chances of a Unicode macro character are pretty
slim.

So yes, I think you are correct that Unicode symbols are officially
unsupported, but I take issue with the notion that Clojure has any
semblance of a spec. [2] :)

Cheers,
guns

[1]: Not that host syntax limitations impact Clojure in any way, but
just to match expectations.

http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.8
http://es5.github.com/x7.html#x7.6

[2]: Unless there is one, of course.

Meikel Brandmeyer

unread,
Aug 1, 2012, 3:00:16 AM8/1/12
to vimcl...@googlegroups.com
Hi,


Am Mittwoch, 1. August 2012 03:42:44 UTC+2 schrieb guns:
So yes, I think you are correct that Unicode symbols are officially
unsupported, but I take issue with the notion that Clojure has any
semblance of a spec. [2] :)

It is quite simple: clojure.org is the spec. When there is written that ' is not allowed in symbols then ' is not allowed in symbols. Clojure itself can do what it likes, though. That doesn't say anything about what you (as in "the user") is allowed to do.[1] Then there is Rich, who is the one and only authoritative source out there. He once explicitly said that alnum is ASCII, ie. [a-zA-z0-9]. (Unfortunately the search function of groups sucks so hard, that I don't find the message anymore. Quite surprising for a google product. I'm glad I found [1].) So people can state "but the implementation does" or "but there is no spec :P". All of that breaks down immediately when Rich speaks up. Clojure is not a democracy.

But as I said: philosophical. :)

Meikel

[1]: https://groups.google.com/forum/?fromgroups#!msg/clojure/Q7PYeSIPTk4/6EBBYQ8ht2YJ

Sung Pae

unread,
Aug 1, 2012, 4:25:27 AM8/1/12
to vimcl...@googlegroups.com
Meikel Brandmeyer <m...@kotka.de> writes:

> All of that breaks down immediately when Rich speaks up. Clojure is
> not a democracy.

I can live with this. I wouldn't use Unicode characters when working on
a team anyhow, since people don't seem to know how to input multibyte
characters in their editors comfortably (vim digraphs + readline macros)

Rich Hickey writes:

> Your grammar will have to be able to parse it, but that is not the
> same thing as me documenting it as ok for user code.

A directive for Clojure tools?

Sounds like the HTML motto [1]. Interestingly, the use of < > ' and
Unicode characters seem to be widespread, which presents a similar (but
incredibly less severe) problem for Clojure aware tools that browsers
have with crappy HTML.

As for myself, I think I will continue to live in sin and suffer the
consequences later, if any.

Thanks again for the hard work on VimClojure.

guns

[1]: "Be liberal in what you accept, and conservative in what you send."

Sung Pae

unread,
Aug 8, 2012, 8:27:28 PM8/8/12
to vimcl...@googlegroups.com
On 31 Jul 2012, at 6:46 PM, Meikel Brandmeyer wrote:

> There probably lurk more multibyte issues. Let me know when you find
> something. Korean is probably a good candidate.

There is another corner case:

(let [숫자 ["하나"
"둘" # Indent off by 2
"셋"]
ᄉ (reduce
#(+ %1 (count (.getBytes %2))) ; Off by 1
0 숫자)]
...)

The problem here is that searchpairpos() returns the column number as a
byte offset, instead of the virtual column count.

Unfortunately, vim does not offer an option to make searchpairpos() (or
any of the other functions that return columns) return virtcols, so it
must be explicitly converted. The overhead of virtcol() is negligible
(~ 0.04 ms), so at least performance is not an issue.

I am sorry this is coming after the 2.3.4 release. I held off on sending
this in while I searched vim for a magic make-everything-return-virtcols
toggle. It does not seem to exist.

Cheers,
Sung Pae

---
vim/indent/clojure.vim | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/vim/indent/clojure.vim b/vim/indent/clojure.vim
index cdf3e18..5fae206 100644
--- a/vim/indent/clojure.vim
+++ b/vim/indent/clojure.vim
@@ -32,9 +32,10 @@ function! s:MatchPairs(open, close, stopat)
let stopat = a:stopat
endif

- return searchpairpos(a:open, '', a:close, 'bWn',
+ let pos = searchpairpos(a:open, '', a:close, 'bWn',
\ 'vimclojure#util#SynIdName() !~ "clojureParen\\d"',
\ stopat)
+ return [pos[0], virtcol(pos)]
endfunction

function! ClojureCheckForStringWorker() dict
--
1.7.10.2.520.g6a4a482

Meikel Brandmeyer

unread,
Aug 9, 2012, 1:51:04 AM8/9/12
to vimcl...@googlegroups.com
Hi,


Am Donnerstag, 9. August 2012 02:27:28 UTC+2 schrieb guns:
The problem here is that searchpairpos() returns the column number as a
byte offset, instead of the virtual column count.

Unfortunately, vim does not offer an option to make searchpairpos() (or
any of the other functions that return columns) return virtcols, so it
must be explicitly converted. The overhead of virtcol() is negligible
(~ 0.04 ms), so at least performance is not an issue.

I am sorry this is coming after the 2.3.4 release. I held off on sending
this in while I searched vim for a magic make-everything-return-virtcols
toggle. It does not seem to exist.

This will go into the next release. Don't get me wrong, but this is for most users probably not an issue. So the priority is not that high. Sorry.

I wanted to get the changes to the result buffer handling out, since I basically killed Oskar's workflow and he had to wait very long for a fix.

But keep the patches coming. :D

Meikel

 

Sung Pae

unread,
Aug 9, 2012, 2:20:40 AM8/9/12
to vimcl...@googlegroups.com
Meikel Brandmeyer <m...@kotka.de> writes:

> This will go into the next release. Don't get me wrong, but this is
> for most users probably not an issue. So the priority is not that
> high. Sorry.

Ha! I am very used to being the only enthusiastic user of a feature.
Rxvt-unicode keeps an old crusty feature around just for me. :)

And like we discussed, this is an enhancement, not a bugfix.

> But keep the patches coming. :D

Thanks Meikel.

guns
Reply all
Reply to author
Forward
0 new messages