Unicode TLA+

221 views
Skip to first unread message

Andrew Helwer

unread,
Sep 9, 2021, 4:06:41 PM9/9/21
to tlaplus
I'd like to get peoples' thoughts on this. Currently, the TLA+ tools only accept ASCII TLA+. ASCII TLA+ is also what I spend my time looking at while I'm writing TLA+ specs. However, it would be neat to have the ASCII symbols translated into their Unicode math equivalents in real time as I am typing, similar to Mathematica (if you've ever used that). I hope to get a demo of this up and running for my talk at the TLA+ conference at the end of September. To that end, I've added support for unicode symbols to the TLA+ tree-sitter parser and am writing a simple utility to translate TLA+ files from ASCII to Unicode and back as a prelude to getting the real-time translator working (probably as a Neovim treesitter module). I should mention Unicode TLA+ is an old idea, dating back at least five years to Ron Pressler's work trying to get SANY to support Unicode, and maybe as early as Specifying Systems where Leslie speculates that some non-ASCII TLA+ version might come along in the future. I've fleshed out a TLA+ Unicode proposal (with discussion of design decisions and drawbacks) in this document, and you can see the list of proposed symbol translations (nicely formatted by github!) in the .csv in that directory.

Some questions to provoke discussion:
  • Do you believe you would personally find it useful/easier to see TLA+ specs in their Unicode form as you are writing them?
  • If you are a non-English user, would you find it useful to be able to use non-ASCII characters in identifiers (operator/module/function names, etc.) or strings?
  • Do you agree with the Unicode symbol choices or have any suggestions?
Andrew

Alex Weisberger

unread,
Sep 10, 2021, 9:49:19 PM9/10/21
to tla...@googlegroups.com
I would personally find that useful. I find the mathematical symbols to be more terse but also readable. And once you learn the symbols of set theory, you see that they are ubiquitous in all mathematical literature, and become fluent in them. The ASCII approximations are optimized for typing, not reading.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/80539612-bc6e-4d23-95ff-dc7e1c6af50fn%40googlegroups.com.

pratik

unread,
Sep 12, 2021, 6:11:26 AM9/12/21
to tlaplus


On Thursday, September 9, 2021 at 10:06:41 PM UTC+2 andrew...@gmail.com wrote:
I'd like to get peoples' thoughts on this. Currently, the TLA+ tools only accept ASCII TLA+. ASCII TLA+ is also what I spend my time looking at while I'm writing TLA+ specs. However, it would be neat to have the ASCII symbols translated into their Unicode math equivalents in real time as I am typing, similar to Mathematica (if you've ever used that). I hope to get a demo of this up and running for my talk at the TLA+ conference at the end of September. To that end, I've added support for unicode symbols to the TLA+ tree-sitter parser and am writing a simhttps://github.com/tonsky/FiraCodeple utility to translate TLA+ files from ASCII to Unicode and back as a prelude to getting the real-time translator working (probably as a Neovim treesitter module). I should mention Unicode TLA+ is an old idea, dating back at least five years to Ron Pressler's work trying to get SANY to support Unicode, and maybe as early as Specifying Systems where Leslie speculates that some non-ASCII TLA+ version might come along in the future. I've fleshed out a TLA+ Unicode proposal (with discussion of design decisions and drawbacks) in this https://github.com/tonsky/FiraCodedocument, and you can see the list of proposed symbol translations (nicely formatted by github!) in the .csv in that directory.

Some questions to provoke discussion:
  • Do you believe you would personally find it useful/easier to see TLA+ specs in their Unicode form as you are writing them?
Yes. It could be very useful indeed.
I currently use https://github.com/tonsky/FiraCode and that works pretty well.
Maybe FiraCode can be hacked to handle \in, \E, \A etc. It handles arrows, /\, \/, ==, <> etc.
Example,
 Screenshot from 2021-09-12 12-00-11.png
  • If you are a non-English user, would you find it useful to be able to use non-ASCII characters in identifiers (operator/module/function names, etc.) or strings?
Not really. I tried it in golang and things get ugly. See also Turing's paper https://londmathsoc.onlinelibrary.wiley.com/doi/pdf/10.1112/plms/s2-42.1.230.
It has symbols that I can't name so things get hard to understand.

Alexander Niederbühl

unread,
Sep 14, 2021, 2:49:23 PM9/14/21
to tlaplus
I think this would be very useful, I find the ASCII representation a bit verbose and distracting. I know of several people who feel the same, e.g. https://matklad.github.io/2020/11/01/notes-on-paxos.html as another data point. Currently I'm using a VSCode extension together with FiraCode as workaround.

A bit unrelated but since in TLA+ there are many ways to write the same thing like \cup, \union or /= , # and different ways to format the code in terms of line breaks etc. I wish there would be a tool to get a canonically formatted Unicode representation which ideally TLC would be able to check. Do you think the treesitter grammar could be utilized to build a code formatter similar to prettier in JavaScript or black in Python?
 

Andrew Helwer

unread,
Sep 14, 2021, 3:46:35 PM9/14/21
to tlaplus
I think it could indeed be used to create a formatter. Say you wanted to write a rule where:

op == /\ A
      /\ B

should be formatted as:

op ==
  /\ A
  /\ B

this would be quite easy; you would write a query for all operator definitions with conjunction lists as a top-level expression, so:

(operator_definition (def_eq) @eq definition: (conj_list) @list)

where the @eq capture gets the == node, and the @list capture gets the conjunction list node. Then you check whether the two nodes are on the same line (easy, info readily available in node object) and if so you insert a newline then dedent all lines included within the conjunction list node. This should preserve vertical alignment for any nested conjunction/disjunction lists.

Andrew

je...@emptysquare.net

unread,
Sep 21, 2021, 2:35:56 PM9/21/21
to tlaplus
I think the ASCII/math-symbols dichotomy is a profound barrier to entry for TLA+. Examples we see in PDFs and books can't be pasted into our own specs, and for a beginner, it's not easy to translate them manually. Since most of us use TLA+ infrequently, we forget the equivalences and have to relearn them. If you see a symbol and don't know its ASCII, you can't always look it up: all the TLA+ ASCII tables I've found are incomplete. I've compiled my own list of symbols and trained myself with flash cards. No other tool I've used as a programmer has this kind of initial hurdle. Anything we do to make the two formats interoperable will be a big improvement. If it's done well and integrated into the Toolbox and/or VS Code it could widen the top of the adoption funnel dramatically. I bet that experienced TLA+ users would be surprised how big an impact this could make for widespread adoption.

So I have a strong opinion about the first question. Regarding the other two I have no opinion at all.

Reuben Cornel

unread,
Sep 21, 2021, 11:09:13 PM9/21/21
to tla...@googlegroups.com
I have found this cheat sheet (https://lamport.azurewebsites.net/tla/summary-standalone.pdf) very helpful when writing specs. It has a page that lists all the symbols I've needed.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.

Christian Barthel

unread,
Sep 23, 2021, 1:22:12 AM9/23/21
to tla...@googlegroups.com
On Thu, Sep 09 2021, Andrew Helwer wrote:
[..]
> Some questions to provoke discussion:
>
> - Do you believe you would personally find it useful/easier
> to see TLA+ specs in their Unicode form as you are writing
> them?

Having an option is certainly not bad (in my opinion). I am
writing my TLA+ specs with GNU/Emacs and tried to convert the
ASCII symbols to Unicode:

<https://git.sdf.org/bch/tlamode/raw/branch/master/examples/emacs-tlaplus.png>

But most of the time, I still stick to the ASCII representation
(and to view the pretty printed version, I do ‘C-c C-t e’ and
xdvi(1) automatically updates the view on that file).

--
Christian Barthel

Andrew Helwer

unread,
Sep 23, 2021, 2:25:34 PM9/23/21
to tlaplus
It should be noted that unicode symbol quality varies pretty widely by font. You also will want a font that renders as many unicode symbols in monospace width as possible.

Andrew

Andrew Helwer

unread,
Oct 6, 2021, 3:10:41 PM10/6/21
to tlaplus
I've opened a RFC for TLA+ Unicode support here: https://github.com/tlaplus/rfcs/issues/5

Andrew
Reply all
Reply to author
Forward
0 new messages