Caution: rev 5334 contains major changes to Leo's language-description tables

7 views
Skip to first unread message

Edward K. Ream

unread,
May 23, 2012, 10:15:04 AM5/23/12
to leo-editor
Rev 5334 is a first draft of a fix of bug 879338:
Global tables in leoApp.py should describe all languages known to the
colorizer
https://bugs.launchpad.net/leo-editor/+bug/879338

I believe the new code is safe, and all unit tests pass, but this is
one bug for which the phrase "you have to break some eggs to make an
omelet" applies. Please report any problems immediately.

The essence of the bug fix is that Leo's language-description tables
should contain entries for all .py files in the leo/modes folder.
These files control the colorizer. If Leo's colorizer knows about a
language, then Leo should know as much as possible about the language.

In concept, this is a fairly straightforward process, but there were
*many* details to handle.

If you aren't a Leo developer, you might want to stop reading now...

===== Tables

Fixing this bug required non-trivial changes to the following tables::

g.app.language_delims_dict
# Keys are languages, values are tuples of delims.

g.app.language_extension_dict
# Keys are language names, values are extensions.

g.app.self.extension_dict
# Keys are extensions, values are language names.

I used scripts to generate new entries for these tables, but these
scripts could not possibly deal with the all the complications. There
is a unit test that tests the consistency of these tables, and this
test failed a few times. It now passes.

Leo uses these tables as follows:

1. To generate the comment delimiters in sentinels for each language.

Happily, getting the comment delimiters correct was probably the
easiest part, so Leo should continue to write sentinels properly for
previously-know languages. However, I had to take care to preserve
the REM, CWEB, forth and perlpod hacks, so that comment delims would
include the necessary spaces.

2. To associate file extensions with importers.

Knowing about new file extensions doesn't actually allow Leo to import
any new languages. For all languages without an official importer Leo
will simply copy the entire text of the file into a single node, as it
always has.

3. To colorize code.

Leo's colorizer mostly doesn't use these tables: to colorize language
x, the colorizer looks for the file leo/modes/x.py. Thus, these
changes probably do not affect the colorizer at all.

===== Special cases

I did a lot of googling in order to determine the proper file
extensions to use for various language. In the process, I learned
that *almost* all languages described in the leo/modes folder are
real, interesting and useful languages.

However, there at least 5 categories of special cases that affect the
tables:

1. Languages that are really just colorizer modes:

These include embperl, pseudoplain and phpsection. We need leo/modes
files for these, but they aren't real languages and thus they should
not appear in the language-description tables.

2. Things that might be colorized but aren't real languages.

Afaik, the following are not real languages, and Leo would never have
to generate files in these languages::

cvs_commit
dsssl
relax_ng_compact: An xml schema.
rtf
svn_commit

In particular, the rtf colorizer is *not* a colorizer for binary .rtf
file format, is a colorizer for .rtf sources. It probably won't do
too much harm to retain the colorizer data for these languages, but I
wouldn't mind eliminating them either.

3. Unknown languages.

A few languages seem not really to exist:

freemarker
hex
jcl
moin
progress
props
sas

I'll consider retaining the mode files for these languages only if
somebody can explain what these languages are.

4. Languages without real comment delimiters.

Patch annotations are *not* real comment delimiters, so Leo could not
generate patch (.fix or .patch) files from an outline. Happily, there
is no need to do so.

5. Conflicting file extensions.

There are two separate kinds of problems:

A. Leo contains colorizers for several assembly languages. Typically,
assembly languages have .asm or .a file extensions. However, a
particular extension can only be associated with a single language
name. Thus, Leo has no way of knowing what language to associate
with .asm or .a files. So I just punted and didn't make any
association at all.

B. Both the rebol and r languages use the .r file extension. One of
Leo's users previously created an entry for rebol, so that's the
language that takes precedence.

So that's it. If you know more about any of these special cases I'd
like to hear about it.

Edward

Terry Brown

unread,
May 23, 2012, 10:48:22 AM5/23/12
to leo-e...@googlegroups.com
On Wed, 23 May 2012 07:15:04 -0700 (PDT)
"Edward K. Ream" <edre...@gmail.com> wrote:

> A few languages seem not really to exist:
>
> freemarker
> hex
> jcl
> moin

MoinMoin wiki markup - similar to rst - http://moinmo.in/

> progress
> props
> sas

SAS, the statistics software - http://www.sas.com/

Probably both worth keeping.

Cheers -Terry

tfer

unread,
May 23, 2012, 10:53:31 AM5/23/12
to leo-e...@googlegroups.com
Does import consider a "mode line" to choose/override a file extension?  I think both Vim and Emacs follow such a convention.

Edward K. Ream

unread,
May 23, 2012, 12:44:00 PM5/23/12
to leo-editor
On May 23, 9:53 am, tfer <tfethers...@aol.com> wrote:
> Does import consider a "mode line" to choose/override a file extension?  I
> think both Vim and Emacs follow such a convention.

Iirc, @language takes precedence over any file extension in @auto.
Otherwise (import-file), the file extension is the only data the
importer has.

EKR

Edward K. Ream

unread,
May 23, 2012, 12:52:32 PM5/23/12
to leo-editor

On May 23, 9:48 am, Terry Brown <terry_n_br...@yahoo.com> wrote:

> MoinMoin wiki markup - similar to rst -http://moinmo.in/

Thanks. After quite a bit of searching, I found an emacs page that
indicates that the proper extension is .wiki.

Is there a file extension for such markup. I haven't been able to
find it.

> SAS, the statistics software -http://www.sas.com/

I'll associate .sas with the sas language.

Edward

Edward K. Ream

unread,
May 23, 2012, 1:08:32 PM5/23/12
to leo-editor
On May 23, 11:52 am, "Edward K. Ream" <edream...@gmail.com> wrote:

> The proper extension [for moin] is .wiki.
...
> I'll associate .sas with the sas language.

Done at the trunk at rev 5336.

Googling reveals that the proper comment entry for sas is "* /* */".

Edward

Brian Theado

unread,
May 23, 2012, 10:29:57 PM5/23/12
to leo-e...@googlegroups.com
On Wed, May 23, 2012 at 10:15 AM, Edward K. Ream <edre...@gmail.com> wrote:
>    jcl

likely this: http://en.wikipedia.org/wiki/Job_Control_Language

Edward K. Ream

unread,
May 24, 2012, 9:24:10 AM5/24/12
to leo-e...@googlegroups.com
On Wed, May 23, 2012 at 9:29 PM, Brian Theado

> [jcl is] likely this: http://en.wikipedia.org/wiki/Job_Control_Language

Thanks for this. "//" would probably suffice as a comment delim, but
I don't think anyone is likely to use Leo to edit a jcl file. It's
ancient history.

EKR
Reply all
Reply to author
Forward
0 new messages