Discuss: What characters should be valid in @path directives?

55 views
Skip to first unread message

Edward K. Ream

unread,
Apr 24, 2025, 9:04:21 AM4/24/25
to leo-editor

Issue #4339 raises a subtle question. What characters should be valid in Leo's @path directives?


The existing code defines the valid characters with this regex:


    r"^@path\s+([\w_:/\\]+)"


In other words, the valid characters are:

-- Unicode word characters.

-- Underscores.

-- The three common path separators: colon, slash, and backslash.


But many more characters are typically valid in directory names, including blanks!


The most general scheme would be to allow all printable characters in an @path directive except trailing blanks.


What do you all think?


Edward


jkn

unread,
Apr 24, 2025, 9:25:02 AM4/24/25
to leo-editor
I have vaguely wondered "what does Leo allow in path directives" myself.

Isn't the list of 'what characters are allowed in directory names' OS-dependent??

Edward K. Ream

unread,
Apr 24, 2025, 9:26:06 AM4/24/25
to leo-editor
On Thursday, April 24, 2025 at 8:04:21 AM UTC-5 Edward K. Ream wrote:

Issue #4339 raises a subtle question. What characters should be valid in Leo's @path directives?


The existing code defines the valid characters with this regex:


    r"^@path\s+([\w_:/\\]+)"


\w includes the underscore character, so there is some redundancy in this regex.

Edward

Edward K. Ream

unread,
Apr 24, 2025, 9:28:48 AM4/24/25
to leo-e...@googlegroups.com
On Thu, Apr 24, 2025 at 8:25 AM jkn <jkn...@nicorp.f9.co.uk> wrote:

I have vaguely wondered "what does Leo allow in path directives" myself.

Isn't the list of 'what characters are allowed in directory names' OS-dependent??

Indeed yes. Some OS's even allow non-printing characters! Here is the wikipedia discussion.

Edward

Thomas Passin

unread,
Apr 24, 2025, 10:46:13 AM4/24/25
to leo-editor
Since we expect the @path directive to be able to point to an actual existing directory, I think it has to allow spaces. With spaces, I suppose that the paths would have to be quoted so that they are delimited. I do not think that wild card path characters need to be (or should be) supported; if included, then Leo would not know how to create a non-existing path, and they would never occur in already-existing paths. Actually, there are several characters that Windows doesn't allow in paths.  According to ChatGPT they are <>:"/\\|?*

I don't like the idea of allowing other non-printing characters even if the OS would allow them because a user would have no way of knowing what the right character is supposed to be. I don't suppose they occur very often in the wild for just this reason.

An alternative for non-printing characters would be to handle them as URLs do (e.g., a space becomes %20) and have Leo's code automatically convert them. Then the paths wouldn't need to be quoted.  But that would make for more code complexity when a user pastes a path from the clipboard into an @path directive. I think that pasting an existing path will be the most common use case.

Linux and MacOS don't allow a null character (\0).  There are reserved file names in Windows (e.g., "CON", "PRN") but since they are file names and not paths They probably don't have to be matched or filtered for the purposes of @path directives.

Edward K. Ream

unread,
Apr 24, 2025, 3:26:20 PM4/24/25
to leo-e...@googlegroups.com
On Thu, Apr 24, 2025 at 9:46 AM Thomas Passin <tbp1...@gmail.com> wrote:
Since we expect the @path directive to be able to point to an actual existing directory, I think it has to allow spaces.

I agree.  Thanks for raising all these issues.

With spaces, I suppose that the paths would have to be quoted so that they are delimited.

Actually, not. Leo already calls g.stripPathCruft to remove various kinds of quotes. That has never been a problem.

I do not think that wild card path characters need to be (or should be) supported; if included, then Leo would not know how to create a non-existing path, and they would never occur in already-existing paths.

The revised PR allows all characters except for trailing whitespace. It's up to the user to create a valid path. If a character doesn't make sense in a directory name, the OS will say so, so I think there is little practical danger in the PR's changes.

Note that by default Leo never creates non-existent paths automatically.
Actually, there are several characters that Windows doesn't allow in paths.  According to ChatGPT they are <>:"/\\|?*

Tests show that allowing all characters is not a problem in this context.

I don't like the idea of allowing other non-printing characters even if the OS would allow them because a user would have no way of knowing what the right character is supposed to be. I don't suppose they occur very often in the wild for just this reason.

I discuss this question in the revised first comment of the PR. As you imply, in practice these edge cases won't occur. Furthermore, the PR can't reasonably be called a breaking change.

Edward

Thomas Passin

unread,
Apr 24, 2025, 6:34:44 PM4/24/25
to leo-editor
On Thursday, April 24, 2025 at 3:26:20 PM UTC-4 Edward K. Ream wrote:
The revised PR allows all characters except for trailing whitespace. It's up to the user to create a valid path. If a character doesn't make sense in a directory name, the OS will say so, so I think there is little practical danger in the PR's changes.

What I don't understand is how to know if a given space is a trailing space.  It used to be, didn't it, that a headline could have text after the path of an @path directive.  I never used it like that but I have been under the impression that it did.  If so, text following a space that was intended to have been a trailing space would get included into the path.  The obvious cure is not to allow any text after the path.  Is this going to be a new restriction, or has it always been like that?

Edward K. Ream

unread,
Apr 25, 2025, 3:19:51 AM4/25/25
to leo-e...@googlegroups.com
On Thu, Apr 24, 2025 at 5:34 PM Thomas Passin <tbp1...@gmail.com> wrote:

On Thursday, April 24, 2025 at 3:26:20 PM UTC-4 Edward K. Ream wrote:
The revised PR allows all characters except for trailing whitespace. It's up to the user to create a valid path. If a character doesn't make sense in a directory name, the OS will say so, so I think there is little practical danger in the PR's changes.

What I don't understand is how to know if a given space is a trailing space.  It used to be, didn't it, that a headline could have text after the path of an @path directive.  I never used it like that but I have been under the impression that it did. 

Thomas, you have asked exactly the right questions!

Imo, "truncating" an @path directive at a space (or any other character not matched by the legacy regex) was an unintentional Easter Egg. The directives reference makes no mention of this truncation. Rather, the documentation clearly implies that path is everything following the @path.

If so, text following a space that was intended to have been a trailing space would get included into the path.  The obvious cure is not to allow any text after the path.  Is this going to be a new restriction, or has it always been like that?

This question is the crux of the PR's dilemma. The PR does what Leo's legacy code intended, but yes, you could say that the PR creates a new restriction.

Summary

The question I have been asking myself is, does the PR create a breaking change? Yes, in theory it does. In practice, not so much. My tentative plan:

- Update the first comment of the PR in light of this discussion.
- Include the PR in the list of significant changes to Leo, both in the release notes and in the "What's new" section.

Edward

jkn

unread,
Apr 25, 2025, 1:00:55 PM4/25/25
to leo-editor
I'm not 100% sure I understand your intention here.

You seem to be saying (if I am not mistaken) that you plan to allow trailing characters in an @path directive ... and that these will be reflected in the created directory name. Is that correct?

I am probably fairly agnostic about these sort of changes - since I would strongly try to avoid creating directories like this if at all possible.

'However' I do have a vague "It would be nice if"  IWBNI you could have a trailing comment in in an @path headline. This of course implies some way of delimiting the comment ... and this probably clashes with what you are trying to achieve with the directory characters.

Since I have only just thought of this IWBNI I don't see it as a strong objection. I'd just like whatever is chosen to be clearly documented (as I am sure it will be)

    J^n





Edward K. Ream

unread,
Apr 25, 2025, 6:29:34 PM4/25/25
to leo-e...@googlegroups.com
On Fri, Apr 25, 2025 at 12:00 PM jkn <jkn...@nicorp.f9.co.uk> wrote:

> You seem to be saying (if I am not mistaken) that you plan to allow trailing characters in an @path directive ... and that these will be reflected in the created directory name. Is that correct?

No. I meant the reverse: the PR strips trailing blanks.

> I am probably fairly agnostic about these sort of changes - since I would strongly try to avoid creating directories like this if at all possible.

Leo never creates directories without your explicit permission.

> 'However' I do have a vague "It would be nice if"  IWBNI you could have a trailing comment in in an @path headline. This of course implies some way of delimiting the comment ... and this probably clashes with what you are trying to achieve with the directory characters.

This "feature" was never intended and the PR (just merged) eliminates it. The benefits of the former Easter Egg are miniscule compared with the problems.

Edward

jkn

unread,
Apr 26, 2025, 3:02:25 AM4/26/25
to leo-editor
On Friday, April 25, 2025 at 11:29:34 PM UTC+1 Edward K. Ream wrote:
On Fri, Apr 25, 2025 at 12:00 PM jkn <jkn...@nicorp.f9.co.uk> wrote:

> You seem to be saying (if I am not mistaken) that you plan to allow trailing characters in an @path directive ... and that these will be reflected in the created directory name. Is that correct?

No. I meant the reverse: the PR strips trailing blanks.

> I am probably fairly agnostic about these sort of changes - since I would strongly try to avoid creating directories like this if at all possible.

Leo never creates directories without your explicit permission.

Sure - I meant (if it wasn't clear) that I would strongly resist creating/using directories with trailing spaces. Heck, I still resist spaces within file and directory names!

> 'However' I do have a vague "It would be nice if"  IWBNI you could have a trailing comment in in an @path headline. This of course implies some way of delimiting the comment ... and this probably clashes with what you are trying to achieve with the directory characters.

This "feature" was never intended and the PR (just merged) eliminates it. The benefits of the former Easter Egg are miniscule compared with the problems.


Got it - there is an minor inconsistency in that trailing spaces will not be reflected in the resulting directory name. I for one am perfectly happy with this.
 
Edward

Edward K. Ream

unread,
Apr 26, 2025, 5:11:56 PM4/26/25
to leo-e...@googlegroups.com
On Sat, Apr 26, 2025 at 2:02 AM jkn <jkn...@nicorp.f9.co.uk> wrote:

...I would strongly resist creating/using directories with trailing spaces. Heck, I still resist spaces within file and directory names!

The PR doesn't allow directories with trailing spaces, and neither did the legacy code.

Edward
Reply all
Reply to author
Forward
0 new messages