Re: String Literals

16 views
Skip to first unread message

Bart

unread,
Sep 29, 2021, 4:31:42 PMSep 29
to
On 29/09/2021 20:46, Stefan Ram wrote:
> I have the following ideas for string literals in a new language
> (first the string, then the string literal is given):
>
> String literals start with an opening bracket and end with
> a closing bracket.
>
> abc
> [abc]
>
> Brackets within the string literal are allowed when properly
> nested.
>
> abc[def]ghi
> [abc[def]gih]
>
> A single opening or closing bracket is written as "[`]" or
> "[]`", respectively. This rule has higher precedence than the
> preceding rule: whenever there is a "[`]" or "[]`" within
> a string literal, it means "[" and "]", with no exceptions.
>
> abc[def
> [abc[`]def]
>
> abc]def
> [abc[]`def]
>
> abc[`]def
> [abc[`]`[]`def]
>
> abc[]`def
> [abc[`][]``def]
>
> The notation for "[`]" and "[]`" within a string is awkward,
> but is antecipated to be required only rarely. Most texts will
> contain brackets that are properly nested, and this was made
> to be easy.
>
> So, are there any problems with this specification I have missed?
> Strings that are impossible to encode or string literals whose
> interpretation is ambiguous? Cases where frequent strings are
> cumbersome to encode? TIA!


I don't know if some strings are impossible to code. But it looks
near-impossible to write or read any strings that contain square
brackets or single quotes.

How do you deal with the usual non-printable characters that need escape
sequences such as CR, LF, TAB, BELL, etc?

With the usual "..." delimiters, your examples reduce to:

abc
"abc"

abc"def"ghi
"abc""def""ghi" # or the more common:
"abc\"def\"ghi" # (I allow both)

I'm not sure how you came up with these puzzling 3-character sequences:

[ [`]
] []`

If introducing ` as some sort of escape symbol, why not have it precede
the escaped character:

[ `[
] `]

Your examples, if still using [..] to delimit strings, and allowing
embedded ...[...]... without needing escapes, become:

abc[def]ghi
[abc[def]gih]

abc[def
[abc`[def]

abc]def
[abc`]def]

abc[`]def
[abc[``]def]

abc[]`def
[abc[]``def]

Here it needs `` to represent one `.

James Harris

unread,
Oct 1, 2021, 9:37:53 AMOct 1
to
On 29/09/2021 21:31, Bart wrote:
> On 29/09/2021 20:46, Stefan Ram wrote:
>>    I have the following ideas for string literals in a new language
>>    (first the string, then the string literal is given):

Stefan, I'll respond to your idea here as Bart has already made some of
the points I would have made.

>>
>>    String literals start with an opening bracket and end with
>>    a closing bracket.
>>
>> abc
>> [abc]

I'd be interested to see where you get to with this as I experimented
with braces (rather than brackets) which have the same feature of the
closing delimiter (and, hence, the string terminator) being different
from the opening delimiter.

>>
>>    Brackets within the string literal are allowed when properly
>>    nested.
>>
>> abc[def]ghi
>> [abc[def]gih]
>>
>>    A single opening or closing bracket is written as "[`]" or
>>    "[]`", respectively. This rule has higher precedence than the
>>    preceding rule: whenever there is a "[`]" or "[]`" within
>>    a string literal, it means "[" and "]", with no exceptions.
>>
>> abc[def
>> [abc[`]def]
>>
>> abc]def
>> [abc[]`def]
>>
>> abc[`]def
>> [abc[`]`[]`def]
>>
>> abc[]`def
>> [abc[`][]``def]
>>
>>    The notation for "[`]" and "[]`" within a string is awkward,

Yes, it's very awkward.


>>    but is antecipated to be required only rarely. Most texts will
>>    contain brackets that are properly nested, and this was made
>>    to be easy.
>>
>>    So, are there any problems with this specification I have missed?
>>    Strings that are impossible to encode or string literals whose
>>    interpretation is ambiguous? Cases where frequent strings are
>>    cumbersome to encode?                                        TIA!
>
>
> I don't know if some strings are impossible to code. But it looks
> near-impossible to write or read any strings that contain square
> brackets or single quotes.
>
> How do you deal with the usual non-printable characters that need escape
> sequences such as CR, LF, TAB, BELL, etc?

That was my main question. AISI, if Stefan uses an escape sequence for
LF etc then a string's opening and closing delimiters could be escaped
in order to embed them.

>
> With the usual "..." delimiters, your examples reduce to:
>
>  abc
>  "abc"
>
>  abc"def"ghi
>  "abc""def""ghi"        # or the more common:
>  "abc\"def\"ghi"        # (I allow both)

I chose to match an opening \ with a closing / so that string would be
one of these

"abc\Q/def\Q/ghi"
"abc\q/def\q/ghi"
"abc\"/def\"/ghi"

Not sure which, yet, but because what comes after \ is not limited to
one character other quote marks could be specified by name, e.g.

\q66/ opening slanted speech mark
\q99/ closing slanted speech mark
\q9/ normal slanted apostrophe
\q<</ France etc opening speech mark
etc

https://en.wikipedia.org/wiki/Guillemet


>
> I'm not sure how you came up with these puzzling 3-character sequences:
>
>  [    [`]
>  ]    []`
>
> If introducing ` as some sort of escape symbol, why not have it precede
> the escaped character:
>
>  [    `[
>  ]    `]
>
> Your examples, if still using [..] to delimit strings, and allowing
> embedded ...[...]... without needing escapes, become:
>
>  abc[def]ghi
>  [abc[def]gih]
>
>  abc[def
>  [abc`[def]
>
>  abc]def
>  [abc`]def]
>
>  abc[`]def
>  [abc[``]def]
>
>  abc[]`def
>  [abc[]``def]
>
> Here it needs `` to represent one `.

However and whenever I try to encode such such strings they end up to be
similarly difficult to read.

One option, perhaps, is to allow greater spacing. Considering the last one,

abc[]`def

if the punctuation characters need special treatment how about spacing
them out. For example,

"abc" + LBRACKET + RBRACKET + BACKAPOSTROPHE + "def"

or

"abc\ [ /\ ] /\ ` /def"

or

"abc" + "\[/" + "\]/" + "\`/" + "def"

or

"abc\ [ ] ` /def"

That last one's arguably not too bad a way to embed three consecutive
special characters.


--
James Harris

David Brown

unread,
Oct 1, 2021, 9:43:31 AMOct 1
to
On 29/09/2021 21:46, Stefan Ram wrote:
> I have the following ideas for string literals in a new language
> (first the string, then the string literal is given):
>
> String literals start with an opening bracket and end with
> a closing bracket.
>

Others have answered here, but have missed the elephant in the room -
/why/? What possible advantages would this brackets mess have over
quotation marks that are used by almost every programming language (and
many human languages)?


Bart

unread,
Oct 1, 2021, 11:01:19 AMOct 1
to
On 01/10/2021 15:12, Stefan Ram wrote:
> James Harris <james.h...@gmail.com> writes:
>> On 29/09/2021 21:31, Bart wrote:
>>> On 29/09/2021 20:46, Stefan Ram wrote:
>>> How do you deal with the usual non-printable characters that need escape
>>> sequences such as CR, LF, TAB, BELL, etc?
>> That was my main question. AISI, if Stefan uses an escape sequence for
>> LF etc then a string's opening and closing delimiters could be escaped
>> in order to embed them.
>
> CR, LF, TAB, and BELL do not need escape sequences in my
> notation as they can be included either literally or via
> the embedding language if need be.
>
> [ a bracketed string
> can span several lines,
> and it may
> contain literal tab
> characters if need be.
> BELL signs are antecipated to be rarely needed.]

That won't work well in general because newline sequences depend on both
the OS and the editor, or even on the source of the text if it was
pasted elsewhere.

Newlines may be CR, CRLF, LF, something else entirely, or may not even
exist. (In my editor, newlines do not exist while editing and displaying
text, which is a list of strings. They are discarded when reading from
disk, and added back again when writing to a file.)

It means that that string can contain have unknown sequences, and what
are superfically the same strings in two source files, may not compare
equal.

Literal tabs are another problem, as they are so often expanded. Then
they turn into spaces, but now a fixed number of spaces.

Yet another, is that without delimiters before the editor's natural
end-of-line, there can be trailing spaces (and tabs) that are now invisible.

Two bonus problems: this makes it impossible to have those intermediate
lines ending with a comment, and you can't indent this text to bring it
(literally) into line with the surrounding code.


>> "abc" + LBRACKET + RBRACKET + BACKAPOSTROPHE + "def"
>
> If these strings are part of a languages with string
> concatenation operators (which is intended indeed) this
> would be possible.

In this case why bother with trying to represent embedded [ and ] at all?

James Harris

unread,
Oct 1, 2021, 11:08:06 AMOct 1
to
On 01/10/2021 15:12, Stefan Ram wrote:
> James Harris <james.h...@gmail.com> writes:
>> On 29/09/2021 21:31, Bart wrote:
>>> On 29/09/2021 20:46, Stefan Ram wrote:
>>> How do you deal with the usual non-printable characters that need escape
>>> sequences such as CR, LF, TAB, BELL, etc?
>> That was my main question. AISI, if Stefan uses an escape sequence for
>> LF etc then a string's opening and closing delimiters could be escaped
>> in order to embed them.
>
> CR, LF, TAB, and BELL do not need escape sequences in my
> notation as they can be included either literally or via
> the embedding language if need be.
>
> [ a bracketed string
> can span several lines,
> and it may
> contain literal tab
> characters if need be.
> BELL signs are antecipated to be rarely needed.]

Those four may be covered but do you not need to handle any other
nonprinting characters such as backspace or del?

You may also want to have a plan for ending lines with something other
than the line endings which happen to be present in the particular
editor you are using (which is what the above text would naturally
include).

What if someone writing one of your strings wanted to include a trailing
space on one line but not another? In the above, trailing blanks would
not be evident in the source.

An escape arrangement would allow such issues to be addressed as well as
providing a way of embedding (or, de-signifying) string delimiters.

Something else to consider is where text has to be entered in lines but
the encoded text should omit the line breaks.

>
>> "abc" + LBRACKET + RBRACKET + BACKAPOSTROPHE + "def"
>
> If these strings are part of a languages with string
> concatenation operators (which is intended indeed) this
> would be possible. I plan to realize concatenation of
> strings by mere concatenation of expressions, so
> "abc\adef" could be written [abc]*BELL[def], that is
> a sequence of a string literal, a name, and another
> string literal (names would have to be marked in this
> language, I used an asterisk in this post as an example
> for a marker for a reference by name).

That's interesting. I tried the same. I found it would work especially
well and usefully for a trailing newline. In your syntax:

[abc] ;Just the three letters abc
[abc]*n ;abc and newline

>
> I decided to use []` for the closing bracket as part of the
> text, as I wrote. If I had decided to use `] for the closing
> bracket as part of the text, this would mean that a backtick
> cannot be the last character in a string. So, I could have
> used ]` instead, but using []` instead means that my strings
> always have properly nested brackets, which helps when using
> editors with functions to find matching brackets.

Understood, but AIUI your idea of having

[`

for a de-signified opening bracket would also make it hard to put such a
backtick at the /beginning/ of a string.

All told, escapes are not the worst idea in the world.


--
James Harris

Rod Pemberton

unread,
Oct 10, 2021, 3:36:44 AMOct 10
to
On 29 Sep 2021 19:46:07 GMT
r...@zedat.fu-berlin.de (Stefan Ram) wrote:

> I have the following ideas for string literals in a new language
> (first the string, then the string literal is given):
>
> String literals start with an opening bracket and end with
> a closing bracket.
>
> abc
> [abc]

Having different initial and terminal delimiters makes it slightly
easier to parse the string than using quotes, but this typically
requires escapes too.

My advice would be to pick delimiters that would not normally be needed
within typical typed text e.g., for ASCII, possibly a backquote `,
backslash \, caret ^, tilde ~, or quote ". I would avoid brackets [],
braces {}, parens (), guillemets <>, as string delimiters due to their
usefulness in pairing items within the language. The other ASCII
symbols are used for punctuation, mathematics, or accounting.

> Brackets within the string literal are allowed when properly
> nested.
>
> abc[def]ghi
> [abc[def]gih]
>

Why would you need to nest string delimiters? ...

In other words, why are you nesting a string within a string?
(IMO, that's the biggest elephant in the room ...)

So, I'm beginning to think that you may mean something different by the
term "string literal" that what I understand a "string literal" to be:
https://en.wikipedia.org/wiki/String_literal

Or, is the usage of nesting just a way to embed non-delimiter brackets
within the string without using escapes? ... If so, your choice of
brackets as delimiters is probably non-optimal. Pick something else.

> A single opening or closing bracket is written as "[`]" or
> "[]`", respectively. This rule has higher precedence than the
> preceding rule: whenever there is a "[`]" or "[]`" within
> a string literal, it means "[" and "]", with no exceptions.

The backquote ` is acting as an escape, but since it comes after the
character being escaped, your lexer would need look-back. AIUI, the
majority of lexers use look-ahead. What does yours do? Is this a
concern?

> abc[def
> [abc[`]def]
>
> abc]def
> [abc[]`def]
>
> abc[`]def
> [abc[`]`[]`def]
>
> abc[]`def
> [abc[`][]``def]
>
> The notation for "[`]" and "[]`" within a string is awkward,
> but is antecipated to be required only rarely. Most texts will
> contain brackets that are properly nested, and this was made
> to be easy.
>
> So, are there any problems with this specification I have missed?
> Strings that are impossible to encode or string literals whose
> interpretation is ambiguous? Cases where frequent strings are
> cumbersome to encode? TIA!
>

I'm really not sure why the nesting of strings is needed, assuming
(probably incorrectly) that's what is being done here, so I'd personally
eliminate the nesting, or change the delimiters, if not. That would
eliminate some or all of the need for escapes (like C) or string
concatenation (like BASIC). If you need an escape, use an escape, or
select different terminators to reduce/eliminate the need for escapes.

--
Things are only going to become worse for Joe Biden. His only chance
at salvation will come from the thing he hates the most: Donald Trump.

Reply all
Reply to author
Forward
0 new messages