Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

A question about extending the Forth200x standard escape sequences ...

1 view

Skip to first unread message

Bruce McFarling

unread,

Dec 13, 2009, 1:01:46 AM12/13/09

I have a setting where I want to include spaces in a single blank-
delimited token which will be supporting escape sequences. I also want
to later extend this to blank delimited tags, and it is convenient to
have the common prefix character.

The proposed standard escape sequences are:

\a BEL (alert, ASCII 7)
\b BS (backspace, ASCII 8)
\e ESC (escape, ASCII 27)
\f FF (form feed, ASCII 12)
\l LF (line feed, ASCII 10)
\m CR/LF pair (ASCII 13, 10)
\n newline (implementation dependent newline,
* eg, CR/LF, LF, or LF/CR)
\q double-quote (ASCII 34)
\r CR (carriage return, ASCII 13)
\t HT (horizontal tab, ASCII 9)
\v VT (vertical tab, ASCII 11)
\z NUL (no character, ASCII 0)
\" double-quote (ASCII 34)
\x## where # is a hexadecimal digit.
The resulting character is the conversion of these two
hexadecimal digits. An ambiguous conditions exists if \x
is not followed by two hexadecimal characters.
\\ backslash itself

However, I have not caught all of the reporting on which escape
sequences are used in which implementations, so I wanted to throw this
out for response before settling on an extension that I'll want to
change later.

I have penciled in:

\s BL ASCII Space \x20
\o Escape-Nul, no character
\{text} character class
\:text{ opening tag
\} closing tag

... but of course am not committed to any, especially if it conflicts
with more elaborated escapes I might wish to adopt later.

==============================
\s BL ASCII Space \x20

The first context is pattern matching, where I have been doing a lot
of sed scripting to match the regular patterns of various anime leech
streaming sites to harvest their streaming links and study the
effectiveness of the strategies of various legit anime streaming sites
in competition with the presumptive "Pirates" (though, it turns out,
ironically enough, that one of the main "Pirate Support Bases" for
bootleg anime streaming is Rupert Murdoch's 20th Century Fox
Intellectual Property's MySpaceCDN servers operated, as the name would
suggest, in support of MySpace). I think I've fixed the main problems
I had with my sliders concept by turning a slider into a structure of
four cells sitting in a dedicated slider stack pad. I do not think I
have the pattern matching toolkit to match any arbitrary BGREP
expression, but it seems like it will do for the application.

However, its convenient to be able to denote some sequences as blank-
delimited tokens, even with a space or two embedded. Hence the desire
of an escape for BL.

In the standard, an escaped embedded space in a token is \x20. I wish
to use \s as in "space":

==============================
\o Null-escape - no ASCII character

In a later, related, application, I wish to have blank delimited tags.
I have sorted out the translation to literal white space by treating
the space preceding the tag as a normal break space by default, the
space following the tag as a normal break no width space by default,
and the reverse for the closing tag, and then a token consisting of
one or more escaped characters in those positions over-rides the
default.

In addition to \s and \t, I need a "no, this is nothing" escape for
this ... a true-null Escape rather than an ASCII Nul escape, \z. This
also allows me to generically restrict escape filtering to tokens
beginning with the ``\'' character.

I believe I wish to use \o in "omit". If anything, this corresponds to
the UTF-16 Byte Order Mark, 0xFEFF in UTF-16 (\xFF\xFE in UTF-16LE or
\xFE\xFF in UTF-16BE, hence used at the beginning of a UTF-16 to
distinguish endianess).

==============================
\{text} character class
\:text{ opening tag
\} closing tag

Finally, I will eventually be wanting to have named classes in one
setting and tags in a related setting. The general pattern of ...

\ opening-delimiter text closing-delimiter
\ punctuation text opening-delimiter
\ closing-delimiter

... is one I've settled on, but there are a number of permutations,
some of which give me the shudders, and others which I rather like as
well:

\(text) \/text( ... \)
\[text] \*text[ ... \]
\<text> \-text< ... \>

So the question is mostly what use there is inside Forth of
punctuation characters to denote escape codes, but also within the
Procrustean Bed of this generic pattern, which combinations seem to
scan well in a sequence of space-delimited text.

0 new messages