At least some tech people from other areas refer to strings stored
in the Turbo Pascal manner as "Pascal strings" (ones preceded by
their binary-encoded length as opposed to zero-terminated C strings.)
So I wonder if that low-level representation of strings has any
wider acceptance in the Pascal world. Some sources say that ANSI
Pascal specifies it also, but I failed to find any references to
it in the standards (as found here:
http://www.moorecad.com/standardpascal/home.html) That seems natural
to me because Pascal, unlike C, generally avoids exposing its guts
to the programmer. So what is the true status of those "Pascal
strings"? Are they used solely by Borland products, or do they act
as a well-known ABI for strings, or what?
Thanks in advance!
Yar
Most Borland oriented (Borland's Delphi obviously, but also Free Pascal, and
in the past Virtual Pascal and afaik Speedpascal), or compilers with some
Borland/TP/Delphi compatibility mode do. (TMT, Topspeed)
In Delphi terminology it is called "shortstring", and one also sees the
phrase "UCSD string" for it.
However the importance in the Borland oriented Pascal's (read Delphi itself,
and Free Pascal) is declining, simply because both have other types for over
a decade (ansistring/widestring and recently unicodestring).
Despite that, afaik Delphi and compatibles (FPC, VP ...) still, or till
quite recently (Delphi-Tiburon last august) store(d) string
literals in shortstring format.
But the term "Pascal string" is like the Pascal calling convention often
used as a general term. While it has a strict definition (UCSD, TP string),
it is often also used as general term for length prefixed strings. (as
opposed to nul terminated or the older space padded strings)
> http://www.moorecad.com/standardpascal/home.html) That seems natural
> to me because Pascal, unlike C, generally avoids exposing its guts
> to the programmer. So what is the true status of those "Pascal
> strings"? Are they used solely by Borland products, or do they act
> as a well-known ABI for strings, or what?
Solely by Borland and compatibles, and maybe some more ansi oriented
products with compatibility modes. However all together that is probably the
bulk of still maintained pascal compilers. Specially when weighted by
usage.
Most notable exception is GNU Pascal afaik, iirc last time I asked it still
needed rewriting:
I believe the STRING as a built-in Pascal Datatype started with UCSD-Pascal
which used the above mentioned form. It does have the shortcoming of being
limited to a certain length.
bill
--
Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves
bill...@cs.scranton.edu | and a sheep voting on what's for dinner.
University of Scranton |
Scranton, Pennsylvania | #include <std.disclaimer.h>
ISO 7185 (Standard Pascal) just uses fixed length array as strings.
ISO 10206 (Extended Pascal) specifies string schema. While Extended
Pascal says little about representation it requires string schema
to have _two_ special "filelds": length and capacity. capacity
specify maximal length that can be stored in the schema while
length specifies actual length (which may be smaller than capacity).
Extended Pascal says that capacity has type Integer and length
has values up to capacity, so natural implementation (used in GNU
Pascal) uses two Integer sized (in GNU Pascal 64-bit on 64-bit
machines) fields followed by character data.
It is possible to conform to Extended Pascal spec but store
capacity separately from the string, so length followed by
character data is also possible (and used by same workstation
Pascal). Note that such representation is still likely to
differ from Borland one: Borland Pascal linits length to 255
characters so it can be stored in single byte and accessed
as element number 0 in the string. Several other implementations
allows longer strings so that length is clearly different
form character data.
I think that "Pascal strings" as name got much popularity from
16-bit Windows and from Mac OS: in both cases the name was
used in the documentation and many operating system functions
required strings in such format.
--
Waldek Hebisch
heb...@math.uni.wroc.pl
All,
Many thanks for your comprehensive replies! My conclusions from
them are as follows.
"Pascal strings" is a well-known term that needs no further comments
if speaking of the idea at large: We can store arbitrary variable-length
data by preceding it with its length as a fixed-width field and
paying the price of limited data length. (As opposed to C strings,
with which we enjoy unlimited data length but have to exclude one
data unit value, the terminator.) However, such strings are not
standard in the Pascal language, unlike null-terminated strings in
C; they are just adopted by a few popular implementations. Moreover,
different implementations can use length fields of different width,
so more details may be called for if one needs to specify a precise
ABI.
I hope I got it right. :-)
Yar
Just to add my pence from other direction:
math representation of character-string is a sequence of nodes each is
marked with character from set-of-implementation-defined-characters
(ISO 7185 6.1.7).
With Standard Pascal we even need not to take in consideration
abstraction of empty-string.
Actually 6.1.7 describes how to embed value of char-string token into
program text. It's clear that in value itself we have far less
apostrophes;)
Alex
As K&R C strings. Modern C string routines are all bounded by an upperbound
value (max chars to search, the "n" variants). This for security reasons.
Since the size of that upper bound is limiting for the modern C stringtype
too, it is equal with respect to "unlimited data length" as the prefixed
bound, if the latter is dynamically allocated.
IOW both systems have a value with a system dependant range (typically word
size) somewhere. However this is the theoretic, general case. The practice
is different, which is what gave UCSD strings a bad name, see below.
> However, such strings are not standard in the Pascal language, unlike
> null-terminated strings in C; they are just adopted by a few popular
> implementations.
True, but they are extremely popular dialects. Also compilers most support
null terminated strings, since it is nearly entirely library based, and thus
hardly can be called a "string type".
Also note that while the principle is unlimited, most implementation that
support the (eighties) UCSD string type default to a byte size length, thus
with a 255 char limit (empty..255). Moreover TP and such typically was
statically (since 255 chars could easily go on the stack), leading to
certain pitfalls that could cause a lot of copying when passing strings
around.
However this was convention, and e.g. TP's own Turbo Vision framework used
pointer to a UCSD string as base string type.
It must be said too that the most popular dialect (Delphi) doesn't use the
UCSD/TP/shortstring type anymore. Delphi introduced a new string type mid
nineties, called the ansistring (which is afaik also available in Borland's
C++ compiler).
Also most of Free Pascal's libraries and systems have been migrated to this
system, except the compiler's guts (performance, and >255 char tokens are
rare) and some TP related parts. (compability units, Turbo Vision and
textmode IDE)
It's a reference counted string type that has a double zero at the end, but
also stores the length. The length is authorative, the terminating double
zero is only for passing to ease passing (a cast, so zero copy, zero
instruction) to null-terminating systems. The fact that it double null
terminates iirc had to do with early windows MBCS support. Allocation is
automated due to refcounting and fairly transparant. However one can mess
with its internals very easily, which makes foreign system interfacing
easier.
> Moreover, different implementations can use length fields of different
> width, so more details may be called for if one needs to specify a precise
> ABI.
> I hope I got it right. :-)
Correct, but keep in mind that there is a big difference to the definition
for general use (as in length prefixed arrays more or less), and the
concrete UCSD and TP types, with its 255 char limitation and being typically
used static.
Strings in my compiler (HP Pascal for OpenVMS) use a leading 16-bit word for
a length count. Besides STRING, per the standard, we have VARYING OF CHAR
which predates the standard but is the same underlying implementation. The
16-bit length word has its history in the VAX instruction set which is
biased to 64K byte strings and the OpenVMS parameter descriptors which have
the same length word.
John
Pascal does have strings, but they are specifically an array type:
packed array [1..n] of char;
They must be packed, they must start with the array index 1, but can
extend to any length.
Because of the type matching rules of ISO 7185 Pascal, two strings are
only compatible if they
are of the same length. Thus:
packed array [1..10] of char;
and
packed array [1..20] of char;
are both string types, but are not compatible with each other. This
can cause issues, for
example with initialization:
var s: packed array [1..20] of char;
...
s := 'hi there ';
If you simply assume that all strings are equal, say, all of length
100, then all your
strings are compatible, and you are left only with the issue that
initializations are
"ungainly". There is an example of dealing with such fixed length
strings here:
For the specific example of so called "space padded" strings. A space
padded string
is a string with all of the extra space to the right filled with
spaces. This can handle
most common string operations.
Several compilers, as I am sure you are aware, implemented extended
string types.
For the case of UCSD, the type for string was similar to ISO 7185
Pascal, but the compiler
reserved a length variable along with the string, and thus the library
provided by the
implementation "knew" the length of the string.
This is not substantially different than ISO 1785 strings, since a
fixed length string is
still the basic underlying type. It is possible to define the same
type of string in ISO 7185
Pascal:
record
len: integer;
str: packed array [1..100] of char;
end;
However, the code to implement that is obviously cleaner as a build in
set of procedures
and functions. Especially, the compiler can hide the initialization,
make all such strings
compatible with each other, make it possible to return strings from a
function, and other
things that make strings behave as much more of a fundamental type.
I think it serves well to consider what goal is being served by
inclusion of strings before
committing to the compiler support of such a type. There are many
languages that consider
strings to be a fundamental type (Basic, TCL, etc), and those that do
not (C, C++, etc).
Typically, the C style plan is to introduce the idea that the language
can manipulate
character arrays of varying lengths and implement the strings via
libraries. The advantage
to doing this is that there is no general agreement on what type or
form of strings are
"best", and also, strings can be considered a subclass of a more
general problem, that of
handling variable length arrays of all base types.
So, for real world examples, we have Borland, which used the plan of
having strings
as arrays with indexes from 0..n, then the 0'th element contains the
length. This is
often, but incorrectly attributed to UCSD. UCSD did use the format of
having the first
element be the length, but did not allow the access a[0]. Instead,
this was reserved
for the compiler via initialization and procedures/functions such as
length().
The Borland plan was good, and flexible, but limited to 255
characters. Borland later
supplemented this string type by any length of string, and later still
with unicode
character strings.
The second example would be the ISO 10206 standard, which implemented
both inherent
string types as well as handling for any length of array (of any
type). In fact, the
standard appears to consider strings to be a special case of so called
dynamic arrays.
The last example is IP Pascal, which implements handling of any length
of array, but
leaves strings up to the user. So the type:
packed array of char;
Is indeed a string type, but is really only a template type which has
no meaning until
it is allocated as part of either a call to new() or the execution of
a procedure.
The allowance of handling for any length of array gives the user the
ability to form
a general purpose library for string handling, but leaves the exact
form up to the user.
For example, the standard string handling library actually supports
two string formats,
one using fixed length padded right strings, and another using
dynamically allocated
strings, and even allows for their use interchangeably. This was done
so that existing
ISO 7185 Pascal programs could use the general purpose string library,
but have full
dynamic strings remain available.
Finally, I think you stated a key requirement for strings in your
message, that it
should not be necessary to rely on their exact format in the final
object code.
The Borland plan of having the length in byte zero surely seemed a
neat fix for the
issue in the early days, but caused the type to be obsoleted later
because of its
inherently limited length. The "hands off" philosophy of UCSD towards
the exact format
would have allowed UCSD to seamlessly evolve beyond its 255 character
limit, but UCSD
didn't survive long enough to see that need. I believe the same
mistake is being
committed yet again with Unicode based strings. There is no real
reason for the user
to have to rely on the specific ASCII or Unicode status of a string,
and even within
Unicode there are differences in encoding (16 bit or 32, with various
endian modes).
I believe this mistake resulted in copying C/C++ based methodologies
over to Pascal.
Pascal, ISO 7185 original Pascal, was designed with great pains taken
to not have
reliances on the exact character format of the code. For example, the
eoln() and eof()
functions that abstracted the exact format of the characters.
Introducing two different
character code sets in the same program leads away from that kind of
character set
independence, and thus in IP Pascal Unicode and ASCII character sets
are different
compile type options for code that uses the same type: char.
Cheers,
Scott Moore
Here's a prime example, right here in this post:
>Hi there,
>At least some tech people from other areas refer to strings stored
>in the Turbo Pascal manner as "Pascal strings" (ones preceded by
First few lines of FIRST post...
----------------------------------------------------------------
>On 2008-10-22, Yar Tikhiy <y...@comp.chem.msu.su> wrote:
>> At least some tech people from other areas refer to strings stored
>> in the Turbo Pascal manner as "Pascal strings" (ones preceded by
And next post...
----------------------------------------------------------------
>In article <gdn9ct$m4u$1...@news.demos.su>,
>Yar Tikhiy <y...@comp.chem.msu.su> writes:
>> Hi there,
>>
>> At least some tech people from other areas refer to strings stored
>> in the Turbo Pascal manner as "Pascal strings" (ones preceded by
etc.
----------------------------------------------------------------
>In comp.lang.pascal.ansi-iso Yar Tikhiy <y...@comp.chem.msu.su> wrote:
>> Hi there,
>>
>> At least some tech people from other areas refer to strings stored
>> in the Turbo Pascal manner as "Pascal strings" (ones preceded by
----------------------------------------------------------------
Hey Falconer and Marco, isn't this just a bit redundant? Or, should I
continue? EVERY post starts with this same thing, and I get tired of
scrolling through the same thing I've already read half a dozen times or
more.
This is the 21st century, and though I'm pretty old myself, I just don't
see the practicality of re-quoting the SAME message(s) time after time
after time after time after time... and then putting the one I REALLY
wanted to read AFTER all the ones I've read many times already. If
you're quoting, yeah sometimes it works a lot better, but a plain old
reply should be accessible as soon as you open up the NEXT post.
I could just as easily say "Please don't BOTTOM post", but would it
do any good? I hate it for you, but you don't have to read my posts if
you prefer not to. Just like I don't have to read yours. All I can say
now is "PLEASE stop asking me not to top-post". That's the way I
do it because it makes more sense to me, sorry. In other words, I'm
going to post the way I want to, live with it or killfile me.
Just do a proper job of snipping, and your posts will not get
over-long. They will also be readable.
<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/> (taming google)
--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.