Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Question on bounded / unbounded strings

850 views
Skip to first unread message

Arie van Wingerden

unread,
Sep 13, 2016, 4:50:41 AM9/13/16
to
Hi,

for a long time I've been interested in Ada but hadn't played with it.

Now I bit the bullet and I am trying to create a small (Windows) program
that:
1) reads a string from STDIN (yet to do)
2) fetches the contents of the PATH environment variable (done)
3) splits that content in individual paths (???)
4) tries to match the inputted string with part of each path (to do)
5) outputs only the matched paths (to do)

I found out the hard way that Ada is very strict about bounded versus
unbounded strings.
So I had to convert the output of 2) to an unbounded string, because I could
not know it's length in advance.

Now I am a bit stuck splitting the string of paths;
I use GNAT.String_Split.Create, but it needs a (bounded) String as input.
However, I only have an unbounded string.

QUESTION: How can I convert an unbounded string to a standard (bounded)
string?
A standard string must be defined with a fixed length in
advance, which I do not know at the time ...

TIA,
Arie


Dmitry A. Kazakov

unread,
Sep 13, 2016, 5:05:33 AM9/13/16
to
On 13/09/2016 10:46, Arie van Wingerden wrote:
> Hi,
>
> for a long time I've been interested in Ada but hadn't played with it.
>
> Now I bit the bullet and I am trying to create a small (Windows) program
> that:
> 1) reads a string from STDIN (yet to do)
> 2) fetches the contents of the PATH environment variable (done)
> 3) splits that content in individual paths (???)
> 4) tries to match the inputted string with part of each path (to do)
> 5) outputs only the matched paths (to do)
>
> I found out the hard way that Ada is very strict about bounded versus
> unbounded strings.
> So I had to convert the output of 2) to an unbounded string, because I
> could not know it's length in advance.
>
> Now I am a bit stuck splitting the string of paths;
> I use GNAT.String_Split.Create, but it needs a (bounded) String as input.
> However, I only have an unbounded string.
>
> QUESTION: How can I convert an unbounded string to a standard (bounded)
> string?

Don't use either. For the thing you described (and for almost all cases)
standard String is a better, easier and safer choice. Bounded strings
are useless altogether. Unbounded string have very limited use.

P.S. Splitting/tokenization is just a wrong pattern for string
processing. You need not split strings at all. You just scan the string
from delimiter to delimiter marking beginning and end of the token and
then pass the substring further, e.g. building an ordered set of
[normalized] paths or directly matching them as they appear. It is
simpler, cleaner, safer and far more effective.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

gautier...@hotmail.com

unread,
Sep 13, 2016, 5:35:48 AM9/13/16
to
Hello,

Many languages have types like String (fixed size - rigid but fast) and Unbounded_String (extensible - flexible but slower) and sometimes something inbetween like the bounded strings.
Ada has started with (fixed) String which is a bit difficult for beginners. However, contrary to other languages, "fixed" doesn't mean that the length must be hardcoded in the program. For instance an upper length N can be an "in" parameter which can be any value. Another goodie is that with initialization, the string bounds don't need to be specified. You can have:

s: String:= Get_Line(file);

Also a String as parameter of a subprogram can have any length.
So you'll be done with your program with a few subprograms and the Get_Line function.

(Generally, you can use the functional programming style to work the rigidity of the type String around. You can see an example here
https://sf.net/p/cbsg/code/HEAD/tree/corporate_bullshit.adb
which is done entirely this way...)

The Unbounded_String type is useful rather as a container, like storing database string items into records. The function To_String gets you a String from an Unbounded_String.
_________________________
Gautier's Ada programming
http://gautiersblog.blogspot.com/search/label/Ada
NB: follow the above link for a valid e-mail address

Alejandro R. Mosteo

unread,
Sep 13, 2016, 6:41:25 AM9/13/16
to
On 13/09/16 10:46, Arie van Wingerden wrote:
> Hi,
>
> (...)
> A standard string must be defined with a fixed length
> in advance, which I do not know at the time ...

If you want to enjoy Ada, you will benefit from going in the direction
Gautier points to. Basically, getting a good grasp of indefinite types
and its implications. Perhaps the Wikibook can be of use too:

https://en.wikibooks.org/wiki/Ada_Programming/Type_System#Indefinite_subtype

Ada management of the stack for such types is one of its strong points
IMO. Basically, although you will be using types with unknown size (at
compile time), they have a known size at runtime and you don't need to
care about what that size is (there are attributes to know, like 'First,
'Last, 'Length).

That's what allows you to declare a function/procedure like:

function Tail (S : String; Len : Natural) return String;

Here, neither the input string S nor the result have a size known in
advance (nor has to be the same), but you'll be able to use them without
resorting to unbounded strings like this (which is very much the same
Gautier gave):

declare
Last_Three : String := Tail ("Hello", 3); -- Last_Three will be "llo"
Last_One : String := Tail (Last_Three, 1); -- Last_One will be "o"
begin
-- whatever
end;

Basically, you have to perform declaration and initialization at the
same time, so the indefinite type gets a bounded value. Otherwise, you
can aim at not storing intermediate results whenever possible, for which
the functional example in Gautier post is spot on.

You can check too this string splitting library of mine for ideas:

https://github.com/mosteo/agpl/blob/master/src/agpl-strings-fields.adb

Jeffrey R. Carter

unread,
Sep 13, 2016, 1:41:16 PM9/13/16
to
On 09/13/2016 01:46 AM, Arie van Wingerden wrote:
>
> for a long time I've been interested in Ada but hadn't played with it.

Some terminology: Ada defines 3 kinds of strings of Character:

* Fixed strings: type String

* Variable strings with a fixed maximum length: Bounded_String (not very useful;
you might prefer something equivalent but easier to use, such as PragmARC.B_Strings)

* Variable strings with no maximum length: Unbounded_String

You seem to use "bounded String" to refer to String, which is confusing for
those of us who use it to refer to Bounded_String.

> So I had to convert the output of 2) to an unbounded string, because I could not
> know it's length in advance.

You may have chosen to convert this to Unbounded_String, and that may have been
a good decision, but you didn't have to do this. You never really need
Unbounded_String, but sometimes it's convenient.

> QUESTION: How can I convert an unbounded string to a standard (bounded) string?

If you don't know how to do this, then you haven't spent enough time reviewing
the definition of Ada.Strings.Unbounded, and shouldn't be using it until you have.

> A standard string must be defined with a fixed length in
> advance, which I do not know at the time ...

A String must be declared with fixed bounds, but as others have pointed out, you
don't need to know those bounds in advance (or in some cases, at all). Between
Strings with bounds determined from initialization, slices, block statements,
and unconstrained subprogram parameters, you can do whatever you like with
String without having to use Unbounded_String.

(Note that a String is declared with bounds, not a length. Most importantly, the
lower bound doesn't have to be 1. Remembering this will save you a lot of grief.
The 1st Ada error I saw was something like

procedure P (S : in String) is
begin
if S (1) = ... -- Constraint_Error here
...
end P;

V : String (1 .. 1000);
Last : Natural;
Start : Positive;
begin
Text_IO.Get_Line (Item => V, Last => Last);

for I in 1 .. Last loop
if V (I) = ... then
Start := ...;

exit;
end if;
end loop;

P (S => V (Start .. Last) );

because Start /= 1.)

--
Jeff Carter
"C++ is vast and dangerous, a sort of Mordor of
programming languages."
Jason R. Fruit
120

Björn Lundin

unread,
Sep 13, 2016, 1:59:16 PM9/13/16
to
On 2016-09-13 10:46, Arie van Wingerden wrote:
> Now I am a bit stuck splitting the string of paths;
> I use GNAT.String_Split.Create, but it needs a (bounded) String as input.
> However, I only have an unbounded string.
>


procedure Foo(Input : in String) is
use Gnat; use Text_Io;
Subs : String_Split.Slice_Set;
Seps : constant String := " ";
begin
String_Split.Create (S => Subs,
From => Input,
Separators => Seps,
Mode => String_Split.Multiple);

for j in 1 .. String_Split.Slice_Count(Subs) loop
declare
A_Slice : String := String_Split.Slice(Subs, j);
begin
Put_Line(A_Slice);
end;
end loop;
end Foo;


call like

Foo("A string to be split on spaces");

or if you have an unbounded string - use To_String on it

use Ada.Strings.Unbounded_String;
Foo(To_String(Some_Unbounded_String));


(not compiled examples)



--
--
Björn

Arie van Wingerden

unread,
Sep 14, 2016, 7:46:40 AM9/14/16
to
Hi,
thank you *very* much. You are very helpful!

I created the following program, which only uses standard strings. It works
partly.

There is one thing I cannot get to work properly:
in the FOR loop the statement:
ATIO.Put(Path(Start .. Len));
outputs the first part of Path properly; but in the following cases it
won't.
What am I doing wrong?

(Don't mind the matching; this is NOT correct currently).

(Wrongly indented pieces will be gone in final version.)

The program code (FIP stand for Find In Path):
======================================


with Ada.Text_Io;
with Ada.Integer_Text_Io;
with Ada.Environment_Variables;
with Ada.Strings.Fixed;
with Ada.Strings.Maps.Constants;

procedure Fip is

package ATIO renames Ada.Text_IO;
package AITIO renames Ada.Integer_Text_Io;
package AEV renames Ada.Environment_Variables;
package ASF renames Ada.Strings.Fixed;
package ASMC renames Ada.Strings.Maps.Constants;

Path : string := ASF.Translate(AEV.Value("Path"),
ASMC.Lower_Case_Map);
Match : string := ASF.Translate(ATIO.Get_Line, ASMC.Lower_Case_Map);

procedure Match_Path (Match : in string; Path : in string) is
Start : positive := 1;
Len : natural := 0;
begin
for I in Path'Range loop
if Path(I) = ';' then
ATIO.Put_Line("----------------------------------------------------");
ATIO.Put(Path(Start .. Len)); AITIO.Put(Start); AITIO.Put(Len); ATIO.Put("
"); ATIO.Put_Line(Match);
-- This matching part is not OK yet
if Len > 0 and then Path(Start .. Len) = Match then
ATIO.Put_Line(Path(Start .. Len));
end if;
Start := I + 1;
Len := 0;
else
Len := Len + 1;
end if;
end loop;
end Match_Path;

begin
Match_Path(Match, Path);
ATIO.Flush;
end Fip;

Arie van Wingerden

unread,
Sep 14, 2016, 8:38:34 AM9/14/16
to
I suddenly realize my error. My bad.
Len only works as an end position in the first case ...

However, any further comments are welcome!

Arie van Wingerden

unread,
Sep 14, 2016, 8:41:20 AM9/14/16
to
I suddenly realize my error. Len only is valid as an end position in the
first case of course ...
My bad! Any further comments are however welcome.

Arie van Wingerden

unread,
Sep 14, 2016, 9:07:51 AM9/14/16
to
I finished the program.
It appears to be working correctly (only in Windows, because of the path
separator).

Any comments to improve?

Code follows:
===========

with Ada.Text_Io;
with Ada.Environment_Variables;
with Ada.Strings.Fixed;
with Ada.Strings.Maps.Constants;

procedure Fip is

package ATIO renames Ada.Text_IO;
package AEV renames Ada.Environment_Variables;
package ASF renames Ada.Strings.Fixed;
package ASMC renames Ada.Strings.Maps.Constants;

Path : string := ASF.Translate(AEV.Value("Path"),
ASMC.Lower_Case_Map);
Match : string := ASF.Translate(ATIO.Get_Line, ASMC.Lower_Case_Map);

procedure FindMatch (Match : in string; Path : in string; StartPos : in
positive; Len : in natural) is
EndPos : positive;
begin
if Len > 0 then -- Ignore case of an unnecessary semi colon
EndPos := StartPos + Len - 1;
if ASF.Index(Source => Path(StartPos .. EndPos), Pattern =>
Match) > 0 then
ATIO.Put_Line(Path(StartPos .. EndPos));
end if;
end if;
end FindMatch;

procedure Match_Path (Match : in string; Path : in string) is
StartPos : positive := 1;
Len : natural := 0;
begin
for I in Path'Range loop
if Path(I) = ';' then
FindMatch(Match, Path, StartPos, Len);
StartPos := I + 1;

Jeffrey R. Carter

unread,
Sep 14, 2016, 3:39:58 PM9/14/16
to
On 09/14/2016 05:57 AM, Arie van Wingerden wrote:
>
> Path : string := ASF.Translate(AEV.Value("Path"), ASMC.Lower_Case_Map);
> Match : string := ASF.Translate(ATIO.Get_Line, ASMC.Lower_Case_Map);

You might want to look at pkg Ada.Characters.Handling, particularly the
functions To_Lower.

Ada is case insensitive, and many of us will run your code through a formatter
to make it look like the code we're familiar with, converting identifiers to
Initial_Caps, and changing CamelCase into difficult to read things like
Findmatch. The recommended practice for Ada is to

> procedure FindMatch (Match : in string; Path : in string; StartPos : in
> positive; Len : in natural) is
> EndPos : positive;
> begin
> if Len > 0 then -- Ignore case of an unnecessary semi colon

This test is unnecessary. If Len = 0, Endpos = Startpos - 1, and Startpos ..
Endpos is a null range. Any array sliced with a null range yields a zero-length
slice; in the case of String, the null string (""). Index will always return 0
if Source is null and Pattern is not.

> EndPos := StartPos + Len - 1;
> if ASF.Index(Source => Path(StartPos .. EndPos), Pattern => Match) >
> 0 then
> ATIO.Put_Line(Path(StartPos .. EndPos));
> end if;
> end if;
> end FindMatch;
>
> procedure Match_Path (Match : in string; Path : in string) is
> StartPos : positive := 1;
> Len : natural := 0;
> begin
> for I in Path'Range loop
> if Path(I) = ';' then

You can use index to find semi-colons in Path

Index (Path, ";")

> FindMatch(Match, Path, StartPos, Len);

Note that you're passing Path and Startpos to Findmatch, which uses Startpos as
an index into Path. The first time you do this, Startpos = 1. In other words,
you're assuming that 1 is a valid index for Path. While this may always be true
for this program, if you need to do something similar in another situation, you
might reuse this code, but pass a value for Path for which it is not true. It
would be better to initialize Startpos to Path'First.

Rather than calculate Endpos and slice Path in Findmatch, why not pass the slice
of Path

procedure Findmatch (Match : in String; Path : in String);

Findmatch (Match => Match, Path => Path (Startpos .. Startpos + Len - 1) );

? As noted above, passing a null string for Path will not be a problem.

If Path doesn't contain ';', your program does nothing. If Path doesn't end with
';', the part of Path from the last semi-colon to the end won't be checked. What
should it do if Match is null?

I suspect this could be implemented more simply and clearly (and correctly)
using Index for the ';' as well as for Match.

--
Jeff Carter
"Since I strongly believe that overpopulation is by
far the greatest problem in the world, this [Soylent
Green] would be my only message movie."
Charleton Heston
123

Olivier Henley

unread,
Sep 16, 2016, 10:43:46 AM9/16/16
to
On a side note:

If you just want to get the job done and do less low level work I suggest you look into GNAT.Spitbol.Patterns. I used it to extract video ids from a Youtube URL in some Ada server code and it saved me. The resulting code is small, clean, maintainable and robust. Now could I have done better by unwrapping my own matching procedure; maybe for speed but not for anything else.

My two cents.

Here is a Gem covering it (toward the end of the article): http://www.adacore.com/adaanswers/gems/gem-26-2/

Arie van Wingerden

unread,
Sep 17, 2016, 12:50:03 PM9/17/16
to
Thx for the extensive review! I'll look into it.

"Jeffrey R. Carter" schreef in bericht news:nrc923$nth$1...@dont-email.me...

Arie van Wingerden

unread,
Sep 17, 2016, 12:56:02 PM9/17/16
to
Thx. Will have a look there ...

"Olivier Henley" schreef in bericht
news:fb5a67b7-0c16-49b9...@googlegroups.com...

John Smith

unread,
Sep 21, 2016, 10:10:12 PM9/21/16
to
Why would you say this? I've found the ease with which you can append or manipulate unbounded strings to be very convenient. Furthermore, I don't need to worry about computing the bound of my string.

From my experience unbounded string are very easy to work with.

But I genuinely am curious about your view on this and why.

Dmitry A. Kazakov

unread,
Sep 22, 2016, 3:25:18 AM9/22/16
to
On 22/09/2016 04:10, John Smith wrote:

> Why would you say this?

Because it is truth? (:-))

> I've found the ease with which you can append or manipulate unbounded
> strings to be very convenient.

1. There is no need to append to a string in no less than 90% of cases.

2. Unbounded_String manipulations are *very* inconvenient. Largely
because of their design flaw in Ada (no array interface), but nonetheless.

> Furthermore, I don't need to worry about
> computing the bound of my string.

Neither do I, I never ever used bounded strings. They are totally
useless, IMO.

> But I genuinely am curious about your view on this and why.

As I said in 90% cases the string is given, it is an input. All strings
derived from the input strings are substrings (or their derivatives).

The use cases like tokenizing are virtually non-existent, they are just
bad programming practices coming from scripting languages which have no
decent support of string characters iteration and from languages
incapable to return strings as a result of function call.

As for 10% when strings are output, e.g. string formatting, unbounded
strings are no more usable. The output length is usually fixed
consisting out of fields. The substrings are aligned in the fields. It
is far easier with plain strings and substrings.

The idea of using Unbounded_String as an accumulator, e.g. for some
messages log, is just awful. It won't work and at the end you will need
to have a special data structure (text buffer) for this (with plain
strings as building blocks).

My method of string processing is sequential scanning / putting fields into:

http://www.dmitry-kazakov.de/ada/strings_edit.htm

Plain strings are ideal for this.

J-P. Rosen

unread,
Sep 22, 2016, 5:01:23 AM9/22/16
to
Le 22/09/2016 à 09:24, Dmitry A. Kazakov a écrit :
> Neither do I, I never ever used bounded strings. They are totally
> useless, IMO.
No, they are not.

A bounded string is perfectly appropriate to represent abstract data
that are /implemented/ by a string.

Typical example:
- you want to represent a first name, a family name, and an address.
- You want them to be different types (to make it impossible to mix
them), although they are all implemented as strings.
- You have an underlying data base that imposes a maximum length for
each of these, but the actual length can be anything up to the maximum

=> Bounded_String is the way to go.

--
J-P. Rosen
Adalog
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00
http://www.adalog.fr

Dmitry A. Kazakov

unread,
Sep 22, 2016, 5:54:52 AM9/22/16
to
On 22/09/2016 11:01, J-P. Rosen wrote:
> Le 22/09/2016 à 09:24, Dmitry A. Kazakov a écrit :
>> Neither do I, I never ever used bounded strings. They are totally
>> useless, IMO.
> No, they are not.
>
> A bounded string is perfectly appropriate to represent abstract data
> that are /implemented/ by a string.
>
> Typical example:
> - you want to represent a first name, a family name, and an address.
> - You want them to be different types (to make it impossible to mix
> them), although they are all implemented as strings.

type First_Name is new String;
type Family_Name is new String;
type Postal_Address is new String;

> - You have an underlying data base that imposes a maximum length for
> each of these, but the actual length can be anything up to the maximum

type Fist_Name is new SQL_C_CHAR;

If you do DB you use DB data types.

BTW, such external constraints, if they exist, are pretty much volatile.
You don't want to make them static that would make the design very fragile.

> => Bounded_String is the way to go.

Never. It is very difficult to find cases where

1. There is a hard upper bound, so hard that it would be feasible to
mold it into the type.

2. There is no cases where the upper bound implies another type. There
is nothing in the first name's length that makes first name different
from the second name. These are just unrelated things.

The major reason for that is that constraint whatever it be is most
likely an implementation detail, which type difference is to reflect the
problem space.

G.B.

unread,
Sep 22, 2016, 6:58:05 AM9/22/16
to
On 22.09.16 11:53, Dmitry A. Kazakov wrote:
> If you do DB you use DB data types.

Whenever the DB allows user defined data types (some do)
and all clients can handle them, then one may consider
using types that reflect the problem domain.

UDDTs, I read, is how things should have been when the DBs
were formalized a little. Today, we may need to learn about the
semantic effects of trailing blanks in VARCHAR of different
vendors' DBMSs, and that's worth money in some places.

So, user defined types may be beneficial in a typical
Ada environment when using a capable DBMS.

J-P. Rosen

unread,
Sep 22, 2016, 7:08:41 AM9/22/16
to
Le 22/09/2016 à 11:53, Dmitry A. Kazakov a écrit :
>> => Bounded_String is the way to go.
>
> Never. It is very difficult to find cases where
>
> 1. There is a hard upper bound, so hard that it would be feasible to
> mold it into the type.
>
> 2. There is no cases where the upper bound implies another type. There
> is nothing in the first name's length that makes first name different
> from the second name. These are just unrelated things.
>
> The major reason for that is that constraint whatever it be is most
> likely an implementation detail, which type difference is to reflect the
> problem space.
>
?? I don't follow you here. I may not have been explicit enough, but of
course each type corresponds to a different instantiation of bounded
strings, wich provide different types with unrelated lengths.

Dmitry A. Kazakov

unread,
Sep 22, 2016, 8:06:29 AM9/22/16
to
On 22/09/2016 13:08, J-P. Rosen wrote:
> Le 22/09/2016 à 11:53, Dmitry A. Kazakov a écrit :
>>> => Bounded_String is the way to go.
>>
>> Never. It is very difficult to find cases where
>>
>> 1. There is a hard upper bound, so hard that it would be feasible to
>> mold it into the type.
>>
>> 2. There is no cases where the upper bound implies another type. There
>> is nothing in the first name's length that makes first name different
>> from the second name. These are just unrelated things.
>>
>> The major reason for that is that constraint whatever it be is most
>> likely an implementation detail, which type difference is to reflect the
>> problem space.
>>
> ?? I don't follow you here. I may not have been explicit enough, but of
> course each type corresponds to a different instantiation of bounded
> strings, wich provide different types with unrelated lengths.

I meant that any non-tagged type can be cloned in Ada, String included.
This is not specific to bounded strings. That a change of maximum length
requires another type is not a feature, it is rather a design flaw.
Clearly bounded strings of different maximum length should be subtypes
of each other, as strings are, unless cloned.

It is thinkable but useless to consider the maximum length a part of the
contract. In most cases it is an implementation detail that does not
deserve new type promotion.

Just my explanation of the empiric fact that practically nobody uses them.

Dmitry A. Kazakov

unread,
Sep 22, 2016, 8:06:42 AM9/22/16
to
Sure they are, all DB interfacing types are user-defined Ada types.

Maciej Sobczak

unread,
Sep 22, 2016, 9:18:13 AM9/22/16
to

> It is very difficult to find cases where
>
> 1. There is a hard upper bound, so hard that it would be feasible to
> mold it into the type.

It is not very difficult to realize that MD5, SHA1, etc. signatures are bounded strings.
In many domains it will be also easy to find that many identifiers and reference numbers are bounded, too.
Social security numbers tend to be bounded as well.
ZIP codes in any given country?

I don't know about you, but I seem to be surrounded by bounded strings.

Splitting strings is inefficient for you? It is very efficient for me to split the string and pass its individual parts to separate tasks for parallel processing. If you are doing the string splitting by traversing the string and storing the tokens aside, it is still splitting, even if you do not call it splitting.

Unbounded_String? Can be very useful as a sink for generated data in whatever format. I do not know up-front how big my XML will be before I store it in the database and I'm OK with the simplified view that the bound is my machine memory size.

Dmitry, you can have strong opinions on various subjects, but it is too easy to disagree with these ones.

--
Maciej Sobczak * http://www.inspirel.com

Dmitry A. Kazakov

unread,
Sep 22, 2016, 9:53:26 AM9/22/16
to
On 22/09/2016 15:18, Maciej Sobczak wrote:
>
>> It is very difficult to find cases where
>>
>> 1. There is a hard upper bound, so hard that it would be feasible to
>> mold it into the type.
>
> It is not very difficult to realize that MD5, SHA1, etc. signatures
> are bounded strings.

It is quite difficult to do. In the application areas these things are
fixed size subtype of either String or Stream_Element_Array. They aren't
an independent new type, for obvious reasons.

> In many domains it will be also easy to find that many identifiers
> and reference numbers are bounded, too.
> Social security numbers tend to be bounded as well.
> ZIP codes in any given country?

Zip code is number in Germany. In other cases it would be a record type
with an enumeration to represent a province and some numbers, never a
bounded string.

> I don't know about you, but I seem to be surrounded by bounded strings.

Do you use Ada bounded strings?

> Splitting strings is inefficient for you? It is very efficient for
> me to split the string and pass its individual parts to separate tasks for
> parallel processing. If you are doing the string splitting by traversing
> the string and storing the tokens aside,

Why would you store them? In the example you gave, I would pass them
straight to a worker task as soon as I get one.

> it is still splitting, even if you do not call it splitting.

Not if substrings get processed, e.g. put into a hash table. The point
is that the operation of splitting as such is pointless because its
result has no value of its own, it is an intermediate which can and must
be dropped for the sake of simplicity and efficiency.

> Unbounded_String? Can be very useful as a sink for generated data in
> whatever format.

Nope. That thing is called stream.

> Dmitry, you can have strong opinions on various subjects, but it is
> too easy to disagree with these ones.

Good, I like when people disagree.

G.B.

unread,
Sep 22, 2016, 10:14:43 AM9/22/16
to
On 22.09.16 14:05, Dmitry A. Kazakov wrote:
> On 22/09/2016 12:58, G.B. wrote:
>> On 22.09.16 11:53, Dmitry A. Kazakov wrote:
>>> If you do DB you use DB data types.
>>
>> (...)
>>
>> So, user defined types may be beneficial in a typical
>> Ada environment when using a capable DBMS.
>
> Sure they are, all DB interfacing types are user-defined Ada types.

Do Ada programmers normally store objects of user defined
types of their own in DBMS? Objects that the DBMS considers atomic,
to become one attribute value of some tuple (column value in some row).
Say, of a private type that exports "=" to the DBMS. These would
be types other than those binding-layer defined types like SQL_INT
etc, which, I take it, you meant by "DB data types".


Maciej Sobczak

unread,
Sep 22, 2016, 10:51:14 AM9/22/16
to
> > I don't know about you, but I seem to be surrounded by bounded strings.
>
> Do you use Ada bounded strings?

I don't use Ada for such applications. :-)
(in a sense, bounded strings in Ada are useless for the majority of programmers...)

Yes, I have used char[N] in "other languages" on various occasions. These are bounded strings.

> > Splitting strings is inefficient for you? It is very efficient for
> > me to split the string and pass its individual parts to separate tasks for
> > parallel processing. If you are doing the string splitting by traversing
> > the string and storing the tokens aside,
>
> Why would you store them? In the example you gave, I would pass them
> straight to a worker task as soon as I get one.

This is a secondary issue. You are still splitting the string.
And no, I would rather store them in a job queue. There is no way to pass them "straight to a worker task" for the simple reason that the worker task is still busy processing previous jobs.

> Not if substrings get processed,

Yes if they are not processed.

Please do not invent artificial argument extensions only to "prove" your point.
Your argument was that bounded, unbounded and splitting are *always* useless. If you have to constrain it as above, then you are already proving yourself to be wrong.

> The point
> is that the operation of splitting as such is pointless because its
> result has no value of its own

No, the result of splitting has a real value of its own. Get the spreadsheet and split it into individual cells - they are still real values. They might even be still useful in another spreadsheet.

> it is an intermediate which can and must
> be dropped for the sake of simplicity and efficiency.

There is no reason to drop a data value that is needed for further processing.

> > Unbounded_String? Can be very useful as a sink for generated data in
> > whatever format.
>
> Nope. That thing is called stream.

Nope. Stream is a high-level I/O abstraction that is not needed here. Progressive accumulation of value is called "append" and Unbounded_String has the right interface for this.

> Good, I like when people disagree.

But sometimes there is no added value.

Dmitry A. Kazakov

unread,
Sep 22, 2016, 1:13:23 PM9/22/16
to
On 2016-09-22 16:51, Maciej Sobczak wrote:
>>> I don't know about you, but I seem to be surrounded by bounded strings.
>>
>> Do you use Ada bounded strings?
>
> I don't use Ada for such applications. :-)
> (in a sense, bounded strings in Ada are useless for the majority of programmers...)
>
> Yes, I have used char[N] in "other languages" on various occasions.
> These are bounded strings.

How char[N] is? It is compatible with char[M].

>>> Splitting strings is inefficient for you? It is very efficient for
>>> me to split the string and pass its individual parts to separate tasks for
>>> parallel processing. If you are doing the string splitting by traversing
>>> the string and storing the tokens aside,
>>
>> Why would you store them? In the example you gave, I would pass them
>> straight to a worker task as soon as I get one.
>
> This is a secondary issue. You are still splitting the string.

No. Splitting is an operation resulting in a set of strings, as a
compound object. Scanning does not produce such objects, though it could
if necessary. The point is that this object is almost never necessary.

> And no, I would rather store them in a job queue.

Job queue is not a set of strings.

>> Not if substrings get processed,
>
> Yes if they are not processed.

They are, as you said, the result is a job queue not a set of strings.

> Please do not invent artificial argument extensions only to "prove" your point.

Just defining terms. If you consider scanning, translating, etc same as
tokenizing because the application in the end does the same, there is
nothing to discuss. Then Ada is C and C is Ada.

>> The point
>> is that the operation of splitting as such is pointless because its
>> result has no value of its own
>
> No, the result of splitting has a real value of its own. Get the
> spreadsheet and split it into individual cells - they are still real
> values. They might even be still useful in another spreadsheet.

String is not spreadsheet and conversely.

>> it is an intermediate which can and must
>> be dropped for the sake of simplicity and efficiency.
>
> There is no reason to drop a data value that is needed for further processing.

Nobody dropped values required processing. Values of tokens /= Value of
a set of tokens. Values are processed without explicit construction of a
set of.

>>> Unbounded_String? Can be very useful as a sink for generated data in
>>> whatever format.
>>
>> Nope. That thing is called stream.
>
> Nope. Stream is a high-level I/O abstraction that is not needed
> here.

You argue for a poor implementation (Unbounded_String) of this abstraction.

> Progressive accumulation of value is called "append" and
> Unbounded_String has the right interface for this.

Yes, "Append" could implements stream's "Write". So what? If the output
is generated in a raw form that is stream. Not frequently used, but
Unbounded_String would be a poor choice, as always. There is no much use
of Unbounded_String. One is passing in-out parameters for a procedure.
Another is having arrays and records with string components.

Dmitry A. Kazakov

unread,
Sep 22, 2016, 1:18:20 PM9/22/16
to
On 2016-09-22 16:14, G.B. wrote:
> On 22.09.16 14:05, Dmitry A. Kazakov wrote:
>> On 22/09/2016 12:58, G.B. wrote:
>>> On 22.09.16 11:53, Dmitry A. Kazakov wrote:
>>>> If you do DB you use DB data types.
>>>
>>> (...)
>>>
>>> So, user defined types may be beneficial in a typical
>>> Ada environment when using a capable DBMS.
>>
>> Sure they are, all DB interfacing types are user-defined Ada types.
>
> Do Ada programmers normally store objects of user defined
> types of their own in DBMS?

Sometimes, existing bindings do not encourage this.

> Objects that the DBMS considers atomic,
> to become one attribute value of some tuple (column value in some row).
> Say, of a private type that exports "=" to the DBMS. These would
> be types other than those binding-layer defined types like SQL_INT
> etc, which, I take it, you meant by "DB data types".

SQL_INT is not observable. The only language type here is SQL_C_INT. If
you have a higher-level abstraction you can have Age or House_Number.
Now bounded string, if any, would be equivalent to SQL_INT, a
non-existent entity.

Maciej Sobczak

unread,
Sep 23, 2016, 1:50:36 AM9/23/16
to

> > Yes, I have used char[N] in "other languages" on various occasions.
> > These are bounded strings.
>
> How char[N] is?

It has bounds.

> It is compatible with char[M].

Then it is unbounded, right?

An array of characters with known bounds is a bounded string. This abstract concept might have different incarnations depending on the target language.

> No. Splitting is an operation resulting in a set of strings

Yes. Alternatively, it might be a sequence.

> as a
> compound object.

This is an arbitrary and unsubstantiated constraint.

> Scanning does not produce such objects, though it could
> if necessary. The point is that this object is almost never necessary.

It might be necessary in a more complete string-processing library, where other operations expect such an object as input. In other library with different API (based on iterators or generators or whatever) it might not be needed. Still, splitting as an abstract concept is the same.

> > And no, I would rather store them in a job queue.
>
> Job queue is not a set of strings.

You are mixing concepts (even the ones which you have arbitrarily introduced).
A set of strings is a result or splitting. Job queue is where I might want to put them. Whether the boundary between these two expects the set as a single object is a matter of interface design. Still, splitting the string and putting its tokens in a job queue is a design-level concept that holds independently on how you implement it.

> >> Not if substrings get processed,
> >
> > Yes if they are not processed.
>
> They are, as you said, the result is a job queue not a set of strings.

Again, mixing your own concepts.

> > Please do not invent artificial argument extensions only to "prove" your point.
>
> Just defining terms. If you consider scanning, translating, etc same as
> tokenizing because the application in the end does the same, there is
> nothing to discuss.

As I have already pointed, sometimes there is no added value from disagreements.

> Then Ada is C and C is Ada.

Interesting implication, but without added value.

> > No, the result of splitting has a real value of its own. Get the
> > spreadsheet and split it into individual cells - they are still real
> > values. They might even be still useful in another spreadsheet.
>
> String is not spreadsheet and conversely.

I have only tried to visualize my argument.

> > Nope. Stream is a high-level I/O abstraction that is not needed
> > here.
>
> You argue for a poor implementation (Unbounded_String) of this abstraction.

The implementation is just right, as this is also what I need as the result, for example in order to do something with it (store in the database?). I don't need the stream here, it is not the intended result and would need to be abandoned after extracting the actual result.

> Yes, "Append" could implements stream's "Write". So what?

So it is useful, because it does not introduce other entities.


Sorry Dmitry, but I don't think that this discussion is interesting for anybody else here.

Simon Wright

unread,
Sep 23, 2016, 2:36:39 AM9/23/16
to
Maciej Sobczak <see.my....@gmail.com> writes:

> An array of characters with known bounds is a bounded string.

It'd be less confusing in an Ada context to call this a *fixed* string.

Dmitry A. Kazakov

unread,
Sep 23, 2016, 3:49:49 AM9/23/16
to
I think that is a *definite* string.

Some nomenclature:

Ada fixed string: bounds are fixed, but could be unknown.

Ada's bounded string: bounded bounds (unintended pun) + an independent
string type (design flaw, arbitrary constraint).

Ada's Unbounded_String: unknown bounds (bounds are limited by the
platform / implementation)

P.S. Neither bounded nor unbounded Ada string is "an array of
characters" (yet another design flaw).

John Smith

unread,
Sep 23, 2016, 7:58:23 PM9/23/16
to
On Thursday, September 22, 2016 at 3:25:18 AM UTC-4, Dmitry A. Kazakov wrote:
> On 22/09/2016 04:10, John Smith wrote:
>
> > I've found the ease with which you can append or manipulate unbounded
> > strings to be very convenient.
>
> 1. There is no need to append to a string in no less than 90% of cases.

The percentage in this case depends on what application you are developing. Sometimes you will need to do this more often, sometimes less often.

>
> 2. Unbounded_String manipulations are *very* inconvenient. Largely
> because of their design flaw in Ada (no array interface), but nonetheless.

Yes, there is that. However, that's not a very real inconvenience. I can easily get the length of the string, get a sub-string, find a sub-string in the main string, etc.

If anything, a fixed string is less convenient since you need to walk on eggshells and cannot simply assign a new value to the existing string.

>
> > Furthermore, I don't need to worry about
> > computing the bound of my string.
>
> Neither do I, I never ever used bounded strings. They are totally
> useless, IMO.
>

When I said bound, I meant the length ;-)

> > But I genuinely am curious about your view on this and why.
>
> As I said in 90% cases the string is given, it is an input. All strings
> derived from the input strings are substrings (or their derivatives).
>

If you need to separate a large string into a bunch smaller ones that do not have a pre-determined size, using a fixed string does not make any sense.

> The use cases like tokenizing are virtually non-existent, they are just
> bad programming practices coming from scripting languages which have no
> decent support of string characters iteration and from languages
> incapable to return strings as a result of function call.
>

Could you please give an example. I'm used to Python and it's trivial to have a function return a string.

> As for 10% when strings are output, e.g. string formatting, unbounded
> strings are no more usable. The output length is usually fixed
> consisting out of fields. The substrings are aligned in the fields. It
> is far easier with plain strings and substrings.
>

When you do need to build a string, it is far easier to have one unbounded string that is added on to and then written out. Having a fixed string means that I woul need something along the lines of a recursive solution, since I can't extent the size of the string after it's been instantiated.

> The idea of using Unbounded_String as an accumulator, e.g. for some
> messages log, is just awful. It won't work and at the end you will need
> to have a special data structure (text buffer) for this (with plain
> strings as building blocks).
>

Why won't it work? I've built HTML files and small reports using an unbouded string and it worked fine.

I don't agree with you on this topic.

Dmitry A. Kazakov

unread,
Sep 24, 2016, 3:52:54 AM9/24/16
to
On 2016-09-24 01:58, John Smith wrote:
> On Thursday, September 22, 2016 at 3:25:18 AM UTC-4, Dmitry A. Kazakov wrote:
>> On 22/09/2016 04:10, John Smith wrote:
>>
>>> I've found the ease with which you can append or manipulate unbounded
>>> strings to be very convenient.
>>
>> 1. There is no need to append to a string in no less than 90% of cases.
>
> The percentage in this case depends on what application you are
> developing. Sometimes you will need to do this more often, sometimes
> less often.

Yes, this "sometimes" is 10% of all "sometimes", or less.

> If anything, a fixed string is less convenient since you need to
> walk on eggshells and cannot simply assign a new value to the existing string.

Which I never have to. If you assign strings reconsider the algorithm,
there is certainly something wrong with it. Strings may be rearranged,
never assigned.

> When I said bound, I meant the length ;-)

That is called fixed string in Ada, I never said they were unusable.

> If you need to separate a large string into a bunch smaller ones
> that do not have a pre-determined size, using a fixed string does not make
> any sense.

I *never* need that. It was discussed already.

> When you do need to build a string, it is far easier to have one
> unbounded string that is added on to and then written out.

No, it is easier with fixed strings.

> Having a
> fixed string means that I woul need something along the lines of a
> recursive solution, since I can't extent the size of the string after
> it's been instantiated.

The output length for formatted output is always known. Normally the
parts of a composite string are statically known, that precludes
iteration or recursion. In other cases, like building a directory path
from some structure, I first calculate the length and then declare the
result string.

>> The idea of using Unbounded_String as an accumulator, e.g. for some
>> messages log, is just awful. It won't work and at the end you will need
>> to have a special data structure (text buffer) for this (with plain
>> strings as building blocks).
>
> Why won't it work?

Because it is a very inefficient data structure for this purpose.
Unbounded_String aren't meant to be efficient for expansion
specifically. They give an overall balanced performance for *all* string
operations. Therefore the implementation will likely use a single buffer
reallocated in some chunks, if you are lucky, or each time, when you are
not, and then copied as a whole. Data structures designed specifically
for accumulation never reallocate anything and certainly never copy the
contents from the beginning. They don't have other string operations
efficient or don't have them at all, which is exactly the point. If you
need accumulator use stream, file, text buffer. You need not strings for
that.

> I've built HTML files and small reports using an unbouded string and
> it worked fine.

The keyword is "small". Unbounded_Strings signify careless design and
careless algorithm choices.

John Smith

unread,
Sep 24, 2016, 12:26:00 PM9/24/16
to
On Saturday, September 24, 2016 at 3:52:54 AM UTC-4, Dmitry A. Kazakov wrote:
> On 2016-09-24 01:58, John Smith wrote:
> > On Thursday, September 22, 2016 at 3:25:18 AM UTC-4, Dmitry A. Kazakov wrote:
> >> On 22/09/2016 04:10, John Smith wrote:
> >>
> >>> I've found the ease with which you can append or manipulate unbounded
> >>> strings to be very convenient.
> >>
> >> 1. There is no need to append to a string in no less than 90% of cases.
> >
> > The percentage in this case depends on what application you are
> > developing. Sometimes you will need to do this more often, sometimes
> > less often.
>
> Yes, this "sometimes" is 10% of all "sometimes", or less.

10 percent of all string operations? Again, I stand by what I said about this being dependent on the application that is being developed.

>
> > If anything, a fixed string is less convenient since you need to
> > walk on eggshells and cannot simply assign a new value to the existing string.
>
> Which I never have to. If you assign strings reconsider the algorithm,
> there is certainly something wrong with it. Strings may be rearranged,
> never assigned.
>

Again, this depends on the application being developed and how the information is flowing inside it.

>
> > If you need to separate a large string into a bunch smaller ones
> > that do not have a pre-determined size, using a fixed string does not make
> > any sense.
>
> I *never* need that. It was discussed already.

If a file is read in (the format is non-standard) and you now need to sift through the details of the file, you will need this.

>
> > When you do need to build a string, it is far easier to have one
> > unbounded string that is added on to and then written out.
>
> No, it is easier with fixed strings.

How?

I've tried going to the example on your website, but it seems that that is down.

>
> > Having a
> > fixed string means that I woul need something along the lines of a
> > recursive solution, since I can't extent the size of the string after
> > it's been instantiated.
>
> The output length for formatted output is always known. Normally the
> parts of a composite string are statically known, that precludes
> iteration or recursion. In other cases, like building a directory path
> from some structure, I first calculate the length and then declare the
> result string.

No, it is almost always unknown in my experience :-). If I'm putting together a report that needs to be e-mailed out, I have no idea how long it will be. I would first need to do my thing, get the string returned from it, append it to an accumulator string and after all of that was done, I can now know its length.

If I were to first run my analysis, get the length of the string (which is the result that will go out), then keep doing this until I'm finished (at which point I will be able to figure out how big my accumulator is supposed to be.) Now, I would have to re-run my analysis again and then copy in the results into the my newly allocated accumulator string. That would make for some needlessly complex logic in my application (as opposed to just dump everything to an unbounded string.)

If there is ever a need to go through the entire string character by character -- something that I never have to do -- I can always use the element function.

>
> >> The idea of using Unbounded_String as an accumulator, e.g. for some
> >> messages log, is just awful. It won't work and at the end you will need
> >> to have a special data structure (text buffer) for this (with plain
> >> strings as building blocks).
> >
> > Why won't it work?
>
> Because it is a very inefficient data structure for this purpose.
> Unbounded_String aren't meant to be efficient for expansion
> specifically. They give an overall balanced performance for *all* string
> operations. Therefore the implementation will likely use a single buffer
> reallocated in some chunks, if you are lucky, or each time, when you are
> not, and then copied as a whole. Data structures designed specifically
> for accumulation never reallocate anything and certainly never copy the
> contents from the beginning. They don't have other string operations
> efficient or don't have them at all, which is exactly the point. If you
> need accumulator use stream, file, text buffer. You need not strings for
> that.
>

How would you use a stream? Again, I can't get to your site, so I don't know if you have an example there or not.

And I agree with you. It does make sense to flush out logging information to a file as soon as possible. I also agree that there is a performance hit when it comes to using Unbounded strings when compared to fixed ones (as to how much depends on various factors that I'm unwilling to jump into.)

However, if I ever need string functionality that I'm used to in Python or C++, Unbounded strings are the only reasonable solution. Yes, I can work have fixed size strings do the same, but then I will have to craft needlessly complex logic to accomplish the same thing. If someone else (or me) needs to spend more time than is necessary in order to understand what I'm doing in my source, then it is an indication -- usually -- that either my design or implementation is flawed (and as we know, perception is reality, especially to humans.)

Dmitry A. Kazakov

unread,
Sep 24, 2016, 1:45:14 PM9/24/16
to
On 2016-09-24 18:25, John Smith wrote:
> On Saturday, September 24, 2016 at 3:52:54 AM UTC-4, Dmitry A. Kazakov wrote:
>> On 2016-09-24 01:58, John Smith wrote:
>>> On Thursday, September 22, 2016 at 3:25:18 AM UTC-4, Dmitry A. Kazakov wrote:
>>>> On 22/09/2016 04:10, John Smith wrote:
>>>>
>>>>> I've found the ease with which you can append or manipulate unbounded
>>>>> strings to be very convenient.
>>>>
>>>> 1. There is no need to append to a string in no less than 90% of cases.
>>>
>>> The percentage in this case depends on what application you are
>>> developing. Sometimes you will need to do this more often, sometimes
>>> less often.
>>
>> Yes, this "sometimes" is 10% of all "sometimes", or less.
>
> 10 percent of all string operations?

10% of *applications*.

The type String does not have operation "append". There is only
concatenation operation "&".

> Again, I stand by what I said
> about this being dependent on the application that is being developed.

Yes. In all applications I developed Unbounded_String was used in far
less than 10% of the cases when strings were involved.

>>> If anything, a fixed string is less convenient since you need to
>>> walk on eggshells and cannot simply assign a new value to the existing string.
>>
>> Which I never have to. If you assign strings reconsider the algorithm,
>> there is certainly something wrong with it. Strings may be rearranged,
>> never assigned.
>
> Again, this depends on the application being developed and how the
> information is flowing inside it.

Sure. I never assign strings as a whole except for initialization purpose.

>>> If you need to separate a large string into a bunch smaller ones
>>> that do not have a pre-determined size, using a fixed string does not make
>>> any sense.
>>
>> I *never* need that. It was discussed already.
>
> If a file is read in (the format is non-standard) and you now need
> to sift through the details of the file, you will need this.

Searching for a pattern / skipping irrelevant parts of the input do not
require that. No need to split anything.

>>> When you do need to build a string, it is far easier to have one
>>> unbounded string that is added on to and then written out.
>>
>> No, it is easier with fixed strings.
>
> How?
>
> I've tried going to the example on your website, but it seems that that is down.

Hmm, it is not down:

http://www.dmitry-kazakov.de/ada/strings_edit.htm
http://www.dmitry-kazakov.de/ada/components.htm#Parsers_etc

>>> Having a
>>> fixed string means that I woul need something along the lines of a
>>> recursive solution, since I can't extent the size of the string after
>>> it's been instantiated.
>>
>> The output length for formatted output is always known. Normally the
>> parts of a composite string are statically known, that precludes
>> iteration or recursion. In other cases, like building a directory path
>> from some structure, I first calculate the length and then declare the
>> result string.
>
> No, it is almost always unknown in my experience :-). If I'm putting
> together a report that needs to be e-mailed out, I have no idea how long
> it will be. I would first need to do my thing, get the string returned
> from it, append it to an accumulator string and after all of that was
> done, I can now know its length.

But a mail body is not a string! If you send a MIME attachment the
natural way to do it is to use a stream as the source. String can be
used, but for short text mails only.

I believe AWS takes streams for attachment. My implementation of SMTP
certainly does:

http://www.dmitry-kazakov.de/ada/components.htm#GNAT.Sockets.SMTP.Client.Attach_Stream

> If I were to first run my analysis, get the length of the string
> (which is the result that will go out), then keep doing this until I'm
> finished (at which point I will be able to figure out how big my
> accumulator is supposed to be.) Now, I would have to re-run my analysis
> again and then copy in the results into the my newly allocated
> accumulator string. That would make for some needlessly complex logic in
> my application (as opposed to just dump everything to an unbounded string.)

I would use a stream on top of a segmented buffer, e.g.

http://www.dmitry-kazakov.de/ada/components.htm#Storage_Streams

Or a FIFO if the producer of the content and the SMTP client are run by
different tasks.

You don't need to build all attachment in the memory in order to be able
to send it.

It is a pity that nobody cares about proper organization of text
processing in these days, even here in c.l.a.

If you like me have 2GB+ of text trace and text editors written by
people believing in "splitting strings", then you have to wait 20
minutes before the editor shows the file contents, 40 first lines. If it
does not crash before and take the OS with. Guess why?

> However, if I ever need string functionality that I'm used to in
> Python or C++, Unbounded strings are the only reasonable solution.

You used to use dynamic strings. But there is no evidence they really
simpler to use. In my code which varies from network protocols to
compiler developing tools and AI you will find almost no
Unbounded_Strings. Would it be simpler with Unbounded_Strings? I doubt it.
Message has been deleted

John Smith

unread,
Sep 24, 2016, 2:37:17 PM9/24/16
to
It seems that the wifi that I'm on is timing out the connection to your entire domain. Yay for parental controls...

Very annoying.

John Smith

unread,
Sep 24, 2016, 2:59:59 PM9/24/16
to
On Saturday, September 24, 2016 at 1:45:14 PM UTC-4, Dmitry A. Kazakov wrote:
> >>> If you need to separate a large string into a bunch smaller ones
> >>> that do not have a pre-determined size, using a fixed string does not make
> >>> any sense.
> >>
> >> I *never* need that. It was discussed already.
> >
> > If a file is read in (the format is non-standard) and you now need
> > to sift through the details of the file, you will need this.
>
> Searching for a pattern / skipping irrelevant parts of the input do not
> require that. No need to split anything.
>

Then how would you accomplish this?

> >>> When you do need to build a string, it is far easier to have one
> >>> unbounded string that is added on to and then written out.
> >>
> >> No, it is easier with fixed strings.
> >
> > How?
> >
> > I've tried going to the example on your website, but it seems that that is down.
>
> Hmm, it is not down:
>
> http://www.dmitry-kazakov.de/ada/strings_edit.htm
> http://www.dmitry-kazakov.de/ada/components.htm#Parsers_etc
>

I'll have a look when I get home.

> >>> Having a
> >>> fixed string means that I woul need something along the lines of a
> >>> recursive solution, since I can't extent the size of the string after
> >>> it's been instantiated.
> >>
> >> The output length for formatted output is always known. Normally the
> >> parts of a composite string are statically known, that precludes
> >> iteration or recursion. In other cases, like building a directory path
> >> from some structure, I first calculate the length and then declare the
> >> result string.
> >
> > No, it is almost always unknown in my experience :-). If I'm putting
> > together a report that needs to be e-mailed out, I have no idea how long
> > it will be. I would first need to do my thing, get the string returned
> > from it, append it to an accumulator string and after all of that was
> > done, I can now know its length.
>
> But a mail body is not a string! If you send a MIME attachment the
> natural way to do it is to use a stream as the source. String can be
> used, but for short text mails only.
>
> I believe AWS takes streams for attachment. My implementation of SMTP
> certainly does:
>
> http://www.dmitry-kazakov.de/ada/components.htm#GNAT.Sockets.SMTP.Client.Attach_Stream
>

Sure it is. This is especially true when it comes to sending out a string e-mail (and a bunch of HTML can be put on a single line.) But the e-mail example is just one that I could think of on a dime.

I'll have a look at what you did when I get home.

> > If I were to first run my analysis, get the length of the string
> > (which is the result that will go out), then keep doing this until I'm
> > finished (at which point I will be able to figure out how big my
> > accumulator is supposed to be.) Now, I would have to re-run my analysis
> > again and then copy in the results into the my newly allocated
> > accumulator string. That would make for some needlessly complex logic in
> > my application (as opposed to just dump everything to an unbounded string.)
>
> I would use a stream on top of a segmented buffer, e.g.
>
> http://www.dmitry-kazakov.de/ada/components.htm#Storage_Streams
>
> Or a FIFO if the producer of the content and the SMTP client are run by
> different tasks.
>
> You don't need to build all attachment in the memory in order to be able
> to send it.
>

I disagree. If I write out the attachment to disk and then e-mail it, that's even more overhead. Why do this? Create a neat package in memory, e-mail it out and then forget about it. Doing this operation in memory would be easier and faster. And I don't have to worry about temporary files or any other files (as well as deleting the files, testing this additional logic, etc.)

> > However, if I ever need string functionality that I'm used to in
> > Python or C++, Unbounded strings are the only reasonable solution.
>
> You used to use dynamic strings. But there is no evidence they really
> simpler to use. In my code which varies from network protocols to
> compiler developing tools and AI you will find almost no
> Unbounded_Strings. Would it be simpler with Unbounded_Strings? I doubt it.
>

When I get home, I'll have a look at what you've written.

Dmitry A. Kazakov

unread,
Sep 25, 2016, 4:50:13 AM9/25/16
to
On 2016-09-24 20:59, John Smith wrote:
> On Saturday, September 24, 2016 at 1:45:14 PM UTC-4, Dmitry A. Kazakov wrote:

>> Searching for a pattern / skipping irrelevant parts of the input do not
>> require that. No need to split anything.
>
> Then how would you accomplish this?

In a loop over the string. There are many algorithms for pattern
matching and many pattern languages, e.g. wildcards, regular
expressions, SNOBOL/SPITBOL etc.

BTW splitting does work anyway if search is not anchored, e.g. you need
to find first occurrence of a pattern.

>> I would use a stream on top of a segmented buffer, e.g.
>>
>> http://www.dmitry-kazakov.de/ada/components.htm#Storage_Streams
>>
>> Or a FIFO if the producer of the content and the SMTP client are run by
>> different tasks.
>>
>> You don't need to build all attachment in the memory in order to be able
>> to send it.
>
> I disagree. If I write out the attachment to disk and then e-mail
> it, that's even more overhead. Why do this? Create a neat package in memory,
> e-mail it out and then forget about it. Doing this operation in memory
> would be easier and faster. And I don't have to worry about temporary
> files or any other files (as well as deleting the files, testing this
> additional logic, etc.)

No, the idea is that the generator of the attachment sends parts of as
soon as they are ready. It is a pipeline. The SMTP stack sits on TCP/IP
stream that sends chunks away.

The stream used as pipeline between the producer and consumer need not
to be able to hold all attachment. Its internal FIFO buffer can be
limited, the producer is blocked when the buffer is full and consumer is
when the buffer is empty.

But my point is that Unbounded_String is almost always a poor substitute
for more clever and efficient data structures / algorithms. The only
excuse to use the former is programmer's laziness.

brbar...@gmail.com

unread,
Sep 25, 2016, 7:35:34 PM9/25/16
to
I use bounded strings in all of my applications. They're useful for avoiding
buffer overflows. Also, I use the Bounded_Strings.Append function for a variety
of purposes. I expect Dimitry wouldn't like putting Bounded_Strings into
Direct_IO files, where they're useful for fields in records that act like
variable length character strings in databases. My applications are probably
outside of the realm that Dimitry usually encounters.

Bruce B.

Dmitry A. Kazakov

unread,
Sep 26, 2016, 3:29:20 AM9/26/16
to
On 26/09/2016 01:35, brbar...@gmail.com wrote:

> I expect Dimitry wouldn't like putting Bounded_Strings into
> Direct_IO files, where they're useful for fields in records that act like
> variable length character strings in databases.

Surely not. Bounded string is a private type (A.4.4).

You *never* use such for any external activity, that would make your
program non-portable and thus the DB too.

Therefore a text field of Direct_IO element record should always be a
subtype of String and never bounded or unbounded string.

BTW, in many cases it is better to use Direct_IO rather as a block I/O
medium. You never know if all application level objects would fit into a
single block. My implementation of external B-trees keeps tree nodes in
Direct_IO blocks with indices pointing to the blobs spread over other
blocks arranged as a persistent memory pool. So there is no limitation
to how large an item could be and no space loss of underfilled blocks.

brbar...@gmail.com

unread,
Sep 26, 2016, 8:39:06 AM9/26/16
to
I wouldn't copy a Direct_IO file from my Windows machine to my
Linux box and expect to use it without reconstructing the file. But then
I'm not working in a corporate environment with heterogeneous systems.
I've got enough problems with getting the math in the algorithms to work
properly, whether doing critical path calculations or Bayesian analysis
of user properties.

Thanks for the interest, even so.

Bruce B.

Randy Brukardt

unread,
Sep 28, 2016, 4:55:39 PM9/28/16
to
"Maciej Sobczak" <see.my....@gmail.com> wrote in message
news:625e3d31-1a45-4308...@googlegroups.com...

>Sorry Dmitry, but I don't think that this discussion is interesting for
>anybody else here.

I find it interesting, to see you defend the indefensible. ;-)

Needless to say, I agree with Dmitry. For instance, Split (in terms of a
string type, which is what is being talked about) is an operation that takes
one string and produces several. (Getting substrings is a different
operation altogether.) You very rarely want an actual Split operation (which
is all Dmitry said initially); you're much more likely to use some sort of
substring of the original string.

Ada bounded strings are nearly useless; as Dmitry said, you rarely have a
hard bound and the fact that all of the objects of the type have to have the
same bound just compounds the problem. Just forget them and use either
Unbounded_Strings or String.

And the Ada.Strings packages are filled with cool subprograms that are
almost never used. Even if you remember one exists, you have to go look it
up to figure out the name and parameters and usage -- by that time, you
could have written the operation yourself with basic operations (admittedly,
with a slightly higher chance of introducing bugs).

I wrote my spam filter exclusively with unbounded strings in part to
demostrate how much easier that would be. Needless to say, that failed
miserably. I did get the experience I was looking for, but it was mostly
bad. :-)

There is one use for unbounded strings that comes up in the spam filter that
probably couldn't have been usefully programmed any other way. One wants to
search the text of messages for problem phrases unencombered by irrelvant
stuff (like encodings and line endings). I did that by putting the decoded
(and stripped of HTML markup) text into an unbounded string, and then search
that. That is way cheaper than searching multiple lines and somehow allowing
blanks to be matched by line ends as well as separately dealing with
decoding and markup.

Still, this seems to be a rare case; I've never encountered anything else
like it. Most of the operations I encounter would be better off written (in
Ada, no one here should care about any other language. ;-) using plain
String. The extra work is usually not that much (and most of the cool
operations are available for plain String in Ada.Strings.Fixed, in case one
is exacly what is needed - I use Index and Find_Token a lot).

Randy.


Randy Brukardt

unread,
Sep 28, 2016, 5:09:39 PM9/28/16
to
"John Smith" <yoursurr...@gmail.com> wrote in message
news:e60176b6-4c61-474a...@googlegroups.com...
On Saturday, September 24, 2016 at 3:52:54 AM UTC-4, Dmitry A. Kazakov
wrote:
...
> However, if I ever need string functionality that I'm used to in Python or
> C++,

Ada doesn't have it (sadly).

I used to tell people to use Unbounded_Strings, but that was before I tried
to build an application that way. (I discussed that further in a different
message in this thread). You have to revert to String in many places (which
is aggrevating [Ada 2005 did eliminate a couple of them, but it doesn't help
much], one has to look up the names of the operations beyond the most basic,
and if you are sensible use-adverse person (:-)), the names you have to use
are nasty (Ada.Strings.Unbounded.To_Unbounded_String ("Foo") -- gack!!!)

To get that Python functionality in Ada, you have to be willing to stand on
your head, and once you are willing to do that, you might as well use String
for most everything other than storage of variable-lengthed strings. (And
it's not that hard to do the latter for String as well, but that gets into
manual storage management that everyone gets wrong.) All of the operations
of Unbounded_String are also available in Ada.Strings.Fixed (unless, of
course, they're builtin operations like indexing and slicing), so there's no
need to reinvent most of those wheels.

Randy.

Björn Lundin

unread,
Sep 30, 2016, 3:59:25 AM9/30/16
to
On 2016-09-28 23:09, Randy Brukardt wrote:
> "John Smith" <yoursurr...@gmail.com> wrote in message
> news:e60176b6-4c61-474a...@googlegroups.com...
> On Saturday, September 24, 2016 at 3:52:54 AM UTC-4, Dmitry A. Kazakov
> wrote:
> ...
>> However, if I ever need string functionality that I'm used to in Python or
>> C++,
>
> Ada doesn't have it (sadly).
>
> I used to tell people to use Unbounded_Strings, but that was before I tried
> to build an application that way. (I discussed that further in a different
> message in this thread). You have to revert to String in many places (which
> is aggrevating [Ada 2005 did eliminate a couple of them, but it doesn't help
> much], one has to look up the names of the operations beyond the most basic,
> and if you are sensible use-adverse person (:-)), the names you have to use
> are nasty (Ada.Strings.Unbounded.To_Unbounded_String ("Foo") -- gack!!!)
>

Recently I needed to process xml files, and do a bit of string
manipulation with the results. I agree that Unbounded_String is clumpsy
(name-wise) and that Ada.Strings.Fixed has useful operations.
I ended up wrapping what I need in a new object - String_Object -
with naming I know will trigger people here :-)


with Ada.Strings; use Ada.Strings;
with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;


package Repository_Types is
type String_Object is tagged private;

procedure Set(Self : in out String_Object; What : String);
procedure Reset(Self : in out String_Object);
function Fix_String( Self : String_Object) return String;
function UBString( Self : String_Object) return Unbounded_String;
function Lower_Case( Self : String_Object) return String;
function Upper_Case( Self : String_Object) return String ;
function Empty_String_Object return String_Object;
procedure Append(Self : in out String_Object; What : String);
function Camel_Case(Self : String_Object) return String ;
procedure Delete_Last_Char(Self : in out String_Object);

function "<"( Left, Right : String_Object) return Boolean;
function "="( Left, Right : String_Object) return Boolean;
function ">"( Left, Right : String_Object) return Boolean;

private
type String_Object is tagged record
Value ,
Camel_Case_Cache,
Lower_Case_Cache,
Upper_Case_Cache: Unbounded_String := Null_Unbounded_String;
end record;

end Repository_Types;


To me, it turned out to be quite useful
--
--
Björn
0 new messages