I'm trying to learn Ada, coming from a C background. The first thing I
planned to code is a S-expression parser, because it's quite easy (at
least in C, less than 1000 lines including comments and memory
handling (dynamic arrays reinvented)) and very useful considering
almost all my existing programs use S-expressions as a serialization
format.
To describe briefly S-expressions, I consider them to be the simplest
existing data organization beyond raw sequences of bits. They are
basically lists of elements, each element being either a list or an
atom, and atoms being raw sequences of bits.
While I'm still not deep enough into Ada to know how to represent the
lists, I guess there won't be major issues, I think I can handle it
myself (though pointers and hints would still be welcome).
My question here is about how to represent the atoms in Ada. In C it
was merely a void pointer and a size, but it seems more difficult in
Ada because of the strong typing. Because it's up to the application
to make sense (i.e. type) out of the raw sequences of bits in atoms,
the S-expression library has to handle them as a sort of untyped
memory chunk. Do you know of a way to handle that?
Please correct me if I'm wrong, but my guess would be that the S-
expression library would provide a Sexp_Atom type, which refers to the
untyped memory chunks, and the application would have procedures to
convert back and forth between Sexp_Atom and whatever types it
internally uses (i.e. serialization and deserialization procedures).
However the library would provide these procedures for the most common
types (e.g. strings and numeric types).
Though it looks like a fine Ada API (at least to my eyes), I have
absolutely no idea about how to implement the library. How to define
the application-opaque Sexp_Atom type? How to read Sexp_Atom objects
from a file? How to build them from Ada types? How to write them back
to disk?
Thanks for your help,
Natacha
> To describe briefly S-expressions, I consider them to be the simplest
> existing data organization beyond raw sequences of bits. They are
> basically lists of elements, each element being either a list or an
> atom, and atoms being raw sequences of bits.
So, it is a tree?
> While I'm still not deep enough into Ada to know how to represent the
> lists, I guess there won't be major issues, I think I can handle it
> myself (though pointers and hints would still be welcome).
>
> My question here is about how to represent the atoms in Ada. In C it
> was merely a void pointer and a size, but it seems more difficult in
> Ada because of the strong typing. Because it's up to the application
> to make sense (i.e. type) out of the raw sequences of bits in atoms,
How can it make sense if the type is unknown? If the type is known, why not
to state it?
> the S-expression library has to handle them as a sort of untyped
> memory chunk. Do you know of a way to handle that?
There are many way to do it.
1. Static polymorphism, generics in Ada. The type of the leaves is the
formal parameter of the package.
2. Dynamic polymorphism.
2.a. The type of a leaf is class wide, each leaf is derived from some
abstract base type. This requires referential approach, i.e. pointers.
2.b. The type of a leaf is a variant type. This is more limiting, but can
be by-value.
> Please correct me if I'm wrong, but my guess would be that the S-
> expression library would provide a Sexp_Atom type, which refers to the
> untyped memory chunks, and the application would have procedures to
> convert back and forth between Sexp_Atom and whatever types it
> internally uses (i.e. serialization and deserialization procedures).
I would not do that.
> However the library would provide these procedures for the most common
> types (e.g. strings and numeric types).
>
> Though it looks like a fine Ada API (at least to my eyes), I have
> absolutely no idea about how to implement the library. How to define
> the application-opaque Sexp_Atom type?
It is not clear why do you need such a type. What are the properties of,
and what is it for, given the application need to convert it anyway?
As for raw memory addresses, it is possible to reinterpret them to any
desired type in Ada, as you would do it in C, provided you know what are
you doing. For this you can use so-called address-to-access conversion (see
RM 13.7.2) or placement attribute 'Address (see 13.3(11)).
> How to read Sexp_Atom objects from a file?
See stream I/O attributes of types (RM 13.13.2). Otherwise, standard design
patterns apply too. In any case you have to know the object's type in order
to write/read it.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
In Ada, you normally model blobs with
System.Storage_Elements.Storage_Array; since arrays are first-class
citizens (as opposed to C's void pointers), you do not need to carry the
length of such an array separately. Thus, a naive approach might be:
type Sexp_Atom is access System.Storage_Elements.Storage_Array;
type Sexp;
type Sexp_Access is access Sexp;
type Sexp is record
Car : Sexp_Atom;
Cdr : Sexp_Access;
end record;
However, the purpose of S-Expressions being to be read and written as
text, a blob may not be the most appropriate; you might be better off
with simply:
type Sexp;
type Sexp_Access is access Sexp;
type Sexp is
Car : Ada.Strings.Unbounded.Unbounded_String;
Cdr : Sexp_Access;
Is_List : Boolean;
end record;
To write a sexp to disk and read back, you would leverage the Ada
streams as Dmitry pointed out.
You could then provide a generic package that serializes an arbitrary
type T back and forth to the unbounded_string.
--
Ludovic Brenta.
Yes, it can be thought of as a binary tree whose leaves are labeled
(the
atom being the label) and with different semantics for the left and
right
children. I'm unsure this represent correctly empty lists, but
otherwise
that's it.
> > My question here is about how to represent the atoms in Ada. In C it
> > was merely a void pointer and a size, but it seems more difficult in
> > Ada because of the strong typing. Because it's up to the application
> > to make sense (i.e. type) out of the raw sequences of bits in atoms,
>
> How can it make sense if the type is unknown? If the type is known, why not
> to state it?
Actually the type is deduced from the context, e.g.
(tcp-connect (host foo.example) (port 80))
We've got here five atoms and three lists, and the application reading
as a
configuration file would know the atom after "host" is a string to be
resolved and the atom after port is a number serialized in a decimal
representation. However this serialization of a number is an
application
choice (which I usually choose to make S-expressions text files) but
it could
have been serialized as a network-ordered 2-byte integer.
This means that as far as the S-expression library is concerned, the
byte
sequence read from the file is not typed. Its typing actually has to
be
delayed until the application has enough context to interpret it.
Hence
the need of a typeless chunk of data.
> > Though it looks like a fine Ada API (at least to my eyes), I have
> > absolutely no idea about how to implement the library. How to define
> > the application-opaque Sexp_Atom type?
>
> It is not clear why do you need such a type. What are the properties of,
> and what is it for, given the application need to convert it anyway?
Well, I imagined making such a new type to distinguish 'raw data
coming
from a S-expression to be interpreted' from other raw data types.
I thought this was the strong type safety to prevent (de)serialization
procedure from trying to interpret just any chunk of memory.
> As for raw memory addresses, it is possible to reinterpret them to any
> desired type in Ada, as you would do it in C, provided you know what are
> you doing. For this you can use so-called address-to-access conversion (see
> RM 13.7.2) or placement attribute 'Address (see 13.3(11)).
Ok, thanks for this references, I guess I'll find there how many
guarantees
there are on such raw memory inspection/interpretation. I've already
encountered
a few situations in C where the code looks very dangerous but where I
know
perfectly well what I'm doing (I also sort-of expected it would be
easier to
convince other people I really know what's going on and what I'm doing
in such
cases when the code is in Ada rather than in C).
You might very well be able to use something like:
package Byte_Lists is new Ada.Containers.Vectors (Index_Type => Positive,
Element_Type => System.Storage_Elements.Storage_Element);
type Content_ID is (Atom, List);
type S_Expression;
type S_Expression_Ptr is access all S_Expression;
type S_Expression_Element (Content : Content_ID := Atom) is record
case Content is
when Atom =>
Byte : Byte_Lists.Vector;
when List =>
Ptr : S_Expression_Ptr;
end case;
end record;
package S_Expression_Lists is new Ada.Containers.Doubly_Linked_Lists
(Element_Type => S_Expression_Element);
type S_Expression is new S_Expression_Lists.List;
If you can use unbounded strings as Brenta suggested, instead of an unbounded
array of bytes (Storage_Element), then this would be even simpler.
--
Jeff Carter
"People called Romanes, they go the house?"
Monty Python's Life of Brian
79
--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---
> On Aug 1, 2:53 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> How can it make sense if the type is unknown? If the type is known, why not
>> to state it?
>
> Actually the type is deduced from the context, e.g.
> (tcp-connect (host foo.example) (port 80))
Hmm, why do you consider brackets as separate elements? I mean, more
natural would be "procedural":
tcp-connect (host (foo.example), port (80))
or
tcp-connect (foo.example, 80)
> This means that as far as the S-expression library is concerned, the byte
> sequence read from the file is not typed. Its typing actually has to be
> delayed until the application has enough context to interpret it. Hence
> the need of a typeless chunk of data.
The above is mere syntax, it is not clear why internal/external
representation must be as you described. (Actually, the structures as above
are widely used in compiler construction e.g. syntax tree, Reverse Polish
notation etc.)
There is no such thing as untyped data. The information about the type must
be somewhere. In your example it could be:
type Connect is new Abstract_Node with record
Host : IP_Address;
Port : Port_Type;
end record;
So I would derive leaves from some abstract base type and link them into an
acyclic graph. It is relatively simple to do using either standard Ada
containers or other libraries containing trees and/or graphs.
>>> Though it looks like a fine Ada API (at least to my eyes), I have
>>> absolutely no idea about how to implement the library. How to define
>>> the application-opaque Sexp_Atom type?
>>
>> It is not clear why do you need such a type. What are the properties of,
>> and what is it for, given the application need to convert it anyway?
>
> Well, I imagined making such a new type to distinguish 'raw data coming
> from a S-expression to be interpreted' from other raw data types.
But you know the type if you are going to interpret raw data. It must be
somewhere, hard-coded or dynamically stored/restored.
BTW, Ada supports I/O of types (i.e type tags) implicitly in the form of
already mentioned stream attributes, but also explicitly by external type
tag representations (RM 3.9(7/2)) and generic dispatching constructors (RM
3.9(18/1)).
> I thought this was the strong type safety to prevent (de)serialization
> procedure from trying to interpret just any chunk of memory.
No, actually it eases serialization because you can define
Serialize/Unserialize operations on the type.
Thanks a lot for the example, it really looks like what I'm used to
(at least in C).
I have to admit I don't really grasp the difference between the vector
you use and the Storage_Array, but it seems I have to research it by
myself before asking here.
> If you can use unbounded strings as Brenta suggested, instead of an unbounded
> array of bytes (Storage_Element), then this would be even simpler.
Actually I'm a bit reluctant to use Ada strings for atoms, because I'm
afraid it might somehow interpret the data (e.g. locale or encoding
related stuff, or trouble with NUL inherited from C). I occasionally
include binary data in S-expressions, and while I keep it on the disk
as a text file (using hexadecimal or base-64 encoding), it is the S-
expression library's responsibility to load it into memory as the
original binary data.
On the other hand, most of my atoms are indeed strings, and Character
definition from A.1.35 looks exactly like a perfect mapping to bytes.
So if Ada strings have no issue with embedded NUL or non-graphics
character, and if binary data can be losslessly recovered once stored
into an Ada string, it could be the best type for atoms. It will
probably be a while before I reach the level of reading the Reference
Manual cover-to-cover, does anyone knows whether those "if"s are
guaranteed by the standard?
The vector manages its own memory, you can grow and shrink it at will.
With a Storage_Array you must do that manually by allocating,
reallocating and freeing as needed.
>> If you can use unbounded strings as Brenta suggested, instead of an
>> unbounded array of bytes (Storage_Element), then this would be even
>> simpler.
>
> Actually I'm a bit reluctant to use Ada strings for atoms, because I'm
> afraid it might somehow interpret the data (e.g. locale or encoding
> related stuff, or trouble with NUL inherited from C).
No, they won't. Ada does not need NUL since it has proper arrays.
> I occasionally include binary data in S-expressions, and while I keep
> it on the disk as a text file (using hexadecimal or base-64 encoding),
> it is the S- expression library's responsibility to load it into
> memory as the original binary data.
I would still store the S-expressions in memory as Unbouded_Strings,
read straight from the file (i.e. in the hexadecimal or base-64
encoding). Then I would provide conversion subprograms for each data
type as needed on top of that.
> On the other hand, most of my atoms are indeed strings, and Character
> definition from A.1.35 looks exactly like a perfect mapping to bytes.
> So if Ada strings have no issue with embedded NUL or non-graphics
> character, and if binary data can be losslessly recovered once stored
> into an Ada string, it could be the best type for atoms. It will
> probably be a while before I reach the level of reading the Reference
> Manual cover-to-cover, does anyone knows whether those "if"s are
> guaranteed by the standard?
While Ada strings indeed have no problems with embedded NULs or
non-ASCII characters (the character set is Latin-1, not ASCII), it is
unwise to use character strings to store things that are not characters.
Like I said, you should see the character representation as the
low-level (storage) representation, then provide conversion subprograms
for the few non-String data types that you need.
--
Ludovic Brenta.
> I would still store the S-expressions in memory as Unbouded_Strings,
> read straight from the file (i.e. in the hexadecimal or base-64
> encoding).
Strings is preferable to Unbouded_Strings in all cases when string content
is not changed, once the string is created. This is 90% of all cases,
including this one.
Storage_Array is a simple array; all instances of the type have a fixed size. So
to have a dynamic size, you'd have to do dynamic allocation and memory
management yourself. Vectors is an unbounded array abstraction, with all
allocation and memory management hidden from the client, and tested by many users.
> On the other hand, most of my atoms are indeed strings, and Character
> definition from A.1.35 looks exactly like a perfect mapping to bytes.
> So if Ada strings have no issue with embedded NUL or non-graphics
> character, and if binary data can be losslessly recovered once stored
> into an Ada string, it could be the best type for atoms. It will
> probably be a while before I reach the level of reading the Reference
> Manual cover-to-cover, does anyone knows whether those "if"s are
> guaranteed by the standard?
String is just an array of Character; there's nothing special about it. No
characters have special significance. Unbounded_String is just an unbound
variant of String, as the vector would be an unbounded variant of Storage_Array.
However, if you store non-Character data, then it would be more appropriate to
use a vector of Storage_Element. Possibly you might want to recognize your
extensive use of strings by having 3 variants, one for String, one for List, and
one for anything else.
Because that's the definition of S-expressions :-)
I'm referring to this almost-RFC: http://people.csail.mit.edu/rivest/Sexp.txt
> I mean, more natural would be "procedural":
>
> tcp-connect (host (foo.example), port (80))
>
> or
>
> tcp-connect (foo.example, 80)
Those are also perfectly valid ways of serializing the same
information, but they are simply not S-expressions.
I use S-expression as a sort of universal text-based container, just
like most people use XML these days, except S-expressions can easily
embed binary data and they are greatly simpler (as I said, ~1000 lines
of commented 80-column C code with a few reinvented wheels embedded).
> > This means that as far as the S-expression library is concerned, the byte
> > sequence read from the file is not typed. Its typing actually has to be
> > delayed until the application has enough context to interpret it. Hence
> > the need of a typeless chunk of data.
>
> The above is mere syntax, it is not clear why internal/external
> representation must be as you described. (Actually, the structures as above
> are widely used in compiler construction e.g. syntax tree, Reverse Polish
> notation etc.)
>
> There is no such thing as untyped data. The information about the type must
> be somewhere. In your example it could be:
>
> type Connect is new Abstract_Node with record
> Host : IP_Address;
> Port : Port_Type;
> end record;
It is indeed somewhere, but beyond the reach of a S-expression
library: its belongs to the application using the library. The record
you show here can't be part of a generic S-expression handling
library.
In my example, as far as the S-expression library is concerned, it
looks like:
"(<some_stuff_1> (<some_stuff_2> <some_stuff_3>) (<some_stuff_4>
<some_stuff_5>))"
The library is not supposed to care about what those some_stuff_ are.
Actually, the library is suppose to get binary data from the
application along with the tree structure described above, store it
into a sequence of bytes (on a disk or over a network or whatever),
and to retrieve from the byte sequence the original tree structure
along with the binary data provided.
But now that I think about it, I'm wondering whether I'm stuck in my C
way of thinking and trying to apply it to Ada. Am I missing an Ada way
of storing structured data in a text-based way?
> > I thought this was the strong type safety to prevent (de)serialization
> > procedure from trying to interpret just any chunk of memory.
>
> No, actually it eases serialization because you can define
> Serialize/Unserialize operations on the type.
I honestly don't understand what you mean here.
Am I misusing the word "serialization" to describe the process of
converting an object from an internal representation (e.g. an integer)
to a byte sequence (e.g. the ASCII string of its decimal
representation, or the byte sequence in memory, both of them being two
acceptable but different serializations, corresponding to different
trade-offs (portability and readability vs space and time
efficiency))?
Thanks for making me think,
Natacha
> My question here is about how to represent the atoms in Ada. In C it
> was merely a void pointer and a size, but it seems more difficult in
> Ada because of the strong typing. Because it's up to the application
> to make sense (i.e. type) out of the raw sequences of bits in atoms,
> the S-expression library has to handle them as a sort of untyped
> memory chunk. Do you know of a way to handle that?
Maybe there is a reasonably simple way of a different
kind---one that emphasizes the type system and leaves
distinguishing types to the compiler and the plain old
run-time system.
First, to represent an atom, I'd choose Ada's limited types,
since they imply no copying and no predefined equality
operators: Each object of such a type will be unique
by definition of "limited". Then I'd derive types for symbols,
numbers, ..., from this type. And use construction functions
to make symbols, numbers, ...
Second, to perform I/O of objects of such a type (Atom
or derived from Atom; also List) via textual representation
of S-expressions, Ada's Generic_Dispatching_Constructor seems
ideal: textual representations of S-expressions will include
a tag that guides the I/O routines. This is completely portable,
and transparent too, unlike 'Input and 'Output attributes or
similar.
A.k.a. "Object factory functions", there is more in the Rationale
http://www.adaic.com/standards/05rat/html/Rat-2-6.html
Some ad hoc definitions and a test case for the Lisp stuff,
just to illustrate the idea involving limited types
(I couldn't resist):
with Ada.Finalization; use Ada;
package S_Expressions is
pragma Preelaborate (S_Expressions);
--
-- Provide definitions typical of Lisp (ad hoc)
--
type EQ_Type is limited Interface;
-- Force EQ of Common Lisp.
function EQ (Left, Right : EQ_Type) return Boolean is abstract;
-- Objects of the same kind may be compared in specific ways,
-- e.g. by numeric value.
function "=" (Left, Right: EQ_Type'Class) return Boolean;
-- General comparison operation for each object in the system,
-- dispatches to EQ if Left'Tag = Right'Tag. False otherwise.
subtype Defined is Finalization.Limited_Controlled;
-- Every object in the system is unique and has no "=",
-- since all objects are limited. Also adds hooks for
-- garbage collection.
--
-- Root of everything not a list
--
type Atom is abstract new Defined and EQ_Type with private;
-- Symbols, Numbers, Character strings, ...
function EQ (Left, Right : Atom) return Boolean is abstract;
-- Kinds of atoms, with construction functions:
type Symbol(<>) is new Atom with private;
function Make_Symbol (Name : String) return Symbol;
type Numeric is range 0 .. 666; -- or something better suited
type Number(<>) is new Atom with private;
function Make_Number (Initial : Numeric) return Number;
--
-- Lists, construction and access to elements:
--
type List is new Defined and EQ_Type with private;
Nil: constant Symbol;
function Cons (Car, Cdr : Defined'Class) return List;
type Ptr is access constant Defined'Class;
-- read-only access to elements of lists
function Car (L : List) return Ptr;
function Cdr (L : List) return Ptr;
private
--
-- Equality functions of all types that need to implement EQ_Type
--
function EQ (Left, Right : Symbol) return Boolean;
function EQ (Left, Right : Number) return Boolean;
function EQ (Left, Right : List) return Boolean;
-- ...
type Atom is abstract new Defined and EQ_Type with null record;
type Symbol_Chars is access constant String;
type Symbol(Name : Symbol_Chars) is new Atom with null record;
Nil_Chars : aliased constant String := "";
Nil: constant Symbol := Symbol'(Atom with Name => Nil_Chars'Access);
type Number(Value : Numeric) is new Atom with null record;
type List is new Defined and EQ_Type with
record
Info : Ptr;
Next : Ptr;
end record;
end S_Expressions;
with S_Expressions; use S_Expressions;
procedure S_Test is
A: Symbol := Make_Symbol("Foo");
B: Symbol := Make_Symbol("Foo");
M: Number := Make_Number(42);
N: Number := Make_Number(42);
S : List := Cons(Make_Number(1),
Cons(Make_Number(2),
Cons(Make_Number(3), Nil)));
Sx : List := Cons(Make_Number(1),
Cons(Make_Number(2),
Cons(Make_Number(3), Nil)));
begin
if A /= B then -- same name!!
raise Program_Error;
end if;
if M /= N then -- same value!!
raise Program_Error;
end if;
if A = N then -- different kinds of atoms!!
raise Program_Error;
end if;
if S = Sx then -- different lists (even though same values)!!
raise Program_Error;
end if;
if S /= S then -- same object!!
raise Program_Error;
end if;
declare
Cadr : constant Ptr := Car (List( Cdr (S).all));
begin
if Number(Cadr.all) /= Make_Number(2) then -- same value!!
raise Program_Error;
end if;
end;
end S_Test;
with Ada.Tags;
with System;
package body S_Expressions is
-- Comparison
function EQ (Left, Right : Symbol) return Boolean is
begin
return Left.Name.all = Right.Name.all;
end EQ;
function EQ (Left, Right : Number) return Boolean is
begin
return Left.Value = Right.Value;
end EQ;
function EQ (Left, Right : List) return Boolean is
use type System.Address;
begin
return Left'Address = Right'Address;
end EQ;
function "=" (Left, Right: EQ_Type'Class) return Boolean is
use type Ada.Tags.Tag;
begin
if Left'Tag /= Right'Tag then
return False;
else
return EQ (Left, Right);
end if;
end "=";
-- Making atoms
function Make_Number (Initial : Numeric) return Number is
begin
return Number'(Defined with
Value => Initial);
end Make_Number;
function Make_Symbol (Name : String) return Symbol is
begin
return Symbol'(Defined with
Name => new String'(Name));
end Make_Symbol;
-- Lists
function Cons (Car, Cdr : Defined'Class) return List is
begin
return List'(Defined with
Info => Car'Unchecked_Access,
Next => Cdr'Unchecked_Access);
end Cons;
function Car (L : List) return Ptr is
begin
return L.Info;
end Car;
function Cdr (L : List) return Ptr is
begin
return L.Next;
end Cdr;
end S_Expressions;
-- Georg
with Ada.Finalization; use Ada;
package S_Expressions is
pragma Preelaborate (S_Expressions);
--
-- Provide definitions typical of Lisp (ad hoc; no Lisp expert
-- here.)
--
type EQ_Type is limited Interface;
function EQ (Left, Right : EQ_Type) return Boolean is abstract;
-- objects of the same kind may be compared in specific ways
function "=" (Left, Right: EQ_Type'Class) return Boolean;
-- General comparison operation for each object in the system,
-- dispatches to EQ if Left'Tag = Right'Tag.
Nil: constant Symbol;
Nil_Chars : aliased constant String := ""; -- (static)
end S_Expressions;
raise Program_Error;
end if;
if S /= S then
In creating the LISP interpreter, most use:
For outside environment to internal:
Word or Numeric Text (Text_IO) --> Tokenizer --> Internal Text tree
--> Internal List tree
If the I/O is to become a VS to your program use the Direct_IO package
to create a Indexed binary file where the words and string phases would
be stored in one file while the internal access pointers would be stored
in the second.
Internal Text tree <--> Direct_IO ( Text_Sexp )
Internal List tree <--> Direct_IO ( List_Sexp )
In this design, short term (online) memory storage the access pointer
are still valid. In long term storage you would need to remap the
List tree to map the Text tree using indexes.
So, for long time storage it is better to use the Text_IO, and rebuild the
outside environment version of the structures.
Internal Word tree --> unTokenizer --> Word or Numeric Text (Text_IO)
Internal List tree -->
> On Aug 1, 8:49 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Sun, 1 Aug 2010 10:35:17 -0700 (PDT), Natacha Kerensikova wrote:
>>> On Aug 1, 2:53 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
>>> wrote:
>>>> How can it make sense if the type is unknown? If the type is known, why not
>>>> to state it?
>>
>>> Actually the type is deduced from the context, e.g.
>>> (tcp-connect (host foo.example) (port 80))
>>
>> Hmm, why do you consider brackets as separate elements?
>
> Because that's the definition of S-expressions :-)
OK, why S-expressions, then? (:-))
> I'm referring to this almost-RFC: http://people.csail.mit.edu/rivest/Sexp.txt
I see, yet another poor data format.
> I use S-expression as a sort of universal text-based container, just
> like most people use XML these days,
These wonder me too. I see no need in text-based containers for binary
data. Binary data aren't supposed to be read by people, they read texts.
And conversely the machine shall not read texts, it has no idea of good
literature...
>>> This means that as far as the S-expression library is concerned, the byte
>>> sequence read from the file is not typed. Its typing actually has to be
>>> delayed until the application has enough context to interpret it. Hence
>>> the need of a typeless chunk of data.
>>
>> The above is mere syntax, it is not clear why internal/external
>> representation must be as you described. (Actually, the structures as above
>> are widely used in compiler construction e.g. syntax tree, Reverse Polish
>> notation etc.)
>>
>> There is no such thing as untyped data. The information about the type must
>> be somewhere. In your example it could be:
>>
>> type Connect is new Abstract_Node with record
>> Host : IP_Address;
>> Port : Port_Type;
>> end record;
>
> It is indeed somewhere, but beyond the reach of a S-expression
> library: its belongs to the application using the library.
Yes
> The record
> you show here can't be part of a generic S-expression handling
> library.
It need not to be. Abstract_Node should declare Serialize and Unserialize
operations, which Connect does implement. The library makes dispatching
calls. Of course you can declare something like:
type List_Of_Anything is new Abstract_Node with
-- Container of class-wide objects
and implement its Serialize and Unserialize through walking the items of an
calling its Serialize and Unserialize. BTW, this is how attributes 'Input
and 'Output work with arrays.
> In my example, as far as the S-expression library is concerned, it
> looks like:
> "(<some_stuff_1> (<some_stuff_2> <some_stuff_3>) (<some_stuff_4>
> <some_stuff_5>))"
That is no matter. Serialize/Unserialize will use this format if they have
to.
> The library is not supposed to care about what those some_stuff_ are.
> Actually, the library is suppose to get binary data from the
> application along with the tree structure described above, store it
> into a sequence of bytes (on a disk or over a network or whatever),
> and to retrieve from the byte sequence the original tree structure
> along with the binary data provided.
Serialize can yield a binary chunk, but if you have to write it anyway, why
not to write text? You insisted on having text, why do mess with binary
stuff?
> But now that I think about it, I'm wondering whether I'm stuck in my C
> way of thinking and trying to apply it to Ada. Am I missing an Ada way
> of storing structured data in a text-based way?
I think yes. Though it is not Ada-specific, rather commonly used OOP design
patterns.
> This means that as far as the S-expression library is concerned, the
> byte sequence read from the file is not typed. Its typing actually has
> to be delayed until the application has enough context to interpret
> it. Hence the need of a typeless chunk of data.
[...]
> Well, I imagined making such a new type to distinguish 'raw data
> coming from a S-expression to be interpreted' from other raw data
> types. I thought this was the strong type safety to prevent
> (de)serialization procedure from trying to interpret just any chunk of
> memory.
This sounds really like an alternative implementation of Ada Stream
attributes - ARM 13.13,
http://www.adaic.com/standards/05rm/html/RM-13-13.html .
The idea of a stream is that the data is written (perhaps to a file,
perhaps over a network, perhaps to memory) in suce a way that it can be
retrieved later by the same or another program.
type R is record
I : Integer;
F : Float;
end record;
procedure Read
(Stream : not null access Ada.Streams.Root_Stream_Type'Class;
Item : out R);
procedure Write
(Stream : not null access Ada.Streams.Root_Stream_Type'Class;
Item : R);
for R'Read use Read;
for T'Write use Write;
then implement Read and Write as you wish.
But as you will see the medium - the Stream above - is just a bunch of
bytes in a file or a network packet. If you send a Foo and I read it
hoping it'll be a Bar, we're going to have a problem.
I guess, given the above, we could start the Sexp with "(R ...", then
the recipient would raise Constraint_Error if the type expected didn't
match the data being read.
This is like the mechanism standard Streams (in GNAT, at any rate) use
for transferring tagged and classwide types; the introductory segment
names the provided type and compiler magic creates the proper result.
[Team, I seem to remember an '05 feature that would support this? or was
that my imagination?]
>> I use S-expression as a sort of universal text-based container, just
>> like most people use XML these days,
>
> These wonder me too. I see no need in text-based containers for binary
> data. Binary data aren't supposed to be read by people, they read texts.
> And conversely the machine shall not read texts, it has no idea of good
> literature...
The whole idea of machine readable Ada source text is silly, right?
Right.
1. Ada sources are read by the compiler.
2. Sources need not to be text file based. Compare it with the word
processor, an application the whole purpose of which is to produce texts,
that aren't text-based... (:-))
> [Team, I seem to remember an '05 feature that would support this? or was
> that my imagination?]
Yes, I think you are referring to Ada.Tags.Generic_Dispatching_Constructor
For an usage example see:
http://www.adacore.com/2007/11/26/ada-gem-19/
Pascal.
--
--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://www.obry.net - http://v2p.fr.eu.org
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver keys.gnupg.net --recv-key F949BD3B
> Simon,
>
>> [Team, I seem to remember an '05 feature that would support this? or was
>> that my imagination?]
>
> Yes, I think you are referring to Ada.Tags.Generic_Dispatching_Constructor
>
> For an usage example see:
>
> http://www.adacore.com/2007/11/26/ada-gem-19/
>
> Pascal.
That was it, thanks.
Am I right that this really comes into its own with classwide types?
First, because this is the existing format in most of my programs, so
for interoperability I can choose only between using this format or
converting from and to it (or rewrite everything). In both cases there
is a strong code-reuse argument in favor of writing a S-expression
library instead of writing a bunch of ad-hoc similar code chunks.
Second, because I like this format and I find it good (see below).
> > I'm referring to this almost-RFC:http://people.csail.mit.edu/rivest/Sexp.txt
> I see, yet another poor data format.
Could you please explain why it is so poor?
I consider it good because it is flexible, expressive and simple. I
already mentioned quite a few times why it looks simple: my parser
written from scratch in ~1000 lines. I looks expressive, because most
data structures I can think of, and all data structures I have
actually used, can be easily represented: an array can be written down
as a list, a hash table as a list of two-element lists (key then
value), and so on. And I see flexibility coming from the fact that any
sequence of bytes can be encoded in an atom.
What am I missing?
> > I use S-expression as a sort of universal text-based container, just
> > like most people use XML these days,
>
> These wonder me too. I see no need in text-based containers for binary
> data. Binary data aren't supposed to be read by people, they read texts.
> And conversely the machine shall not read texts, it has no idea of good
> literature...
Actually this an interesting remark, it made me realize I'm mixing two
very different use of a data format (though I still think S-
expressions are adequate for both of them):
I think text-based format is very useful when the file has to be dealt
with by both humans and programs. The typical example would be
configuration files: read and written by humans, and used by the
program. And that's where I believe XML is really poor, because it's
too heavy for human use. I occasionally feel the need of embedding
binary data in some configuration files, e.g. cryptographic keys or
binary initialization sequences to send as-is over whatever
communication channel. In these occasion I do use the text-based
binary encoding allowed by S-expressions, base-64 or hexadecimal, so
that the configuration file is still a text file. The huge advantage
of text files here is that there is already a lot of tools to deal
with it, while using a binary format would require writing specific
tools for humans to deal with it, with is IMO a waste of time compared
to the text-based approach.
The other application is actual serialization, i.e. converting
internal types into a byte sequence in order to be stored on disk or
transmitted over a network or whatever. In this situation, humans
don't need to interact with the data (except for debugging purposes,
but it's an argument so light it's only justified when everything else
is otherwise equal).
In my previous posts I have talked a lot about serialization, while my
actual use of S-expression is more often the first one. And
historically I first used S-expressions for configuration files,
because of their expressiveness over all other text format I know,
while still being extremely simple. I then used this format for
serialization mostly for code reuse sake: I did have a code for S-
expressions, so it was very cheap to use it for serialization
purposes. Compared to using another serialization format, it leads to
less code being more used, hence less opportunities to write bugs and
more opportunities to find and fix bugs. So it seems like a very
rational choice.
> > The library is not supposed to care about what those some_stuff_ are.
> > Actually, the library is suppose to get binary data from the
> > application along with the tree structure described above, store it
> > into a sequence of bytes (on a disk or over a network or whatever),
> > and to retrieve from the byte sequence the original tree structure
> > along with the binary data provided.
>
> Serialize can yield a binary chunk, but if you have to write it anyway, why
> not to write text? You insisted on having text, why do mess with binary
> stuff?
I hope the above already answered this: I mostly use S-expressions for
text purposes, yet occasionally I feel the need of embedding binary
data. Of course there are a lot of way to testify binary data, like
hexadecimal or base-64, but considering S-expressions already handle
textification, I don't see the point of having the application deal
with it too.
> > But now that I think about it, I'm wondering whether I'm stuck in my C
> > way of thinking and trying to apply it to Ada. Am I missing an Ada way
> > of storing structured data in a text-based way?
>
> I think yes. Though it is not Ada-specific, rather commonly used OOP design
> patterns.
I heard people claiming that the first language shapes the mind of
coders (and they continue saying a whole generation of programmers has
been mind-crippled by BASIC). My first language happened to be 386
assembly, that might explain things. Anyway, I genuinely tried OOP
with C++ (which I dropped because it's way too complex for me (and I'm
tempted to say way too complex for the average coder, it should be
reserved to the few geniuses actually able to fully master it)), but I
never felt the need of anything beyond what can be done with a C
struct containing function pointers.
Now back to the topic, thanks to your post and some others in this
thread (for which I'm also thankful), I came to realize my mistake is
maybe wanting to parse S-expressions and atom contents separately. The
problem is, I just can't manage to imagine how to go in a single step
from the byte sequence containing a S-expression describing multiple
objects to the internal memory representation and vice-versa.
Thanks for your help and your patience,
Natacha
> On Aug 1, 11:13 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Sun, 1 Aug 2010 13:06:10 -0700 (PDT), Natacha Kerensikova wrote:
>>> On Aug 1, 8:49 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
>>> wrote:
>>>> Hmm, why do you consider brackets as separate elements?
>>> Because that's the definition of S-expressions :-)
>> OK, why S-expressions, then? (:-))
>
> First, because this is the existing format in most of my programs, so
> for interoperability I can choose only between using this format or
> converting from and to it (or rewrite everything). In both cases there
> is a strong code-reuse argument in favor of writing a S-expression
> library instead of writing a bunch of ad-hoc similar code chunks.
Legacy stuff also. That is a valid argument.
>>> I'm referring to this almost-RFC:http://people.csail.mit.edu/rivest/Sexp.txt
>> I see, yet another poor data format.
>
> Could you please explain why it is so poor?
>
> I consider it good because it is flexible, expressive and simple. I
> already mentioned quite a few times why it looks simple: my parser
> written from scratch in ~1000 lines. I looks expressive, because most
> data structures I can think of, and all data structures I have
> actually used, can be easily represented: an array can be written down
> as a list, a hash table as a list of two-element lists (key then
> value), and so on. And I see flexibility coming from the fact that any
> sequence of bytes can be encoded in an atom.
>
> What am I missing?
The requirements.
One cannot judge a format without knowing what is the purpose of. Most of
the formats like S-expressions are purposeless, in the sense that there is
no *rational* purpose behind them. As you wrote above, it is either legacy
(we have to overcome some limitations of some other poorly designed
components of the system) or personal preferences (some people like angle
brackets others do curly ones).
>>> I use S-expression as a sort of universal text-based container, just
>>> like most people use XML these days,
>>
>> These wonder me too. I see no need in text-based containers for binary
>> data. Binary data aren't supposed to be read by people, they read texts.
>> And conversely the machine shall not read texts, it has no idea of good
>> literature...
>
> Actually this an interesting remark, it made me realize I'm mixing two
> very different use of a data format (though I still think S-
> expressions are adequate for both of them):
>
> I think text-based format is very useful when the file has to be dealt
> with by both humans and programs. The typical example would be
> configuration files: read and written by humans, and used by the
> program.
There should be no configuration files at all. The idea that a
configuration can be edited using a text editor is corrupt.
> And that's where I believe XML is really poor, because it's
> too heavy for human use. I occasionally feel the need of embedding
> binary data in some configuration files, e.g. cryptographic keys or
> binary initialization sequences to send as-is over whatever
> communication channel. In these occasion I do use the text-based
> binary encoding allowed by S-expressions, base-64 or hexadecimal, so
> that the configuration file is still a text file. The huge advantage
> of text files here is that there is already a lot of tools to deal
> with it, while using a binary format would require writing specific
> tools for humans to deal with it, with is IMO a waste of time compared
> to the text-based approach.
All these tools are here exclusively to handle poor formats of these files.
They add absolutely nothing to the actual purpose of configuration, namely
to handle the *semantics* of the given configuration parameter. None
answers simple questions like: How do I make the 3-rd button on the left
4cm large? Less than none verify the parameter values.
The king is naked.
> The other application is actual serialization,
That should not be a text.
>>> But now that I think about it, I'm wondering whether I'm stuck in my C
>>> way of thinking and trying to apply it to Ada. Am I missing an Ada way
>>> of storing structured data in a text-based way?
>>
>> I think yes. Though it is not Ada-specific, rather commonly used OOP design
>> patterns.
>
> I heard people claiming that the first language shapes the mind of
> coders (and they continue saying a whole generation of programmers has
> been mind-crippled by BASIC). My first language happened to be 386
> assembly, that might explain things.
I see where mixing abstraction layers comes from...
> Anyway, I genuinely tried OOP
> with C++ (which I dropped because it's way too complex for me (and I'm
> tempted to say way too complex for the average coder, it should be
> reserved to the few geniuses actually able to fully master it)), but I
> never felt the need of anything beyond what can be done with a C
> struct containing function pointers.
Everything is Turing-complete you know... (:-))
> The
> problem is, I just can't manage to imagine how to go in a single step
> from the byte sequence containing a S-expression describing multiple
> objects to the internal memory representation and vice-versa.
You need not, that is the power of OOP you dislike so much. Consider each
object knows how to construct itself from a stream of octets. It is trivial
to simple objects like number. E.g. you read until the octets are '0'..'9'
and generate the result interpreting it as a decimal representation. Or you
take four octets and treat them as big-endian binary representation etc.
For a container type, you call the constructors for each container member
in order. If the container is unbounded, e.g. has variable length, you read
its bounds first or you use some terminator in the stream to mark the
container end. For containers of dynamically typed elements you must learn
the component type before you construct it.
In the theory this is called the recursive descent parser, the simplest
thing ever.
Why can't there be general-purpose format, just like there are general-
purpose programming languages?
Can we at least agree on the fact that a sequence of bytes is a
general-purpose format, widely used for storing and transmitting data?
(this question is just a matter of vocabulary)
Now byte sequences are a very crude format, because it doesn't have
any semantics besides what the application specifically put into it.
So let's add as few semantics as possible, to keep as much generality
as possible. We end up with a bunch of byte sequences, whose semantics
are still left to the application, linked together by some kind of
semantic link. When the chosen links are "brother of" and "sublist of"
you get exactly S-expressions. The almost-RFC I linked is only this
definition along with a standardization of how to serialize the links
and the bytes sequences.
This is undoubtedly still a crude format. You might argue it's useless
to add so little semantics on top of byte sequences, and that
serialization should be engineered only when you what you are about to
serialize, i.e. make a much larger leap between byte sequences and
meaningful objects. I might even agree on a philosophical point of
view.
However from a purely practical point of view, and using the fact that
in my background languages (C and 386 asm) bytes sequences and strings
are so similar, these crude semantics are all I need (or at least, all
I've ever needed so far). Now if we agree that simplicity is a
desirable quality (because it leads to less bugs, more productivity,
etc), I still fail to see the issues of such a format.
When I mentioned earlier flexibility as a strong point for S-
expressions, I meant that just like byte sequences, they can
accommodate whatever purpose you might want to put on top of them.
Now regarding personal preferences about braces, I have to admit I'm
always shocked at the way so many people dismiss S-expressions on
first sight because of the LISP-looking parentheses. I'm very glad I
can have a higher-level conversation about them.
> > The other application is actual serialization,
>
> That should not be a text.
I tend to agree with that as a generality, however I believe some
particular cases might benefit from a text-based serializations, in
order to harness the power of existing text-based tools.
> >>> But now that I think about it, I'm wondering whether I'm stuck in my C
> >>> way of thinking and trying to apply it to Ada. Am I missing an Ada way
> >>> of storing structured data in a text-based way?
>
> >> I think yes. Though it is not Ada-specific, rather commonly used OOP design
> >> patterns.
>
> > I heard people claiming that the first language shapes the mind of
> > coders (and they continue saying a whole generation of programmers has
> > been mind-crippled by BASIC). My first language happened to be 386
> > assembly, that might explain things.
>
> I see where mixing abstraction layers comes from...
Could you please point me where I am mixing what? I genuinely want to
learn, but I just don't understand what you're referring to.
> > Anyway, I genuinely tried OOP
> > with C++ (which I dropped because it's way too complex for me (and I'm
> > tempted to say way too complex for the average coder, it should be
> > reserved to the few geniuses actually able to fully master it)), but I
> > never felt the need of anything beyond what can be done with a C
> > struct containing function pointers.
>
> Everything is Turing-complete you know... (:-))
I should know, I was accessing from assembly some (DirectX) C++
objects' vtable array before I knew anything about OOP.
My point is, most of my (currently non-OOP) code can be expressed as
well in an OOP style. When I defined a C structure along with a bunch
of functions that perform operations on it, I'm conceptually defining
a class and its methods, only expressed in a non-OOP language. I
sometimes put function pointers in the structure, to have a form of
dynamic dispatch or virtual methods. I occasionally even leave the
structure declared but undefined publicly, to hide internals (do call
that encapsulation?), along with functions that could well be called
accessors and mutators. In my opinion that doesn't count as OOP
because it doesn't use OOP-specific features like inheritance.
> > The
> > problem is, I just can't manage to imagine how to go in a single step
> > from the byte sequence containing a S-expression describing multiple
> > objects to the internal memory representation and vice-versa.
>
> You need not, that is the power of OOP you dislike so much.
I don't dislike at all. I just seldom think that way. I've met people
who saw objects everywhere, while I tend to see bunch of bits
everywhere. As a part of demoscene said when I learned programming,
"100% asm, a way of life".
> Consider each
> object knows how to construct itself from a stream of octets. It is trivial
> to simple objects like number. E.g. you read until the octets are '0'..'9'
> and generate the result interpreting it as a decimal representation. Or you
> take four octets and treat them as big-endian binary representation etc.
> For a container type, you call the constructors for each container member
> in order. If the container is unbounded, e.g. has variable length, you read
> its bounds first or you use some terminator in the stream to mark the
> container end. For containers of dynamically typed elements you must learn
> the component type before you construct it.
That's exactly how I use S-expressions, except that instead of
starting from a stream of octets, I start from an array of octets
(whose length is known), but if I understood correctly that doesn't
change your point. And the reason why I started this thread is only to
know how to buffer into memory the arrays of octets, because I need
(in my usual use of S-expressions) to resolve the links between atoms
before I can know the type of atoms. So I need a way to delay the
typing, and in the meantime handle data as a generic byte sequence
whose only known information is its size and its place in the S-
expression tree. What exactly is so bad with that approach?
I hope I don't bother you too much with my noobity and my will to
understand,
Natacha
> On Aug 7, 10:39 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> One cannot judge a format without knowing what is the purpose of. Most of
>> the formats like S-expressions are purposeless, in the sense that there is
>> no *rational* purpose behind them. As you wrote above, it is either legacy
>> (we have to overcome some limitations of some other poorly designed
>> components of the system) or personal preferences (some people like angle
>> brackets others do curly ones).
>
> Why can't there be general-purpose format, just like there are general-
> purpose programming languages?
An interesting question. I would say no, there cannot be such formats. Any
presentation format is of course a language. The only difference to the
true programming languages is in complexity, any maybe in a tendency to
being declarative rather than imperative. There are border cases like
Postscript, which IMO illustrate the point, more general purpose it has to
be, less "format" it would become.
> Can we at least agree on the fact that a sequence of bytes is a
> general-purpose format, widely used for storing and transmitting data?
> (this question is just a matter of vocabulary)
I don't think so. Namely it don't think that "general" is a synonym to
"completeness." It is rather about the abstraction level under the
condition of completeness.
> So let's add as few semantics as possible, to keep as much generality
> as possible. We end up with a bunch of byte sequences, whose semantics
> are still left to the application, linked together by some kind of
> semantic link. When the chosen links are "brother of" and "sublist of"
> you get exactly S-expressions.
Yes, the language of S-expressions is about hierarchical structures of
elements lacking any semantics.
I see no purpose such descriptions. But this is a very old and bearded
issue. The same question arise when it is discussed why RDBMS are so
boring. For the same reason: a naked structure, be it relational,
hierarchical, whichever, is useless without the semantics. The semantics
when dealt with, is capable to catch such simple relationships as "sibling"
with no efforts. DB people believe that one could bridge the gap and
somehow come to the semantics from the structure's side. Translated into
your S-expressions, it is by putting a proper pattern of opening and
closing brackets one could describe everything...
> However from a purely practical point of view, and using the fact that
> in my background languages (C and 386 asm) bytes sequences and strings
> are so similar, these crude semantics are all I need (or at least, all
> I've ever needed so far).
Lower you descend down the abstraction levels less differences you see.
Everything is a bunch of transistors...
> Now if we agree that simplicity is a
> desirable quality (because it leads to less bugs, more productivity,
> etc), I still fail to see the issues of such a format.
Programs in 386 Assembler are sufficiently more complex than programs in
Ada. Simplicity of nature by no means implies simplicity of use.
> Now regarding personal preferences about braces, I have to admit I'm
> always shocked at the way so many people dismiss S-expressions on
> first sight because of the LISP-looking parentheses.
Do you mean LISP does not deserve its fame? (:-))
>>>>> But now that I think about it, I'm wondering whether I'm stuck in my C
>>>>> way of thinking and trying to apply it to Ada. Am I missing an Ada way
>>>>> of storing structured data in a text-based way?
>>
>>>> I think yes. Though it is not Ada-specific, rather commonly used OOP design
>>>> patterns.
>>
>>> I heard people claiming that the first language shapes the mind of
>>> coders (and they continue saying a whole generation of programmers has
>>> been mind-crippled by BASIC). My first language happened to be 386
>>> assembly, that might explain things.
>>
>> I see where mixing abstraction layers comes from...
>
> Could you please point me where I am mixing what?
Encoding, representation, states, behavior, values, objects, everything is
a sequence of bytes, so?
> My point is, most of my (currently non-OOP) code can be expressed as
> well in an OOP style. When I defined a C structure along with a bunch
> of functions that perform operations on it, I'm conceptually defining
> a class and its methods, only expressed in a non-OOP language. I
> sometimes put function pointers in the structure, to have a form of
> dynamic dispatch or virtual methods. I occasionally even leave the
> structure declared but undefined publicly, to hide internals (do call
> that encapsulation?), along with functions that could well be called
> accessors and mutators. In my opinion that doesn't count as OOP
> because it doesn't use OOP-specific features like inheritance.
I disagree because in my view this is all what OO is about. OO is not about
the tools (OOPL), it is about the way of programming.
>>> The
>>> problem is, I just can't manage to imagine how to go in a single step
>>> from the byte sequence containing a S-expression describing multiple
>>> objects to the internal memory representation and vice-versa.
>>
>> You need not, that is the power of OOP you dislike so much.
>
> I don't dislike at all. I just seldom think that way. I've met people
> who saw objects everywhere, while I tend to see bunch of bits
> everywhere.
Yes, I know them too. I don't believe that everything is object. But I do
believe in abstract type systems, that every object in a well-designed
program must have a type and that type shall describe the behavior as
precise as possible.
> And the reason why I started this thread is only to
> know how to buffer into memory the arrays of octets, because I need
> (in my usual use of S-expressions) to resolve the links between atoms
> before I can know the type of atoms. So I need a way to delay the
> typing, and in the meantime handle data as a generic byte sequence
> whose only known information is its size and its place in the S-
> expression tree. What exactly is so bad with that approach?
Nothing wrong when at the implementation level. However I don't see why
links need to be resolved first. In comparable cases - I do much messy
protocol/communication stuff - I usually first restore objects and then
resolve links.
Capers Jones, the function-point person, collected statistics on function points
in various languages. One statistic was the average number of LOC to implement a
function point. Function points may or may not be a good metric, but the values
fell into 3 groups, which he labeled low-level languages (assembler and C),
mid-level (FORTRAN, Pascal), and high-level (Ada).
Thus, there is a big difference in abstraction between C and Ada. C is about
translating the problem into the capabilities of the solution language; Ada is
(or should be) about modeling the problem in the software. C is about coding:
mapping everything onto a small set of predefined representations. Ada is about
SW engineering: creating useful abstractions that represent important aspects of
the problem.
Effectively using Ada requires a different way of thinking from using C or
assembler. You'll more quickly gain that mindset by asking how to approach
specific problems in Ada, rather than how to do something you did in C.
I think there may be an analogy between languages and data representation
formats: everything can be implemented in machine code, but that doesn't mean
it's a good language for development. Similarly, everything may be represented
as a sequence of bytes, but that doesn't mean it's an appropriate representation
for an application.
An S-expression library might be a useful thing at a low-level, but few
applications should call it directly. Instead, there should probably be at least
one layer of abstraction on top of the low-level storage representation, so that
the application only deals with appropriate application-level representations of
the data.
Here, your problem seems to be how to have a human-readable storage format for
your applications. While there is merit to discussing the pros and cons of the
many existing formats to use, the Ada approach is to use an application-specific
abstraction, hiding the implementation detail of which such format is eventually
chosen.
--
Jeff Carter
"I blow my nose on you."
Monty Python & the Holy Grail
03
Funnily, you're the first one shaking my resolve to learn Ada.
Let's get everything straight: I'm amateur. I'm coding for fun. Well,
I'll so be coding for a living too, but then I won't have a say in the
language chosen. I like coding in C, and I don't care how efficient it
is. There is not waste of time in a leisure activity. I'd rather have
fun with C rather than doing the same thing 10x faster without fun.
The only thing bothering me in C is that I often end up using
dangerous construct. For example, *(struct foo **)((char *)whatever +
bar.offset). While I'm perfectly fine with that, because I'm confident
in what I'm doing, but I can understand it looks sloppy from the
outside. My main motivation to learn Ada is publicize the concerns for
robustness and correctness that might not be obvious from my C code. I
was hoping to do Ada whatever I used to be doing in C: network
programming, DS homebrew, etc.
Am I misguided? Should I stop now?
> Effectively using Ada requires a different way of thinking from using C or
> assembler. You'll more quickly gain that mindset by asking how to approach
> specific problems in Ada, rather than how to do something you did in C.
Ok, so let's have a look at the grand picture. My main objective right
now is to code a webserver in Ada. Yes, that's reinventing the wheel,
but it does wonders for learning.
Here is how I intended to do it, admittedly exactly like I would do it
in C, could you please tell me how far I am from the Ada approach?
I start by dividing the work into "modules" or "components", each
containing a structure or a few related structures, along with
whatever functions or procedures to deal with them. I thought this
would map perfectly into Ada packages.
Configuration files are S-expressions, in the form of (key value)
pairs, gathered in sections like (section-name (key value) (key value))
… Webpage templates are also S-expressions, in the form "raw-
html" (function arg1 arg2 …) "raw-html" (function) "raw-html". The
interesting thing being that arg1 arg2 etc are S-expressions and thus
can be subtemplates too.
As I'm more comfortable using components already coded and tested, I
would code them from the lowest to the highest level:
- first a component dealing with S-expression I/O, hence this topic.
- then a component for configuration, which use the S-expression
library and is used by other components either for program-wide
configuration variables or for instance specific configuration
- then a network component, gluing the rest of my program with
whatever socket library I will use (AdaSockets or GNAT.stuff or C
interfacing or whatever, don't know yet)
- then a HTTP parsing component, taking data from the network
component and configuration
- then a general page component, dispatching requests to the relevant
page objects
- then a raw file component, a specific page responding to HTTP
request with data taken directly from a file
- then a template component, interpreting the function calls from S-
expression templates
- then a templated page component, another specialization of a page
object, dealing with HTTP response and containing instance-specific
data used by the template component.
And that should be about it, I might encounter the need for other
components, maybe for network I/O multiplexing or for logging or for
caching templates etc.
So, how bad is it?
> An S-expression library might be a useful thing at a low-level, but few
> applications should call it directly. Instead, there should probably be at least
> one layer of abstraction on top of the low-level storage representation, so that
> the application only deals with appropriate application-level representations of
> the data.
I have to admit in the above I don't really know what belongs to a
library and what belongs to the application. But indeed, a S-
expression package is a low-level thing, I'm well aware of that, I
just can begin with high-level stuff if I don't have strong and tested
low-level stuff to build upon.
However the point of coding so many separate components is to be able
to change the internals of one without having to touch everything
else. Should I find someday a format so much better than S-
expressions, I would only have one component to change. Should I want
different formats for configuration and templates, that's a component
to add, and maybe little modifications to configuration and/or
template modules. And so on…
> Here, your problem seems to be how to have a human-readable storage format for
> your applications. While there is merit to discussing the pros and cons of the
> many existing formats to use, the Ada approach is to use an application-specific
> abstraction, hiding the implementation detail of which such format is eventually
> chosen.
Is that so different than what I explained above?
One can do anything you can do in C in Ada. Better, since creating buffer
overflow and signed-integer overflow vulnerabilities takes effort in Ada, while
they're the default in C. (Virtually every "important security update" I see for
Linux is a buffer overflow or signed-integer overflow vulnerability. I doubt if
people are creating these on purpose. My conclusion is that it is impossible in
practice to use C without creating these errors.)
> Ok, so let's have a look at the grand picture. My main objective right
> now is to code a webserver in Ada. Yes, that's reinventing the wheel,
> but it does wonders for learning.
>
> Here is how I intended to do it, admittedly exactly like I would do it
> in C, could you please tell me how far I am from the Ada approach?
It's hard to comment meaningfully, since you mostly describe your intended
implementation, not your requirements.
I'd use Ada Web Server (AWS), and perhaps you should try that, too. Using a
significant, existing, high-level Ada application framework like that might help
introduce you to how some experienced Ada people thought this kind of thing
should be approached.
A "web server" can be a variety of things, from a simple page server that serves
static files to a highly-dynamic system generating everything on the fly. It
appears that you intend something that serves static files and expanded page
templates.
Initially, I'd observe that the system talks to the network and to the permanent
storage that stores the configuration information, static pages, and so on. So
my initial decomposition would identify interface modules for communicating with
these. (This is an "edges-in" approach.)
At a higher level, there is something that responds to incoming requests to
serve the appropriate responses. There's something this uses to obtain the
configuration from the permanent storage. This could make use of something that
can serve a static page and something that can serve an expanded page template.
There's also clearly a place for something that expands a page template.
I'm doing this off the top of my head, so I won't be surprised if I've missed
something or otherwise screwed up.
This identifies the major high-level modules in the system. I could now define
the package specifications for them and have the compiler check that they are
syntactically correct and semantically consistent. Then I could pick one and
design its implementation. It's likely at some point tasking would be involved,
allowing the processing of multiple requests at once, so this would all have to
be done keeping concurrency in mind.
At some point I'd get to a low enough level to start thinking about
representations, which seems to be where you begin your thinking about the problem.
> As I'm more comfortable using components already coded and tested, I
> would code them from the lowest to the highest level:
In Ada, one can create package specifications, then create other units that make
use of those specifications before they are implemented. This is an important
concept in Ada called the separation of specification and body. Sometimes it is
useful to create stub bodies for such packages, which can then be used to test
the units that make use of these packages. Thus it is often possible to
implement and test higher-level modules before lower-level modules that they use
have been implemented. This may not be especially useful on a single-person
project, but can be quite valuable in projects with more than one developer.
This often seems to be a foreign concept to those used to C.
While your approach seems quite different to mine, many aspects of the final
result seem to be similar. This probably bodes well for you being able to use
Ada effectively.
Nothing wrong with stating at the bottom! especially when you already
know that the component you're looking at is likely to be useful and to
fit into *your* way of thinking about things. Your plan already has
higher-level abstractions, so that if you get to the next layer up and
want to change your lowest layer (if only for experimentation's sake)
you will be able to do so.
Lots of people round here are responsible for component libraries at
various levels of abstraction which they've developed for their own
purposes and then pushed out to the community in the hope they'll help
someone else.
The only caution I'd add is that, at some point, when you're reading an
S-expression from an external file and you expect the next 4 bytes to
contain an integer in binary little-endian format, you're going to have
to trust _something_ to have got it right; if you wrote "*(struct foo
**)((char *)whatever + bar.offset)" in C you will have to write the
equivalent in Ada. Unless you were going to invent a sort of checked
S-expression? (I don't get that impression!)
--S
> The only caution I'd add is that, at some point, when you're reading an
> S-expression from an external file and you expect the next 4 bytes to
> contain an integer in binary little-endian format,
S-expressions are bad, but not that bad. They have binary data encoded as
hexadecimal strings.
> you're going to have
> to trust _something_ to have got it right; if you wrote "*(struct foo
> **)((char *)whatever + bar.offset)" in C you will have to write the
> equivalent in Ada.
Luckily, Ada makes it difficult to write C equivalents. In Ada a
"non-equivalent" could be:
with Ada.Streams; use Ada.Streams;
with Interfaces; use Interfaces;
with Ada.Exceptions; use Ada.Exceptions;
function Get (Stream : access Root_Stream_Type'Class) return Unsigned_16 is
Result : Unsigned_16 := 0;
begin
if '#' /= Character'Input (Stream) then
Raise_Exception (Syntax_Error'Identity, "Opening '#' is expected");
end if;
for Octet in 0..3 loop
Result :=
Result + Character'Pos (Character'Input (Stream)) * 2**Octet;
end loop;
if '#' /= Character'Input (Stream) then
Raise_Exception (Syntax_Error'Identity, "Closing '#' is expected");
end if;
return Result;
end Get;
> Unless you were going to invent a sort of checked
> S-expression? (I don't get that impression!)
I don't think that were an invention. The program in C or Ada that does not
check the format of the input is broken to me.
> On Sun, 08 Aug 2010 11:26:11 +0100, Simon Wright wrote:
>
>> The only caution I'd add is that, at some point, when you're reading an
>> S-expression from an external file and you expect the next 4 bytes to
>> contain an integer in binary little-endian format,
>
> S-expressions are bad, but not that bad. They have binary data encoded as
> hexadecimal strings.
>
>> you're going to have
>> to trust _something_ to have got it right; if you wrote "*(struct foo
>> **)((char *)whatever + bar.offset)" in C you will have to write the
>> equivalent in Ada.
>
> Luckily, Ada makes it difficult to write C equivalents. In Ada a
> "non-equivalent" could be:
>
> with Ada.Streams; use Ada.Streams;
> with Interfaces; use Interfaces;
> with Ada.Exceptions; use Ada.Exceptions;
>
> function Get (Stream : access Root_Stream_Type'Class) return Unsigned_16 is
Unsigned_32
> Result : Unsigned_16 := 0;
Unsigned_32
> begin
> if '#' /= Character'Input (Stream) then
> Raise_Exception (Syntax_Error'Identity, "Opening '#' is expected");
> end if;
> for Octet in 0..3 loop
> Result :=
> Result + Character'Pos (Character'Input (Stream)) * 2**Octet;
2**(Octet*8) of course
Well, then I'm afraid I can discuss anymore, because I fail to
understand your definition here.
I was using "general-purpose" as the opposite of "specific-purpose".
If we make the parallel with compression schemes, FLAC sure is as
complete as bzip2, yet the first one has a specific purpose
(compressing sounds) while the other is general-purpose. So back to
data format, I made the distinction in the amount preliminary
assumptions about data to be contained. In that sense the raw byte-
sequence is the most general format in that there is no assumption
about the contained data (except that its number of bits is a multiple
of the number of bits per byte).
> > So let's add as few semantics as possible, to keep as much generality
> > as possible. We end up with a bunch of byte sequences, whose semantics
> > are still left to the application, linked together by some kind of
> > semantic link. When the chosen links are "brother of" and "sublist of"
> > you get exactly S-expressions.
>
> Yes, the language of S-expressions is about hierarchical structures of
> elements lacking any semantics.
>
> I see no purpose such descriptions.
Indeed, I don't see any either, and that's the point: there is room to
add your application-specific purpose on top of this format.
> > However from a purely practical point of view, and using the fact that
> > in my background languages (C and 386 asm) bytes sequences and strings
> > are so similar, these crude semantics are all I need (or at least, all
> > I've ever needed so far).
>
> Lower you descend down the abstraction levels less differences you see.
> Everything is a bunch of transistors...
In the Get procedure from your last post, you don't seem to make that
much difference between a binary byte and a Character. I would seem
Ada Strings are also very similar to byte sequences/arrays.
> > Now if we agree that simplicity is a
> > desirable quality (because it leads to less bugs, more productivity,
> > etc), I still fail to see the issues of such a format.
>
> Programs in 386 Assembler are sufficiently more complex than programs in
> Ada. Simplicity of nature by no means implies simplicity of use.
Guess why I haven't written a single line in assembly during the last
8 years ;-)
> > Now regarding personal preferences about braces, I have to admit I'm
> > always shocked at the way so many people dismiss S-expressions on
> > first sight because of the LISP-looking parentheses.
>
> Do you mean LISP does not deserve its fame? (:-))
I honestly don't know enough about both LISP and its fame to mean
anything like that. I just meant that judging format only from its
relatively heavy use of parenthesis is about as silly as judging
skills of a person only from the amount of melanin in their skin.
> > My point is, most of my (currently non-OOP) code can be expressed as
> > well in an OOP style. When I defined a C structure along with a bunch
> > of functions that perform operations on it, I'm conceptually defining
> > a class and its methods, only expressed in a non-OOP language. I
> > sometimes put function pointers in the structure, to have a form of
> > dynamic dispatch or virtual methods. I occasionally even leave the
> > structure declared but undefined publicly, to hide internals (do call
> > that encapsulation?), along with functions that could well be called
> > accessors and mutators. In my opinion that doesn't count as OOP
> > because it doesn't use OOP-specific features like inheritance.
>
> I disagree because in my view this is all what OO is about. OO is not about
> the tools (OOPL), it is about the way of programming.
Then I guess you could say I'm twisting C into OO programming, though
I readily violate OOP principles when it significantly improves code
readability or simplicity (which I guess happens much more often in C
than in Ada).
> > And the reason why I started this thread is only to
> > know how to buffer into memory the arrays of octets, because I need
> > (in my usual use of S-expressions) to resolve the links between atoms
> > before I can know the type of atoms. So I need a way to delay the
> > typing, and in the meantime handle data as a generic byte sequence
> > whose only known information is its size and its place in the S-
> > expression tree. What exactly is so bad with that approach?
>
> Nothing wrong when at the implementation level. However I don't see why
> links need to be resolved first. In comparable cases - I do much messy
> protocol/communication stuff - I usually first restore objects and then
> resolve links.
That's because some atom types are only known after having examined
other atoms. I you remember my example (tcp-connect (host foo.example)
(port 80)), here is how would it be interpreted: from the context or
initial state, we expect a list beginning with a atom which is a
string describing what to with whatever is after. "tcp-connect" is
therefore interpreted as a string, from the string value we know the
following is a list of settings, each of them being a list whose first
element is a atom which is a string describing the particular setting.
"host" is therefore a string, as its value tells us the following
atoms are also strings, describing host names to connect to, in
decreasing priority order. There "foo.example" is a string to be
resolve into a network address. "port" is also a string, and from its
value we know it's followed by atom being the decimal representation
of a port number, which in Ada would probably be a type on its own
(probably Integer mod 2**16 or something like that).
Of course, all those "we know" is actually knowledge derived from the
configuration file specification.
In this particular example, atoms are treated in the order in which
they appear in the byte stream, so there is already enough context to
know the type of an atom before reading it. This is not always the
case, for example it might be necessary to build an associative array
from a list of list before being able to know the type of non-head
atoms, or the S-expression might have to be kept uninterpreted (and
thus untyped) before some other run-time actions are performed (this
is quite common in the template system, where the template and the
data can change independently, and both changes induce a S-expression
re-interpretation).
Is it clearer now?
Natacha
> On Aug 7, 4:23 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Sat, 7 Aug 2010 05:56:50 -0700 (PDT), Natacha Kerensikova wrote:
>>> Can we at least agree on the fact that a sequence of bytes is a
>>> general-purpose format, widely used for storing and transmitting data?
>>> (this question is just a matter of vocabulary)
>>
>> I don't think so. Namely it don't think that "general" is a synonym to
>> "completeness." It is rather about the abstraction level under the
>> condition of completeness.
>
> Well, then I'm afraid I can discuss anymore, because I fail to
> understand your definition here.
>
> I was using "general-purpose" as the opposite of "specific-purpose".
> If we make the parallel with compression schemes, FLAC sure is as
> complete as bzip2, yet the first one has a specific purpose
> (compressing sounds) while the other is general-purpose. So back to
> data format, I made the distinction in the amount preliminary
> assumptions about data to be contained. In that sense the raw byte-
> sequence is the most general format in that there is no assumption
> about the contained data (except that its number of bits is a multiple
> of the number of bits per byte).
And how are you going to make any assumptions at the level of raw bytes?
For a sequence of bytes to become sound you need to move many abstraction
layers - and OSI layers - up.
>>> However from a purely practical point of view, and using the fact that
>>> in my background languages (C and 386 asm) bytes sequences and strings
>>> are so similar, these crude semantics are all I need (or at least, all
>>> I've ever needed so far).
>>
>> Lower you descend down the abstraction levels less differences you see.
>> Everything is a bunch of transistors...
>
> In the Get procedure from your last post, you don't seem to make that
> much difference between a binary byte and a Character.
No I do. But you have defined it as a text file. A streamed text file is a
sequence of Character items.
> I would seem
> Ada Strings are also very similar to byte sequences/arrays.
I remember a machine where char was 32-bit long.
Byte, octet, character are three different things (and code point is a
fourth).
> I just meant that judging format only from its
> relatively heavy use of parenthesis is about as silly as judging
> skills of a person only from the amount of melanin in their skin.
The amount of melanin is unrelated to the virtues we count in human beings.
An excessive need in indistinguishable brackets would definitely reduce
readability.
>>> And the reason why I started this thread is only to
>>> know how to buffer into memory the arrays of octets, because I need
>>> (in my usual use of S-expressions) to resolve the links between atoms
>>> before I can know the type of atoms. So I need a way to delay the
>>> typing, and in the meantime handle data as a generic byte sequence
>>> whose only known information is its size and its place in the S-
>>> expression tree. What exactly is so bad with that approach?
>>
>> Nothing wrong when at the implementation level. However I don't see why
>> links need to be resolved first. In comparable cases - I do much messy
>> protocol/communication stuff - I usually first restore objects and then
>> resolve links.
>
> That's because some atom types are only known after having examined
> other atoms. I you remember my example (tcp-connect (host foo.example)
> (port 80)), here is how would it be interpreted: from the context or
> initial state, we expect a list beginning with a atom which is a
> string describing what to with whatever is after. "tcp-connect" is
> therefore interpreted as a string, from the string value we know the
> following is a list of settings,
Once you matched "tcp-connect", you know all the types of the following
components.
> This is not always the
> case, for example it might be necessary to build an associative array
> from a list of list before being able to know the type of non-head
> atoms,
What for? Even if such cases might be invented, I see no reason to do that.
It is difficult to parse, it is difficult to read. So why to mess with?
Well actually the requirements were presented in the beginning: I've
got a S-expression configuration file, a directory of static files, a
directory of S-expression templates and a directory of S-expression
data to populate the templates. And I want to respond to HTTP request
with either a static file or an expanded template. Now I might
misunderstanding the word "requirements", but what I actually consider
as requirements is the above without any "S-expression" occurrence,
the rest being implementation choices. "S-expression" might be part of
the requirements to use existing data, but then again another format
can be chosen provided a converted from S-expression is also coded.
I then proceeded to propose an implementation fitting these
requirements, which how I would do stuff, and asking how far it is
from a typical Ada-style implementation.
> I'd use Ada Web Server (AWS), and perhaps you should try that, too. Using a
> significant, existing, high-level Ada application framework like that might help
> introduce you to how some experienced Ada people thought this kind of thing
> should be approached.
Thanks for the pointer, however I'm already afraid learning only Ada
is too much for me (I've tried with C++ and it's a language much too
complex to fit in my head (especially the STL), and I hope Ada is
simple enough), learning both Ada and AWS at the same time will
probably be way too much. However I haven't decided yet whether I will
actually begin with this project, I might try to code something
simpler before that (which will also need S-expressions, hence my
beginning with that library). In that case, I might be confident
enough in my Ada before starting the web project, and then go for AWS.
> A "web server" can be a variety of things, from a simple page server that serves
> static files to a highly-dynamic system generating everything on the fly. It
> appears that you intend something that serves static files and expanded page
> templates.
>
> Initially, I'd observe that the system talks to the network and to the permanent
> storage that stores the configuration information, static pages, and so on. So
> my initial decomposition would identify interface modules for communicating with
> these. (This is an "edges-in" approach.)
>
> At a higher level, there is something that responds to incoming requests to
> serve the appropriate responses. There's something this uses to obtain the
> configuration from the permanent storage. This could make use of something that
> can serve a static page and something that can serve an expanded page template.
> There's also clearly a place for something that expands a page template.
>
> I'm doing this off the top of my head, so I won't be surprised if I've missed
> something or otherwise screwed up.
Actually, that's very similar to what I thought too. It's just that
I've already thought so much about this project that I'm further down
the road, and I tried to fit everything in the one-dimension of a
text.
That's actually a pretty strange thing I've never encountered yet in
other coders I know, I first spend a lot of time thinking before
writing the first line of code (or that particular case, before even
managing to install the compiler). I also spend much less time
debugging, though I can't tell whether they are related or whether one
of them is more efficient than the other.
> This identifies the major high-level modules in the system. I could now define
> the package specifications for them and have the compiler check that they are
> syntactically correct and semantically consistent. Then I could pick one and
> design its implementation.
I have done this a few times, however C compilation being weaker than
Ada's (from what I understood), knowing that something in C compiles
isn't that useful. Hence my stress on tests, which require an
implementation of every dependency. Hence my tendency to start with
lower-level code.
Moreover, I have quite a few times (though less often recently, it
might a beginner thing) realized during implementation that the
specification is poorly chosen. The less higher-level components are
already written, the cheaper interface redesign is.
> It's likely at some point tasking would be involved,
> allowing the processing of multiple requests at once, so this would all have to
> be done keeping concurrency in mind.
I'm unsure about this. In C I usually use a socket multiplexer call
(poll() or select()) along with memory buffers, which allows to serve
simultaneously multiple requests in a single thread. It scales
differently than a thread-based approach, but I'm nowhere near the
amount of traffic where it matters, so going for the simplest might be
the best.
Moreover, as the multiplexing might end up being one package (or being
integrated to the networking package, I don't know enough yet), there
is only one place to change should I want to switch between pure-
threaded (like apache), pure-multiplexed (like lighttpd), or a mixed
implementation (like nginx). Resources are largely independent and
read-only, so making everything thread-safe shouldn't be that
difficult anyway.
> At some point I'd get to a low enough level to start thinking about
> representations, which seems to be where you begin your thinking about the problem.
Actually I have already thought a lot before, I just didn't feel the
need of Ada-specific advice before thinking about the actual low-level
implementation.
> > As I'm more comfortable using components already coded and tested, I
> > would code them from the lowest to the highest level:
>
> In Ada, one can create package specifications, then create other units that make
> use of those specifications before they are implemented. This is an important
> concept in Ada called the separation of specification and body. Sometimes it is
> useful to create stub bodies for such packages, which can then be used to test
> the units that make use of these packages. Thus it is often possible to
> implement and test higher-level modules before lower-level modules that they use
> have been implemented. This may not be especially useful on a single-person
> project, but can be quite valuable in projects with more than one developer.
> This often seems to be a foreign concept to those used to C.
As I said, I have the feeling this is very close to what I'm already
doing in C, except that you don't get very far with stub in C, because
the C compiler doesn't prevent many errors beyond typos.
> While your approach seems quite different to mine, many aspects of the final
> result seem to be similar. This probably bodes well for you being able to use
> Ada effectively.
Unless I'm very misunderstanding and/or vary naive, I've got the
feeling our approaches are not that different, and differ mostly in
the order of implementation, which indeed doesn't change that much in
the final result.
I'm glad my future in Ada looks well. I'm still afraid of its
complexity, and of how intrusive the standard library is (e.g. DS is
very limited in memory, as much useless (and maybe not-so-unless)
stuff as possible should be trimmed away).
Thanks for your insights,
Natacha
I'm not, hence the byte-sequence being general-purpose using my
definition. (However one could split hair by saying that at the level
of raw bytes you're making assumptions about the number and endianness
of bits in each byte.)
S-expressions on the other hand are slightly less general-purpose in
that they contain the assumption that data is organized on the leaves
of a binary tree.
The more specialized the format, the more assumptions on the contained
data. Right?
> > In the Get procedure from your last post, you don't seem to make that
> > much difference between a binary byte and a Character.
>
> No I do. But you have defined it as a text file. A streamed text file is a
> sequence of Character items.
Actually, I didn't. I only defined it as a bunch of byte sequences
organized in a certain way.
The fact I usually choose a text representation of S-expressions is
purely a personal choice (motivated by the power of existing text
utilities like grep and sed), but I've never written a S-expression
library assuming it will deal with texts. The canonical form of a S-
expression, where atoms are embedded verbatim, is as binary as one can
get.
> > I would seem
> > Ada Strings are also very similar to byte sequences/arrays.
>
> I remember a machine where char was 32-bit long.
I've often wanted to get access to one of those PDP with 9-bit bytes,
just to further check my C programs.
> Byte, octet, character are three different things (and code point is a
> fourth).
I know very well these differences, except octet vs character,
especially considering Ada's definition of a Character. Or is it only
that Character is an enumeration while octet is a modular integer?
This leads to a question I had in mind since quite early in the
thread, should I really use an array of Storage_Element, while S-
expression standard considers only sequences of octets?
> > I just meant that judging format only from its
> > relatively heavy use of parenthesis is about as silly as judging
> > skills of a person only from the amount of melanin in their skin.
>
> The amount of melanin is unrelated to the virtues we count in human beings.
> An excessive need in indistinguishable brackets would definitely reduce
> readability.
Of course, this the same issue as curly brackets in C. My opinion
being that those brackets are not meant to be read by humans, only by
the compiler. Indentation is supposed to provide the same information
to humans while being ignored by the compiler. I apply the same rule
to S-expressions. Don't you think one should at least have a serious
look at a file before freaking out and calling it unreadable?
> > That's because some atom types are only known after having examined
> > other atoms. I you remember my example (tcp-connect (host foo.example)
> > (port 80)), here is how would it be interpreted: from the context or
> > initial state, we expect a list beginning with a atom which is a
> > string describing what to with whatever is after. "tcp-connect" is
> > therefore interpreted as a string, from the string value we know the
> > following is a list of settings,
>
> Once you matched "tcp-connect", you know all the types of the following
> components.
Unfortunately, you know "80" is a 16-bit integer only after having
matched "port".
> > This is not always the
> > case, for example it might be necessary to build an associative array
> > from a list of list before being able to know the type of non-head
> > atoms,
>
> What for? Even if such cases might be invented, I see no reason to do that.
> It is difficult to parse, it is difficult to read. So why to mess with?
For example, you might have a sub-S-expression describing a seldom
used object that is expensive to build, wouldn't you want to be sure
you actually need it before building it?
Thanks for the discussion,
Natacha
Thanks a lot for the support \o/
> Lots of people round here are responsible for component libraries at
> various levels of abstraction which they've developed for their own
> purposes and then pushed out to the community in the hope they'll help
> someone else.
I indeed planned to share such a library (assuming I actually write
and finish it), should a generous soul accept to review it. However I
have long lost the hope of seeing my S-expression stuff used, I guess
I can't win against lisp-trauma.
> The only caution I'd add is that, at some point, when you're reading an
> S-expression from an external file and you expect the next 4 bytes to
> contain an integer in binary little-endian format, you're going to have
> to trust _something_ to have got it right; if you wrote "*(struct foo
> **)((char *)whatever + bar.offset)" in C you will have to write the
> equivalent in Ada. Unless you were going to invent a sort of checked
> S-expression? (I don't get that impression!)
Actually that ugly C expression is not a part of my S-expression code.
It's part of a generic self-balancing binary tree interface, supposed
to allow any algorithm as a back end. Because algorithms store
different data, I can't make assumption about the position of children
or balancing data inside the node structure. Therefore I allow the
back-end to provide the offset from the node structure where the
generic tree code can find stuff it needs. Here "whatever" is a void
pointer, pointing to the beginning of the node structure; "bar" is a
structure provided by the back-end; char* cast is needed to perform
byte-based pointer arithmetic, and then the struct foo** cast back to
the real type of the required element.
While I know perfectly what I'm doing with this, I guess it's not
obvious for the majority of C coders. My hope with Ada is that I
wouldn't ever need to write such dubious expressions. In that
particular case, I'm working around C's lack of generics, so it
shouldn't be a problem to express this in Ada. And should I ever need
to write dubious expressions, hope the Ada context would give me the
benefit of doubt long enough to have people read the documents or the
comments and understand there was no other choice; while in C people
wouldn't go further and just label my code and me as ugly and
dangerous.
Regarding the integer encoded as 4 little-endian bytes, I believe it's
pretty safe because S-expression atom are of known length, so if the
length is different than 4 bytes I know there is a problem, and
otherwise I need other constrains on the integer to know whether it's
valid or not. In any case, it doesn't disrupt the reading or
interpretation of the S-expression beyond that particular integer
value.
Thanks again for your support,
Natacha
>
> Is it clearer now?
GO! Natacha! GO! -- Natacha ROCKS! -- GO! Natacha! GO!
;D
I've _now_ decided to go out and learn _all_ about S-Expressions. ;)
--
Duke
> On Aug 8, 3:01 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
> The more specialized the format, the more assumptions on the contained
> data. Right?
Yes if the specialization addresses the encoded entities. No if it does the
medium.
>>> In the Get procedure from your last post, you don't seem to make that
>>> much difference between a binary byte and a Character.
>>
>> No I do. But you have defined it as a text file. A streamed text file is a
>> sequence of Character items.
>
> Actually, I didn't. I only defined it as a bunch of byte sequences
> organized in a certain way.
I see. This confuses things even more. Why should I represent anything as a
byte sequence? It already is, and in 90% cases I just don't care how the
compiler does that. Why to convert byte sequences into other sequences and
then into a text file. It just does not make sense to me. Any conversion
must be accompanied by moving the thing from one medium to another.
Otherwise it is wasting.
>> Byte, octet, character are three different things (and code point is a
>> fourth).
>
> I know very well these differences, except octet vs character,
> especially considering Ada's definition of a Character. Or is it only
> that Character is an enumeration while octet is a modular integer?
The difference is that Character represents code points and octet does
atomic arrays of 8 bits.
> This leads to a question I had in mind since quite early in the
> thread, should I really use an array of Storage_Element, while S-
> expression standard considers only sequences of octets?
That depends on what are you going to do. Storage_Element is a
machine-dependent addressable memory unit. Octet is a machine independent
presentation layer unit, a thing of 256 independent states. Yes
incidentally Character has 256 code points.
> Don't you think one should at least have a serious
> look at a file before freaking out and calling it unreadable?
There are well-known things which do not require reconsidering. Curly or
round brackets aren't bad because of they form. They are because of
excessive overloading: the closing brackets of a loop, aggregate, block etc
are indistinguishable in C. Further in C you have brackets where there none
needed and don't have them where they should be. This do apply to LISP and
S-expressions.
>>> That's because some atom types are only known after having examined
>>> other atoms. I you remember my example (tcp-connect (host foo.example)
>>> (port 80)), here is how would it be interpreted: from the context or
>>> initial state, we expect a list beginning with a atom which is a
>>> string describing what to with whatever is after. "tcp-connect" is
>>> therefore interpreted as a string, from the string value we know the
>>> following is a list of settings,
>>
>> Once you matched "tcp-connect", you know all the types of the following
>> components.
>
> Unfortunately, you know "80" is a 16-bit integer only after having
> matched "port".
Nope, we certainly know that each TCP connection needs a port. There is
nothing to resolve since the notation is not reverse. Parse it top down, it
is simple, it is safe, it allows excellent diagnostics, it works.
>>> This is not always the
>>> case, for example it might be necessary to build an associative array
>>> from a list of list before being able to know the type of non-head
>>> atoms,
>>
>> What for? Even if such cases might be invented, I see no reason to do that.
>> It is difficult to parse, it is difficult to read. So why to mess with?
>
> For example, you might have a sub-S-expression describing a seldom
> used object that is expensive to build, wouldn't you want to be sure
> you actually need it before building it?
See above, if you parse top down, you know if you need that object before
begin. Then having a bracketed structure, it is trivial to skip the
object's description without construction. Just count brackets.
> Thanks for the pointer, however I'm already afraid learning only Ada
> is too much for me (I've tried with C++ and it's a language much too
> complex to fit in my head (especially the STL), and I hope Ada is
> simple enough), ...
Ada is more complicated than C, less complicated that C++.
There are also cases where Ada _seems_ less complicated.
Overloading, for example. All you need to know is:
- What it is.
- It's allowed for subprograms and literals. (That's slightly
wrong, but you don't need to care.)
- The compiler resolves overloading by looking at the types
of things.
- If you get an ambiguity, you can resolve it in one of three ways,
in decreasing order of preference:
- Change the names of some things.
- Specify the type of some expression using a qualified
expression.
- Use Package_Name.Procedure_Name notation.
The actual overload resolution rules in Ada are quite complicated.
Dozens of pages of text, I'd guess. But you don't need to
understand them -- just wait until the compiler complains,
and fix it.
In C++, on the other hand, you need to understand all kinds of
subtle interactions with implicit conversions, for example.
So Ada's overloading rules might be as complicated as the C++
rules, but Ada _seems_ simpler, because you can safely ignore
all the details. Unless you're writing an Ada compiler, of
course.
>...learning both Ada and AWS at the same time will
> probably be way too much.
Well, I'm not sure. To learn a programming language, you need to
write programs, and also read programs. Jeff's point was that
reading AWS will help you learn Ada, and that's probably true.
By the way, I don't see anything fundamentally wrong with
your s-expression plans, with s-expressions containing
raw arrays-of-octets -- especially since you say you
already have a lot of stuff using s-exprs. You can layer on
top of that less-general but more-typed abstractions, probably
using generics.
- Bob
> (probably Integer mod 2**16 or something like that).
There's no such thing as "Integer mod 2**16". "mod ..." is
how you declare an unsigned integer type (called "modular type" in
Ada), and "range ..." is how you declare a signed integer type.
Integer is a predefined signed integer type. It's range is
not portable, so if you care about that, it should be avoided.
It's a good idea to declare your own integer types -- different
types for different purposes. For example, if you have an array
of indexes into another array, using different index types for
the two arrays will often make the code readable -- it will be
obvious that (say) X is intended for indexing into the
outer array.
Modular types are evil, and should usually be avoided.
You might want modular types when interfacing with C code,
but don't use them just because you happen to have
no negative numbers. For example, I would usually
(not always!) prefer this:
type Octet is range 0..2**8-1; -- signed
over this:
type Octet is mod 2**8; -- unsigned
The index type of an unconstrained array should (almost) never
be modular.
- Bob
> Modular types are evil, and should usually be avoided.
They aren't, modular arithmetic is. I wished to be able to define a modular
types without +,-,*,/, and, or, xor, and have a more decent notation for
S'Succ and S'Pred.
> The index type of an unconstrained array should (almost) never
> be modular.
That is again not their fault. If a subtype of a modular type were modular,
e,g,
type T is mod 256;
subtype S is T range 2..10;
X : S := 10;
begin
X := X + 1; -- The result is 2!
then they could be used as indices.
> On Sun, 08 Aug 2010 11:34:23 -0400, Robert A Duff wrote:
>
>> Modular types are evil, and should usually be avoided.
>
> They aren't, modular arithmetic is.
Hmm.
Implicit modular arithmetic is evil. I have no problem with
expressions like "X mod N".
>...I wished to be able to define a modular
> types without +,-,*,/, and, or, xor, and have a more decent notation for
> S'Succ and S'Pred.
>
>> The index type of an unconstrained array should (almost) never
>> be modular.
>
> That is again not their fault. If a subtype of a modular type were modular,
> e,g,
>
> type T is mod 256;
> subtype S is T range 2..10;
>
> X : S := 10;
> begin
> X := X + 1; -- The result is 2!
2 factorial? ;-)
For scalars, the primitive operations mostly don't know the
subtype bounds -- they are operations of the type. So this
wouldn't fit into Ada well.
> ...then they could be used as indices.
The problem I was alluding to is that if you have
an unconstrained array type, sooner or later you might
have an empty one. So you want bounds 0..-1, but -1
wraps around.
Same thing with a "for" loop that goes from T'First up to
something.
If 2..10 wrapped around, then you'd want range 2..1, which
has the same problem.
Did I mention that modular types are my least favorite
feature of Ada? ;-)
- Bob
As someone who is responsible for a library, I not sure we disagree. Certainly
when one gets to the point of deciding on a representation, one should implement
one's choice in a reusable manner if possible, or reuse an existing library;
often the existence of a library will be an important factor in choosing a
representation.
Where we may disagree is that I don't think one should begin a project such as
this by choosing a representation.
--
Jeff Carter
"I wave my private parts at your aunties."
Monty Python & the Holy Grail
13
Most of what you presented seem more like implementation decisions
(S-expressions for the configuration information, S-expressions for the page
templates) than requirements. Even discussion of files and directories might be
premature, depending on the higher level requirements. A web server that can be
used in an embedded system as the external interface as well as as a stand-alone
system would need a more abstract view of permanent storage, for example.
Of course, sometimes implementation decisions are made by others and become part
of one's requirements, but I wouldn't expect that in a personal project. And the
existence of an existing library can be an important factor when masking such
decisions.
I wasn't saying that implementing an S-expression library, or deciding to use it
at a low level in your project, was a bad thing. I was viewing this project as a
separate thing from your S-expression library.
> Thanks for the pointer, however I'm already afraid learning only Ada
> is too much for me (I've tried with C++ and it's a language much too
> complex to fit in my head (especially the STL), and I hope Ada is
> simple enough), learning both Ada and AWS at the same time will
> probably be way too much. However I haven't decided yet whether I will
> actually begin with this project, I might try to code something
> simpler before that (which will also need S-expressions, hence my
> beginning with that library). In that case, I might be confident
> enough in my Ada before starting the web project, and then go for AWS.
I think something like AWS would help you learn Ada. You need to look at
existing code as well as write your own. Starting with the library has good and
bad points. It's a fairly small, contained project, which is good. On the other
hand, the first things you write in Ada are probably not going to be great. In a
few months you'll probably look at the library and want to make significant
changes to it.
Ada is simpler than C++. It's also fairly well designed, so most features are
orthogonal, which simplifies learning the language.
> Actually, that's very similar to what I thought too. It's just that
> I've already thought so much about this project that I'm further down
> the road, and I tried to fit everything in the one-dimension of a
> text.
I'm glad to hear that. Perhaps I simply misinterpreted your presentation.
> That's actually a pretty strange thing I've never encountered yet in
> other coders I know, I first spend a lot of time thinking before
> writing the first line of code (or that particular case, before even
> managing to install the compiler). I also spend much less time
> debugging, though I can't tell whether they are related or whether one
> of them is more efficient than the other.
Excellent. The whole point of Ada is thinking before coding, so you'll probably
find that Ada supports your way of doing things. It's a common experience that
if you did think first, by the time you get your code to compile, it works
correctly. Ada people don't do a lot of debugging; I can't remember the last
time I used a debugger. When we do have problems, they tend to be logic errors
rather than typos.
> Unless I'm very misunderstanding and/or vary naive, I've got the
> feeling our approaches are not that different, and differ mostly in
> the order of implementation, which indeed doesn't change that much in
> the final result.
You may well be right, in which case you'll likely find Ada to be a good fit for
your way of approaching things.
--
Jeff Carter
"I wave my private parts at your aunties."
Monty Python & the Holy Grail
13
> For scalars, the primitive operations mostly don't know the
> subtype bounds -- they are operations of the type. So this
> wouldn't fit into Ada well.
But array subtypes silently modify their operations or, if you want,
parametrize them using the dope. A modular subtype would have a dope as
well.
>> ...then they could be used as indices.
>
> The problem I was alluding to is that if you have
> an unconstrained array type, sooner or later you might
> have an empty one. So you want bounds 0..-1, but -1
> wraps around.
Empty range does not have bounds. What are bounds of
type Empty is mod 0; -- Illegal in Ada, but must be legal
> Same thing with a "for" loop that goes from T'First up to
> something.
If ranges (contiguous sets of indices) were proper types you would be able
to construct them in many different ways. E.g.
function ":" (From : Index; Length : Universal_Integer)
return Index'Range
L := 0
...
for I in T'First :L loop
instead of what you can find in every Ada program:
for I in T'First..T'First + L - 1 loop
or for enumerations:
for I in T'First..T'Val (T'Pos (T'First) + L - 1) loop
which does not work anyway...
> If 2..10 wrapped around, then you'd want range 2..1, which
> has the same problem.
2..1 actually is 2,3,4,5,6,7,8,9,1
Modular numbers are not ordered.
> On Sun, 08 Aug 2010 16:03:22 -0400, Robert A Duff wrote:
>
>> For scalars, the primitive operations mostly don't know the
>> subtype bounds -- they are operations of the type. So this
>> wouldn't fit into Ada well.
>
> But array subtypes silently modify their operations or, if you want,
> parametrize them using the dope.
Yes, and discriminated subtypes are similar to arrays in that way.
The dividing line is "scalar" vs. "composite". (I'm not sure that's
a good idea, but that's Ada.) You are suggesting to make modular
types (a kind of scalar) work more like composites, which doesn't
fit well, even if it's a good idea in the abstract.
I actually have no opinion whether it's a good idea, because I
don't like modular types in the first place. ;-)
I didn't mention "access" above, which are "elementary" but not
"scalar". Constrained access subtypes are a nightmare!
>... A modular subtype would have a dope as
> well.
They already do, but the arithmetic ops don't consult that dope.
>>> ...then they could be used as indices.
>>
>> The problem I was alluding to is that if you have
>> an unconstrained array type, sooner or later you might
>> have an empty one. So you want bounds 0..-1, but -1
>> wraps around.
>
> Empty range does not have bounds.
Except that in Ada, they do. An empty String is (1..0).
It could also be 10..9 or even -100..-10_000_000, but
that's not a good idea.
>...What are bounds of
>
> type Empty is mod 0; -- Illegal in Ada, but must be legal
>
>> Same thing with a "for" loop that goes from T'First up to
>> something.
>
> If ranges (contiguous sets of indices) were proper types you would be able
> to construct them in many different ways. E.g.
>
> function ":" (From : Index; Length : Universal_Integer)
> return Index'Range
>
> L := 0
> ...
> for I in T'First :L loop
>
> instead of what you can find in every Ada program:
>
> for I in T'First..T'First + L - 1 loop
>
> or for enumerations:
>
> for I in T'First..T'Val (T'Pos (T'First) + L - 1) loop
>
> which does not work anyway...
Good points.
>> If 2..10 wrapped around, then you'd want range 2..1, which
>> has the same problem.
>
> 2..1 actually is 2,3,4,5,6,7,8,9,1
>
> Modular numbers are not ordered.
But they are -- they have "<", and "for" and so on.
Perhaps they _should_ be unordered, but I won't agree or disagree,
since I think in an ideal world they should be banished.
By the way, one defense of modular types I've heard is that
they are used in mathematics. True. But mathematicians do
not use _implicit_ mod. They say things like "X = Y (mod N)",
which is pronounced "X is congruent to Y (modulo N)".
Congruent, not equal.
- Bob
> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> writes:
>
> I actually have no opinion whether it's a good idea, because I
> don't like modular types in the first place. ;-)
The implementation or the idea? Would you agree that objects with some
properties of modular integers have place in Ada programs which do not
interface C?
> I didn't mention "access" above, which are "elementary" but not
> "scalar". Constrained access subtypes are a nightmare!
Yes, because the language should decide whether the constraint does
influence the behavior (all operations potentially), or is a kludge that
prohibits some values when assigned or passed. I understand the motivation
why Ada 83 chosen the second, but it does not make it right. Assignment is
an operation as any other.
>> Empty range does not have bounds.
>
> Except that in Ada, they do. An empty String is (1..0).
> It could also be 10..9 or even -100..-10_000_000, but
> that's not a good idea.
Yes, because it is wrong. Doing something wrong always will hit back.
>> Modular numbers are not ordered.
>
> But they are -- they have "<", and "for" and so on.
"<" is wrong when non-transitive. I wished Ada clarified difference between
enumeration and total order.
> Perhaps they _should_ be unordered, but I won't agree or disagree,
> since I think in an ideal world they should be banished.
I think they could be fixed.
> By the way, one defense of modular types I've heard is that
> they are used in mathematics. True.
> But mathematicians do
> not use _implicit_ mod. They say things like "X = Y (mod N)",
> which is pronounced "X is congruent to Y (modulo N)".
> Congruent, not equal.
The mathematical notation (mod N) is untyped. It applies to any natural
numbers and what is worse you have to add it at each point of the program
you use the type.
Representing something as a byte sequence is serialization (at least,
according to my (perhaps wrong) definition of serialization). Actually
there is no byte sequences converted into other sequences converted
into a text file. The only conversion is from in-memory representation
(which happens to be also a byte sequence, but maybe not contiguous or
context-dependent or whatever, that's besides the point) into a
serialized byte sequence.
S-expressions are not a format on top or below that, it's a format
*besides* that, at the same level. Objects are serialized into byte
sequences forming S-expression atoms, and relations between objects/
atoms are serialized by the S-expression format. This is how one get
the canonical representation of a S-expression.
Now depending on the situation one might want additional constrains on
the representation, for example human-readability or being text-based,
and the S-expression standard defines non-canonical representations
for such situations.
> > I know very well these differences, except octet vs character,
> > especially considering Ada's definition of a Character. Or is it only
> > that Character is an enumeration while octet is a modular integer?
>
> The difference is that Character represents code points and octet does
> atomic arrays of 8 bits.
Considering Ada's Character also spans over 8 bits (256-element
enumeration), both are equivalent, right? The only difference is the
intent and the meaning of values, right? (unlike byte vs octet, where
quantitative differences might exist on some platforms).
> > This leads to a question I had in mind since quite early in the
> > thread, should I really use an array of Storage_Element, while S-
> > expression standard considers only sequences of octets?
>
> That depends on what are you going to do. Storage_Element is a
> machine-dependent addressable memory unit. Octet is a machine independent
> presentation layer unit, a thing of 256 independent states. Yes
> incidentally Character has 256 code points.
Actually I've started to wonder whether Stream_Element might even more
appropriated: considering a S-expression atom is the serialization of
an object, and I guess objects which know how to serialize themselves
do so using the Stream subsystem, so maybe I could more easily
leverage existing serialization codes if I use Stream_Element_Array
for atoms. But then I don't know whether it's possible to have object
hand over a Stream_Element_Array representing themselves, and I don't
know either how to deal with cases where Stream_Element is not an
octet.
> >> Once you matched "tcp-connect", you know all the types of the following
> >> components.
>
> > Unfortunately, you know "80" is a 16-bit integer only after having
> > matched "port".
>
> Nope, we certainly know that each TCP connection needs a port. There is
> nothing to resolve since the notation is not reverse. Parse it top down, it
> is simple, it is safe, it allows excellent diagnostics, it works.
Consider:
(tcp-connect (host foo.example) (port 80))
and:
(tcp-connect (port 80) (host foo.example))
Both of these are semantically equivalent, but know which of the tail
atom is a 16-bit integer and which is the string, you have to first
match "port" and "host" head atoms.
Or am I misunderstanding your point?
> >>> This is not always the
> >>> case, for example it might be necessary to build an associative array
> >>> from a list of list before being able to know the type of non-head
> >>> atoms,
>
> >> What for? Even if such cases might be invented, I see no reason to do that.
> >> It is difficult to parse, it is difficult to read. So why to mess with?
>
> > For example, you might have a sub-S-expression describing a seldom
> > used object that is expensive to build, wouldn't you want to be sure
> > you actually need it before building it?
>
> See above, if you parse top down, you know if you need that object before
> begin. Then having a bracketed structure, it is trivial to skip the
> object's description without construction. Just count brackets.
Well in that example I was considering something outside from the S-
expression selects which object to use. For example a database
containing thousands of templates or whatever, and user selection
picking only one of them.
Thanks for your patience with me,
Natacha
> On Aug 8, 5:15 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Sun, 8 Aug 2010 06:49:09 -0700 (PDT), Natacha Kerensikova wrote:
>>> On Aug 8, 3:01 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
>>>> No I do. But you have defined it as a text file. A streamed text file is a
>>>> sequence of Character items.
>>
>>> Actually, I didn't. I only defined it as a bunch of byte sequences
>>> organized in a certain way.
>>
>> I see. This confuses things even more. Why should I represent anything as a
>> byte sequence? It already is, and in 90% cases I just don't care how the
>> compiler does that. Why to convert byte sequences into other sequences and
>> then into a text file. It just does not make sense to me. Any conversion
>> must be accompanied by moving the thing from one medium to another.
>> Otherwise it is wasting.
>
> Representing something as a byte sequence is serialization (at least,
> according to my (perhaps wrong) definition of serialization).
"Byte" here is a RAM unit or a disk file item? I meant the former. All
objects are already sequences of bytes in the RAM.
> S-expressions are not a format on top or below that, it's a format
> *besides* that, at the same level. Objects are serialized into byte
> sequences forming S-expression atoms, and relations between objects/
> atoms are serialized by the S-expression format. This is how one get
> the canonical representation of a S-expression.
I thought you wanted to represent *objects* ... as S-sequences?
Where these representations are supposed to live? In the RAM? Why are you
then talking about text files, configurations and humans reading something?
I cannot remember last time I read memory dump...
> Now depending on the situation one might want additional constrains on
> the representation, for example human-readability or being text-based,
> and the S-expression standard defines non-canonical representations
> for such situations.
>
>>> I know very well these differences, except octet vs character,
>>> especially considering Ada's definition of a Character. Or is it only
>>> that Character is an enumeration while octet is a modular integer?
>>
>> The difference is that Character represents code points and octet does
>> atomic arrays of 8 bits.
>
> Considering Ada's Character also spans over 8 bits (256-element
> enumeration), both are equivalent, right?
Equivalent defiled as? In terms of types they are not, because the types
are different. In terms of the relation "=" they are not either, because
"=" is not defined on the tuple Character x Unsigned_8 (or whatever).
> The only difference is the
> intent and the meaning of values, right?
Huh, there is *nothing* beyond the meaning (semantics).
>>> This leads to a question I had in mind since quite early in the
>>> thread, should I really use an array of Storage_Element, while S-
>>> expression standard considers only sequences of octets?
>>
>> That depends on what are you going to do. Storage_Element is a
>> machine-dependent addressable memory unit. Octet is a machine independent
>> presentation layer unit, a thing of 256 independent states. Yes
>> incidentally Character has 256 code points.
>
> Actually I've started to wonder whether Stream_Element might even more
> appropriated: considering a S-expression atom is the serialization of
> an object, and I guess objects which know how to serialize themselves
> do so using the Stream subsystem, so maybe I could more easily
> leverage existing serialization codes if I use Stream_Element_Array
> for atoms.
Note that Stream_Element is machine-depended as well.
> But then I don't know whether it's possible to have object
> hand over a Stream_Element_Array representing themselves,
This does not make sense to me, it is mixing abstractions:
Stream_Element_Array is a representation of an object in a stream.
Storage_Array might be a representation of in the memory. These are two
different objects. You cannot consider them same, even if they shared
physically same memory (but they do not). The whole purpose of
serialization to a raw stream is conversion of a Storage_Array to
Stream_Element_Array. Deserialization is a backward conversion.
> and I don't
> know either how to deal with cases where Stream_Element is not an
> octet.
By not using Stream_Element_Array, obviously. You should use the encoding
you want to. That is all.
If the encoding is for a text file you have to read Characters, you don't
care about how they land into a Stream_Element_Array, it is not your
business, it is an implementation detail of the text stream. If the
encoding is about octets, you have to read them. You have to chose.
>>>> Once you matched "tcp-connect", you know all the types of the following
>>>> components.
>>
>>> Unfortunately, you know "80" is a 16-bit integer only after having
>>> matched "port".
>>
>> Nope, we certainly know that each TCP connection needs a port. There is
>> nothing to resolve since the notation is not reverse. Parse it top down, it
>> is simple, it is safe, it allows excellent diagnostics, it works.
>
> Consider:
> (tcp-connect (host foo.example) (port 80))
> and:
> (tcp-connect (port 80) (host foo.example))
>
> Both of these are semantically equivalent, but know which of the tail
> atom is a 16-bit integer and which is the string, you have to first
> match "port" and "host" head atoms.
Sure
> Or am I misunderstanding your point?
The point is that you never meet 80 before knowing that this is a "port",
you never meet "port" before knowing it is of "tcp-connect". You always
know all types in advance. It is strictly top-down.
> Thanks for your patience with me,
You are welcome. I think from the responses of the people here you see that
the difference between Ada and C is much deeper than begin/end instead of
curly brackets. Ada does not let you through without a clear concept of
what are you going to do. Surely with some proficiency one could write
classical C programs in Ada, messing everything up. You could even create
buffer overflow in Ada. But it is difficult for a beginner...
> The implementation or the idea? Would you agree that objects with some
> properties of modular integers have place in Ada programs which do not
> interface C?
No. I do not like implicit "mod". But that's the essence of
modular types.
If I ran the circus, then this:
M : constant := 2**64;
type My_Unsigned is range 0..M-1;
pragma Assert(My_Unsigned'Size = 64);
X : My_Unsigned := ...;
Y : My_Unsigned := (X + 1) mod M;
would be legal, and would not raise any exceptions, even when
X = M-1. (And I'd want "(X + 1) mod M" to be implemented
as a single "add" instruction on a typical 64-bit machine.)
The above is illegal in every Ada compiler, because My_Unsigned
is a _signed_ integer type, and nobody supports that range.
Part of the reason modular types were invented is because
signed integers don't work in cases like the above.
>> Perhaps they _should_ be unordered, but I won't agree or disagree,
>> since I think in an ideal world they should be banished.
>
> I think they could be fixed.
How? And having fixed them, when would you use them?
That is, when would you prefer them over signed integers?
>> By the way, one defense of modular types I've heard is that
>> they are used in mathematics. True.
>
>> But mathematicians do
>> not use _implicit_ mod. They say things like "X = Y (mod N)",
>> which is pronounced "X is congruent to Y (modulo N)".
>> Congruent, not equal.
>
> The mathematical notation (mod N) is untyped. It applies to any natural
> numbers and what is worse you have to add it at each point of the program
> you use the type.
Writing "mod" whenever I want an expression that takes the modulus is
a GOOD thing. "+" should always do an add, and nothing else.
If want to negate the result of "+", I should write "-".
If want to take the modulus of the result of "+", I should
write "mod".
Look at how unsigned types are used in C. size_t is a good
example. It's used to count up the sizes of things.
If I have 1000 objects of size 10_000_000, the total
size is 1000*10_000_000 = 10_000_000_000. If that
calculation wraps around on a 32-bit machine, the
answer is just plain wrong. I'd rather get Constraint_Error.
If I interface to C's size_t, and do similar calculations on
the Ada side, wraparound is equally wrong.
- Bob
> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> writes:
>
>> The implementation or the idea? Would you agree that objects with some
>> properties of modular integers have place in Ada programs which do not
>> interface C?
>
> No. I do not like implicit "mod". But that's the essence of
> modular types.
>
> If I ran the circus, then this:
>
> M : constant := 2**64;
> type My_Unsigned is range 0..M-1;
> pragma Assert(My_Unsigned'Size = 64);
>
> X : My_Unsigned := ...;
> Y : My_Unsigned := (X + 1) mod M;
>
> would be legal, and would not raise any exceptions, even when
> X = M-1. (And I'd want "(X + 1) mod M" to be implemented
> as a single "add" instruction on a typical 64-bit machine.)
What does "+" above return? Is "mod M" a required part of the notation or
not?
>>> Perhaps they _should_ be unordered, but I won't agree or disagree,
>>> since I think in an ideal world they should be banished.
>>
>> I think they could be fixed.
>
> How? And having fixed them, when would you use them?
> That is, when would you prefer them over signed integers?
Ring buffer indexing, flags (by-value sets actually), cryptography,
interfacing, communication.
I think that all these usages require different types of modular types with
different sets of operations. So I would prefer a language extension which
would allow me construction of such types rather than built-in ones, which
satisfy no one. Just one example from a huge list, why "in" is not an
operation?
if 0 /= (Mode and Alarm) then -- Isn't it awful?
why am I not allowed to have:
if Alarm in Mode then
>>> By the way, one defense of modular types I've heard is that
>>> they are used in mathematics. True.
>>
>>> But mathematicians do
>>> not use _implicit_ mod. They say things like "X = Y (mod N)",
>>> which is pronounced "X is congruent to Y (modulo N)".
>>> Congruent, not equal.
>>
>> The mathematical notation (mod N) is untyped. It applies to any natural
>> numbers and what is worse you have to add it at each point of the program
>> you use the type.
>
> Writing "mod" whenever I want an expression that takes the modulus is
> a GOOD thing. "+" should always do an add, and nothing else.
Huh, modular "+" *does* add in the ring. Your "+" does not!
> If want to negate the result of "+", I should write "-".
Negative inverse in a ring is not one of integers. They are different
types. It is good and helpful that Ada promotes this difference.
> If want to take the modulus of the result of "+", I should
> write "mod".
That would be a type conversion. Bad thing. But I see no problem with that.
Give me universal integers or equivalent "2 x width" type and you will get
what you want in return:
function "+" (Left, Right : Modular) return Universal_Integer;
-- overloads standard +, if that exists
-- function "mod" (Left, Right : Universal_Integer) return Modular;
-- just to remember, it already exists
That's it.
> Look at how unsigned types are used in C. size_t is a good
> example. It's used to count up the sizes of things.
> If I have 1000 objects of size 10_000_000, the total
> size is 1000*10_000_000 = 10_000_000_000. If that
> calculation wraps around on a 32-bit machine, the
> answer is just plain wrong. I'd rather get Constraint_Error.
>
> If I interface to C's size_t, and do similar calculations on
> the Ada side, wraparound is equally wrong.
I agree. But it rather means that C is wrong defining size_t modular. On
Ada side it must be Natural_32. Why Ada does not support that? Why is it
impossible to declare 32-bit range 0..2**32-1 without wrapping (=with
overflow check)?
> I think that all these usages require different types of modular types with
> different sets of operations. So I would prefer a language extension which
> would allow me construction of such types rather than built-in ones, which
> satisfy no one. Just one example from a huge list, why "in" is not an
> operation?
>
> if 0 /= (Mode and Alarm) then -- Isn't it awful?
>
> why am I not allowed to have:
>
> if Alarm in Mode then
Is anything wrong with packed arrays of Booleans for status thingies?
>> Writing "mod" whenever I want an expression that takes the modulus is
>> a GOOD thing. "+" should always do an add, and nothing else.
>
> Huh, modular "+" *does* add in the ring. Your "+" does not!
How many programmers, among those without a degree in math or
in physics, know what a ring is?
Hence, no "+" can do the "right thing", since "+" is overloaded.
Not just in Ada, but also in popular understanding.
A definition of the "+" operator, whose meaning(s) is (are)
stipulated to be well known, seems impossible:
everyone considers its meaning(s) as clearly established as
the proper ingredients of lentil soup. Because the frames of
reference used in the definition are an obvious pick, aren't they!?
I don't remember when I have last read a final report on either
the statistical meaning of "+" or on lentil soup ingredients research
(other than our bi-weekly CVE :-)
But I do remember finding collections of good recipes.
Each recipe there is named with a unique identifier, and it lists
operands (ingredients) and operations (preparation, cooking).
The essential bit: use words or phrases that say what you
mean. (By implication, express what you want others to understand.)
Obviously, "+" does not express a recipe clearly, or not without
lengthy exegesis...
Imagine a programming language that is just like Ada with
SPARK restrictions on naming. Then, throw out overloading of
predefined names, too, of "+", for example.
No more misunderstandings, then.
But will this language be usable with beginners?
> and I don't know either how to deal with cases where Stream_Element is
> not an octet.
By ignoring them, I think!
> On 09.08.10 16:38, Dmitry A. Kazakov wrote:
>
>> I think that all these usages require different types of modular types with
>> different sets of operations. So I would prefer a language extension which
>> would allow me construction of such types rather than built-in ones, which
>> satisfy no one. Just one example from a huge list, why "in" is not an
>> operation?
>>
>> if 0 /= (Mode and Alarm) then -- Isn't it awful?
>>
>> why am I not allowed to have:
>>
>> if Alarm in Mode then
>
> Is anything wrong with packed arrays of Booleans for status thingies?
1. The membership/subset tests look even worse
2. No ranges
3. No loops
4. No case statement
5. Array aggregates as literals aren't not fun
6. No discriminants of
7. No array index of
Note that array must be a view of a modular type with the modulus 2**N, but
the type itself must remain scalar.
>>> Writing "mod" whenever I want an expression that takes the modulus is
>>> a GOOD thing. "+" should always do an add, and nothing else.
>>
>> Huh, modular "+" *does* add in the ring. Your "+" does not!
>
> How many programmers, among those without a degree in math or
> in physics, know what a ring is?
Why should they use the type then?
> Hence, no "+" can do the "right thing", since "+" is overloaded.
Most of modular types should have no + at all. There should be a readable
notation for X+1 instead of infernal T'Succ(X)
> A definition of the "+" operator, whose meaning(s) is (are)
> stipulated to be well known, seems impossible:
http://en.wikipedia.org/wiki/Abelian_group ?
> Imagine a programming language that is just like Ada with
> SPARK restrictions on naming. Then, throw out overloading of
> predefined names, too, of "+", for example.
I don't think this were possible, but basically it is what inheritance is
all about. The concept of checking would be the LSP-conformity.
> No more misunderstandings, then.
>
> But will this language be usable with beginners?
Come on, children start counting before knowing anything about groups. Most
of them die before they do. The language is a tool it is not a library.
Are there any such cases in the real world? There are machines
in the embedded world where Storage_Element is not 8 bits.
But any machine that's hooked up to any sort of network
is going to be talking Octets, I think. Does anybody know of
counterexamples?
Perhaps Stream_Element should have been defined as exactly
8 bits. Perhaps it should have been called "Octet".
- Bob
OK. But I guess we could write inlined functions?
> 2. No ranges
> 3. No loops
> 4. No case statement
You have the index subtype at the point where you have
the array type. Why not ranges, loops, and case statements?
> 5. Array aggregates as literals aren't not fun
Fun?
> 6. No discriminants of
> 7. No array index of
These refer to arrays in general, not Boolean arrays as sets and,
I guess, the item numbered 7 follows a general rule:
Where the array type is visible, its index type is
visible. (Or else, you have been too lazy and used
an anonymous index type; still, you have 'first etc.)
Therefore, the index type can be named without
indirection, that is, via some array attribute function.
(Or else, index values can be made from 'first etc.)
Array access will be type checked either way.
> Note that array must be a view of a modular type
An array of Booleans for status fields must be a view of a
modular type? I don't understand that.
> with the modulus 2**N, but
> the type itself must remain scalar.
Uh, which type must remain scalar?
>> A definition of the "+" operator, whose meaning(s) is (are)
>> stipulated to be well known, seems impossible:
>
> http://en.wikipedia.org/wiki/Abelian_group ?
That's normative ontology, not empirically justified language
design ;-)
I remember having seen introductory texts covering algebraic
structures that use a symbol other than "+" to denote the additive
operation. I think the authors have a good reason.
One reason might be that they know their audience.
This is an example of empirically justified use of operator symbols.
When good average programmers write programs, are they
structuring their data by conscious use of mathematical
structures like groups? I don't think so.
It is these programmers that need to know the meaning
of "+". This seems vital. Example: C's "definition" of the data
type int and its "+" operation demonstrates what happens
if you are optimistic about mastery of C's "+" basics.
I think Bob's example involving "not" of a mod 3 variable
shows a related issue. (And will be included in a test of advanced
Ada knowledge? ;-P) Because *even* *though* there is a precise
mathematical definition (LRM) of what "not" should return,
referring to some overly complex set of rules explaining
the operation is hardly convincing? Consistency of the language
(with itself) is only an excuse, then.
Georg
> 1. The membership/subset tests look even worse
We're talking about bit maps now, right?
Membership is "if Flags(X)...". Subset is
"if Is_Subset(These_Flags, Those_Flags)...".
The body of Is_Subset says, "(X and Y) = X".
Not so horrible.
> 2. No ranges
> 3. No loops
I don't know what you want ranges and loops for.
If you really want these, you can Unchecked_Convert to
integer (signed range 0..2**something). But only if
something is small.
> 4. No case statement
Yeah, that's a shame. I'd like to have case statements
on bit maps, but also on other composite types.
> 5. Array aggregates as literals aren't not fun
(This | That => True, The_Other => False)
I agree that's ugly, but it beats 2#101#.
> 6. No discriminants of
I'm not sure why you'd want discriminants of bit maps,
but I agree that discriminants should be more general.
> 7. No array index of
Unchecked_Conversion again.
> Note that array must be a view of a modular type with the modulus 2**N, but
> the type itself must remain scalar.
All Ada compilers support:
type Index is range 0..100;
type Bit_Map is array (Index) of Boolean;".
And this:
type Bit_Map is array (Character) of Boolean;".
But no Ada compilers support this:
type Bit_Map is mod 2**100;
> Most of modular types should have no + at all. There should be a readable
> notation for X+1 instead of infernal T'Succ(X)
What notation would you like?
How about:
function Succ (...) return ... renames T'Succ;
?
- Bob
> Is anything wrong with packed arrays of Booleans for status thingies?
No.
In fact, arrays of Booleans (packed or not) are better than modular
types in that they can have an arbitrary number of bits.
If you have 50 flags, and use a modular type, that's not
portable -- it works in GNAT, but not all Ada compilers.
And then if you change your mind and want 70 flags,
you have to rewrite all the code.
And if packed, they should be equally efficient.
- Bob
> On Mon, 09 Aug 2010 09:48:02 -0400, Robert A Duff wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> writes:
>>
>>> The implementation or the idea? Would you agree that objects with some
>>> properties of modular integers have place in Ada programs which do not
>>> interface C?
>>
>> No. I do not like implicit "mod". But that's the essence of
>> modular types.
>>
>> If I ran the circus, then this:
>>
>> M : constant := 2**64;
>> type My_Unsigned is range 0..M-1;
>> pragma Assert(My_Unsigned'Size = 64);
>>
>> X : My_Unsigned := ...;
>> Y : My_Unsigned := (X + 1) mod M;
>>
>> would be legal, and would not raise any exceptions, even when
>> X = M-1. (And I'd want "(X + 1) mod M" to be implemented
>> as a single "add" instruction on a typical 64-bit machine.)
>
> What does "+" above return?
A value of type My_Unsigned. The values of this type are
the infinite set of all integers -- that's already the
case in Ada, except that the RM allows (not requires)
certain calculations to overflow and raise C_E.
>...Is "mod M" a required part of the notation or
> not?
No, not required -- my idea is that if you want to
perform some operation ("mod" or "+" or "*" or anything),
you write that explicitly in the code. No change there
from the way signed integers already work.
>>>> Perhaps they _should_ be unordered, but I won't agree or disagree,
>>>> since I think in an ideal world they should be banished.
>>>
>>> I think they could be fixed.
>>
>> How? And having fixed them, when would you use them?
>> That is, when would you prefer them over signed integers?
>
> Ring buffer indexing,
I prefer an explicit "mod" there.
>...flags (by-value sets actually),
I prefer array-of-Boolean over modular. Even better would
be a proper user-defined abstraction, with notations like "in".
>...cryptography,
Explicit "mod".
> interfacing, communication.
For these, you don't want modular semantics -- you just want
a data type whose representation matches what you're
interfacing/communicating with, such as "type Octet is
range 0..2**8-1;"
> I think that all these usages require different types of modular types with
> different sets of operations. So I would prefer a language extension which
> would allow me construction of such types rather than built-in ones, which
> satisfy no one.
I certainly agree with that. But it sounds like you are
agreeing with my point -- that modular types should not
be in the language.
> I agree. But it rather means that C is wrong defining size_t modular. On
> Ada side it must be Natural_32. Why Ada does not support that? Why is it
> impossible to declare 32-bit range 0..2**32-1 without wrapping (=with
> overflow check)?
Good question.
- Bob
>> 7. No array index of
>
> These refer to arrays in general, not Boolean arrays as sets and,
> I guess, the item numbered 7 follows a general rule:
> Where the array type is visible, its index type is
> visible. (Or else, you have been too lazy and used
> an anonymous index type; still, you have 'first etc.)
> Therefore, the index type can be named without
> indirection, that is, via some array attribute function.
> (Or else, index values can be made from 'first etc.)
> Array access will be type checked either way.
I think Dmitry was complaining in 7 that you can't do this:
type Bit_Map is array(...) of Boolean;
type T is array (Bit_Map) of Something; -- Illegal index type.
whereas if Bit_Map is modular then you CAN do that.
You can, but only if the number of bits is small.
- Bob
I remember I wasn't that happy with C++ overloading, especially
operator overloading. I wrote a 3D vector library only to find out
operator overloading was more obscuring than improving readability,
because too much was happening behind the scenes. So I came back to a
math-assembly-looking sequence of function calls, uglier but easier to
follow and maintain.
> >...learning both Ada and AWS at the same time will
> > probably be way too much.
>
> Well, I'm not sure. To learn a programming language, you need to
> write programs, and also read programs. Jeff's point was that
> reading AWS will help you learn Ada, and that's probably true.
For some strange reason it didn't even occurred to be that the advice
included reading AWS source and not only use it as a library. Still,
AWS looks like an overkill for my web needs.
> By the way, I don't see anything fundamentally wrong with
> your s-expression plans, with s-expressions containing
> raw arrays-of-octets
Thanks for the support, that's what I'm about to do, except I'm still
unsure about what array type to use: range octet, modular octet,
Storage_Element, Stream_Element, something else…
> -- especially since you say you
> already have a lot of stuff using s-exprs. You can layer on
> top of that less-general but more-typed abstractions, probably
> using generics.
Actually I wasn't aware until very recently that the code I usually
put on top of my S-expression parser is indeed called a parser too. So
yeah, basically my S-expression library will be the common core of all
my specialized configuration/template/etc file parsers.
Natacha
> Thanks for the support, that's what I'm about to do, except I'm still
> unsure about what array type to use: range octet, modular octet,
> Storage_Element, Stream_Element, something else
The truth (after all this discussion) is that it doesn't
matter a whole lot.
- Bob
Technically, these are implementation decisions made by an earlier me:
the web server project is intended to be a replacement for an existing
web server, so either I use the same input files, or I write as a part
of the project a converter from the existing format into the new
server data. In both cases, I do have these S-expressions based files,
and I do want to serve HTTP responses based (directly or indirectly)
on them.
> I wasn't saying that implementing an S-expression library, or deciding to use it
> at a low level in your project, was a bad thing. I was viewing this project as a
> separate thing from your S-expression library.
I'm actually viewing it that way too: whatever project I will do next,
it will almost surely rely on some existing S-expression files, so it
make sense to start with general code dealing with S-expressions
before making the final choice about what to do next.
> > Unless I'm very misunderstanding and/or vary naive, I've got the
> > feeling our approaches are not that different, and differ mostly in
> > the order of implementation, which indeed doesn't change that much in
> > the final result.
>
> You may well be right, in which case you'll likely find Ada to be a good fit for
> your way of approaching things.
That's what I think after reading most posts from this threads,
exceptions being mostly Dmitry's posts which often make me feeling
Ada's world is a strange place (to my mind) where I don't belong. I
guess I'll have to dive deeper into Ada before knowing who's right.
Thanks for your advice,
Natacha
Actually past a certain level, they don't explicit mod anymore. I
don't remember hearing many "modulo" in Galois field classes. And
neither when dealing with stuff like an n-bit integer representing a
member of Z/2Z [X] quotiented by a n+1 degree polynomial (Reed-Solomon
stuff). And then you're overloading "+" with something very xor-like…
Hoping I'm not disrupting anything,
Natacha
>>> 7. No array index of
> I think Dmitry was complaining in 7 that you can't do this:
>
> type Bit_Map is array(...) of Boolean;
> type T is array (Bit_Map) of Something; -- Illegal index type.
>
> whereas if Bit_Map is modular then you CAN do that.
Always takes a while until my slow head grasps Dmitry's
non-Ada semantics.
case {T0 x T1 x T3} is
when ... => ...
Static dispatch tables on composite types' values
switching to non-subprogram sequences of instructions
wrapped in a case statement that's likely to have
impressive combinatorial properties...
Is there no better way than this?
Experimenting further, what will be a useful type of
case_statement_alternative's discrete choice list, then?
One that can be explored programmatically?
Georg
In C for such things I usually use a struct bitfield. I guess the Ada
equivalent would be a record of Booleans, with a pragma or something
to have them packed together. Would there be anything wrong with that?
It seems to me more readable than the above IFs, though from what I
understood from the rest of the thread it would make CASE on multiple
bit difficult, and it won't allow slicing (which I'm not sure is a bad
thing, readability-wise). Is there anything else?
Natacha
That's something I'm quite uncomfortable with. However I see the point
of the other related posts here. So my question is, if I assume
Stream_Element is an octet, is there a way to make compilation fail
with an explicit error message when the assumption doesn't hold? C's
preprocessor allows this through a #if on CHAR_BITS and a #error to
stop compilation with a custom message. Any Ada equivalent?
Thanks for your help,
Natacha
I pursued that idea a little further and actually wrote an embryonic
S-Expression library. I'm not entirely satisfied with it because it
uses copy semantics instead of reference semantics and so is probably
inefficient. But it does demonstrate how to read and write
S-Expressions on a stream (e.g. a file). Also, it shows how to
hex-encode and hex-decode blobs, which I've modelled as Storage_Arrays.
You can browse the sources here:
http://green.ada-france.org:8081/branch/changes/org.ludovic-brenta.s_expressions
Enjoy. (the license is the GPLv3 or later).
--
Ludovic Brenta.
It seems to me there are 2 orthogonal issues: signed or unsigned representation,
and overflow checking on or off. In this case, we'd like an unsigned
representation (because we don't need negative values but do need as large a
range as possible) with overflow checking on (we almost always want overflow
checking on).
If integer types had been approached this way, perhaps "wrap around" would only
happen on the base type of types without overflow checking.
--
Jeff Carter
"Run away! Run away!"
Monty Python and the Holy Grail
58
Typically, I would address it using:
type Mode_ID is (Alarm, ...);
type Mode_Set is array (Mode_ID) of Boolean;
for Mode_Set'Component_Size use 1;
Mode : Mode_Set;
if Mode (Alarm) then ...
Of course, I'd usually leave off the 'Component_Size definition; memory is
usually cheap and plentiful these days. Only if I had to match an existing
bit-packed representation, or Mode_ID was very large [array (Integer) of
Boolean, anyone?] would I worry about the representation.
> That's something I'm quite uncomfortable with. However I see the point
> of the other related posts here. So my question is, if I assume
> Stream_Element is an octet, is there a way to make compilation fail
> with an explicit error message when the assumption doesn't hold? C's
> preprocessor allows this through a #if on CHAR_BITS and a #error to
> stop compilation with a custom message. Any Ada equivalent?
Put:
pragma Assert(Stream_Element'Modulus = 2**8);
pragma Assert(Stream_Element'Size = 8);
in any library package that depends on this fact.
This will fail at run time if it's wrong. But it will fail on
EVERY execution of the program, so it's pretty-much just as
good as a compile-time error. Also, if you're lucky,
the compiler will warn you at compile time (and you can
use warnings-as-errors mode if you like).
You might also want to look up the GNAT-specific Compile_Time_Error
and Compile_Time_Warning pragmas.
- Bob
> Always takes a while until my slow head grasps Dmitry's
> non-Ada semantics.
;-)
Dmitry has many "interesting" ideas about what Ada ought to be like,
some (many?) of which I agree with. But it's important to keep
straight the difference between "Ada as it is" and "Ada as Dmitry
(or Bob, or ...) would like it to be". ;-)
> case {T0 x T1 x T3} is
> when ... => ...
>
> Static dispatch tables on composite types' values
> switching to non-subprogram sequences of instructions
> wrapped in a case statement that's likely to have
> impressive combinatorial properties...
I can think of several reasonable ways to implement such a
feature. It would, of course, require a rather massive
change to what "static expression" means.
You can already do "case" on a 64-bit integer,
so the compiler already has to deal with cases where
a simple jump table is infeasible.
> Is there no better way than this?
>
> Experimenting further, what will be a useful type of
> case_statement_alternative's discrete choice list, then?
> One that can be explored programmatically?
Any non-limited type, I guess.
- Bob
> On Aug 8, 11:08 pm, Robert A Duff <bobd...@shell01.TheWorld.com>
> wrote:
>> "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> writes:
>> By the way, one defense of modular types I've heard is that
>> they are used in mathematics. True. But mathematicians do
>> not use _implicit_ mod. They say things like "X = Y (mod N)",
>> which is pronounced "X is congruent to Y (modulo N)".
>> Congruent, not equal.
>
> Actually past a certain level, they don't explicit mod anymore.
Interesting. I never reached that level of math!
>... I
> don't remember hearing many "modulo" in Galois field classes. And
> neither when dealing with stuff like an n-bit integer representing a
> member of Z/2Z [X] quotiented by a n+1 degree polynomial (Reed-Solomon
> stuff). And then you're overloading "+" with something very xor-like
- Bob
> On 08/09/2010 11:32 AM, Natacha Kerensikova wrote:
>>
>> In C for such things I usually use a struct bitfield. I guess the Ada
>> equivalent would be a record of Booleans, with a pragma or something
>> to have them packed together. Would there be anything wrong with that?
>> It seems to me more readable than the above IFs, though from what I
>> understood from the rest of the thread it would make CASE on multiple
>> bit difficult, and it won't allow slicing (which I'm not sure is a bad
>> thing, readability-wise). Is there anything else?
>
> Typically, I would address it using:
>
> type Mode_ID is (Alarm, ...);
>
> type Mode_Set is array (Mode_ID) of Boolean;
> for Mode_Set'Component_Size use 1;
>
> Mode : Mode_Set;
>
> if Mode (Alarm) then ...
>
> Of course, I'd usually leave off the 'Component_Size definition; memory
> is usually cheap and plentiful these days. Only if I had to match an
> existing bit-packed representation, or Mode_ID was very large [array
> (Integer) of Boolean, anyone?] would I worry about the representation.
Well, if you're doing "and"s and "or"s (i.e. intersections and unions),
then the packed representation is likely more efficient. Memory is
cheap in terms of money, but in terms of efficiency, using more
memory can make your program run slower (e.g. worse cache behavior).
I'd use "pragma Pack" instead of the rep clause in this case.
It's guaranteed to do the same thing, but the pragma has more
of a flavor of "for efficiency reasons". I'd use the Component_Size
clause if I were interfacing to some existing representation,
i.e., "for correctness reasons".
- Bob
> On Aug 9, 5:14 pm, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
> wrote:
>> On 09.08.10 16:38, Dmitry A. Kazakov wrote:
>>
>>> Just one example from a huge list, why "in" is not an operation?
>>
>>> if 0 /= (Mode and Alarm) then -- Isn't it awful?
>>
>>> why am I not allowed to have:
>>
>>> if Alarm in Mode then
>>
>> Is anything wrong with packed arrays of Booleans for status thingies?
I define a function "/" that takes the types of Mode and Alarm and returns
Boolean, so I can say things like:
if Alarm/Mode then ...
Or (real code)
if INS.jump_target/starts_a_code_block then ...
--
Bill Findlay
<surname><forename> chez blueyonder.co.uk
> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> writes:
>
>> 1. The membership/subset tests look even worse
>
> We're talking about bit maps now, right?
Yes
> Membership is "if Flags(X)...". Subset is
> "if Is_Subset(These_Flags, Those_Flags)...".
> The body of Is_Subset says, "(X and Y) = X".
>
> Not so horrible.
But a lot of manual work. The type should just do what you expect it to do.
>> 2. No ranges
>> 3. No loops
>
> I don't know what you want ranges and loops for.
A subset of flags. E.g. Critical_Errors, Warnings, Any_Error.
> If you really want these, you can Unchecked_Convert to
> integer (signed range 0..2**something). But only if
> something is small.
>
>> 4. No case statement
>
> Yeah, that's a shame. I'd like to have case statements
> on bit maps, but also on other composite types.
Yes, it is a type property that the domain is a set of identifiable values,
in fact, some mapping to an enumeration.
The language should provide means to implement this interface with any
type. E.g. why I cannot specify this interface for String to be able to
write:
case Get_Token is
when "type" => ...
when "package" => ...
when others => raise Syntax_Error;
end case;
>> 6. No discriminants of
>
> I'm not sure why you'd want discriminants of bit maps,
> but I agree that discriminants should be more general.
type Message (Urgency : Alarm_Type) is ...
>> Note that array must be a view of a modular type with the modulus 2**N, but
>> the type itself must remain scalar.
>
> All Ada compilers support:
>
> type Index is range 0..100;
> type Bit_Map is array (Index) of Boolean;".
>
> And this:
>
> type Bit_Map is array (Character) of Boolean;".
>
> But no Ada compilers support this:
>
> type Bit_Map is mod 2**100;
Why not? It is not that difficult to implement mod 2**100 or range
-2**1_000_000..2**1_000_000+1. I think the standard should require it on
any hardware.
>> Most of modular types should have no + at all. There should be a readable
>> notation for X+1 instead of infernal T'Succ(X)
>
> What notation would you like?
>
> How about:
>
> function Succ (...) return ... renames T'Succ;
It should be something more like an operator. The problem is that generics
should understand it without passing around as an extra parameter.
Interesting, though it seems your code won't like strings containing a
double-quote.
However, the principal conclusion I draw from reading this is that I'm
still far from having the Ada level required to write anything like
that (since I can't even understand everything while reading it).
Still, from what I understood, I'm not fond of the idea of having to
write type-to-atom conversion procedure while they already exist.
It occurred to me that most types already know how to serialize
themselves, through the Stream subsystem. Strings, Unbounded_strings,
Integers, Float, etc, can all be read from and written into a stream.
So the code to convert them into file-suitable Stream_Element_Array
exist. Won't be a pity not to reuse it?
Pursuing this idea leads to a library quite different from what I had
in mind when I started this thread. Let's call it Sexp_Stream, which
would be derived from Root_Stream_Type. Then Read and Write procedure
are called (from what I understand) by type'Read and type'Write after
said type has turned itself into a Stream_Element_Array. That would
cover atom I/O, and some procedures must be added to cover the list I/
O part.
I think I can imagine how the writing part would happen:
after having initialized a Sexp_Stream object with various
information, including an underlying stream where the S-expression
would be actually written, the application would use the Write
procedure to append an object as an atom to the current node, and the
newly appended atom would become the current node. List procedure
would be called something like List_Open, to open a new list and
append following nodes into it, and List_Close, to close the current
list and come back to the previous list.
Using my tcp-connect example, it would look like this (please excuse
Ada syntax error and focus on the idea):
Sexp_Stream.Open(SxStream, underlying_stream, ...);
Sexp_Stream.Open_List(SxStream);
String'Write(SxStream, "tcp-connect");
Sexp_Stream.Open_List(SxStream);
String'Write(SxStream, "host");
String'Write(SxStream, Hostname);
Sexp_Stream.Close_List(SxStream);
Sexp_Strean.Open_List(SxStream);
String'Write(SxStream, "port");
Integer'Write(SxStream, PortNumber);
Sexp_Stream.Close_List(SxStream);
Sexp_Stream.Close_List(SxStream);
Now I have to admit I don't know how to specify the reading part of
such a Sexp_Stream. I guess I would need the notion of a current node,
with a function telling the application whether the current node is a
list or an atom, type'Read converting the current atom into said type
(and raising an exception when the current atom is a list), and some
procedures to get to the next node, to the frist node following the
current list (i.e. up one level), and when the current node is a list
to go the first node of the list (i.e. one level deeper).
However such a library wouldn't cover all my S-expression needs,
because I sometimes need to keep S-expressions into memory, so there
would be another package for in-memory S-expressions, which would have
to interact nicely with Sexp_Stream.
So, how does it sound? Is it totally crazy? Is it totally not-Ada? Is
there something right in there?
For Jeffry Carter (and anybody interested in helping me understand the
Ada way): here is how it looks like when I haven't thought much about
it. Notice that all this is completely from the point of view of the
application using the package, with very few implementation choices.
If I knew Ada, wouldn't these explanations be pretty much the contents
of the specification file, with about everything from the body being
still to invent? How Ada-ish is that thought process?
Thanks in advance for your reviews of my ideas,
Natacha
> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> writes:
>
>> On Mon, 09 Aug 2010 09:48:02 -0400, Robert A Duff wrote:
>>
>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> writes:
>>>
>>>> The implementation or the idea? Would you agree that objects with some
>>>> properties of modular integers have place in Ada programs which do not
>>>> interface C?
>>>
>>> No. I do not like implicit "mod". But that's the essence of
>>> modular types.
>>>
>>> If I ran the circus, then this:
>>>
>>> M : constant := 2**64;
>>> type My_Unsigned is range 0..M-1;
>>> pragma Assert(My_Unsigned'Size = 64);
>>>
>>> X : My_Unsigned := ...;
>>> Y : My_Unsigned := (X + 1) mod M;
>>>
>>> would be legal, and would not raise any exceptions, even when
>>> X = M-1. (And I'd want "(X + 1) mod M" to be implemented
>>> as a single "add" instruction on a typical 64-bit machine.)
>>
>> What does "+" above return?
>
> A value of type My_Unsigned. The values of this type are
> the infinite set of all integers -- that's already the
> case in Ada, except that the RM allows (not requires)
> certain calculations to overflow and raise C_E.
OK, but it is not modular, it is natural. And it is just a different case
where modular types should not be used in first place, like size_t.
Equivalent Ada types like Storage_Offset are not modular.
>>...Is "mod M" a required part of the notation or
>> not?
>
> No, not required -- my idea is that if you want to
> perform some operation ("mod" or "+" or "*" or anything),
> you write that explicitly in the code. No change there
> from the way signed integers already work.
Yes, but not for modular types! (:-))
>>>>> Perhaps they _should_ be unordered, but I won't agree or disagree,
>>>>> since I think in an ideal world they should be banished.
>>>>
>>>> I think they could be fixed.
>>>
>>> How? And having fixed them, when would you use them?
>>> That is, when would you prefer them over signed integers?
>>
>> Ring buffer indexing,
>
> I prefer an explicit "mod" there.
This is will be a source of errors like forgetting mod or specifying wrong
modulus. The idea of modular types is to prevent such errors.
>>...flags (by-value sets actually),
>
> I prefer array-of-Boolean over modular. Even better would
> be a proper user-defined abstraction, with notations like "in".
Agree
>>...cryptography,
>
> Explicit "mod".
Disagree.
>> interfacing, communication.
>
> For these, you don't want modular semantics -- you just want
> a data type whose representation matches what you're
> interfacing/communicating with, such as "type Octet is
> range 0..2**8-1;"
The problem is that these things require both array-of-Boolean view and
arithmetic. I agree that when arithmetic is used, then it has to be wide.
E.g. when interpreting sequences of octets as little/big endian numbers, we
never use modular arithmetic. But integer arithmetic is incompatible with
array/set view. I.e. once you added something it is another type integer:
function "+" (Left, Right : Octet) return Universal_Integer;
>> I think that all these usages require different types of modular types with
>> different sets of operations. So I would prefer a language extension which
>> would allow me construction of such types rather than built-in ones, which
>> satisfy no one.
>
> I certainly agree with that. But it sounds like you are
> agreeing with my point -- that modular types should not
> be in the language.
Yes, but what I want will never become Ada, so I prefer that minimum I can
get. (:-))
> When good average programmers write programs, are they
> structuring their data by conscious use of mathematical
> structures like groups? I don't think so.
They do it unconsciously. It is no matter if we do it willingly or not,
there is only one right way.
> It is these programmers that need to know the meaning
> of "+". This seems vital. Example: C's "definition" of the data
> type int and its "+" operation demonstrates what happens
> if you are optimistic about mastery of C's "+" basics.
I don't see your point. It is broken in C because it contradicts to the
mathematical meaning of. So what do you propose? To keep it broken? Because
people better understand broken stuff? They don't.
> I think Bob's example involving "not" of a mod 3 variable
> shows a related issue. (And will be included in a test of advanced
> Ada knowledge? ;-P) Because *even* *though* there is a precise
> mathematical definition (LRM) of what "not" should return,
> referring to some overly complex set of rules explaining
> the operation is hardly convincing? Consistency of the language
> (with itself) is only an excuse, then.
Nope, "not" is just broken for mod 3. It is a lattice operation, it has
nothing to do with rings.
The problem with modular types is that one tried to build a submarine which
could fly. It sails like an aircraft and flies like a ship.
> OK, but it is not modular, it is natural. And it is just a different case
> where modular types should not be used in first place, like size_t.
Right, the whole point of this part of the discussion is: I was
trying to show how Ada's signed integer types could be
improved to cover cases where Ada's modular types are commonly
used.
> This is will be a source of errors like forgetting mod or specifying wrong
> modulus. The idea of modular types is to prevent such errors.
I see that point, which was also made by others. I at least
half-agree with it.
> Yes, but what I want will never become Ada, so I prefer that minimum I can
> get. (:-))
;-)
- Bob
>>> 2. No ranges
>>> 3. No loops
>>
>> I don't know what you want ranges and loops for.
>
> A subset of flags. E.g. Critical_Errors, Warnings, Any_Error.
I don't understand this part. If subrange of a modular type
used as a bit map is not a coherent subset of the flags.
> Yes, it is a type property that the domain is a set of identifiable values,
> in fact, some mapping to an enumeration.
>
> The language should provide means to implement this interface with any
> type. E.g. why I cannot specify this interface for String to be able to
> write:
>
> case Get_Token is
> when "type" => ...
> when "package" => ...
> when others => raise Syntax_Error;
> end case;
It would be nice.
>> But no Ada compilers support this:
>>
>> type Bit_Map is mod 2**100;
>
> Why not? It is not that difficult to implement mod 2**100 or range
> -2**1_000_000..2**1_000_000+1. I think the standard should require it on
> any hardware.
I agree. But you and I are in a small minority here!
The frustrating thing to me is that every Ada compiler must support
arbitrary-range integer arithmetic at compile time, but can't make
that available in a portable way at run time.
- Bob
>> Yeah, that's a shame. I'd like to have case statements
>> on bit maps, but also on other composite types.
>
> Yes, it is a type property that the domain is a set of identifiable values,
> in fact, some mapping to an enumeration.
>
> The language should provide means to implement this interface with any
> type. E.g. why I cannot specify this interface for String to be able to
> write:
>
> case Get_Token is
> when "type" => ...
> when "package" => ...
> when others => raise Syntax_Error;
> end case;
I think that you want pattern matching
(http://en.wikipedia.org/wiki/Standard_ML#Algebraic_datatypes_and_pattern_matching)
With the OCaml syntax, you can do
match expression with
| (Foo bar, [Int 4, u]) -> <some code>
| _ -> <catchall code>
and the engine will attempt unification (
http://en.wikipedia.org/wiki/Unification_%28computing%29) between
expression and (Foo bar, [Int 4, u]), and if it succeeds, bar and u will
be variables (immutable in caml) of the right value and right type so as
to "fill in the blanks". (_ is a special-purpose variable whose content
is discarded, because in the catch-all code we don't need a new name for
expression -- here _ will match the whole expression of course).
It is a pretty powerfull thing, and one of the reasons I keep using
functionnal languages (I don't know if Lisp does it, though). Structured
pattern matching begins to kick ass when you use big typed trees, like
for instance ASTs.
I don't know enough ada to even imagine how you would implement such a
thing in Ada, if possible, or whether it already exists (but I gather no
from what you say).
(Sorry to intrude in an Ada newsgroup, I was just passing by)
_FrnchFrgg_
(There is one right way per person. Some agree on the right way.
Even those who believe(!) in the one right way don't know(!)
for sure ... ;-)
> ["+"] is broken in C because it contradicts to the
> mathematical meaning of.
The mathematical meaning of "+" is ambiguous or context dependent.
It depends on the definitions of "+" etc.
Programmers usually pick one meaning of "+", which then turns out
to be the traditional mathematical preconception. There is no way
to fix preconceptions other than through shock, pain, and learning.
The thing to be unlearned has a name: "+".
> So what do you propose? To keep it broken? Because
> people better understand broken stuff? They don't.
Use operator symbols that clearly separate:
- traditional mathematical addition
- computer addition
If you don't keep the concepts apart visually, programmers see "+",
see something seemingly familiar, and will think that "+" means "+".
But "+" doesn't mean "+", and this causes all kinds of problems.
Expensive problems.
Even more so when "+" is considered magic and is thus admired.
C does use "+" in all sorts of places, overloading "+" for
arithmetic and pointers. You can't do anything substantial in
C without "+", explicitly or implicitly.
int f() {}
int (*g)() = f + 4;
int main()
{
return g();
}
I'm sure someone familiar with C will find it easy to explain
why this is a harmless program in most implementations of C...
I believe APL's intent shows a way to rectify the imprecise
operator names issue.
Georg
Yes. And arbitrary-range rational arithmetic. Every compiler that supports the
container library has a hash table and some sort of searchable structure
(balanced tree, skip list) implementation. None of these are portably available
to developers. This is a mistake, IMO.
> I don't know enough ada to even imagine how you would implement such a
> thing in Ada, if possible, or whether it already exists (but I gather no
> from what you say).
For the basic types like discrete types, value matching
is fully supported and re1quired.
type Color is (Red, Green, Blue);
C : Color;
case C is
when Red => Add_Green;
when Green => Add_Blue;
end case;
7. case C is
|
>>> missing case value: "BLUE"
Some of the structure matching can, I think, be handled
by choosing meaningful type hierarchies (Tree, Leaf and
the like) and use O-O overridings (for Tree and Leaf,
respectively).
Georg
Right; that's a minor detail and easy to correct by e.g. preceding
embedded double-quotes with a backslash.
> However, the principal conclusion I draw from reading this is that I'm
> still far from having the Ada level required to write anything like
> that (since I can't even understand everything while reading it).
I think I could simplify the library with some additional thought. What
you see is but a first iteration :)
That's exactly what my library is for: it lets you build S-Expressions
in memory and serialize them to a stream (adding any parentheses or
quoting necessary along the way) and, conversely, read from a stream and
build in-memory S-Expressions. At this level, the value of any atom is
an unbounded_string; the semantics are left to the application.
The To_Atom and From_Atom functions are there to help you build a
high-level layer on top of the S-Expression layer. The "test.adb"
program contains such a higher layer: it defines a (simple) record type
containing an integer and a boolean (enumerated type) and encodes it
into an S-Expression containing strings, and back.
I'll add a generic to encode arbitrary hashed_maps into S-Expressions of
the form ((key value) (key value) ...) later on, if you're interested.
> So, how does it sound? Is it totally crazy? Is it totally not-Ada? Is
> there something right in there?
I don't think "Ada" or "not Ada" should be the question; the proper
question should be "is it reusable and maintainable?"
> For Jeffry Carter (and anybody interested in helping me understand the
> Ada way): here is how it looks like when I haven't thought much about
> it. Notice that all this is completely from the point of view of the
> application using the package, with very few implementation choices.
> If I knew Ada, wouldn't these explanations be pretty much the contents
> of the specification file, with about everything from the body being
> still to invent? How Ada-ish is that thought process?
>
>
> Thanks in advance for your reviews of my ideas,
> Natacha
--
Ludovic Brenta.
Not sure if it still exists in the real world, but the compiler we did for
the Unisys mainframes used Stream_Element'Size = 9. (The Unisys mainframes
are 36-bit machines.) Stream_Element'Size = 8 would have made the code for
handling arrays awful.
Similarly, Character'Size = 9 on that machine. While the upper bit was not
used in the representation, this is surely not an Octet.
> But any machine that's hooked up to any sort of network
> is going to be talking Octets, I think. Does anybody know of
> counterexamples?
The network interfaces on the Unisys machines required binary or text mode
settings: in text mode, the extra bit was just dropped; in binary mode all
of the bits were sent (not sure precisely how that was done, it was in the
OS).
> Perhaps Stream_Element should have been defined as exactly
> 8 bits. Perhaps it should have been called "Octet".
That would have made a compiler for the Unisys machines impossible; it would
have made streaming impossible. There is no sane way to put 36-bit values
into Octets - the only way that would have worked would have been to use
16-bits for every 9-bit byte.
Whether this is a significant consideration today (in 2010) is debatable,
but it surely was a real consideration back in 1993-4. So Ada 95 could not
have made this choice.
Randy.
In general, describing it from the point of view of "the application using the
package" (often referred to as "the client") is the purpose of the
specification. The body is for the implementation, usually best left until the
specification is finished.
On the other hand, your proposed operations seem unnecessarily low level for the
client. You have an internal representation, which is probably hidden from the
client, and an external representation, and operations to write an object of the
internal representation to that external representation and read the external
representation to create an object of the internal representation. Thus, your
client should probably invoke a single operation to write an object, and another
single operation to read.
What was Storage_Unit? 36?
There was a proposal to support X'Succ (effectively as a procedure), X being
an object name. That eliminates most of the noise, but isn't perfect as it
can't be easily used in a larger expression. I was in favor of that (along
with X'Image, which is something that GNAT has supported for years).
Randy.
What have you done with Dmitry?? You can't be the *real* Dmitry! :-)
Array-of-bytes views and arithmetic views are of clearly different types,
with different sets of operations. These shouldn't be mixed, logically or
any other way. If you need to go between these views, you need some sort of
type conversion (or Unchecked_Conversion or Stream'Read or...). Thus, it is
*never* necessary to do operations on both views at once, and it is
irrelevant what the "math" operations for octets is. If Ada gets anything
wrong about this, it is that it has math operations at all for
stream_elements.
The Dmitry I now certainly understands this, so I wonder what you've done
with him... :-)
Randy.
So I was doing it the right way, right? (I still have almost no clue
about how to make the body, while the specification looks relatively
clear to me, even though I can't yet turn in into real Ada).
> On the other hand, your proposed operations seem unnecessarily low level for the
> client. You have an internal representation, which is probably hidden from the
> client, and an external representation, and operations to write an object of the
> internal representation to that external representation and read the external
> representation to create an object of the internal representation. Thus, your
> client should probably invoke a single operation to write an object, and another
> single operation to read.
I have to admit I can't imagine how to do it in any higher level than
this. Or are you suggesting to replace my sequence of procedure calls
by something like:
NewList(AtomFromString("tcp-connect",
NewList(AtomFromString("host", AtomFromString("foo,example",
Nil)),
NewList(AtomFromString("port", AtomFromPort(PortNb, Nil)), Nil)),
Nil);
Or maybe we don't the same thing with "object"? Here I have five
objects to put into a S-expression as atoms, and they are indeed
fairly low-level objects. A higher-level object would be a record type
called e.g. TCP_Info, whose reading and writing primitives handle the
addition/matching of "host" and "port" and maybe "tcp-connect". So
there would be a single operation to read or write such a high-level
object by code using it. However the mid-level TCP_Whatever package
providing the high-level object and its operation would in turn use
the low-level Sexp_Stream (or anything else I might want to experiment
with, that's the whole point of modularity).
Would you mind hinting me about to do specify my library in a better
way?
> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:1y1c8zzqmcer5.p...@40tude.net...
> ...
>>> For these, you don't want modular semantics -- you just want
>>> a data type whose representation matches what you're
>>> interfacing/communicating with, such as "type Octet is
>>> range 0..2**8-1;"
>>
>> The problem is that these things require both array-of-Boolean view and
>> arithmetic. I agree that when arithmetic is used, then it has to be wide.
>> E.g. when interpreting sequences of octets as little/big endian numbers,
>> we
>> never use modular arithmetic. But integer arithmetic is incompatible with
>> array/set view.
>
> What have you done with Dmitry?? You can't be the *real* Dmitry! :-)
Brainwashed me? (:-))
> Array-of-bytes views and arithmetic views are of clearly different types,
> with different sets of operations. These shouldn't be mixed, logically or
> any other way. If you need to go between these views, you need some sort of
> type conversion (or Unchecked_Conversion or Stream'Read or...). Thus, it is
> *never* necessary to do operations on both views at once, and it is
> irrelevant what the "math" operations for octets is. If Ada gets anything
> wrong about this, it is that it has math operations at all for
> stream_elements.
Right, but there is no contradiction because it is more than one type
involved. What I meant is:
type Octet is ...;
-- Array interface to access bits of the octet (not Ada)
type Bit_Index is range 1..8;
function "()" (Unit : Octet; Bit : Bit_Index) return Boolean;
procedure "()" (Unit : in out Octet; Bit : Bit_Index; Value : Boolean);
-- Arithmetic interface, immediately leads out of octets (not Ada)
function "+" (Left, Right : Octet) return Universal_Integer;
function "-" (Left, Right : Octet) return Universal_Integer;
...
So when you write:
Little_Endian_Value := Octet_1 + Octet_2 * 256;
There result is not an octet, as it would be with a modular type. Presently
it is just broken.
> Dmitry A. Kazakov wrote:
>> there is only one right way.
>
> (There is one right way per person. Some agree on the right way.
> Even those who believe(!) in the one right way don't know(!)
> for sure ... ;-)
It is the purpose of science to find the right way. No, it wondered so
many, why is there only one mathematics, a product of pure imagination? But
it is out of comp.lang.ada scope.
>> ["+"] is broken in C because it contradicts to the
>> mathematical meaning of.
>
> The mathematical meaning of "+" is ambiguous or context dependent.
It is exact in each context. C's authors didn't mean matrix addition when
they designed +.
> It depends on the definitions of "+" etc.
Meaning depends on definitions? There must be something wrong in this
sentence...
> > So what do you propose? To keep it broken? Because
>> people better understand broken stuff? They don't.
>
> Use operator symbols that clearly separate:
>
> - traditional mathematical addition
> - computer addition
I don't know what computer addition is. Computer does not count. Computer
*models* counting. Function "+" is a model of what?
> int f() {}
> int (*g)() = f + 4;
> int main()
> {
> return g();
> }
I have no problem with this. They don't model integers here. They do
software faults. As such + perfectly serves the purpose. (:-))
> Le 09/08/2010 21:59, Dmitry A. Kazakov a écrit :
>
>>> Yeah, that's a shame. I'd like to have case statements
>>> on bit maps, but also on other composite types.
>>
>> Yes, it is a type property that the domain is a set of identifiable values,
>> in fact, some mapping to an enumeration.
>>
>> The language should provide means to implement this interface with any
>> type. E.g. why I cannot specify this interface for String to be able to
>> write:
>>
>> case Get_Token is
>> when "type" => ...
>> when "package" => ...
>> when others => raise Syntax_Error;
>> end case;
>
> I think that you want pattern matching
> (http://en.wikipedia.org/wiki/Standard_ML#Algebraic_datatypes_and_pattern_matching)
No, I don't believe in type inference, in fact I strongly disbelieve in it.
What I want is a manifested strong type system properly done. The above
does not require matching all types involved were known in advance. It
requires some enumeration of strings (like hash function), which the
compiler could use in order to build the case statement.
> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> writes:
>
>>>> 2. No ranges
>>>> 3. No loops
>>>
>>> I don't know what you want ranges and loops for.
>>
>> A subset of flags. E.g. Critical_Errors, Warnings, Any_Error.
>
> I don't understand this part. If subrange of a modular type
> used as a bit map is not a coherent subset of the flags.
Yes. There are two issues here:
1. The modular type used for flags has ranges with the meaning a subset,
like Character subsets: 'A'..'Z'. The modular type used for ring arithmetic
should have ranges with the meaning of a smaller ring. They are two
absolutely different things. They cannot be united in just one type.
2. There should be elements of the set. Individual flags must have a
separate type. A range should be constructed using the element's type and
not sets. When sets are used, they must be singletons, e.g. 1, 2, 4... The
range 3..7 is just a mess because 3 and 7 aren't singletons.
>> case {T0 x T1 x T3} is
>> when ... => ...
>> Experimenting further, what will be a useful type of
>> case_statement_alternative's discrete choice list, then?
>> One that can be explored programmatically?
>
> Any non-limited type, I guess.
I was fancying effects that programmers might want to inject into
the switching machinery of the case statement? In the sense of
what I think they are calling aspect oriented programming.
type C is ... record ... end record;
type D is new C and Aspects with null record; -- for case
-- or use aspect notation for D?
x : D := (C with null record);
case x is
when D'(comp_1, ..., comp_n) => ...
Generalizing, could we have aspects of types permitting the
specification of effects to be triggered, for example when
a primitive op is called? (Much like SNOBOL-4's TRACE feature
works.)
Georg
I have to admit I tend to confuse these two kind of bytes. Considering
my C-bitching background, I call "byte" C's char, i.e. the smallest
memory unit addressable independently (from the language point of
view, I'm well aware modern i386 have memory bus much wider than 8
bits, and transfer whole chunks at one time, but the CPU is following
the "as-if" rule so it looks like 8-bits are addressed independently,
hence 8-bit chars on these platforms). My confusion is that I often
take for granted that disk files can be addressed the same way, while
I perfectly know it's not always the case (e.g. DS has a 16-bit GBA
bus and 8 bit operations have to load the whole 16 bit chunk, just
like operating on 4-bit nibbles on i386).
It must be because I've never thought of disk I/O as unbuffered, and
the buffer does follow RAM rules of addressing.
> > S-expressions are not a format on top or below that, it's a format
> > *besides* that, at the same level. Objects are serialized into byte
> > sequences forming S-expression atoms, and relations between objects/
> > atoms are serialized by the S-expression format. This is how one get
> > the canonical representation of a S-expression.
>
> I thought you wanted to represent *objects* ... as S-sequences?
It depends what you call object. Here again, my vocabulary might has
been tainted by C Standard. Take for example a record, I would call
each component an object, as well as the whole record itself. I guess
I would call "object" an area of memory meaning something for the
compiler (e.g. a byte in some multibyte internal representation of
anything is not an object, while the byte sequence in RAM used by the
compiler to know what an access variable accesses is an object).
A S-expression atom must be build from something, so from this
definition it comes that whatever is turned into a S-expression atom
is an object (Well one could make an atom from multiple object but I
don't think it's a good idea). So there are objects meant to end up in
a single S-expression atom.
There are indeed objects that I want to represent as more elaborate S-
expressions. For example I wouldn't store a record into a single atom,
I'd rather define a S-expression specification to represent it and
store only simple components into atoms. But that's purely a personal
choice, motivated by the preference for human-readable disk files.
> Where these representations are supposed to live? In the RAM? Why are you
> then talking about text files, configurations and humans reading something?
> I cannot remember last time I read memory dump...
Ultimately these representations are supposed to be stored on disk or
transferred over a network or things like that, hence the need to be
built from serialized representations of objects.
However there is processing I might want to perform that require them
to be buffered in memory. So they temporary live in RAM until they are
stored or sent.
Text file, configurations and human reading somethings are details of
some applications of S-expressions, that might or might not be
relevant in a discussion about the deep nature of what a S-expression
is. But they have to be taken into consideration when a S-expression
library is written, lest the library turn out unsuitable for these
applications.
> >> The difference is that Character represents code points and octet does
> >> atomic arrays of 8 bits.
>
> > Considering Ada's Character also spans over 8 bits (256-element
> > enumeration), both are equivalent, right?
>
> Equivalent defiled as? In terms of types they are not, because the types
> are different. In terms of the relation "=" they are not either, because
> "=" is not defined on the tuple Character x Unsigned_8 (or whatever).
Sorry, "equivalent" in the mathematical that there is a bijection
between the set of Characters and the set of Octets, which allows to
use any of them to represent the other. Agreed, this a very week
equivalence, it just means there are exactly as many octet values as
Character values.
On the other hand, Storage_Element and Character are not bijection-
equivalent because there is no guarantee they will always have the
same number of values, even though they often do.
> > Actually I've started to wonder whether Stream_Element might even more
> > appropriated: considering a S-expression atom is the serialization of
> > an object, and I guess objects which know how to serialize themselves
> > do so using the Stream subsystem, so maybe I could more easily
> > leverage existing serialization codes if I use Stream_Element_Array
> > for atoms.
>
> Note that Stream_Element is machine-depended as well.
I'm sadly aware of that. I need an octet-sequence to follow the S-
expression standard, and there is here an implementation trade-off:
assuming objects already know how to serialize themselves into a
Stream_Element_Array, I can either code a converter from
Stream_Element_Array to octet-sequence, or reinvent the wheel and code
a converter for each type directly into an octet-sequence. For some
strange reason I prefer by far the first possibility.
> > But then I don't know whether it's possible to have object
> > hand over a Stream_Element_Array representing themselves,
>
> This does not make sense to me, it is mixing abstractions:
> Stream_Element_Array is a representation of an object in a stream.
> Storage_Array might be a representation of in the memory. These are two
> different objects. You cannot consider them same, even if they shared
> physically same memory (but they do not). The whole purpose of
> serialization to a raw stream is conversion of a Storage_Array to
> Stream_Element_Array. Deserialization is a backward conversion.
I don't consider them the same, otherwise I wouldn't be pondering
about which one to use.
If it helps, you can think of S-expressions as a standardized way of
serializing some relations between objects. However the objects still
have to be serialized, and that's outside of the scope of S-
expressions. From what I understood, the existing serializations of
objects use Stream_Element_Array as a low-level type. So the natural
choice for serializing the relations seems to be taking the
Stream_Element_Array from each object, and hand over to the lower-
level I/O a unified Stream_Element_Array.
Does it make sense or am I completely missing something?
> The point is that you never meet 80 before knowing that this is a "port",
> you never meet "port" before knowing it is of "tcp-connect". You always
> know all types in advance. It is strictly top-down.
Right, in that simple example it the case. It is even quite often the
case, hence my thinking about a Sexp_Stream in another post, which
would allow S-expression I/O without having more than a single node at
the same time in memory.
But there are still situations where S-expression have to be stored in
memory. For examples the templates, where S-expressions represent a
kind of limited programming language that is re-interpreted for each
template extension. Either store the S-expression in memory or re-read
the file form disk (which doesn't work efficiently when there is more
than one template in a file).
> > Thanks for your patience with me,
>
> You are welcome. I think from the responses of the people here you see that
> the difference between Ada and C is much deeper than begin/end instead of
> curly brackets.
Of course, if there was no other difference I would have stayed in C
(I still prefer brackets rather than words).
> Ada does not let you through without a clear concept of
> what are you going to do. Surely with some proficiency one could write
> classical C programs in Ada, messing everything up. You could even create
> buffer overflow in Ada. But it is difficult for a beginner...
I'm not here to Cifiy Ada. C is undoubtedly the best language to do C-
styled stuff.
I felt a mismatch between the nature of C and my way of programming. I
got interested in Ada (rather than in any language in the plethora of
existing languages out there) because it seemed to have what I lacked
in C while still having what I like in C. The former being roughly the
emphasis on robustness, readability and quality; the latter being
roughly the low-level (as in "close to the hardware", at least close
enough not to rule out programming for embedded platforms and system
programming) and the performance. The previous phrase lacking a lot of
irrational elements that I don't manage to word and that can be
summarized as "personal taste".
Now I can't explain why your posts often make me feel Ada is
completely out of my tastes in programming languages, while other
people's posts often make me feel Ada is probably an improvement over
C in term of personal preferences. However I have the feeling I got
much more personal improvement from your posts (improvement still
meaningful even if I eventually drop Ada to come back to C), and I'm
grateful for that.
>> Note that Stream_Element is machine-depended as well.
>
> I'm sadly aware of that. I need an octet-sequence to follow the S-
> expression standard, and there is here an implementation trade-off:
> assuming objects already know how to serialize themselves into a
> Stream_Element_Array, I can either code a converter from
> Stream_Element_Array to octet-sequence, or reinvent the wheel and code
> a converter for each type directly into an octet-sequence. For some
> strange reason I prefer by far the first possibility.
In any case, these small articles on data representation might
be worth reading:
http://www2.adacore.com/2008/03/03/gem-27/
http://www2.adacore.com/2008/03/17/gem-28/
Georg
> On Aug 9, 12:56 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Mon, 9 Aug 2010 02:55:03 -0700 (PDT), Natacha Kerensikova wrote:
>>> On Aug 8, 5:15 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
>>> wrote:
>>> S-expressions are not a format on top or below that, it's a format
>>> *besides* that, at the same level. Objects are serialized into byte
>>> sequences forming S-expression atoms, and relations between objects/
>>> atoms are serialized by the S-expression format. This is how one get
>>> the canonical representation of a S-expression.
>>
>> I thought you wanted to represent *objects* ... as S-sequences?
>
> It depends what you call object. Here again, my vocabulary might has
> been tainted by C Standard. Take for example a record, I would call
> each component an object, as well as the whole record itself.
That's OK. Using this definition S-sequence in the memory is an object.
Which was the question: what was wrong with the first object so that you
wanted another instead?
My first take was S-sequence used as an object presentation outside the
memory, because you used the word "format". Now it looks as a solution
before the problem. You seem going to convert objects to S-sequences *in*
the memory and then dump the result them into files. Is it so? What was the
problem then? Because it cannot work without a conversion between
S-sequence in the memory (object) and S-sequence in the file
(representation). Why do you need S-sequence in the memory, while dumping
objects directly into files as S-sequences (if you insist on having them)
is simpler, cleaner, thinner, faster.
>>>> The difference is that Character represents code points and octet does
>>>> atomic arrays of 8 bits.
>>
>>> Considering Ada's Character also spans over 8 bits (256-element
>>> enumeration), both are equivalent, right?
>>
>> Equivalent defiled as? In terms of types they are not, because the types
>> are different. In terms of the relation "=" they are not either, because
>> "=" is not defined on the tuple Character x Unsigned_8 (or whatever).
>
> Sorry, "equivalent" in the mathematical that there is a bijection
> between the set of Characters and the set of Octets, which allows to
> use any of them to represent the other. Agreed, this a very week
> equivalence, it just means there are exactly as many octet values as
> Character values.
No it is OK. Character can be used to represent octets. Ada provides means
for that, e.g.:
type Octet is private; -- Interface
private
type Octet is new Character; -- Implementation
> On the other hand, Storage_Element and Character are not bijection-
> equivalent because there is no guarantee they will always have the
> same number of values, even though they often do.
Yes.
>>> Actually I've started to wonder whether Stream_Element might even more
>>> appropriated: considering a S-expression atom is the serialization of
>>> an object, and I guess objects which know how to serialize themselves
>>> do so using the Stream subsystem, so maybe I could more easily
>>> leverage existing serialization codes if I use Stream_Element_Array
>>> for atoms.
>>
>> Note that Stream_Element is machine-depended as well.
>
> I'm sadly aware of that. I need an octet-sequence to follow the S-
> expression standard, and there is here an implementation trade-off:
> assuming objects already know how to serialize themselves into a
> Stream_Element_Array, I can either code a converter from
> Stream_Element_Array to octet-sequence, or reinvent the wheel and code
> a converter for each type directly into an octet-sequence. For some
> strange reason I prefer by far the first possibility.
That depends on your goal. Streams are machine-dependent. Streams of octets
are not. If you want to exchange objects in the form of S-sequences across
the network you have to drop standard stream implementations of the objects
and replace them with your own, based on the stream octets. In this case
you will not use Stream_Element_Array directly. You will read and write
octets, by Octet'Read and Octet'Write. Provided that octet streams work,
which is about 99.9%, I guess. When they are not capable to handle octets
properly, you will have to implement I/O manually. If you wrap Octet'Read
into a function, you will be able to exchange the I/O layer without
affecting the upper ones. If we look at all this mechanics we will see the
old good OSI model.
> If it helps, you can think of S-expressions as a standardized way of
> serializing some relations between objects. However the objects still
> have to be serialized, and that's outside of the scope of S-
> expressions. From what I understood, the existing serializations of
> objects use Stream_Element_Array as a low-level type. So the natural
> choice for serializing the relations seems to be taking the
> Stream_Element_Array from each object, and hand over to the lower-
> level I/O a unified Stream_Element_Array.
>
> Does it make sense or am I completely missing something?
In other post Jeffrey Carter described this as low-level. Why not to tell
the object: store yourself and all relations you need, I just don't care
which and how?
In fact I did something like that, persistent objects with dependencies
between them and collection of unreferenced objects. But I did it as
Jeffrey suggested. There is Put (Store, Object) the rest is hidden.
BUT, S-expression do not support references, they are strictly by-value, so
you don't need that stuff anyway.
>> The point is that you never meet 80 before knowing that this is a "port",
>> you never meet "port" before knowing it is of "tcp-connect". You always
>> know all types in advance. It is strictly top-down.
>
> Right, in that simple example it the case. It is even quite often the
> case, hence my thinking about a Sexp_Stream in another post, which
> would allow S-expression I/O without having more than a single node at
> the same time in memory.
>
> But there are still situations where S-expression have to be stored in
> memory.
There is no such cases!
> For examples the templates, where S-expressions represent a
> kind of limited programming language that is re-interpreted for each
> template extension.
I am not sure what do you mean here, but in general template is not the
object, its instance is. You do not need S-expressions here either. You can
store/restore templates as S-sequences. A template in the memory would be
an object with some operations like Instantiate_With_Parameters etc. The
result of instantiation will be again an object and no S-sequence.
BTW, for an interpreter I would certainly prefer the Reverse Polish
Notation to a tree. (I think this too boils down to a solution before the
problem.)
> the latter being
> roughly the low-level (as in "close to the hardware", at least close
> enough not to rule out programming for embedded platforms and system
> programming) and the performance.
(BTW, Ada is closer to the hardware than C is. You can even describe
interrupt handlers in Ada. Try it in C.)
> Now I can't explain why your posts often make me feel Ada is
> completely out of my tastes in programming languages,
In the way of programming you mean? I wanted to convey that a
"C-programmer" will have to change some habits when switching to Ada. Ada
enforces certain attitude to programming. Some people would say to software
engineering. It is not obvious because you can do in Ada anything you can
in C. But a stubborn hardcore "C-programmer" might become very frustrated
very soon. A competent C developer will only enjoy Ada.
Unification and pattern matching is independent of type inference. Sure,
most of the time you find both in the same languages, but IIRC in the
course of my master, I have encountered languages with one but not the
other.