Google Gruppi non supporta più i nuovi post o le nuove iscrizioni Usenet. I contenuti storici continuano a essere visibili.

Dismiss

FPL Marshalling survey, third revision

10 visualizzazioni

Passa al primo messaggio da leggere

Joachim Durchholz

da leggere,

25 nov 2003, 14:40:1425/11/03

Here's the third revision of the Marshalling Survey.

There's a mostly-new, much improved classification scheme in place now.
All of those nitty-gritty special cases have found a quite natural place
in it.
I have transferred all available information into that new scheme. I did
my best, but I'm pretty sure I misinterpreted some of it, or simply got
it wrong - the inevitable consequences of working with data provided by
other people, expecially if you're doing it late in the night and with
too little sleep under your belt :-)
In other words: my thanks go to those who have offered information in
the past, and I bid you to re-read the classifications and check the
entries of the language(s) you are affiliated with. My thankfulness will
haunt you forever...

Other than that, there was some clean-up in the wording, which is why
I'm posting it in its entirety again. Sorry for the repetition, but I'm
pretty sure people would ask for it if I had omitted it.

-----------------------------------------------------------------
The Survey started as an attempt to classify various well-known
functional programming languages according to their marshalling mechanisms.

As such endeavours sometimes do, auxiliary information turned out to be
just as interesting as information gathered for the primary purpose. In
this case, it unearthed a wealth of variations on the general theme;
compiling a useful list of properties for categorizing the marshalling
mechanisms was more of a challenge than obtaining the actual information!

So I hope that the result of the Survey will help several groups of people:
* language designers trying to assess the design space wrt. marshalling,
* language implementors looking for criteria that might help them judge
their language wrt. marshalling, and
* programmers looking for criteria to judge how far a programming
language will help them along with their next project.
If it has helped one of them, this survey is worth the effort.

I wish to thank all the helpful people on the comp.lang.functional
newsgroup who took the time to review the criteria and who contributed
information on the various languages. These were: Ulf Wiger, Peter
Sewell, Oleg Trott, Markus Mottl, Marcin 'Qrczak' Kowalczyk, Fergus
Henderson, D. Darius, Claus Reinke, and Andreas Rossberg. Without their
contributions, the reliability and quality of the information presented
here would have seriously suffered.

Basics
------

Let's start with a definition so that the subject matter is clear.

"Marshalling" (aka "serialization") is the process of converting data
into an untyped format; technical details may vary, but usually that
will be a sequence of bytes. Typical destinations for marshalled data
include disk files and running processes.
"Unmarshalling" (aka "deserialization") is the process of converting
marshalled data back into internal format, suitable for immediate
processing. The unmarshalling process might be a different thread in the
same application, a different process on the same machine, a process on
a different machine, or a process that reads the data from disk.

Marshalling to disk offers slightly different challenges than
marshalling to another process.
When marshalling to another process, typical issues involve the
reliability of the transmission channel, efficiency concerns due to
handshaking protocols, and similar network-centric issues.
When marshalling to disk, there's the question of building an
appropriate "closure": how much auxiliary data must be written to disk
so that it will be complete and interpretable when unmarshalled?
Different languages and libraries have used different definitions of
completeness here, and this Survey aims to highlight them.
Despite those differences, I felt that there are enough similarities to
warrant covering both areas in a single Survey.

Status of this Survey
---------------------

This Survey is in a half-done status, and I suspect that it will never
finish. Languages evolve, new languages enter the arena, data in the
Survey itself will have to be corrected.
So this is going to remain an ongoing effort.

Contributions, clarifications, proposals for better categories, and
flames are all most welcome, though not all will be included.
Actually, requests for category consolidation will be preferred over
requests for additional categories. Having too many categories tends to
muddle more than clarify, partly due to information overload, partly due
to overlaps. I aim to keep the total content of the Survey to something
that can be read and evaluated in less than an hour. More importantly,
filling in the answers for a particularly language should take less than
an hour, this is good for gathering reliable information.
There have been requests for including imperative languages. I have
decided against this, partly because this would require a slightly
different set of properties (see below), partly to keep the workload to
a manageable level. If anybody wants to step in, I'll gladly cooperate.

Contributions will be accepted both on my private email address
(currently joachim....@web.de) and on news:comp.lang.functional.
Just be sure to include the terms "Marshalling" and "Survey" in the
Subject: header of the message.

Definitions of Properties
-------------------------

This section defines what properties of a marshalling mechanism were
deemed "interesting". The guiding line has been: assuming somebody is
about to start a project, what would he need to know to evaluate the
language's support for marshalling?

Here are the properties, together with the answers for all anticipated
situations.
Not all anticipated situations will make sense, or even be present in
any real language. I'm just trying to cover all possibilities.
For each property, anticipated answers and the background that I'd
expect them to be given from are listed. Respondents were encouraged to
use their own answers, to complement the answer with background
information, or to list URLs that contain further explanations.

[Default mechanism] No need to write a separate wrapper for each data
type that needs to be marshalled.
*Yes: All data types are marshallable by default.
*Per-type: Data types are not marshallable by default, but can be made
so with minimal effort on a per-type basis (e.g. by adding a keyword to
the type definition, or a single line of code during initialization).
*No: Making a data type marshallable requires writing (un-)marshalling
code for each component of that data type.

[Customizability] Ability to write a marshalling wrapper for any
user-defined data type.
This ability is useful in various situations, e.g. to avoid marshalling
across weak pointers, to marshall mutable data, to prevent the
marshalling data that would be irrelevant or even dangerous when
unmarshalled, to avoid marshalling data where recomputing is faster than
marshalling/transmission/unmarshalling, to normalize internal
representations before marshalling to avoid redundant transmissions
and/or improve compression characteristics of the byte stream, etc.
*Decomposed: The marshalling process can be influenced by setting flags
at the appropriate places. (E.g. references marked as "marshalling-weak"
break transitivity, there's a hook for representation normalization to
improve compressability and/or avoid retransmissions, etc.).
*Monolithic: There's a hook for writing a marshalling wrapper for each
user-defined type.
*No: Marshalling is fully handled by the run-time system, there's no way
to influence the process.

[Value transitivity] Accessing immutable data through an unmarshalled
object isn't syntactically different from accessing immutable data via a
locally-constructed object.
*Eager: Unmarshalled are immediately transitively accessible.
*Lazy: Unmarshalled data may be incomplete, but missing data is
transparently requested from the sources when it's accessed. This is
semantically equivalent to "Eager", except that failure modes and
efficiency may differ.
*Wrappers: Programmers must write wrappers to gain value transitivity.
*No: Such accesses require special syntax (e.g. a call to an
unmarshall() routine) at the point of access.

[Type transitivity] Whether it is possible to unmarshall with a type
that's initially unknown to the unmarshalling process.
Possible answers should be similar to those for [Value transitivity].
If the language is dynamically typed (or, equivalently, the set of types
is predefined and cannot be extended), the answer should be "N/A".
If the answer is "Eager", "Lazy", or "Wrappers", this will usually mean
that a structural representation of the type is encoded in the
marshalled data.

[Type safety] Whether unmarshalling data with a different type than it
was marshalled with is possible. (Types that happen to have the same
name do not count as "the same" unless they also have the same structure.)
*Conversion: Type mismatches will result in (possibly user-customizable)
conversion. (A versioning mechanism would be a case of conversion: data
marshalled with an old type will be unmarshalled with a different, new
type. Versioning essentially is just a way of associating old and new
types, at least from a marshalling perspective.)
*Failure: Type mismatches will result in controllable failures when
unmarshalling.
*Crash: Type mismatches will result in unpredictable system behaviour.
*N/A: This is not an issue, either because the type will be created in
the target system if necessary, or because the language is dynamically
typed.

[Variables] The mechanism can marshall variables.
*N/A: The language has no variables.
*No: There is no mechanism to do that.
*Clone: The unmarshalled variable will be separate, and separately settable.
*Proxy: The unmarshalled variable is a proxy for the original one. (When
unmarshalling from disk, the unmarshalled variable will have to be a
proxy that writes to the file or something similar to implement this
semantics.)
*Relocate: The variable will relocate to the receiver. The sender will
not be able to access the variable anymore.
*Relocate with proxy: Like "Proxy", but if the sender accesses the
variable, it will be through a proxy. (This is indistinguishable from
"Proxy" except for failure modes and performance.)

[Variable transitivity] As [Value transitivity], but for referenced
variables instead of immutable values.
*N/A: The language has no variables.

(The "Variables" and "Variable transitivity" categories are experimental
at this time. The usefulness of the answers for these will have to be
evaluated.)

[Preserves sharing] Shared data will unmarshall as shared data. (This
should also make sure that cycles unmarshall as cycles.)
This doesn't directly affect marshalling semantics. However,
unmarshalling without sharing preservation can result in an exponential
data blowup; for example, a simple CONS list of N nodes, each referring
to its successor in both HEAD and TAIL fields, will explode to 2^N
nodes; transmitting an innocent 80-node list will require more cons
cells than there are particles in the universe.
Sharing preservation influences the marshalling of variables.
*Yes: Shared data structures will remain shared.
*No: Shared data will be replicated.

[Resources] The mechanism can marshall resources, i.e. things that are
bound to a process or marchine. By nature, any access to the resource
will have to refer back to the process resp. machine where the
marshalled data structure originated from.
Marshalling seems to be reasonable while the marshalling process resp.
machine continues to operate. However, even then, access via the
marshalled proxy will have to take possible resource unavailability into
account (something that might happen without marshalling as well, but
marshalling generally increases the likelihood of unavailability).
This applies only if the resource can be accessed through the
unmarshalled resource as if it were local. The ability to marshall a URL
in string form is already covered under [Default mechanism] :-).
*Yes: Resources can be marshalled.
*No: Resources cannot be marshalled.

[Base functions] How functions with a direct representation in source
code are marshalled. Multiple answers are possible if the language uses
different strategies for functions from different source (e.g.
dynamically loaded code vs. functions resulting from run-time compilation).
*No: Base functions cannot be encoded.
*Code: The code of the function itself is transmitted. This is
relatively straightforward for VM-based language implementations;
implementations that compile to machine code might carry a VM-based
representation of all code, or they might have restrictions wrt.
architecture interoperability (which is a separate category further down
in the Survey).
*Library reference (safe): The unmarshalling process is supposed to have
the function in a library (or in the run-time system); references to
functions within that library are relocated (patched up) during the
unmarshalling process, or some resolution mechanism based on names or ID
numbers is in place.
Unmarshalling and/or accessing functions from a different or unavailable
library will fail in a controllable manner.
*Library (unsafe): As "Library reference (unsafe)", but accessing or
unmarshalling a function from a different or unavailable library may
fail in an uncontrollable (and sometimes spectacular) manner.
*Address (safe): The unmarshalling process is supposed to be the exactly
same binary as the process that marshalled the function. In other words,
marshalling just writes out binary machine addresses of the functions.
Unmarshalling and/or accessing functions from a different binary will
fail in a controllable manner.
*Address (unsafe): As "Address (safe)", but accessing or unmarshalling a
function from a different binary may fail in an uncontrollable (and
sometimes spectacular) manner.

[Composed functions] Marshalling of functions that were constructed at
run-time and that don't have a direct equivalent in source code (in
other words: it's impossible to construct even an implicit name during
compilation).
In case anybody wonders: run-time function construction can be done
using currying, expression substitution, partial evaluation, function
composition, closure construction, continuation construction, or via
similar means. At their heart, all these constructions are based on the
same operation: taking an expression with open variables and replacing
all occurrences of one of them with another expression.
*Graph: These objects are represented as trees (directed acyclic graphs
the presence of sharing, general directed graphs in the presence of cycles).
*Code: The objects are transformed to some executable code (machine or
VM code). (I think there is no interesting observable difference between
"Graph" and "Code", I'm not sure why I felt that this distinction should
be made.)
*No: No support for marshalling composed functions.
*N/A: The language doesn't have composed functions, so there are none
available for marshalling.

[Implementation interoperability] Ability to transmit data between
different implementations of the language.
*Yes: Different language implementation can exchange data.
*N/A: There is just one implementation of the language.
*No: Different language implementations cannot exchange data.
*Partially: Only some implementations can exchange data (groups of
interoperating implementations should be listed).

[Platform interoperability] Ability to transmit data between different
OSes or architectures.
Note: Floating-point issues are nearly impossible to make truly
machine-independent without sacrificing a large degree efficiency, do
I'd expect lots of "Yes (except floating-point data)" answers ;-)
*Yes: Programs can exchange data even if they are running on machines
that have different operating systems, endianness conventions, default
word sizes, floating-point formats, or other differences.
*No: Programs cannot exchange data if there are architectural differences.
*N/A: All implementations of the language run on the same OS and machine
type. (This should apply mostly to languages that live on Windows
exclusively, since Unix runs on machines with varying word sizes,
endiannesses, floating-point formats etc.)

[Foreign language import] List of languages whose marshalled data the
language in question can unmarshall from.

[Foreign language export] List of languages that can unmarshall data
marshalled by the language in question.

Template Entry
--------------

Name of language
[Default mechanism] Yes? Per-type? No?
[Customizability] Decomposed? Monolithic? No?
[Value transitivity] Eager? Lazy? Wrappers? No?
[Type transitivity] Eager? Lazy? Wrappers? N/A? No?
[Type safety] Conversion? Failure? Crash? N/A?
[Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?
[Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
[Preserves sharing] Yes? No?
[Resources] Yes? No?
[Base functions] No? Code? Library reference (safe/unsafe)?
Address (safe/unsafe)?
[Composed functions] Graph? Code? No? N/A?
[Implementation interoperability] Yes? N/A? No? Partially?
[Platform interoperability] Yes? No? N/A?
[Foreign language import] Which languages, if any?
[Foreign language export] Which languages, if any?

Results
-------

Clean (marshalling mechanism might be experimental)
[Default mechanism] Per-type.
[Customizability] Decomposed? Monolithic? No?
[Value transitivity] Eager? Lazy?
[Type transitivity] Eager? Lazy? Wrappers? N/A? No?
[Type safety] Conversion? Failure? Crash? N/A?
[Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?
[Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
[Preserves sharing] Yes.
[Resources] Yes? No?
[Base functions] No? Code? Library reference (safe/unsafe)?
Address (safe/unsafe)?
[Composed functions] Graph? Code?
[Implementation interoperability] Yes? N/A? No? Partially?
[Platform interoperability] No.
[Foreign language import] Which languages, if any?
[Foreign language export] Which languages, if any?

Common Lisp
[Default mechanism] Yes.
[Customizability] Yes. (This is Lisp *g*)
[Value transitivity] Eager? Lazy? (1)
[Type transitivity] Eager? Lazy? Wrappers? N/A? No?
[Type safety] Conversion? Failure? Crash? N/A?
[Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?
[Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
[Preserves sharing] Yes? No? (2)
[Resources] Yes? No?
[Base functions] No? Code? Library reference (safe/unsafe)?
Address (safe/unsafe)?
[Composed functions] Graph? Code?
[Implementation interoperability] No.
[Platform interoperability] No? N/A?
[Foreign language import] Which languages, if any?
[Foreign language export] Which languages, if any?
(1) Not transitive for all kinds of data types, e.g. classes need to
have a default constructor defined. I've been told that "normal cases"
will work.
(2)
http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/ext/save_obj/save_obj.cl
advertises that it "handles circular lists", but it's unclear whether
that includes things that aren't lists (such as classes, functions,
symbols, ...), nor whether that's done via special circle handling code
or via sharing detection (one would suppose that the latter approach is
easier, but Lisp could well be an exception).

Erlang
[Default mechanism] Yes.
[Customizability] No.
[Value transitivity] Eager.
[Type transitivity] Eager? Lazy? Wrappers? N/A?
[Type safety] Conversion? Failure? Crash? N/A?
[Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?
[Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
[Preserves sharing] No.
[Resources] No.
[Base functions] Library reference (safe).
[Composed functions] Graph? Code? No?
[Implementation interoperability] Yes.
[Platform interoperability] Yes (how about floats?)
[Foreign language import] Which languages, if any?
[Foreign language export] Which languages, if any?

Haskell
[Default mechanism] Per-type.
[Customizability] Decomposed? Monolithic?
[Value transitivity] Eager.
[Type transitivity] No.
[Type safety] Failure.
[Variables] N/A.
[Variable transitivity] N/A.
[Preserves sharing] No.
[Resources] No.
[Base functions] No.
[Composed functions] No.
[Implementation interoperability] Yes.
[Platform interoperability] Yes. (Except floats?)
[Foreign language import] Which languages, if any?
[Foreign language export] Which languages, if any?

Mercury
[Default mechanism] Yes? Per-type?
[Customizability] No. (Could be done in library.)
[Value transitivity] Eager? Lazy?
[Type transitivity] Eager? Lazy? Wrappers? N/A? No?
[Type safety] Conversion? Failure? Crash? N/A?
[Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?
[Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
[Preserves sharing] No.
[Resources] Yes? No?
[Base functions] No.
[Composed functions] No.
[Implementation interoperability] Yes.
[Platform interoperability] Yes.
[Foreign language import] Which languages, if any?
[Foreign language export] Which languages, if any?

OCaml
[Default mechanism] Yes.
[Customizability] Monolithic (only for types implemented in C).
[Value transitivity] Eager? Lazy?
[Type transitivity] Eager? Lazy? Wrappers? N/A? No?
[Type safety] Conversion? Failure? Crash? N/A?
[Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?
[Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
[Preserves sharing] Yes.
[Resources] Yes? No?
[Base functions] Address (unsafe).
[Composed functions] Graph? Code?
[Implementation interoperability] N/A.
[Platform interoperability] Yes (what about floats?)
[Foreign language import] Which languages, if any?
[Foreign language export] Which languages, if any?

Oz
[Default mechanism] Yes.
[Customizability] Monolithic (via C++ FFI, difficult).
[Value transitivity] Eager? Lazy?
[Type transitivity] Eager? Lazy?
[Type safety] N/A.
[Variables] Clone? Proxy? Relocate? Relocate with proxy?
[Variable transitivity] Eager? Lazy? Wrappers? No?
[Preserves sharing] Yes.
[Resources] Yes? No?
[Base functions] Code? Library reference (safe)?
[Composed functions] Graph? Code?
[Implementation interoperability] N/A?
[Platform interoperability] Yes (what about floats?)
[Foreign language import] Which languages, if any?
[Foreign language export] Oz (other languages?)

Scheme
[Default mechanism] Yes? Per-type? No?
[Customizability] Decomposed? Monolithic? No?
[Value transitivity] Eager? Lazy? Wrappers? No?
[Type transitivity] Eager? Lazy? Wrappers? N/A? No?
[Type safety] Conversion? Failure? Crash? N/A?
[Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?
[Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
[Preserves sharing] Yes? No?
[Resources] Yes? No?
[Base functions] No? Code? Library reference (safe/unsafe)?
Address (safe/unsafe)?
[Composed functions] Graph? Code? No? N/A?
[Implementation interoperability] Yes? N/A? No? Partially?
[Platform interoperability] Yes? No? N/A?
[Foreign language import] Which languages, if any?
[Foreign language export] Which languages, if any?

SML
Marshalling is implementation-dependent.
For this reason, SML implementations are considered
separate languages within this Survey.

SML (Alice)
[Default mechanism] Yes? Per-type?
[Customizability] Monolithic (currently via C++ FFI (difficult),
library support planned).
[Value transitivity] Eager? Lazy?
[Type transitivity] Eager? Lazy? Wrappers?
[Type safety] Conversion? Failure? Crash?
[Variables] No? Clone? Proxy? Relocate? Relocate with proxy?
[Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
[Preserves sharing] Yes.
[Resources] Yes? No?
[Base functions] Code? Library reference (safe)?
[Composed functions] Graph? Code?
[Implementation interoperability] N/A?
[Platform interoperability] Yes (what about floats?)
[Foreign language import] Oz. (Other languages?)
[Foreign language export] Which languages, if any?

SML (SML/NJ)
[Default mechanism] Yes? Per-type?
[Customizability] Decomposed? Monolithic? No?
[Value transitivity] Eager? Lazy?
[Type transitivity] Eager? Lazy? Wrappers? N/A? No?
[Type safety] Conversion? Failure? Crash? N/A?
[Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?
[Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
[Preserves sharing] Yes.
[Resources] Yes? No?
[Base functions] No.
[Composed functions] No.
[Implementation interoperability] N/A.
[Platform interoperability] Yes? No?
[Foreign language import] Which languages, if any?
[Foreign language export] Which languages, if any?

SML (MLton and Moscow ML)
No marshalling support.

Fergus Henderson

da leggere,

25 nov 2003, 21:56:2425/11/03

Joachim Durchholz <joachim....@web.de> writes:

>[Customizability] Ability to write a marshalling wrapper for any
>user-defined data type.
>This ability is useful in various situations, e.g. to avoid marshalling
>across weak pointers, to marshall mutable data, to prevent the
>marshalling data that would be irrelevant or even dangerous when
>unmarshalled, to avoid marshalling data where recomputing is faster than
>marshalling/transmission/unmarshalling, to normalize internal
>representations before marshalling to avoid redundant transmissions
>and/or improve compression characteristics of the byte stream, etc.
>*Decomposed: The marshalling process can be influenced by setting flags
>at the appropriate places. (E.g. references marked as "marshalling-weak"
>break transitivity, there's a hook for representation normalization to
>improve compressability and/or avoid retransmissions, etc.).
>*Monolithic: There's a hook for writing a marshalling wrapper for each
>user-defined type.
>*No: Marshalling is fully handled by the run-time system, there's no way
>to influence the process.

For Mercury, it doesn't really fit into any of those categories.
In Mercury, the standard library support for marshalling/unmarshalling
does not provide any customizability. However, this library is written
in Mercury, using procedures provided by the run-time system for decomposing
or constructing arbitrary data types. It is possible to use those same
procedures to implement your own marshalling/unmarshalling library,
which you can then customize to your heart's content.

>Mercury
> [Default mechanism] Yes? Per-type?

Yes.

> [Customizability] No. (Could be done in library.)

I suppose that is a reasonable way of summarizing what I wrote above.

> [Value transitivity] Eager? Lazy?

Eager.

> [Type transitivity] Eager? Lazy? Wrappers? N/A? No?

No.

> [Type safety] Conversion? Failure? Crash? N/A?

Failure.

> [Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?

No.

> [Variable transitivity] N/A? Eager? Lazy? Wrappers? No?

I think "N/A" -- but this is due to the "No" answer to the previous
question, not due to a lack of support for mutable variables.

> [Resources] Yes? No?

No.

> [Foreign language import]

Prolog.

> [Foreign language export]

Prolog.

--
Fergus Henderson <f...@cs.mu.oz.au> | "I have always known that the pursuit
The University of Melbourne | of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.

Joachim Durchholz

da leggere,

26 nov 2003, 04:05:4126/11/03

Fergus Henderson wrote:

> Joachim Durchholz <joachim....@web.de> writes:
>
>>[Customizability] Ability to write a marshalling wrapper for any
>>user-defined data type.
>

> In Mercury, the standard library support for marshalling/unmarshalling
> does not provide any customizability. However, this library is written
> in Mercury, using procedures provided by the run-time system for decomposing
> or constructing arbitrary data types. It is possible to use those same
> procedures to implement your own marshalling/unmarshalling library,
> which you can then customize to your heart's content.

Maybe the Mercury entry should be split into two:
* Mercury (language definition)
* Mercury (standard libraries)

Another perspective from which one could write the answer is this:
The primary purpose of the survey is giving an overview of the
marshalling mechanisms as they present themselves to a senior
application architect. Would writing a customizable marshalling library
be an option, given enough resources and an urgent need?

>>Mercury
>> [Default mechanism] Yes? Per-type?

> [...]

Answers integrated into Survey. Thanks.

>> [Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
>
> I think "N/A" -- but this is due to the "No" answer to the previous
> question, not due to a lack of support for mutable variables.

Ah, right, I overlooked that case.
The definition has been changed to read:

[Variable transitivity] As [Value transitivity], but for referenced
variables instead of immutable values.

*N/A: The language has no marshallable variables.
^^^^^^^^^^^^
(added)

Hope this clears things up (and makes the entry for Mercury an
unqualified "N/A").

>> [Foreign language import]
>
> Prolog.
>
>> [Foreign language export]
>
> Prolog.

Does this interoperate with any Prolog implementation?

Here's the revised entry for Mercury, summed up for convenience.
Note that I assume that a user-written marshalling library cannot
provide additional services beyond customizability; if that's wrong,
feel free to add "(1)" annotations as appropriate :-)

Mercury
[Default mechanism] Yes.
[Customizability] No. (1)

[Value transitivity] Eager.
[Type transitivity] No.
[Type safety] Failure.

[Variables] No.

[Variable transitivity] N/A.
[Preserves sharing] No.
[Resources] No.
[Base functions] No.
[Composed functions] No.
[Implementation interoperability] Yes.
[Platform interoperability] Yes.

[Foreign language import] Prolog.
[Foreign language export] Prolog.

(1) This describes marshalling as provided by the standard library.
Mercury offers a low-level marshalling mechanism which could be used to
write one's own marshalling mechanism that would be customizable to
one's heart's contents (in fact the standard library uses it).

Regards,
Jo

Ronny Wichers Schreur

da leggere,

26 nov 2003, 04:59:4826/11/03

Joachim Durchholz writes (to comp.lang.functional):

> I bid you to re-read the classifications and check the
> entries of the language(s) you are affiliated with.

> Clean (marshalling mechanism might be experimental)
Marshalling is part of the dynamics implementation, which
is only available on Windows.

> [Default mechanism] Per-type.
Should be Yes.

> [Customizability] Decomposed? Monolithic? No?
No.

> [Value transitivity] Eager? Lazy?
Lazy.

> [Type transitivity] Eager? Lazy? Wrappers? N/A? No?

Eager.

> [Type safety] Conversion? Failure? Crash? N/A?

Failure. Note that a type match involves unification, it doesn't
mean type equality.

> [Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?

N/A.

> [Variable transitivity] N/A? Eager? Lazy? Wrappers? No?

N/A.

> [Preserves sharing] Yes.
Yes.

> [Resources] Yes? No?
No.

> [Base functions] No? Code? Library reference (safe/unsafe)?
> Address (safe/unsafe)?

Library reference (unsafe). The libraries are maintained by the
system, which means it's only unsafe if the user changen or deletes
them directly.

> [Composed functions] Graph? Code?
Graph.

> [Implementation interoperability] Yes? N/A? No? Partially?

N/A.

> [Platform interoperability] No.
No.

> [Foreign language import] Which languages, if any?

None.

> [Foreign language export] Which languages, if any?

None.

Fergus Henderson

da leggere,

26 nov 2003, 06:25:4826/11/03

Joachim Durchholz <joachim....@web.de> writes:

>Fergus Henderson wrote:
>
>> Joachim Durchholz <joachim....@web.de> writes:
>>
>>>[Customizability] Ability to write a marshalling wrapper for any
>>>user-defined data type.
>>
>> In Mercury, the standard library support for marshalling/unmarshalling
>> does not provide any customizability. However, this library is written
>> in Mercury, using procedures provided by the run-time system for decomposing
>> or constructing arbitrary data types. It is possible to use those same
>> procedures to implement your own marshalling/unmarshalling library,
>> which you can then customize to your heart's content.
>
>Maybe the Mercury entry should be split into two:
>* Mercury (language definition)
>* Mercury (standard libraries)
>
>Another perspective from which one could write the answer is this:
>The primary purpose of the survey is giving an overview of the
>marshalling mechanisms as they present themselves to a senior
>application architect. Would writing a customizable marshalling library
>be an option, given enough resources and an urgent need?

A customizable marshaller can be written in less than 20 lines of code:

:- import_module std_util, term_io, list, bool.

:- pred marshal(pred(univ,bool,io,io)::pred(in,out,di,uo) is det,
T::in, io::di, io::uo) is det.
marshal(Customizer, Term) -->
marshal_subterm(Customizer, Term),
print(".\n").

:- pred marshal_subterm(pred(univ,bool,io,io)::pred(in,out,di,uo)
is det, T::in, io::di, io::uo) is det.
marshal_subterm(Customizer, Term) -->
Customizer(univ(Term), Handled),
(if { Handled = yes } then [] else
{ deconstruct(Term, Functor, _, Args) },
(if { Args = [] } then
write(Term)
else
quote_atom(Functor),
print('('),
write_list(Args, ", ", marshal_subterm(Customizer)),
print(')'))).

This interoperates with the standard unmarshaller. Writing a customizable
unmarshaller is more work -- about 200-300 lines of code -- but is still quite
feasible.

>>> [Foreign language import]
>> Prolog.
>>> [Foreign language export]
>> Prolog.
>
> Does this interoperate with any Prolog implementation?

Yes.

Joachim Durchholz

da leggere,

26 nov 2003, 06:38:2326/11/03

Ronny Wichers Schreur wrote:

>> [Value transitivity] Eager? Lazy?
>
> Lazy.

Both for marshalling and unmarshalling? (I have seen that unmarshalling
is lazy, but I didn't read carefully enough to determine whether this
also holds when marshalling.)

>> [Type safety] Conversion? Failure? Crash? N/A?
>
> Failure. Note that a type match involves unification, it doesn't
> mean type equality.

Do you mean types are identified via structural equivalence, not by name?

>> [Base functions] No? Code? Library reference (safe/unsafe)?
>> Address (safe/unsafe)?
>
> Library reference (unsafe). The libraries are maintained by the
> system, which means it's only unsafe if the user changen or deletes
> them directly.

I have reworded this under the assumption that deleting files directly
is not normal usage. (See below to check whether that makes sense.)

Here's the revised entry for checking; corrections are appreciated.

------------------------------
Clean (1)

[Default mechanism] Yes.
[Customizability] No.

[Value transitivity] Lazy.
[Type transitivity] Eager.
[Type safety] Failure. (2)

[Variables] N/A.
[Variable transitivity] N/A.

[Preserves sharing] Yes.
[Resources] No.
[Base functions] Library reference (safe). (3)
[Composed functions] Graph.
[Implementation interoperability] N/A.
[Platform interoperability] No.
[Foreign language import] None.
[Foreign language export] None.
(1) Clean's marshalling system is restricted to Windows.
(2) Type equivalence is determined using structural equality.
(3) Safety can be outmaneuvered by deleting files without telling the
system.
------------------------------

Regards,
Jo

Ronny Wichers Schreur

da leggere,

26 nov 2003, 07:55:0926/11/03

Joachim Durchholz writes (to comp.lang.functional):

> Both for marshalling and unmarshalling? (I have seen that > unmarshalling is lazy, but I didn't read carefully enough to

> determine whether this also holds when marshalling.)

On the level of marshalling there's no laziness. The whole graph is
marshalled in one go. However, the graph may contain closures and
the marshalling itself doesn't force any evaluation.

> Do you mean types are identified via structural equivalence, not
> by name?

Not exactly. Perhaps an example will make things clear. You can
marshall the identity function with type A.a: a -> a (A. is the
universal quantor) and unmarshall it with type Int -> Int.

Two type definitions are equivalent if they have the same structure
(modulo the order of the constructors) and use the same type and
constructor names. So the following types are equivalent:

:: List a = Nil | Cons a (List a)
:: List b = Cons b (List b) | Nil

We plan to remove the freedom to reorder the constructors, which
would leave us with equality modulo alpha conversion.

Ronny Wichers Schreur

Andreas Rossberg

da leggere,

26 nov 2003, 08:26:1726/11/03

Joachim Durchholz <joachim....@web.de> wrote:
> "Marshalling" (aka "serialization") is the process of converting data
> into an untyped format;

I'd rather say that marshalling (also known as pickling, btw) is the process
of turning (parts of) the heap's data graph into a linearized form. This may
well as typed or untyped as the program heap itself may be.

> [Customizability] Ability to write a marshalling wrapper for any
> user-defined data type.
> This ability is useful in various situations, e.g. to avoid marshalling
> across weak pointers, to marshall mutable data, to prevent the
> marshalling data that would be irrelevant or even dangerous when
> unmarshalled, to avoid marshalling data where recomputing is faster than
> marshalling/transmission/unmarshalling, to normalize internal
> representations before marshalling to avoid redundant transmissions
> and/or improve compression characteristics of the byte stream, etc.
> *Decomposed: The marshalling process can be influenced by setting flags
> at the appropriate places. (E.g. references marked as "marshalling-weak"
> break transitivity, there's a hook for representation normalization to
> improve compressability and/or avoid retransmissions, etc.).
> *Monolithic: There's a hook for writing a marshalling wrapper for each
> user-defined type.

Note that "decomposed" and "monolithic" (not sure I'm happy with this
terminology) are not mutually exclusive.

> [Value transitivity] Accessing immutable data through an unmarshalled
> object isn't syntactically different from accessing immutable data via a
> locally-constructed object.
> *Eager: Unmarshalled are immediately transitively accessible.
> *Lazy: Unmarshalled data may be incomplete, but missing data is
> transparently requested from the sources when it's accessed. This is
> semantically equivalent to "Eager", except that failure modes and
> efficiency may differ.

And it prevents persistence!

> [Resources] The mechanism can marshall resources, i.e. things that are
> bound to a process or marchine. By nature, any access to the resource
> will have to refer back to the process resp. machine where the
> marshalled data structure originated from.
> Marshalling seems to be reasonable while the marshalling process resp.
> machine continues to operate. However, even then, access via the
> marshalled proxy will have to take possible resource unavailability into
> account (something that might happen without marshalling as well, but
> marshalling generally increases the likelihood of unavailability).
> This applies only if the resource can be accessed through the
> unmarshalled resource as if it were local. The ability to marshall a URL
> in string form is already covered under [Default mechanism] :-).
> *Yes: Resources can be marshalled.
> *No: Resources cannot be marshalled.

This is *way* too simplistic. If resources are not marshable, are they
detected and rejected, or are they just dumped binary, probably causing
havoc upon unmarshalling? If there is support for resources, can resources
only be reused in the same process or can they be rebound if read back into
a different process? In the latter case, is there a way to prevent rebinding
of resources upon unmarshalling (which is very important for security)? Are
resources detected (and rejected) upon unmarshalling that have no meaning in
the current process?

Actually, a language that allows marshalling of resources "by accident"
without proper treatment is *much* worse than one that safely rejects any
attempt to marshall resources. I think you should at least have the
categories "undetected" (unsafe), "rejected" (safe), "rebound"
(transferable) instead of yes/no to cover the most important issues.

Sorry, but I still insist that this is nonsense. There are no such
functions. Take function composition, for example:

compose f g x = f (g x)

You are saying there is no source code for (compose sin sqrt). But sure
there is! Desugar the definition of compose:

compose = \f.\g.\x.f (g x)

and you see that the source code in question is \x.f (g x), but you need a
closure to represent the actual function, of course. An outer lambda is no
different from an outer let! And this goes for any kind of function
"constructed" by composition, partial application (currying), or whatsoever.

So the only sane distinction you can make is between support for "top-level"
functions only (those that don't contain references to their environment),
and support for full closures. But I really don't see why this is worth an
extra category. After all, a functional language incapable of dealing with
closures is not worth the name. And with respect to marshalling, the closure
is the easy part.

> *Graph: These objects are represented as trees (directed acyclic graphs
> the presence of sharing, general directed graphs in the presence of
cycles).

To be honest, I have no idea what you mean here.

> [Platform interoperability] Ability to transmit data between different
> OSes or architectures.
> Note: Floating-point issues are nearly impossible to make truly
> machine-independent without sacrificing a large degree efficiency, do
> I'd expect lots of "Yes (except floating-point data)" answers ;-)

Why do you think so? Choose one format, IEEE or whatever, and be done with
it. We are talking about an external format here, it should have nothing to
do with the internal FP format used on particular platforms - marshalling
has to perform conversions anyway.

> Haskell
> [Default mechanism] Per-type.
> [Customizability] Decomposed? Monolithic?
> [Value transitivity] Eager.
> [Type transitivity] No.
> [Type safety] Failure.
> [Variables] N/A.
> [Variable transitivity] N/A.
> [Preserves sharing] No.
> [Resources] No.
> [Base functions] No.
> [Composed functions] No.
> [Implementation interoperability] Yes.
> [Platform interoperability] Yes. (Except floats?)
> [Foreign language import] Which languages, if any?
> [Foreign language export] Which languages, if any?

Haskell *has* variables, they just have to live inside an appropriate monad.
In principle, it may even be possible to marshall them under some
restrictions. The same goes for Clean, I guess.

> OCaml
> [Default mechanism] Yes.
> [Customizability] Monolithic (only for types implemented in C).
> [Value transitivity] Eager? Lazy?

Eager.

> [Type transitivity] Eager? Lazy? Wrappers? N/A? No?

No.

> [Type safety] Conversion? Failure? Crash? N/A?

Crash.

> [Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?
> [Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
> [Preserves sharing] Yes.
> [Resources] Yes? No?
> [Base functions] Address (unsafe).
> [Composed functions] Graph? Code?
> [Implementation interoperability] N/A.
> [Platform interoperability] Yes (what about floats?)
> [Foreign language import] Which languages, if any?
> [Foreign language export] Which languages, if any?

> Oz
> [Default mechanism] Yes.
> [Customizability] Monolithic (via C++ FFI, difficult).
> [Value transitivity] Eager? Lazy?

Eager.

> [Type transitivity] Eager? Lazy?

N/A.

> [Type safety] N/A.
> [Variables] Clone? Proxy? Relocate? Relocate with proxy?

Proxy, Clone.

> [Variable transitivity] Eager? Lazy? Wrappers? No?

Eager.

> [Preserves sharing] Yes.
> [Resources] Yes? No?

No (rejected).

> [Base functions] Code? Library reference (safe)?

Code.

> [Composed functions] Graph? Code?

Code.

> [Implementation interoperability] N/A?

N/A.

> [Platform interoperability] Yes (what about floats?)

Full yes.

> [Foreign language import] Which languages, if any?

Oz, Alice.

> [Foreign language export] Oz (other languages?)

Oz, Alice.

> SML (Alice)
> [Default mechanism] Yes? Per-type?

Yes.

> [Customizability] Monolithic (currently via C++ FFI (difficult),
> library support planned).
> [Value transitivity] Eager? Lazy?

Eager.

> [Type transitivity] Eager? Lazy? Wrappers?

Eager.

> [Type safety] Conversion? Failure? Crash?

Failure.

> [Variables] No? Clone? Proxy? Relocate? Relocate with proxy?

Clone.

> [Variable transitivity] N/A? Eager? Lazy? Wrappers? No?

Eager.

> [Preserves sharing] Yes.
> [Resources] Yes? No?

No (rejected).

> [Base functions] Code? Library reference (safe)?

Code.

> [Composed functions] Graph? Code?

Code.

> [Implementation interoperability] N/A?

N/A

> [Platform interoperability] Yes (what about floats?)

Full yes.

> [Foreign language import] Oz. (Other languages?)

Alice, Oz.

> [Foreign language export] Which languages, if any?

Alice, Oz.

> SML (SML/NJ)
> [Default mechanism] Yes? Per-type?

Yes.

> [Customizability] Decomposed? Monolithic? No?

No.

> [Value transitivity] Eager? Lazy?

Eager.

> [Type transitivity] Eager? Lazy? Wrappers? N/A? No?

Eager.

> [Type safety] Conversion? Failure? Crash? N/A?

Crash.

> [Variables] N/A? No? Clone? Proxy? Relocate? Relocate with proxy?
> [Variable transitivity] N/A? Eager? Lazy? Wrappers? No?
> [Preserves sharing] Yes.
> [Resources] Yes? No?

No (undetected).

Marcin 'Qrczak' Kowalczyk

da leggere,

26 nov 2003, 08:57:0826/11/03

On Wed, 26 Nov 2003 14:26:17 +0100, Andreas Rossberg wrote:

> So the only sane distinction you can make is between support for "top-level"
> functions only (those that don't contain references to their environment),
> and support for full closures. But I really don't see why this is worth an
> extra category. After all, a functional language incapable of dealing with
> closures is not worth the name. And with respect to marshalling, the closure
> is the easy part.

I disagree. It's easy to imagine the case when toplevel named functions
are marshalled by name, and anonymous (or local) functions can't be
marshalled at all.

--
__("< Marcin Kowalczyk
\__/ qrc...@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/

Fergus Henderson

da leggere,

26 nov 2003, 09:47:1026/11/03

"Andreas Rossberg" <ross...@ps.uni-sb.de> writes:

>Joachim Durchholz <joachim....@web.de> wrote:
>> [Composed functions] Marshalling of functions that were constructed at
>> run-time and that don't have a direct equivalent in source code (in
>> other words: it's impossible to construct even an implicit name during
>> compilation).
>
>Sorry, but I still insist that this is nonsense. There are no such

>functions. [...]

>the only sane distinction you can make is between support for "top-level"
>functions only (those that don't contain references to their environment),
>and support for full closures. But I really don't see why this is worth an
>extra category. After all, a functional language incapable of dealing with
>closures is not worth the name. And with respect to marshalling, the closure
>is the easy part.

Not necessarily. Closures are a form of existential typing: knowing
what that the type of a closure is does not tell you what the types of
the components of that closure are.

For example, suppose you have a closure of type Int->Int whose value is
represented as the pair {f,x}, meaning (\y -> f x y).
This closure has been formed by partially applying a function
f of type foo->Int->Int, for some foo, and its representation consists
of a pair: a reference to the function f, and a value x of type foo.

It may be easy to marshall and unmarshall f, but if you don't have the
appropriate RTTI information around at run-time, it may not be possible
to determine the type of the partially applied argument x. This implies
that it may not be possible to marshal or unmarshal x, because it may not
be possible to marshal or unmarshal values without knowing their types.

Some language implementations may prefer not to keep such RTTI information
around at run-time, since there is a (small) cost to doing so, and the cost
might be incurred even in cases where the information isn't actually used.

Andreas Rossberg

da leggere,

26 nov 2003, 12:19:3926/11/03

Fergus Henderson <f...@cs.mu.oz.au> wrote:
>
> >the only sane distinction you can make is between support for "top-level"
> >functions only (those that don't contain references to their
environment),
> >and support for full closures. But I really don't see why this is worth
an
> >extra category. After all, a functional language incapable of dealing
with
> >closures is not worth the name. And with respect to marshalling, the
closure
> >is the easy part.
>
> Not necessarily. Closures are a form of existential typing: knowing
> what that the type of a closure is does not tell you what the types of
> the components of that closure are.

I know, but wouldn't you agree that coping with that problem is easier than
the code externalisation issue? In particular, since full-fledged
marshalling mechanisms have to deal with it in other places already.

- Andreas

Andreas Rossberg

da leggere,

26 nov 2003, 12:29:0726/11/03

"Marcin 'Qrczak' Kowalczyk" <qrc...@knm.org.pl> wrote:
>
> > So the only sane distinction you can make is between support for
"top-level"
> > functions only (those that don't contain references to their
environment),
> > and support for full closures. But I really don't see why this is worth
an
> > extra category. After all, a functional language incapable of dealing
with
> > closures is not worth the name. And with respect to marshalling, the
closure
> > is the easy part.
>
> I disagree. It's easy to imagine the case when toplevel named functions
> are marshalled by name, and anonymous (or local) functions can't be
> marshalled at all.

It may be easy to imagine, but I'd consider it a rather poor and unnatural
strategy for a functional language, so I doubt you will find something like
it in practice (note that Joachim explicitly restricted the scope of his
survey to functional languages).

In any case, I maintain that it is a different distinction than what Joachim
describes, and is rather strange to have as a category on its own.

- Andreas

Joachim Durchholz

da leggere,

26 nov 2003, 13:33:0826/11/03

Fergus Henderson wrote:
> A customizable marshaller can be written in less than 20 lines of code:
>

> [snipped]

Impressive :-)

> This interoperates with the standard unmarshaller. Writing a customizable
> unmarshaller is more work -- about 200-300 lines of code -- but is still quite
> feasible.

I have changed the remark to:

(1) This describes marshalling as provided by the standard library.

Writing a customizable marshaller/unmarshaller on top of that would take
an estimated 350-450 lines of code.

I hope that's appropriate.

Regards,
Jo

Joachim Durchholz

da leggere,

26 nov 2003, 13:42:1726/11/03

Ronny Wichers Schreur wrote:

> Joachim Durchholz writes (to comp.lang.functional):
>
>> Both for marshalling and unmarshalling? (I have seen that >
>> unmarshalling is lazy, but I didn't read carefully enough to
>> determine whether this also holds when marshalling.)
>
> On the level of marshalling there's no laziness. The whole graph is
> marshalled in one go. However, the graph may contain closures and the
> marshalling itself doesn't force any evaluation.

OK, that makes it "value eager" (the definition is about primitive
types, not about unevaluated expressions - that's the domain the
"composed functions" category).

>> Do you mean types are identified via structural equivalence, not by
>> name?
>
> Not exactly. Perhaps an example will make things clear. You can
> marshall the identity function with type A.a: a -> a (A. is the
> universal quantor) and unmarshall it with type Int -> Int.

OK - then you can specialize when unmarshalling.
Though I'd think that isn't anything especially noteworthy - you can
always get the same effect by unmarshalling with the most general type,
then wrapping it in a helper function that has the restricted type.

> Two type definitions are equivalent if they have the same structure
> (modulo the order of the constructors) and use the same type and
> constructor names. So the following types are equivalent:
>
> :: List a = Nil | Cons a (List a) :: List b = Cons b (List b) | Nil

Would the latter be equivalent to
:: Foo a = Cons a (List a) | Nil
, or must the type name match as well?

> We plan to remove the freedom to reorder the constructors,

Hmm... what reasons are behind that decision?
(This question primarily motivated by personal curiosity.)

Regards,
Jo

Joachim Durchholz

da leggere,

26 nov 2003, 14:37:0626/11/03

Andreas Rossberg wrote:
> "Marcin 'Qrczak' Kowalczyk" <qrc...@knm.org.pl> wrote:
>>I disagree. It's easy to imagine the case when toplevel named functions
>>are marshalled by name, and anonymous (or local) functions can't be
>>marshalled at all.
>
> It may be easy to imagine, but I'd consider it a rather poor and unnatural
> strategy for a functional language,

I agree. However, there are many borderline cases: languages that
started out as non-functional ones and acquired functional idioms over
time (Erlang). For these languages, it's quite reasonable to have one
and not the other.

> so I doubt you will find something like
> it in practice (note that Joachim explicitly restricted the scope of his
> survey to functional languages).

Well, Erlang has been "in practice" for quite a while, and it didn't
have closures in the first place for a long time. (Things have changed,
but there are still efficiency issues with them. After all, Erlang's
mission statement is "distributed systems", not "functional
programming", so it's perfectly reasonable to be, well,
largely-functional yet not have closures.)

> In any case, I maintain that it is a different distinction than what Joachim
> describes,

No, Erlang was exactly the reason why I introduced this distinction in
the first place.

> and is rather strange to have as a category on its own.

The strangeness lives in the programming languages themselves, and the
Survey is forced to follow up with it.
I agree it's strange, as the world of programming languages itself is.
Actually I consider any "Crash" answer just as strange as not having
closures - one of the things that make functional languages attractive
is improved reliability, yet unmarshalling may crash the entire
program... and this doesn't affect the obscure niche languages, it
plagues respected and otherwise very respectable languages.
In other words, strangeness is very much in the eyes of the beholder,
and the best that a Survey can do is to highlight the issues and hope
that prospective users will nag the implementors about the dysfunctional
aspects of the language.

Regards,
Jo

Joachim Durchholz

da leggere,

26 nov 2003, 14:46:0226/11/03

Andreas Rossberg wrote:

> Joachim Durchholz <joachim....@web.de> wrote:
>
>> "Marshalling" (aka "serialization") is the process of converting
>> data into an untyped format;
>
> I'd rather say that marshalling (also known as pickling, btw)

Thanks for reminding me of that terminology.
Included.

> is the process of turning (parts of) the heap's data graph into a
> linearized form. This may well as typed or untyped as the program
> heap itself may be.

A fully precise definition would be "untyped (or with a type that's
unrelated to the pickled objects", but I felt that this level of
precision would be over-verbose.
YMMV :-)

>> [Customizability] Ability to write a marshalling wrapper for any
>> user-defined data type. This ability is useful in various
>> situations, e.g. to avoid marshalling across weak pointers, to
>> marshall mutable data, to prevent the marshalling data that would
>> be irrelevant or even dangerous when unmarshalled, to avoid
>> marshalling data where recomputing is faster than
>> marshalling/transmission/unmarshalling, to normalize internal
>> representations before marshalling to avoid redundant transmissions
>> and/or improve compression characteristics of the byte stream,
>> etc.
>> *Decomposed: The marshalling process can be influenced by setting
>> flags at the appropriate places. (E.g. references marked as
>> "marshalling-weak" break transitivity, there's a hook for
>> representation normalization to improve compressability and/or
>> avoid retransmissions, etc.).
>> *Monolithic: There's a hook for writing a marshalling wrapper for
>> each user-defined type.
>
> Note that "decomposed" and "monolithic" (not sure I'm happy with this
> terminology)

Neither me.
One of the things that delayed publication of this revision was trying
to find a better term... "decomposed" started life as "declarative", but
it captured the wrong aspect of the distinction.
A better terminology would be highly appreciated. However, as you write
yourself,

> [they] are not mutually exclusive.

so I fear that this category isn't final anyway.
Unfortunately, the best way I see to refine these categories is to
review the various languages (which, since I'm no expert at all these
languages, means relying on the patience of language experts to answer
variations of the same questions again and again - I hope you all will
bear with me...)

>> [Value transitivity] Accessing immutable data through an
>> unmarshalled object isn't syntactically different from accessing
>> immutable data via a locally-constructed object. *Eager:
>> Unmarshalled are immediately transitively accessible. *Lazy:
>> Unmarshalled data may be incomplete, but missing data is
>> transparently requested from the sources when it's accessed. This
>> is semantically equivalent to "Eager", except that failure modes
>> and efficiency may differ.
>
> And it prevents persistence!

How that?
(Actually I don't know whether you refer to value transitivity itself or
to its "Lazy" variant, so I'm doubly confused.)

>> [Resources] The mechanism can marshall resources, i.e. things that
>> are bound to a process or marchine. By nature, any access to the
>> resource will have to refer back to the process resp. machine where
>> the marshalled data structure originated from. Marshalling seems to
>> be reasonable while the marshalling process resp. machine continues
>> to operate. However, even then, access via the marshalled proxy
>> will have to take possible resource unavailability into account
>> (something that might happen without marshalling as well, but
>> marshalling generally increases the likelihood of unavailability).
>> This applies only if the resource can be accessed through the
>> unmarshalled resource as if it were local. The ability to marshall
>> a URL in string form is already covered under [Default mechanism]
>> :-).
>> *Yes: Resources can be marshalled.
>> *No: Resources cannot be marshalled.
>
>
> This is *way* too simplistic. If resources are not marshable, are
> they detected and rejected, or are they just dumped binary, probably
> causing havoc upon unmarshalling?

It might be a good idea to add a safe/unsafe flag.

> If there is support for resources, can resources only be reused in
> the same process or can they be rebound if read back into a different
> process?

Good point.

> In the latter case, is there a way to prevent rebinding of resources
> upon unmarshalling (which is very important for security)?

Unmarshalling opens security issues anyway: marshalled data is prone to
eavesdropping, and marshalling some data might inadvertently marshall
auxiliary data structures that were never intended for transmission. (An
example comes to mind: there was a case of a Word document that
contained an embedded Excel sheet with prices; the receiver opened that
sheet for curiosity and found that it contained all the pricing
calculations, which gave him a /lot/ of leeway for the next round of
price negotiations. Or the case of Word documents which contained
invisible derisive comments on customers, which were reactivated by said
customers and made a lot of bad press for the issuing company...)

In other words, I think that security issues with resources are the same
as security issues that arise with any form of transitivity: you might
inadvertently transmit information that you didn't intend to transmit.
Which makes security orthogonal to other aspects.

I'd like to add information on security to the Survey, but I don't know
enough about the issue to validate information (or even to give good
categories), so I have been leaving it out.

> Are resources detected (and rejected) upon unmarshalling that have no
> meaning in the current process?

"Resource unmarshalling" was carefully worded to mean the resource on
the original machine/process. In other words, unmarshalling a resource
will at best return an object that routes all requests back to the
original machine/process.

> Actually, a language that allows marshalling of resources "by
> accident" without proper treatment is *much* worse than one that
> safely rejects any attempt to marshall resources. I think you should
> at least have the categories "undetected" (unsafe), "rejected"
> (safe), "rebound" (transferable) instead of yes/no to cover the most
> important issues.

I believe that my remarks make the above paragraph insubstantial, but
I'm willing to improve my understanding if you disagree :-)

>> [Base functions] How functions with a direct representation in

>> source code are marshalled. [...]

>>
>> [Composed functions] Marshalling of functions that were constructed
>> at run-time and that don't have a direct equivalent in source code
>> (in other words: it's impossible to construct even an implicit name
>> during compilation).
>
> Sorry, but I still insist that this is nonsense. There are no such
> functions. Take function composition, for example:
>
> compose f g x = f (g x)
>
> You are saying there is no source code for (compose sin sqrt). But
> sure there is! Desugar the definition of compose:
>
> compose = \f.\g.\x.f (g x)
>
> and you see that the source code in question is \x.f (g x), but you
> need a closure to represent the actual function, of course. An outer
> lambda is no different from an outer let! And this goes for any kind
> of function "constructed" by composition, partial application
> (currying), or whatsoever.

There is still no /direct/ source code for it.
It doesn't matter that source code can be constructed, even
automatically - sure there can, for any closure.

However, this criterion is what I call a "secondary definition". I
didn't find a good way to express what I really wanted to (namely
whether a language can marshall/unmarshall closures and similar beasts),
so I took the "direct representation in source code" bit.
It's flaky, and I know it. I'll appreciate any better definition that
will be understood by all FPL people (including those with a Lisp or
Erlang mindset - it's always the borderline cases where things get
complicated, but it's also the borderline cases that are interesting
since these tend to inject their ideas in the mainstream after a while).

>> *Graph: These objects are represented as trees (directed acyclic
>> graphs the presence of sharing, general directed graphs in the
>> presence of cycles).
>
> To be honest, I have no idea what you mean here.

The implementation could represent a closure as an AST-plus-bindings, or
it could compile it to byte code (or whatever) and transmit /that/.

Actually, I think it was a mistake to distinguish between "Code" and
"Graph". Different implementations of the language might use different
implementations, and that wouldn't warrant multiple entries in the Survey.
Or, in other words, the distinction in itself isn't very interesting to
an application designers, only its consequences (mainly interoperability
issues with a modicum of efficiency concerns) - and interoperability is
already covered elsewhere, efficiency isn't within the scope of the Survey.

>> [Platform interoperability] Ability to transmit data between
>> different OSes or architectures. Note: Floating-point issues are
>> nearly impossible to make truly machine-independent without
>> sacrificing a large degree efficiency, do I'd expect lots of "Yes
>> (except floating-point data)" answers ;-)
>
> Why do you think so? Choose one format, IEEE or whatever, and be done
> with it. We are talking about an external format here, it should have
> nothing to do with the internal FP format used on particular
> platforms - marshalling has to perform conversions anyway.

Some conversions are lossy, others are lossless.
Converting between IEEE and, say, some IBM mainframe formats is actually
lossy (and IIRC it's lossy /both/ directions).

>> Haskell [Default mechanism] Per-type. [Customizability] Decomposed?
>> Monolithic? [Value transitivity] Eager. [Type transitivity] No.
>> [Type safety] Failure. [Variables] N/A. [Variable transitivity]
>> N/A. [Preserves sharing] No. [Resources] No. [Base functions] No.
>> [Composed functions] No. [Implementation interoperability] Yes.
>> [Platform interoperability] Yes. (Except floats?) [Foreign language
>> import] Which languages, if any? [Foreign language export] Which
>> languages, if any?
>
> Haskell *has* variables, they just have to live inside an appropriate
> monad.

No, they are not variables. They are just names that get rebound on each
iteration.
Assuming that the names inside a state monad are variables is a nice
abstraction, but this abstracts away some essential properties that make
a difference particularly if you start copying the thing around.

Actually, the only place where Haskell really manipulates variables is
within the IO monad, which is essentially a way of constructing "action
lists". The actions aren't executed within the program, they are
interpreted by the run-time system. Or one could say that IO monad
values are functions that map input stimuli to action lists (and the
actions, when executed, might create new input stimuli, say the setting
of a variable) - but I'm too hazy on the details to say such things with
any degree of certaincy (sp?).

> In principle, it may even be possible to marshall them under some
> restrictions.

The monads themselves are trivial to marshall.
Whether they mean anything interesting is a quite different question; it
depends on how the monad was coded, I guess.
But I don't know enough about Haskell to say for sure.

> The same goes for Clean, I guess.

>> OCaml
> [...]
>> Oz
> [...]
>> SML (Alice)
> [...]

Thanks, this has been added.

>> SML (SML/NJ) [Resources] Yes? No?
>
> No (undetected).

I'll make this a "No (unsafe)", meaning "some representation is
marshalled and unmarshalled, but it will usually make not any sense
after unmarshalling".

Does this appropriately describe the situation in SML/NJ?

Regards,
Jo

Fergus Henderson

da leggere,

26 nov 2003, 21:53:2526/11/03

"Andreas Rossberg" <ross...@ps.uni-sb.de> writes:

>Fergus Henderson <f...@cs.mu.oz.au> wrote:
>>
>> Closures are a form of existential typing: knowing
>> what that the type of a closure is does not tell you what the types of
>> the components of that closure are.
>
>I know, but wouldn't you agree that coping with that problem is easier than
>the code externalisation issue?

Easier than a _good_ solution to the code externalisation issue, yes,
but not easier than just writing out the code address, for example.

Feuer

da leggere,

26 nov 2003, 23:50:4926/11/03

Joachim Durchholz wrote:

> > Haskell *has* variables, they just have to live inside an appropriate
> > monad.
>
> No, they are not variables. They are just names that get rebound on each
> iteration.

Which in haskell are called variables.

> Actually, the only place where Haskell really manipulates variables is
> within the IO monad, which is essentially a way of constructing "action
> lists". The actions aren't executed within the program, they are
> interpreted by the run-time system. Or one could say that IO monad
> values are functions that map input stimuli to action lists (and the
> actions, when executed, might create new input stimuli, say the setting
> of a variable) - but I'm too hazy on the details to say such things with
> any degree of certaincy (sp?).

Yeah. Including manipulating reference cells which are what you call
variables. This is not Standard Haskell, but GHC supports it and
I think others do too. (and you should note that most Haskell users
use GHC).

> The monads themselves are trivial to marshall.
> Whether they mean anything interesting is a quite different question; it
> depends on how the monad was coded, I guess.
> But I don't know enough about Haskell to say for sure.

You don't marshall the monad. You marshall the ref cell. If you
want to do something like that, which nobody has AFAIK.

DAvid

Andreas Rossberg

da leggere,

27 nov 2003, 08:02:1927/11/03

Joachim Durchholz wrote:
>
> A fully precise definition would be "untyped (or with a type that's
> unrelated to the pickled objects",

Well, maybe we talk about different meanings of "typed" here. In a
"dynamically typed" language the heap is "typed" in the sense of dynamic
typing. The same goes for pickles, consequently.

I would simply leave out any reference to typing in the definition. The main
point is, it's linearization of the heap's graph structure.

>> Note that "decomposed" and "monolithic" (not sure I'm happy with this
>> terminology)
>
> Neither me.

How about "modal" and "extensible"?

>>> [Value transitivity] Accessing immutable data through an
>>> unmarshalled object isn't syntactically different from accessing
>>> immutable data via a locally-constructed object. *Eager:
>>> Unmarshalled are immediately transitively accessible. *Lazy:
>>> Unmarshalled data may be incomplete, but missing data is
>>> transparently requested from the sources when it's accessed. This
>>> is semantically equivalent to "Eager", except that failure modes
>>> and efficiency may differ.
>>
>> And it prevents persistence!
>
> How that?
> (Actually I don't know whether you refer to value transitivity itself or
> to its "Lazy" variant, so I'm doubly confused.)

I refered to laziness. If data is dumped into a file or data base, and the
original process terminates, then another process can no longer acquire
those "lazy" parts. Consequently, persistence enforces eager marshalling
(or come with failure modes that render it essentially useless).

>> In the latter case, is there a way to prevent rebinding of resources
>> upon unmarshalling (which is very important for security)?
>
> Unmarshalling opens security issues anyway: marshalled data is prone to
> eavesdropping, and marshalling some data might inadvertently marshall
> auxiliary data structures that were never intended for transmission.

Right, but that is a completely different aspect of security, on a different
level. What I described is important for sandboxing.

> "Resource unmarshalling" was carefully worded to mean the resource on
> the original machine/process. In other words, unmarshalling a resource
> will at best return an object that routes all requests back to the
> original machine/process.
>
>> Actually, a language that allows marshalling of resources "by
>> accident" without proper treatment is *much* worse than one that
>> safely rejects any attempt to marshall resources. I think you should
>> at least have the categories "undetected" (unsafe), "rejected"
>> (safe), "rebound" (transferable) instead of yes/no to cover the most
>> important issues.
>
> I believe that my remarks make the above paragraph insubstantial, but
> I'm willing to improve my understanding if you disagree :-)

Well, I can live with your suggestion to at least cover the safety aspect (I
think a lot of implementations simply ignore the issue of resources and
hence allow marshalling them by accident, with undefined, unsafe semantics,
which is very bad).

>> You are saying there is no source code for (compose sin sqrt). But
>> sure there is! Desugar the definition of compose:
>>
>> compose = \f.\g.\x.f (g x)
>>
>> and you see that the source code in question is \x.f (g x), but you
>> need a closure to represent the actual function, of course. An outer
>> lambda is no different from an outer let! And this goes for any kind
>> of function "constructed" by composition, partial application
>> (currying), or whatsoever.
>
> There is still no /direct/ source code for it.

I don't follow. How more "direct" can it get?

> It doesn't matter that source code can be constructed, even
> automatically - sure there can, for any closure.

But it isn't "constructed", it is simply there. Any function is either
primitive, or stems from some lambda expression in your source code (plus
the closure environment). There is no other way. It's as simple as that. At
runtime, there is nothing besides closure construction!

> It's flaky, and I know it. I'll appreciate any better definition that
> will be understood by all FPL people

I'd just make the "Functions" entry distinguish between full support for
closures, or "autonomous" functions only (those with no free variables).

>>> *Graph: These objects are represented as trees (directed acyclic
>>> graphs the presence of sharing, general directed graphs in the
>>> presence of cycles).
>>
>> To be honest, I have no idea what you mean here.
>
> The implementation could represent a closure as an AST-plus-bindings, or
> it could compile it to byte code (or whatever) and transmit /that/.

No, no, where would that AST come from? Marshalling just dumps the data
structures from the heap. A function closure is an ordinary data structure
on that heap and certainly contains no AST, only a code pointer (well, code
itself might actually be designed as an AST, but you certainly didn't mean
that).

Sorry, but somehow I feel that there is a fundamental misunderstanding on
your side about how 1st-class functions are represented and constructed at
runtime.

>> Haskell *has* variables, they just have to live inside an appropriate
>> monad.
>
> No, they are not variables.

You mention the IO monad. IMO it provides true variables. One might imagine
a marshalling operation in the IO monad that can cope with IORefs. Of
course, you had to prevent its pickles to escape that monad, but I don't
see why that should be impossible. If you have some form of monadic
concurrency it may well be useful.

>>> SML (SML/NJ) [Resources] Yes? No?
>>
>> No (undetected).
>
> I'll make this a "No (unsafe)", meaning "some representation is
> marshalled and unmarshalled, but it will usually make not any sense
> after unmarshalling".
>
> Does this appropriately describe the situation in SML/NJ?

Yes, I think so (somebody please correct me if I'm wrong).

Cheers,

- Andreas

--
Andreas Rossberg, ross...@ps.uni-sb.de

"Computer games don't affect kids; I mean if Pac Man affected us
as kids, we would all be running around in darkened rooms, munching
magic pills, and listening to repetitive electronic music."
- Kristian Wilson, Nintendo Inc.

Vincenzo aka Nick Name

da leggere,

27 nov 2003, 11:12:5627/11/03

Joachim Durchholz wrote:

> Assuming that the names inside a state monad are variables is a nice
> abstraction

I guess the OP was talking about variables in the IO monad, wich are
true reference to mutable cells, aka variables.

Joachim Durchholz

da leggere,

28 nov 2003, 09:42:1028/11/03

Feuer wrote:
>
> Joachim Durchholz wrote:
>
>>>Haskell *has* variables, they just have to live inside an appropriate
>>>monad.
>>
>>No, they are not variables. They are just names that get rebound on each
>>iteration.
>
> Which in haskell are called variables.
>
>>Actually, the only place where Haskell really manipulates variables is
>>within the IO monad, which is essentially a way of constructing "action
>>lists". The actions aren't executed within the program, they are
>>interpreted by the run-time system. Or one could say that IO monad
>>values are functions that map input stimuli to action lists (and the
>>actions, when executed, might create new input stimuli, say the setting
>>of a variable) - but I'm too hazy on the details to say such things with
>>any degree of certaincy (sp?).
>
> Yeah. Including manipulating reference cells which are what you call
> variables.

Does Haskell indeed have reference cells? It's being advertized as
"pure", so there is simply no place where a reference cell could exist.
(Except within the IO monad. Or some other "magical" monad that, like
IO, derives its semantics from "outside the language", i.e. the language
just defines a description of things to do and throws that list over the
fence to the run-time system).

> This is not Standard Haskell, but GHC supports it and
> I think others do too. (and you should note that most Haskell users
> use GHC).

I know - count me into that crowd (though I could hardly call myself a
Haskell user).

>>The monads themselves are trivial to marshall.
>>Whether they mean anything interesting is a quite different question; it
>>depends on how the monad was coded, I guess.
>>But I don't know enough about Haskell to say for sure.
>
> You don't marshall the monad. You marshall the ref cell. If you
> want to do something like that, which nobody has AFAIK.

There are uses for marshalling a ref cell.
To maintain the semantics of the ref cell, unmarshalling a ref cell must
result in a proxy that routes all requests back to the original ref
cell. If unmarshalling from a file, any updates to the ref cell would
have to go back to that file, for example. (Mutable data is inefficient,
as we all know *g*)

Regards,
Jo

Joachim Durchholz

da leggere,

28 nov 2003, 09:42:1728/11/03

Andreas Rossberg wrote:
> Joachim Durchholz wrote:
>
>>A fully precise definition would be "untyped (or with a type that's
>>unrelated to the pickled objects",
>
> Well, maybe we talk about different meanings of "typed" here. In a
> "dynamically typed" language the heap is "typed" in the sense of dynamic
> typing. The same goes for pickles, consequently.

Ah, now I see what you mean.
However, I think that considering the pickled representation as typed
would be misleading. On the contrary, the essence of pickling is to make
the data itself untyped so that it can be sent over the network, stored
in a file, or put into any other channel that doesn't know about the
type system of the language.
Of course, the challenge is the reconstruction of a typed data structure
from the untyped stream. Some language encode the type in the pickled
representations, others rely on conventions (and a large part of the
Survey is devoted to exploring where exactly the borderline between
explicit encoding and implicit assumptions is drawn). Still, just
encoding type information doesn't make the data itself typed - in that
case, the program sources would be typed as well since type information
is present in them. Or, for a more constructive criterion: the pickle
would preserve type iff one could apply the exactly same operations on
it as on the original, unpickled data.

Accidentally, the above definition is more precise than I thought: "with
a type that's unrelated to the type of the pickled objects" includes all
those cases where data is pickled into something other than a uniform
byte string (for example, a partially evaluated expression that will
deliver more pickle snippets as requested).

> I would simply leave out any reference to typing in the definition. The main
> point is, it's linearization of the heap's graph structure.

No. The main point is making the objects suitable for transmission to
another process, using a medium that doesn't know about the language's
type system.
Which might be the best approach to defining pickling. I'll have to
think about that.

>>>Note that "decomposed" and "monolithic" (not sure I'm happy with this
>>>terminology)
>>
>>Neither me.
>
> How about "modal" and "extensible"?

Actually, the decomposed approach need not be extensible - I can imagine
decomposed frameworks which, once an aspect is fixed, don't allow
further modification of that aspect (and with a reason).
And "modal" is probably not immediately understood.

>>>>[Value transitivity] Accessing immutable data through an
>>>>unmarshalled object isn't syntactically different from accessing
>>>>immutable data via a locally-constructed object. *Eager:
>>>>Unmarshalled are immediately transitively accessible. *Lazy:
>>>>Unmarshalled data may be incomplete, but missing data is
>>>>transparently requested from the sources when it's accessed. This
>>>>is semantically equivalent to "Eager", except that failure modes
>>>>and efficiency may differ.
>>>
>>>And it prevents persistence!
>>
>>How that?
>>(Actually I don't know whether you refer to value transitivity itself or
>>to its "Lazy" variant, so I'm doubly confused.)
>
> I refered to laziness. If data is dumped into a file or data base, and the
> original process terminates, then another process can no longer acquire
> those "lazy" parts.

Agreed. Lazy marshalling will not work for the write-to-file case.
It's one of those test questions: if people answer "lazy value
transitivity", I'll ask what will happen when marshalling to disk.

Actually there could be valid answers to that.
For example, if a process has marshalled data lazily, the run-time
system might have a clean-up routine that dumps enough of the process to
disk that it can be resurrected for delivering the missing data.
Or there might be no idea of process termination. Processes never die
explicitly, they are garbage collected when the last reference to them
goes away.

> Consequently, persistence enforces eager marshalling
> (or come with failure modes that render it essentially useless).

In general, I agree - for my (and, obviously, your) definition of "useless".
However, "usefulness" is a matter of perspective and goals. You and me
would consider lazy value transitivity too restricted to think seriously
about it - but others might find good uses for it. I'm trying to keep
the perspective as broad as possible, and this means I have to include
all possibilities that I can think of. (As the introduction says: "Not

all anticipated situations will make sense, or even be present in any

real language. I'm just trying to cover all possibilities." And with
"cover all possibilities", I mean "make the perspective as broad as
possible, even if that means looking at nonsensical ideas". Real
languages do contain nonsensical mechanisms after all...)

>>>In the latter case, is there a way to prevent rebinding of resources
>>>upon unmarshalling (which is very important for security)?
>>
>>Unmarshalling opens security issues anyway: marshalled data is prone to
>>eavesdropping, and marshalling some data might inadvertently marshall
>>auxiliary data structures that were never intended for transmission.
>
> Right, but that is a completely different aspect of security, on a different
> level. What I described is important for sandboxing.

Could you give some examples that roughly cover the issues involved?
This may already be covered by other categories, but my ideas about the
issues are too fuzzy to do a reliable check.

>>"Resource unmarshalling" was carefully worded to mean the resource on
>>the original machine/process. In other words, unmarshalling a resource
>>will at best return an object that routes all requests back to the
>>original machine/process.
>>
>>>Actually, a language that allows marshalling of resources "by
>>>accident" without proper treatment is *much* worse than one that
>>>safely rejects any attempt to marshall resources. I think you should
>>>at least have the categories "undetected" (unsafe), "rejected"
>>>(safe), "rebound" (transferable) instead of yes/no to cover the most
>>>important issues.
>>
>>I believe that my remarks make the above paragraph insubstantial, but
>>I'm willing to improve my understanding if you disagree :-)
>
> Well, I can live with your suggestion to at least cover the safety aspect (I
> think a lot of implementations simply ignore the issue of resources and
> hence allow marshalling them by accident, with undefined, unsafe semantics,
> which is very bad).

Agreed.
Marshalling, say, a C file descriptor doesn't make much sense.

>>>You are saying there is no source code for (compose sin sqrt). But
>>>sure there is! Desugar the definition of compose:
>>>
>>>compose = \f.\g.\x.f (g x)
>>>
>>>and you see that the source code in question is \x.f (g x), but you
>>>need a closure to represent the actual function, of course. An outer
>>>lambda is no different from an outer let! And this goes for any kind
>>>of function "constructed" by composition, partial application
>>>(currying), or whatsoever.
>>
>>There is still no /direct/ source code for it.
>
> I don't follow. How more "direct" can it get?

"Direct" in the sense of "present in the source code of the program
being executed".

>>It doesn't matter that source code can be constructed, even
>>automatically - sure there can, for any closure.
>
> But it isn't "constructed", it is simply there. Any function is either
> primitive, or stems from some lambda expression in your source code (plus
> the closure environment). There is no other way. It's as simple as that. At
> runtime, there is nothing besides closure construction!

Yes. I'm talking about the source code that was fed to the language
processor.

>>It's flaky, and I know it. I'll appreciate any better definition that
>>will be understood by all FPL people
>
> I'd just make the "Functions" entry distinguish between full support for
> closures, or "autonomous" functions only (those with no free variables).

Does that help?
An autonomous function could have been constructed at run-time just like
a closure.

>>>>*Graph: These objects are represented as trees (directed acyclic
>>>>graphs the presence of sharing, general directed graphs in the
>>>>presence of cycles).
>>>
>>>To be honest, I have no idea what you mean here.
>>
>>The implementation could represent a closure as an AST-plus-bindings, or
>>it could compile it to byte code (or whatever) and transmit /that/.
>
> No, no, where would that AST come from? Marshalling just dumps the data
> structures from the heap. A function closure is an ordinary data structure
> on that heap and certainly contains no AST, only a code pointer (well, code
> itself might actually be designed as an AST, but you certainly didn't mean
> that).

Yes, I meant exactly that. There are many VMs around that represent
functions as ASTs (more precisely, DAGs when reduced to ground terms).

> Sorry, but somehow I feel that there is a fundamental misunderstanding on
> your side about how 1st-class functions are represented and constructed at
> runtime.

Or maybe you haven't seen how some Haskell interpreters are built :-)
Actually, I was thinking about the tagless spineless Haskell interpreter
presented in some papers. (I don't have any references handy, I'll look
them up if Google doesn't find them for you.)
There's a /lot/ of variety. Including many ideas that seem quite whacko
on first sight. The DAG design seems to be mainly designed for programs
that do lots of closure construction (e.g. if you have code that's
making heavy use of higher-order functions that partially apply
parameters). In other words, for Haskell. I don't know whether such a
construction makes sense in any other context.

>>>Haskell *has* variables, they just have to live inside an appropriate
>>>monad.
>>
>>No, they are not variables.
>
> You mention the IO monad. IMO it provides true variables. One might imagine
> a marshalling operation in the IO monad that can cope with IORefs. Of
> course, you had to prevent its pickles to escape that monad, but I don't
> see why that should be impossible. If you have some form of monadic
> concurrency it may well be useful.

Agreed.
The IO monad is a part of the language after all.

Regards,
Jo

Vincenzo aka Nick Name

da leggere,

28 nov 2003, 11:03:1628/11/03

Joachim Durchholz wrote:

> Does Haskell indeed have reference cells? It's being advertized as
> "pure", so there is simply no place where a reference cell could
> exist. (Except within the IO monad. Or some other "magical" monad
> that, like IO, derives its semantics from "outside the language", i.e.
> the language just defines a description of things to do and throws
> that list over the fence to the run-time system).

Besides, the IO monad is overkill for arrays, since the IO monad
features nondeterminism, and so if you want to stay safe you can't
allow a term of type (IO a -> a), while array computations are
perfectly deterministic, but still require proper sequencing and
violate referential integrity; indeed there are more well-suited monads
like STArray or something, I don't remember right now, wich have true
(and fast) mutable arrays, and have a freeze operation of type (M a ->
a) wich is safe and do not violate referential integrity.

Andreas Rossberg

da leggere,

28 nov 2003, 11:31:4028/11/03

Joachim Durchholz wrote:
>
> On the contrary, the essence of pickling is to make
> the data itself untyped so that it can be sent over the network, stored
> in a file, or put into any other channel that doesn't know about the
> type system of the language.

I disagree. The "heap" - and your system's memory management - doesn't know
about the language's type system either. The data representation on the
heap usually is precisely as typed or untyped as the content of a pickle.
Only the runtime system accessing it has a typeful interpretation of it -
again in both cases.

So the essence of pickling is definitely *not* related to typing. As I said,
it is turning the internal graph structure into a compact external linear
format that can be stored or transmitted easily. The fact that pickle
formats are often untyped (in a certain sense) is rather by accident, not
by requirement.

>>>>Note that "decomposed" and "monolithic" (not sure I'm happy with this
>>>>terminology)
>>>
>>>Neither me.
>>
>> How about "modal" and "extensible"?
>
> Actually, the decomposed approach need not be extensible - I can imagine
> decomposed frameworks which, once an aspect is fixed, don't allow
> further modification of that aspect (and with a reason).

Now I'm confused. Yes, they are orthogonal - that's what I pointed out in my
previous posting. Or did you mean "monolithic" instead of "decomposed" in
the paragraph above?

> And "modal" is probably not immediately understood.

Well, more so than "decomposed", I'd say. ;-) Maybe "multi-modal" is making
it clearer?

>> Right, but that is a completely different aspect of security, on a
>> different level. What I described is important for sandboxing.
>
> Could you give some examples that roughly cover the issues involved?
> This may already be covered by other categories, but my ideas about the
> issues are too fuzzy to do a reliable check.

If a process has no way of controlling rebinding of system resources, than
it cannot safely call any function loaded from a pickle - it could always
access whatever it likes on the target machine. Much like the infamous
macros in Word documents... ;-) Higher-order pickles would be inherently
dangerous.

Note that unpickling always is about receiving something from untrusted
sources. So you need some way to isolate or restrict it. The easiest way is
disallowing resources in pickles. If you allow them, you must provide some
additional policies, or be unsafe.

> Marshalling, say, a C file descriptor doesn't make much sense.

Not only that, it may accidently be valid on the target system and produce
havoc if accessed (this would be an instance of uncontrolled rebinding).

>>>>You are saying there is no source code for (compose sin sqrt). But
>>>>sure there is! Desugar the definition of compose:
>>>>
>>>>compose = \f.\g.\x.f (g x)
>>>>
>>>>and you see that the source code in question is \x.f (g x), but you
>>>>need a closure to represent the actual function, of course. An outer
>>>>lambda is no different from an outer let! And this goes for any kind
>>>>of function "constructed" by composition, partial application
>>>>(currying), or whatsoever.
>>>
>>>There is still no /direct/ source code for it.
>>
>> I don't follow. How more "direct" can it get?
>
> "Direct" in the sense of "present in the source code of the program
> being executed".

Er, but that was what I was saying: it *is* present in the source code of
the program. Always. The library source code containing compose is part of
the program's source code, of course.

Or in what sense do you consider \x.f (g x) absent in the definition of
compose above?

>> Any function is either
>> primitive, or stems from some lambda expression in your source code (plus
>> the closure environment). There is no other way. It's as simple as that.
>> At runtime, there is nothing besides closure construction!
>
> Yes. I'm talking about the source code that was fed to the language
> processor.

Me too...

>> I'd just make the "Functions" entry distinguish between full support for
>> closures, or "autonomous" functions only (those with no free variables).
>
> Does that help?
> An autonomous function could have been constructed at run-time just like
> a closure.

But at runtime you only construct closures! A closure contains some code
pointer (for whatever code is), but that code *always* is the translation
of some lambda expression in some source code. Code isn't constructed at
runtime.

Since an autonomous function has an empty closure environment it is
basically just a code pointer - but still having a 1-to-1 correspondence to
some source code lambda.

> Actually, I was thinking about the tagless spineless Haskell interpreter
> presented in some papers.

Yes, I reckoned, but I don't see how an implementation based on the
G-Machine or something similar would make a fundamental difference with
respect to what I said. The main difference is that closures not only can
represent lambdas, but arbitrary source code expressions.

Joachim Durchholz

da leggere,

28 nov 2003, 20:42:0428/11/03

Andreas Rossberg wrote:
> Joachim Durchholz wrote:
>
>> On the contrary, the essence of pickling is to make the data itself
>> untyped so that it can be sent over the network, stored in a file,
>> or put into any other channel that doesn't know about the type
>> system of the language.
>
> I disagree.

I don't see where you disagree with my arguments (those that I gave
above the paragraph you're quoting here).
That's quite unfortunate, since this denies me the chance to revisit my
arguments and see where they may be incomplete or inconsistent.

I'll try to argue based on your arguments, but it's got the feel of an
uphill battle.

> The "heap" - and your system's memory management - doesn't know about
> the language's type system either. The data representation on the
> heap usually is precisely as typed or untyped as the content of a
> pickle. Only the runtime system accessing it has a typeful
> interpretation of it - again in both cases.

I agree with the observations, but I think you're overlooking an
important difference:
For heap data, there is no good reason to access the untyped representation.
For pickled data, there are situations where accessing the pickles'
representations is important. For example, there might be a
cryptographic layer, written in the language itself, through which the
pickle is passed. There may be routing code that sends the pickle off to
different machines to achieve workload balancing; the routing rules may
depend on user settings, so this isn't very low-level anymore.

> So the essence of pickling is definitely *not* related to typing.

Since there /is/ an important pertaining difference, this is a non sequitur.

> As I said, it is turning the internal graph structure into a compact
> external linear format that can be stored or transmitted easily. The
> fact that pickle formats are often untyped (in a certain sense) is
> rather by accident, not by requirement.

I strongly disagree here. If being "external" is essential, being
untyped is an immediate consequence, since external media will be unable
to express the language's types.

There's an even stronger argument why pickling is related to type
conversions: if the channel for which the data is pickled were able to
handle the language's types directly, there would be no need for
pickling in the first place, the channel would be able take the data
just as it is.
(It's partly a question of definitions. If such a hypothetical typed
channel can accept typed data, one could still define this as "trivial
pickling". But it's, well, er, trivial.)

>>>>> Note that "decomposed" and "monolithic" (not sure I'm happy
>>>>> with this terminology)
>>>>
>>>> Neither me.
>>>
>>> How about "modal" and "extensible"?
>>
>> Actually, the decomposed approach need not be extensible - I can
>> imagine decomposed frameworks which, once an aspect is fixed, don't
>> allow further modification of that aspect (and with a reason).
>
> Now I'm confused. Yes, they are orthogonal - that's what I pointed
> out in my previous posting. Or did you mean "monolithic" instead of
> "decomposed" in the paragraph above?

I'm not sure which you intended to replace what, so I'm not sure myself.

>> And "modal" is probably not immediately understood.
>
> Well, more so than "decomposed", I'd say. ;-) Maybe "multi-modal" is
> making it clearer?

Not at all.

The basic difference is really "decomposed" vs. "monolithic". The
monolithic model is essentially a hook that will be called whenever an
object is to be marshalled. The decomposed model is one that provides
different hooks for different purposes, i.e. the marshalling process is
decomposed into different phases if you will: object tracing (hook: weak
pointers), late sharing detection (hook: normalizing function; can be
used to create a dictionary of objects-already-in-pickle, so that
identical values aren't sent twice even if they start life as identical
copies), conversion of primitive types to byte sequences (I can't
imagine any useful hooks for that, but I'm pretty sure some language
designer will *g*). Actually, the phase hooks may even be decomposed
further on a per-type basis.
(I'm using an extremely broak definition of "hook" in the above:
anything that can influence a given process, whether it's a function or
just a flag. Well, in a functional language, functions are data, and
data can be seen as functions, so this isn't too far-fetched *g*)

>>> Right, but that is a completely different aspect of security, on
>>> a different level. What I described is important for sandboxing.
>>
>> Could you give some examples that roughly cover the issues
>> involved? This may already be covered by other categories, but my
>> ideas about the issues are too fuzzy to do a reliable check.
>
> If a process has no way of controlling rebinding of system resources,
> than it cannot safely call any function loaded from a pickle - it
> could always access whatever it likes on the target machine. Much
> like the infamous macros in Word documents... ;-) Higher-order
> pickles would be inherently dangerous.
>
> Note that unpickling always is about receiving something from
> untrusted sources. So you need some way to isolate or restrict it.
> The easiest way is disallowing resources in pickles. If you allow
> them, you must provide some additional policies, or be unsafe.
>
>> Marshalling, say, a C file descriptor doesn't make much sense.
>
> Not only that, it may accidently be valid on the target system and
> produce havoc if accessed (this would be an instance of uncontrolled
> rebinding).

OK, now I understand what you mean.

However, this is not what the question was about. Rereading the
descriptive part of the category, it's clear (at least to me) that the
question is about whether the language is smart enough to install some
refer-back mechanism when it unmarshalls a resource, not about whether
there are any safety issues in this area.
If anything is wrong with that category, it's not the answers, it's the
question - and I agree that the question should be made broader in scope.

This gives me the following, revised category:

-------------------
[Resources] A "resource" is a piece of data that must be interpreted
within the context of the marshalling process or machine. The canonical
example for this are file handles.
*Proxy: Unmarshalling a marshalled resource will install a proxy that
will route all requests back to the machine/process where the resource
is located. (Obviously, the originating process must still be alive.)
*Failure: One of the steps "marshall the resource", "unmarshall the
resource", or "use the unmarshalled object" will fail.
*Undefined: The resource will be unmarshalled, but the interpretation of
the unmarshalled value is undefined. (In the case of file handles, this
would be reinterpreting the file handle in the context of the target
process, where it might mean an entirely different file.)
-------------------

A nice side effect is that the associated text has become considerably
shorter :-)

>>>>> You are saying there is no source code for (compose sin
>>>>> sqrt). But sure there is! Desugar the definition of compose:
>>>>>
>>>>> compose = \f.\g.\x.f (g x)
>>>>>
>>>>> and you see that the source code in question is \x.f (g x),
>>>>> but you need a closure to represent the actual function, of
>>>>> course. An outer lambda is no different from an outer let!
>>>>> And this goes for any kind of function "constructed" by
>>>>> composition, partial application (currying), or whatsoever.
>>>>
>>>> There is still no /direct/ source code for it.
>>>
>>> I don't follow. How more "direct" can it get?
>>
>> "Direct" in the sense of "present in the source code of the program
>> being executed".
>
> Er, but that was what I was saying: it *is* present in the source
> code of the program. Always. The library source code containing
> compose is part of the program's source code, of course.
>
> Or in what sense do you consider \x.f (g x) absent in the definition
> of compose above?

It's absent in the sense that there is no function of this kind in the
source. What you have is \f.\g.\x.f (g x) (i.e. the body of "compose").
\x.f (g x) is present, but not as a self-contained entity, but just as a
component of something else.

Re-quoting:

> lambda is no different from an outer let! And this goes for any kind
> of function "constructed" by composition, partial application
> (currying), or whatsoever.

If I have

compose = \f.\g.\x.f (g x)

and the (admittedly silly) expression
compose sin (compose ln tan)
then the result
sin ln tan
most definitely is /not/ part of the original source! The source can be
transformed into it, but there are undecidable cases (say,
compose A B
within a recursive function with parameters A and B). So there are most
definitely cases of functions that aren't present in the source code
even if "source code presence" is defined as "algorithmically derivable
from source code".

Personally, I'd like to stick with trivial transformations of source
code. I.e. a function is "present" if it is the body of a function.

>>> Any function is either primitive, or stems from some lambda
>>> expression in your source code (plus the closure environment).
>>> There is no other way. It's as simple as that. At runtime, there
>>> is nothing besides closure construction!
>>
>> Yes. I'm talking about the source code that was fed to the language
>> processor.
>
> Me too...

You're including some transformations, such as extracting a subexpression.
These may seem trivial, but one person's trivial transformations are
another person's compiler manipulation, so it's quite difficult to draw
the line.

>>> I'd just make the "Functions" entry distinguish between full
>>> support for closures, or "autonomous" functions only (those with
>>> no free variables).
>>
>> Does that help? An autonomous function could have been constructed
>> at run-time just like a closure.
>
> But at runtime you only construct closures!

Well, sort of. I have built some interpreters, and while you're
conceptually doing closure evaluation, they can remain pretty implicit.

> A closure contains some
> code pointer (for whatever code is), but that code *always* is the
> translation of some lambda expression in some source code.

Agreed.

> Code isn't
> constructed at runtime.

Disagreed. You can "stick together" functions, which gives you new
functions. The set of constructed functions can even be undecidable
(just use recursion).
Monads are a popular way of doing exactly that.

> Since an autonomous function has an empty closure environment it is
> basically just a code pointer - but still having a 1-to-1
> correspondence to some source code lambda.

Agreed.
That's exactly why I don't think that the distinction between autonomous
and non-autonomous functions is very useful: the autonomous functions
are just a special case.

More importantly, they are not the special case I'm after.
Essentially, I want to differentiate between languages that are
restricted to transmitting "just functions" and those that can do things
like
complicated A B
marshall (A 5 B)
where A is an arbitrary lambda.

If you'd want to do things like "complicated" in a language that cannot
transmit lambdas, you'd have to keep all function names in the lambda as
open parameters and transmit them in separate parameters. Yucky, but old
versions of Erlang required exactly this. (In practice, one would simply
define a function that happened to have the right lambda as its body,
and use its name for transmission, so it's not as bad as it looks on
first sight.)

Hope this clarifies things a bit.

Regards,
JO

Feuer

da leggere,

28 nov 2003, 22:49:4328/11/03

Joachim Durchholz wrote:

> Does Haskell indeed have reference cells?

Yes.

> It's being advertized as
> "pure", so there is simply no place where a reference cell could exist.
> (Except within the IO monad.

Which is where they are.

As Vincenzo mentions, there are other situations where
such things make sense (specially encapsulated imperative
computations), and if you read enough about GHC maybe
you'll understand them. I don't. (haven't tried lately)

David

Feuer

da leggere,

28 nov 2003, 22:55:2328/11/03

Andreas Rossberg wrote:

> > Marshalling, say, a C file descriptor doesn't make much sense.
>
> Not only that, it may accidently be valid on the target system and produce
> havoc if accessed (this would be an instance of uncontrolled rebinding).

Actually... There are systems where file descriptors can be passed
around. It just has to be done according to the right protocol.

David
File descriptor != file descriptor number

Andreas Rossberg

da leggere,

29 nov 2003, 14:11:5929/11/03

Joachim Durchholz <joachim....@web.de> wrote:
> >
> >> On the contrary, the essence of pickling is to make the data itself
> >> untyped so that it can be sent over the network, stored in a file,
> >> or put into any other channel that doesn't know about the type
> >> system of the language.
> >
> > I disagree.
>
> I don't see where you disagree with my arguments

I disagreed with your statement about untypedness being the essence of
pickling.

> (those that I gave
> above the paragraph you're quoting here).

Hm, the only bit I snipped preceeding what I quoted was:

> Ah, now I see what you mean.
> However, I think that considering the pickled representation as typed
> would be misleading.

So I'm not sure what arguments you are referring to above.

You gave some arguments later, explaining why you don't consider a pickle
typed (wrt to the language's type system). I did not quote or refute them
because they were not relevant to my reasoning. A pickle may well be untyped
as you say, but the heap would be so in the same way (OTOH, in a dynamically
typed language both may be considered typed). My reasoning for typing issues
being irrelevant was based on that observation, namely that there is no
*change* in "typefulness" upon pickling, not that a pickle may be typed.

> I agree with the observations, but I think you're overlooking an
> important difference:
> For heap data, there is no good reason to access the untyped
representation.

Well, any access to the heap is access to the "untyped" representation.
(Some access not even being performed by the runtime system itself, e.g. the
virtual memory management of the OS swapping pages - if you want to go that
far...)

> For pickled data, there are situations where accessing the pickles'
> representations is important. For example, there might be a
> cryptographic layer, written in the language itself, through which the
> pickle is passed. There may be routing code that sends the pickle off to
> different machines to achieve workload balancing; the routing rules may
> depend on user settings, so this isn't very low-level anymore.

But these would rely on some sort of reflective mechanism that is not only
imaginable for pickles. For example, a heap inspection tool needs similar
information. So it's an orthogonal issue and thus not relevant IMO.

> > As I said, it is turning the internal graph structure into a compact
> > external linear format that can be stored or transmitted easily. The
> > fact that pickle formats are often untyped (in a certain sense) is
> > rather by accident, not by requirement.
>
> I strongly disagree here. If being "external" is essential, being
> untyped is an immediate consequence, since external media will be unable
> to express the language's types.

In the very same sense the language's runtime system is unable to "express"
those types! Types are a meta-concept, that guides the interpretation of
what is happening and represented by an implementation - internally as well
as external to a running process.

> There's an even stronger argument why pickling is related to type
> conversions: if the channel for which the data is pickled were able to
> handle the language's types directly, there would be no need for
> pickling in the first place, the channel would be able take the data
> just as it is.
> (It's partly a question of definitions. If such a hypothetical typed
> channel can accept typed data, one could still define this as "trivial
> pickling". But it's, well, er, trivial.)

I'd say, above all it's a question of the abstraction level at which you
look at it.

But take Alice for example. It uses pickling to transfer arguments and
results of RPCs. All of this is typeful, that is, both sides know how to
interpret the pickled stream and don't need to perform any checks. This is
pretty much what you describe above. But unlike you assume, the underlying
use of pickling is not transparent because it changes the semantics: all the
restrictions and semantic twists of pickling apply, e.g. resources are
rejected, user extensions are invoked. Typing-wise, nothing is happening,
though.

> > Well, more so than "decomposed", I'd say. ;-) Maybe "multi-modal" is
> > making it clearer?
>
> Not at all.
>
> The basic difference is really "decomposed" vs. "monolithic". The
> monolithic model is essentially a hook that will be called whenever an
> object is to be marshalled. The decomposed model is one that provides
> different hooks for different purposes,

OK, now I see what you mean. In that case, modality is yet another possible
dimension of customizability.

> [Resources] A "resource" is a piece of data that must be interpreted
> within the context of the marshalling process or machine. The canonical
> example for this are file handles.
> *Proxy: Unmarshalling a marshalled resource will install a proxy that
> will route all requests back to the machine/process where the resource
> is located. (Obviously, the originating process must still be alive.)
> *Failure: One of the steps "marshall the resource", "unmarshall the
> resource", or "use the unmarshalled object" will fail.
> *Undefined: The resource will be unmarshalled, but the interpretation of
> the unmarshalled value is undefined. (In the case of file handles, this
> would be reinterpreting the file handle in the context of the target
> process, where it might mean an entirely different file.)

Yes, I think that is better.

> >>>>> You are saying there is no source code for (compose sin
> >>>>> sqrt). But sure there is! Desugar the definition of compose:
> >>>>>
> >>>>> compose = \f.\g.\x.f (g x)
> >>>>>
> >>>>> and you see that the source code in question is \x.f (g x),
> >>>>> but you need a closure to represent the actual function, of
> >>>>> course. An outer lambda is no different from an outer let!
> >>>>> And this goes for any kind of function "constructed" by
> >>>>> composition, partial application (currying), or whatsoever.
> >>>>
> >>>> There is still no /direct/ source code for it.
> >>>
> >>> I don't follow. How more "direct" can it get?
> >>
> >> "Direct" in the sense of "present in the source code of the program
> >> being executed".
> >
> > Er, but that was what I was saying: it *is* present in the source
> > code of the program. Always. The library source code containing
> > compose is part of the program's source code, of course.
> >
> > Or in what sense do you consider \x.f (g x) absent in the definition
> > of compose above?
>
> It's absent in the sense that there is no function of this kind in the
> source. What you have is \f.\g.\x.f (g x) (i.e. the body of "compose").
> \x.f (g x) is present, but not as a self-contained entity, but just as a
> component of something else.

Er, yes, it's a subexpression. So? Every expression is a subexpression of
something.

> Re-quoting:
> > lambda is no different from an outer let! And this goes for any kind
> > of function "constructed" by composition, partial application
> > (currying), or whatsoever.
>
> If I have
> compose = \f.\g.\x.f (g x)
> and the (admittedly silly) expression
> compose sin (compose ln tan)
> then the result
> sin ln tan
> most definitely is /not/ part of the original source!

No, but the closure representing it contains a pointer to the code of \x.f
(g x). That's what closures are all about: they supplement some source code
expression (usually represented in compiled form) with the bindings of its
free variables, together representing a function.

In absolutely the same sense as for your example there is or isn't a
representation for the function g in

f x = let g y = x + y in g

or in

f x = x + 1
g x = f (x + x)

although they are even named! You see, by your line of reasoning, *only*
autonomous functions would have a "source representation". And that's what I
said would be the only useful distinction to make.

> Personally, I'd like to stick with trivial transformations of source
> code. I.e. a function is "present" if it is the body of a function.

Which is true for the composition example.

> You're including some transformations, such as extracting a subexpression.

As I said, everything is a subexpression. So what's the point? The
subexpression in question is a lambda. That's all you can demand in a
language with 1st class functions.

Let me repeat my example:

compose = \f.\g.\x.f (g x)

Somehow you seem to imply a difference to

compose = \f.\g.let h x = f (g x) in h

But there is no difference! It is just using syntactic sugar, just another
way to write exactly the same function and it will have exactly the same
sort of representation.

> > Code isn't
> > constructed at runtime.
>
> Disagreed. You can "stick together" functions, which gives you new
> functions. The set of constructed functions can even be undecidable
> (just use recursion).
> Monads are a popular way of doing exactly that.

No, this is still nothing but closure construction. Monads are an
abstraction for plugging functions, but it's all closures.

Recursion may indeed be primitive, the effect usually being that closures
are recursive.

> Essentially, I want to differentiate between languages that are
> restricted to transmitting "just functions" and those that can do things
> like
> complicated A B
> marshall (A 5 B)
> where A is an arbitrary lambda.

So A 5 B is a closure.

> If you'd want to do things like "complicated" in a language that cannot
> transmit lambdas, you'd have to keep all function names in the lambda as
> open parameters and transmit them in separate parameters. Yucky, but old
> versions of Erlang required exactly this.

OK, I see. You loosen the restriction to autonomous functions slightly, by
allowing "semi-autonomous" functions. Those would be functions that are
autonomous up to references to other semi-autonomous (top-level named)
functions. And you must even transfer the remaining minimal "closure
environment" manually.

However, this is basically in line with what I was saying, except that you
have a third, intermediate case. You have autonomous < semi-autonomous <
closures. It's not another dimension.

- Andreas

Joachim Durchholz

da leggere,

29 nov 2003, 21:41:5629/11/03

Andreas Rossberg wrote:
>>There's an even stronger argument why pickling is related to type
>>conversions: if the channel for which the data is pickled were able to
>>handle the language's types directly, there would be no need for
>>pickling in the first place, the channel would be able take the data
>>just as it is.
>>(It's partly a question of definitions. If such a hypothetical typed
>>channel can accept typed data, one could still define this as "trivial
>>pickling". But it's, well, er, trivial.)
>
> I'd say, above all it's a question of the abstraction level at which you
> look at it.
>
> But take Alice for example. It uses pickling to transfer arguments and
> results of RPCs. All of this is typeful, that is, both sides know how to
> interpret the pickled stream and don't need to perform any checks.

Hey, the RPC call must at least check that the data it's receiving is
being sent from an Alice process. (Not sure how far this is relevant to
this exchange.)

> This is
> pretty much what you describe above. But unlike you assume, the underlying
> use of pickling is not transparent because it changes the semantics: all the
> restrictions and semantic twists of pickling apply, e.g. resources are
> rejected, user extensions are invoked. Typing-wise, nothing is happening,
> though.

And how is the data transferred? It becomes a byte stream, I'd think.
Which is untyped.
Q.E.D.

Well, obviously, you mean something else, but I fail to see what you're
getting at.

>>>Well, more so than "decomposed", I'd say. ;-) Maybe "multi-modal" is
>>>making it clearer?
>>
>>Not at all.
>>
>>The basic difference is really "decomposed" vs. "monolithic". The
>>monolithic model is essentially a hook that will be called whenever an
>>object is to be marshalled. The decomposed model is one that provides
>>different hooks for different purposes,
>
> OK, now I see what you mean. In that case, modality is yet another possible
> dimension of customizability.

Exactly.
I'd be still interested in a better terminology. Or a better description
of the issues involved. Clarification can make things only better.

>>[Resources]
>>[...]

>
> Yes, I think that is better.

Fine, then it's in.

>>>>>>>You are saying there is no source code for (compose sin
>>>>>>>sqrt). But sure there is! Desugar the definition of compose:
>>>>>>>
>>>>>>>compose = \f.\g.\x.f (g x)
>>>>>>>
>>>>>>>and you see that the source code in question is \x.f (g x),
>>>>>>>but you need a closure to represent the actual function, of
>>>>>>>course. An outer lambda is no different from an outer let!
>>>>>>>And this goes for any kind of function "constructed" by
>>>>>>>composition, partial application (currying), or whatsoever.
>>>>>>
>>>>>>There is still no /direct/ source code for it.
>>>>>
>>>>>I don't follow. How more "direct" can it get?
>>>>
>>>>"Direct" in the sense of "present in the source code of the program
>>>> being executed".
>>>
>>>Er, but that was what I was saying: it *is* present in the source
>>>code of the program. Always. The library source code containing
>>>compose is part of the program's source code, of course.
>>>
>>>Or in what sense do you consider \x.f (g x) absent in the definition
>>>of compose above?
>>
>>It's absent in the sense that there is no function of this kind in the
>>source. What you have is \f.\g.\x.f (g x) (i.e. the body of "compose").
>>\x.f (g x) is present, but not as a self-contained entity, but just as a
>>component of something else.
>
> Er, yes, it's a subexpression. So? Every expression is a subexpression of
> something.

Both [Base functions] and [Composed functions] entries are carefully
worded to talk about marshalling /functions/. Not about marshalling
/subexpressions/.
It's possible to map functions to subexpressions and vice versa, but
they are not the same - at least not in all languages.

>>> lambda is no different from an outer let! And this goes for any
>>> kind of function "constructed" by composition, partial
>>> application (currying), or whatsoever.
>>
>> If I have compose = \f.\g.\x.f (g x) and the (admittedly silly)
>> expression compose sin (compose ln tan) then the result sin ln tan
>> most definitely is /not/ part of the original source!
>
> No, but the closure representing it contains a pointer to the code of
> \x.f (g x).

So what? When a data structure containing sin ln tan must be
transmitted, it's a composed function without a direct representation in
source code.

> That's what closures are all about: they supplement some
> source code expression (usually represented in compiled form) with
> the bindings of its free variables, together representing a function.
>
> In absolutely the same sense as for your example there is or isn't a
> representation for the function g in
>
> f x = let g y = x + y in g
>
> or in
>
> f x = x + 1
> g x = f (x + x)
>
> although they are even named! You see, by your line of reasoning,
> *only* autonomous functions would have a "source representation".

To a large degree, whether the distinction makes sense is a question of
how much of a distinction the language makes.
Languages based on term reduction typically don't make much of a
distinction; depending on the motto of the day, named subexpressions
could be seen as subexpressions, closures, or functions with implicit
parameters.
Other languages sharply distinguish between functions and expressions.

For the former class of languages, I agree the difference doesn't make
much sense. For the latter class, it does.

> And that's what I said would be the only useful distinction to make.

I think we're talking at cross purposes here.

You've been talking about how marshalling /should be/ designed.

While that's important, the Survey is about something entirely
different: namely, about how marshalling /is/ designed.
In other words: for the Survey, the useful distinctions are not those
between various useful designs, they are those between major actual
implementations.
In yet other words: I'm categorizing things to /describe/ them, not to
/design/ them.

(Actually, I disagree in that I don't consider even autonomous functions
very different from closures - they are essentially closures with an
empty environment, not even worth mentioning as a special degenerate
case. However, language designers tend to disagree with that stance, and
it's the language designers' views that I have to report, not my own.)

>>Personally, I'd like to stick with trivial transformations of source
>>code. I.e. a function is "present" if it is the body of a function.
>
> Which is true for the composition example.
>
>>You're including some transformations, such as extracting a subexpression.
>
> As I said, everything is a subexpression.

I'm slowly getting that /this/ is the exact point where you and many
(well, some) language designers disagree.

Hope I'm getting things cleared up, albeit slowly...

Regards,
Jo

Andreas Rossberg

da leggere,

1 dic 2003, 09:11:3501/12/03

Joachim Durchholz wrote:
>>
>> But take Alice for example. It uses pickling to transfer arguments and
>> results of RPCs. All of this is typeful, that is, both sides know how to
>> interpret the pickled stream and don't need to perform any checks.
>
> Hey, the RPC call must at least check that the data it's receiving is
> being sent from an Alice process. (Not sure how far this is relevant to
> this exchange.)

That is taken care of when the connection is first established.

>> This is
>> pretty much what you describe above. But unlike you assume, the
>> underlying use of pickling is not transparent because it changes the
>> semantics: all the restrictions and semantic twists of pickling apply,
>> e.g. resources are rejected, user extensions are invoked. Typing-wise,
>> nothing is happening, though.
>
> And how is the data transferred? It becomes a byte stream, I'd think.
> Which is untyped.

Only at a lower level of abstraction. And that would be precisely the same
level of abstraction at which the heap is an untyped byte array. See?

> Both [Base functions] and [Composed functions] entries are carefully
> worded to talk about marshalling /functions/. Not about marshalling
> /subexpressions/.

Syntactically, functions are usually a subclass of expressions in functional
languages. That's essentially what makes them 1st class.

> Other languages sharply distinguish between functions and expressions.

Can you give an example of a functional language that does? (And no, in
Erlang functions can be expressions. Erlang merely makes top-level
functions members of the respective module and names them that way.)

>>> If I have compose = \f.\g.\x.f (g x) and the (admittedly silly)
>>> expression compose sin (compose ln tan) then the result sin ln tan
>>> most definitely is /not/ part of the original source!
>>
>> No, but the closure representing it contains a pointer to the code of
>> \x.f (g x).
>
> So what? When a data structure containing sin ln tan must be
> transmitted, it's a composed function without a direct representation in
> source code.

As I said, it has as direct or indirect a source representation as any
function with free variables has. This includes practically all non-trivial
functions. They all use the same representation: closures. If my examples
didn't make that clear by now then I don't know what more to say. :(

>> As I said, everything is a subexpression.
>
> I'm slowly getting that /this/ is the exact point where you and many
> (well, some) language designers disagree.

This might well be true. Two points though: 1st, a language that does not
have functions 1st-class as expression forms with lexical closure hardly
qualifies as functional (after all, *this* is the essence of FP); and 2nd,
all of this would be rather irrelevant to marshalling: when a language does
not support full closure, then of course marshalling won't either.

The example you always seem to be alluding to is Erlang. But Erlang supports
closures. OTOH, Erlang restricts marshalling to something like the
semi-autonomous form of functions I talked about in my last posting.
Namely, they must be defined at the module top-level. It does not support
marshalling of closures.

While this is a pity, IMO it certainly is no reason for making an extra
entry in the survey. Just have an appropriate attribute for Functions.

But anyway, even if you want it, my main criticism was wrt your wording
about "direct source representation" and functions that are "constructed at
runtime". It sounds - and your arguments confirmed that - as if you wanted
to make a distinction between closures constructed by different expression
forms, which is simply non-sensical for any semantics and any
implementation strategy I can think of (including Erlang's, which cannot
marshall non-"composed" function closures either). By all reasonable means,
having a "direct source representation" is thus synonymous to
(semi-)autonomous. So it would be more correct and much clearer if you
simply talked about closures, instead of nebulous "composed functions".

Regards,

Joachim Durchholz

da leggere,

1 dic 2003, 13:33:4801/12/03

Andreas Rossberg wrote:
> Joachim Durchholz wrote:

>>> This is pretty much what you describe above. But unlike you
>>> assume, the underlying use of pickling is not transparent because
>>> it changes the semantics: all the restrictions and semantic
>>> twists of pickling apply, e.g. resources are rejected, user
>>> extensions are invoked. Typing-wise, nothing is happening,
>>> though.
>>
>> And how is the data transferred? It becomes a byte stream, I'd
>> think. Which is untyped.
>
> Only at a lower level of abstraction. And that would be precisely the
> same level of abstraction at which the heap is an untyped byte array.
> See?

Yes, but the language looks can inspect the pickled data for a useful
purpose, which it cannot for the heap data.

>> Both [Base functions] and [Composed functions] entries are
>> carefully worded to talk about marshalling /functions/. Not about
>> marshalling /subexpressions/.
>
> Syntactically, functions are usually a subclass of expressions in
> functional languages. That's essentially what makes them 1st class.

*sigh* this may be in the languages that are part of your background.
I definitely know languages that define things otherwise.

>> Other languages sharply distinguish between functions and
>> expressions.
>
> Can you give an example of a functional language that does? (And no,
> in Erlang functions can be expressions. Erlang merely makes top-level
> functions members of the respective module and names them that way.)

There /is/ a difference, and a marked one. Things that accept a function
don't accept subexpressions (actually, I don't remember anything in
Erlang that will accept a subexpression in the first place).
Earlier versions of Erlang didn't even have closures, so the common
abstraction on which both functions and subexpressions are based on
didn't even exist.

>>>> If I have compose = \f.\g.\x.f (g x) and the (admittedly silly)
>>>> expression compose sin (compose ln tan) then the result sin ln
>>>> tan most definitely is /not/ part of the original source!
>>>
>>> No, but the closure representing it contains a pointer to the
>>> code of \x.f (g x).
>>
>> So what? When a data structure containing sin ln tan must be
>> transmitted, it's a composed function without a direct
>> representation in source code.
>
> As I said, it has as direct or indirect a source representation as
> any function with free variables has. This includes practically all
> non-trivial functions. They all use the same representation:
> closures. If my examples didn't make that clear by now then I don't
> know what more to say. :(

Yes, but the closure representing sin ln tan is constructed at runtime.

>>> As I said, everything is a subexpression.
>>
>> I'm slowly getting that /this/ is the exact point where you and
>> many (well, some) language designers disagree.
>
> This might well be true. Two points though: 1st, a language that does
> not have functions 1st-class as expression forms with lexical closure
> hardly qualifies as functional (after all, *this* is the essence of
> FP);

I'm not pretending that I can give an unequivocal definition of "FP".
I'm pretty amazed that you think you can.
And, no, Erlang was widely considered a functional language even before
it even had closures. I'd term that "weakly functional", in the sense of
"has most important properties of functional languages but not all".

I agree that any language that's designed for functional idioms should
have closures. Or 1st-class functions if you will.
However, I disagree that languages that don't match my personal
definition of "designed for functionalness" should be termed
"non-functional" - that would be quite silly. And, by the way, quite
arrogant - as if only the Pure and Chaste were allowed into the Heaven
of Functionalness. Things are far more pragmatic.
(Heck, I was just told that SML isn't a functional language because it
has side effects. That was the worst kind of language bigotry that I
ever saw, though I don't think that the author is a language bigot in
general. It's just a way of excluding concepts deemed "unworthy" (in
whatever sense) by definition - but exclusion by definition just helps
define camps, which is exactly the opposite of what I'm trying to do.)

> and 2nd, all of this would be rather irrelevant to marshalling: when
> a language does not support full closure, then of course marshalling
> won't either.

Agreed.
This was indeed the case for Erlang (until it acquired "funs", i.e.
lambda terms).

> The example you always seem to be alluding to is Erlang. But Erlang
> supports closures.

It didn't for a long time.

> OTOH, Erlang restricts marshalling to something like the
> semi-autonomous form of functions I talked about in my last posting.
> Namely, they must be defined at the module top-level. It does not
> support marshalling of closures.

I'm not sure about that right now. It's one of the open questions (yet).
Technically, I see no reasons why it shouldn't transfer closures, but
maybe they haven't gotten around to actually doing it.

> While this is a pity, IMO it certainly is no reason for making an
> extra entry in the survey. Just have an appropriate attribute for
> Functions.

Let's agree to disagree on this one.
The issue will be decided if the Survey shows a 100% correlation for the
answers. No reason to argue here, Reality has the last word on this one.

> But anyway, even if you want it, my main criticism was wrt your
> wording about "direct source representation" and functions that are
> "constructed at runtime". It sounds - and your arguments confirmed
> that - as if you wanted to make a distinction between closures
> constructed by different expression forms, which is simply
> non-sensical for any semantics and any implementation strategy I can
> think of (including Erlang's, which cannot marshall non-"composed"
> function closures either).

I wanted to differentiate between functions and things "that aren't
present in C". It's not easy to find a good definition of such terms;
literature either uses handwaving ("closures and similar things") or
over-precise, article-specific terminology that nobody outside the
theoretical camp would understand.
Finding a precise, generally understood terminology isn't easy.

> By all reasonable means, having a "direct source representation" is
> thus synonymous to (semi-)autonomous. So it would be more correct and
> much clearer if you simply talked about closures, instead of nebulous
> "composed functions".

I think that definition of "semi-autonomous" is part of that too-precise
terminology.
Actually, you lost me with the definition of an "autonomous function" -
I didn't see how autonomy relates to the distinction I'm trying to make
in the first place, much less semi-autonomy!

In general, I'm under the impression that you try to apply a designer's
approach to a classification problem.
If you say things like "1st, a language that does not have functions

1st-class as expression forms with lexical closure hardly qualifies as

functional (after all, *this* is the essence of FP)", this is a language
designer's point of view (and one that I happen to share). Yet this
opinion is irrelevant for a Survey: I'm classifying observed phenomena,
not trying to design or improve anything.
(The results of the Survey may help with improvements. I'd very much
like to hear so. However, if observations are influenced by
recommendations, the observations tend to become useless. Science
history has told us so much, and I want to and will stick with that lesson.)

Regards,
Jo

Daniel C. Wang

da leggere,

1 dic 2003, 18:01:2801/12/03

Joachim Durchholz <joachim....@web.de> writes:

> Andreas Rossberg wrote:
> > Joachim Durchholz wrote:

{stuff deleted}

> >>> As I said, everything is a subexpression.
> >> I'm slowly getting that /this/ is the exact point where you and
> >> many (well, some) language designers disagree.
> > This might well be true. Two points though: 1st, a language that does
> > not have functions 1st-class as expression forms with lexical closure
> > hardly qualifies as functional (after all, *this* is the essence of
> > FP);
>
> I'm not pretending that I can give an unequivocal definition of "FP".
> I'm pretty amazed that you think you can.
> And, no, Erlang was widely considered a functional language even before
> it even had closures. I'd term that "weakly functional", in the sense of
> "has most important properties of functional languages but not all".

I think, it's fair to categorize languages, based on the underlying
computational model of execution. If you wrote a high-level formal semantics
for a given language what is the most convenient model of computation that
you would use.

Imperative languages/C/Fortran/Algol
A von Neumann model of computation (i.e. memory with a stack and a stored program)

OO languages Smalltalk/Python/Java
A object calculus with either messages sending or method dispatch

Functional programming languages SML/Scheme/Haskell
A lambda calculus based model

Declarative/Logic Prolog/Erlang/Mercury
A computational interpretation of logical clauses to model execution.

Erlang's roots are from the declarative/logical view of computation. Given
the strong connection between logic and computation there are strong
connections between the declarative and functional view of things.

However, I hope the Erlang's out there don't get upset if I categorize
Erlang as a declarative concurrent language, and not call it a functional
language per-se.

Erlang reminds me of Prolog on steroids more than a FPL.

Joachim Durchholz

da leggere,

2 dic 2003, 09:03:5602/12/03

Daniel C. Wang wrote:
>
> I think, it's fair to categorize languages, based on the underlying
> computational model of execution.

I agree with that.
It's just that everybody seems to end up with different sets of
languages. As I wrote, there are people who debate that even SML is
functional...
Well, I'd actually agree with your classification, but I'd also agree if
somebody classified Erlang with functional languages - it did acquire
closures after all.
You cheated a bit by not categorizing Lisp, BTW ;-)

For a survey, I think including "fringe languages" is better than trying
to focus on "purely functional" languages. Given the many wars on what
is and isn't a functional languages, I doubt there is a set of
non-arbitrary criteria that will be useful for deciding which language
should be included and which shouldn't.

My personal definition of a "functional language", for purposes of this
Survey, is "programming in a functional style in this language is
practical". The term "practical" is my vague loophole here :-)
In practice, I simply look whether the language community itself
considers the language functional. Since it's these communities that
will read the Survey results with interest, this is exactly the
criterion to go by anyway :-)

> OO languages Smalltalk/Python/Java A object calculus with either
> messages sending or method dispatch

Just a quick aside note: message send and method dispatch are one and
the same.
("Message send" in Java is always synchronous: the sender will not
continue until the receiver has sent the answer. This is equivalent to
the usual stack-based discipline of dynamically-dispatched subroutine
calls.)
(Many Smalltalkers will violently disagree. There's a lot of ideology
involved here; there are important differences between Smalltalk and
other languages, but they have to do with Smalltalk's dynamic typing,
not with message sends. "Sending a message" is a different, and
sometimes useful, way to look at dynamic dispatch.)

> Declarative/Logic Prolog/Erlang/Mercury A computational
> interpretation of logical clauses to model execution.
>
> Erlang's roots are from the declarative/logical view of computation.
> Given the strong connection between logic and computation there are
> strong connections between the declarative and functional view of
> things.

Erlangers typically would say that it started out as a logic language,
but the logic roots don't show except for some remnants of syntactic
sugar. (From what I know about the language, I agree: Erlang software
revolves around evaluating functions, not about finding a solution to a
set of constraints.)

> However, I hope the Erlang's out there don't get upset if I
> categorize Erlang as a declarative concurrent language, and not call
> it a functional language per-se.
>
> Erlang reminds me of Prolog on steroids more than a FPL.

In what respects?
Actually I didn't find anything in it that reminded me of Prolog, but
I'm always willing to look at things from a new perspective.

Regards,
Jo

Andreas Rossberg

da leggere,

2 dic 2003, 09:27:4602/12/03

Joachim Durchholz wrote:
>>>
>>> And how is the data transferred? It becomes a byte stream, I'd
>>> think. Which is untyped.
>>
>> Only at a lower level of abstraction. And that would be precisely the
>> same level of abstraction at which the heap is an untyped byte array.
>> See?
>
> Yes, but the language looks can inspect the pickled data for a useful
> purpose, which it cannot for the heap data.

There is no technical reason why a language couldn't do the same for the
heap, so this is not a technical argument. And I can see actual
applications, for example a heap inspector, or a garbage collector written
in the host language (which isn't any stranger than implementing pickle
compression in the host language, when you think about it).

>> Syntactically, functions are usually a subclass of expressions in
>> functional languages. That's essentially what makes them 1st class.
>
> *sigh* this may be in the languages that are part of your background.
> I definitely know languages that define things otherwise.

Sure, me too, but do you consider them functional? ;-)

>>> Other languages sharply distinguish between functions and
>>> expressions.
>>
>> Can you give an example of a functional language that does? (And no,
>> in Erlang functions can be expressions. Erlang merely makes top-level
>> functions members of the respective module and names them that way.)
>
> There /is/ a difference, and a marked one. Things that accept a function
> don't accept subexpressions (actually, I don't remember anything in
> Erlang that will accept a subexpression in the first place).

I don't get this. I said functions are subexpressions, not the other way
round. So how does the above matter?

> Earlier versions of Erlang didn't even have closures, so the common
> abstraction on which both functions and subexpressions are based on
> didn't even exist.

Yes, but things changed. And as Daniel pointed out, Erlang's origins don't
lie in the "functional world", so it is not surprising that it was
different once.

>> As I said, it has as direct or indirect a source representation as
>> any function with free variables has. This includes practically all
>> non-trivial functions. They all use the same representation:
>> closures. If my examples didn't make that clear by now then I don't
>> know what more to say. :(
>
> Yes, but the closure representing sin ln tan is constructed at runtime.

Again, like all closures. That does not make that example any different.

> I'm not pretending that I can give an unequivocal definition of "FP".
> I'm pretty amazed that you think you can.

I didn't, I merely said that 1st-class functions are at the core of it. I
would really be surprised if anybody disagreed with that today.

> And, no, Erlang was widely considered a functional language even before
> it even had closures.

Maybe, maybe not, I'm not sure. But surely I don't want to start a fight
over it.

>> OTOH, Erlang restricts marshalling to something like the
>> semi-autonomous form of functions I talked about in my last posting.
>> Namely, they must be defined at the module top-level. It does not
>> support marshalling of closures.
>
> I'm not sure about that right now. It's one of the open questions (yet).
> Technically, I see no reasons why it shouldn't transfer closures, but
> maybe they haven't gotten around to actually doing it.

Technically, there are no strong reasons, I suppose. And if they did it,
they could immediately transfer everything you call "composed functions" as
well, because it is *exactly* the same mechanism. You see, that's my whole
point: there is no such difference like you are trying to make (between
closures and closures).

>> While this is a pity, IMO it certainly is no reason for making an
>> extra entry in the survey. Just have an appropriate attribute for
>> Functions.
>
> Let's agree to disagree on this one.
> The issue will be decided if the Survey shows a 100% correlation for the
> answers. No reason to argue here, Reality has the last word on this one.

OK with me.

> I wanted to differentiate between functions and things "that aren't
> present in C". It's not easy to find a good definition of such terms;
> literature either uses handwaving ("closures and similar things") or
> over-precise, article-specific terminology that nobody outside the
> theoretical camp would understand.
> Finding a precise, generally understood terminology isn't easy.

I have one: closures. ;-) No "similar things" exist, everything is just
closures. But I'm really repeating myself.

> Actually, you lost me with the definition of an "autonomous function" -
> I didn't see how autonomy relates to the distinction I'm trying to make
> in the first place, much less semi-autonomy!

Well, I made that term up on the fly to describe those functions that can be
represented *without* closures. I'm not aware of any established term, and
it sounded intuitive. Just call them "basic functions" if you prefer.

> In general, I'm under the impression that you try to apply a designer's
> approach to a classification problem.

I don't know. If you think so, I have no problem with that. Because:

> I'm classifying observed phenomena,
> not trying to design or improve anything.

Yes, but *how* do you classify them, if not according to the (relatively
well-understood) design space? Since your survey describes the features of
marshalling mechanisms it essentially describes their design choices.

ke...@ii.uib.no

da leggere,

2 dic 2003, 11:07:1602/12/03

Joachim Durchholz <joachim....@web.de> writes:

> Daniel C. Wang wrote:

>> Erlang reminds me of Prolog on steroids more than a FPL.

> In what respects?
> Actually I didn't find anything in it that reminded me of Prolog, but

Surely the syntax? :-)

-kzm
--
If I haven't seen further, it is by standing in the footprints of giants

Daniel C. Wang

da leggere,

2 dic 2003, 11:32:0302/12/03

Joachim Durchholz <joachim....@web.de> writes:

> Daniel C. Wang wrote:
> >
> > I think, it's fair to categorize languages, based on the underlying
> > computational model of execution.
>
> I agree with that.
> It's just that everybody seems to end up with different sets of
> languages. As I wrote, there are people who debate that even SML is
> functional...
> Well, I'd actually agree with your classification, but I'd also agree if
> somebody classified Erlang with functional languages - it did acquire
> closures after all.
> You cheated a bit by not categorizing Lisp, BTW ;-)

Lambda-Prolog and Mercury have closures, yet I'd still consider them more
logical than functional.

{stuff deleted}

> Erlangers typically would say that it started out as a logic language,
> but the logic roots don't show except for some remnants of syntactic
> sugar. (From what I know about the language, I agree: Erlang software
> revolves around evaluating functions, not about finding a solution to a
> set of constraints.)

For me what makes a logical language "logical" is that you can read the
program as a set of constraints, and you can interpret the programs
execution as an attempt to find a solution to those constraints.

> > Erlang reminds me of Prolog on steroids more than a FPL.
>
> In what respects?
> Actually I didn't find anything in it that reminded me of Prolog, but
> I'm always willing to look at things from a new perspective.

Mainly the syntax.

Joachim Durchholz

da leggere,

2 dic 2003, 16:46:4802/12/03

Andreas Rossberg wrote:

> Joachim Durchholz wrote:
>
>>>>And how is the data transferred? It becomes a byte stream, I'd
>>>>think. Which is untyped.
>>>
>>>Only at a lower level of abstraction. And that would be precisely the
>>>same level of abstraction at which the heap is an untyped byte array.
>>>See?
>>
>>Yes, but the language looks can inspect the pickled data for a useful
>>purpose, which it cannot for the heap data.
>
> There is no technical reason why a language couldn't do the same for the
> heap, so this is not a technical argument.

I don't see in what respect is matters whether an argument is technical
or not, unless there's a definition of "technical".

> And I can see actual
> applications, for example a heap inspector, or a garbage collector written
> in the host language (which isn't any stranger than implementing pickle
> compression in the host language, when you think about it).

Heap inspectors are debugging tools, which I consider irrelevant for
classification purposes: debugging is one of the occasions where
abstraction barriers are routinely broken.

A garbage collector also works below the level of abstraction that
applications operate at; actually I'm at a loss imagining GC
interoperating with the application that it's supposed to work with! (If
a VM is implemented in the language it's supposed to run, this doesn't
mean that the abstraction barriers become irrelevant.)

>>>Syntactically, functions are usually a subclass of expressions in
>>>functional languages. That's essentially what makes them 1st class.
>>
>>*sigh* this may be in the languages that are part of your background.
>>I definitely know languages that define things otherwise.
>
> Sure, me too, but do you consider them functional? ;-)

Again:

A survey should be a passive observer if it is to be useful reading.

A survey should not set standards. Nor should it recommend anything.

>>>>Other languages sharply distinguish between functions and
>>>>expressions.
>>>
>>>Can you give an example of a functional language that does? (And no,
>>>in Erlang functions can be expressions. Erlang merely makes top-level
>>>functions members of the respective module and names them that way.)
>>
>>There /is/ a difference, and a marked one. Things that accept a function
>>don't accept subexpressions (actually, I don't remember anything in
>>Erlang that will accept a subexpression in the first place).
>
> I don't get this. I said functions are subexpressions, not the other way
> round. So how does the above matter?

Functions are not expressions, nor subexpressions.
At least not by the definition that most languages go by.
You can map subexpressions to functions, and you can extract a
subexpression from a function. This still doesn't make the concepts
identical.

And I've been wondering how this all relates to the Survey. This
discussion is turning in circles; I think there are some deep
misunderstandings at work, on both sides, but I flatly don't know where
exactly they lie.

My position is actually simple:
1) Functional languages tend to have two kinds of functional objects:
those that are defined as functions (using an equation or a "function
declaration"), and those that aren't (constructed at run-time).
2) There were languages that had different marshalling capabilities for
these two types of functional objects. Which makes this differentiation
relevant for the Survey.

As far as I understand your observations, you say it's nonsensical to
differentiate as described in (1); while I agree that this may be the
case, some languages have done so, other languages may do so in the
future, and if I don't want to rewrite the categories just because
another language with that distinction comes along I still have to carry
those categories. The bottom line of this being: your observation is
correct but entirely irrelevant.
In general, surveying a field is an observation, it's about recording
the useful along with the silly. Surveying is entirely unrelated to
giving recommendations.

>>Earlier versions of Erlang didn't even have closures, so the common
>>abstraction on which both functions and subexpressions are based on
>>didn't even exist.
>
> Yes, but things changed.

Alread addressed.

> And as Daniel pointed out, Erlang's origins don't
> lie in the "functional world", so it is not surprising that it was
> different once.

Other languages will acquire functional attributes, and will have to be
considered for the Survey even if they are in a transitional state.

Daniel's observation seems to be mainly with syntactic sugar, which I
don't consider relevant enough to classify a language. YMMV.

>>>As I said, it has as direct or indirect a source representation as
>>>any function with free variables has. This includes practically all
>>>non-trivial functions. They all use the same representation:
>>>closures. If my examples didn't make that clear by now then I don't
>>>know what more to say. :(
>>
>>Yes, but the closure representing sin ln tan is constructed at runtime.
>
> Again, like all closures. That does not make that example any different.

The compose sin (compose ln tan) closure is specified directly in source
code.
The reduced version, sin ln tan, is the result of a computation on the
original closure. It's /not/ directly present in the source code, it's a
value that's (at least conceptually) computed during evaluation of the
program.
The distinction of the Survey is between closures that are constructed
from the source vs. closures constructed by evaluation. (Which,
essentially, means that you say "closure" where I say "function", with
no essential technical difference - which leaves me wondering if there's
any real difference at all.)

>>I'm not pretending that I can give an unequivocal definition of "FP".
>>I'm pretty amazed that you think you can.
>
> I didn't, I merely said that 1st-class functions are at the core of it. I
> would really be surprised if anybody disagreed with that today.

*shrug* since I was told that SML is not a functional language, I'm
prepared to hear /anything/ about such issues.

>>And, no, Erlang was widely considered a functional language even before
>>it even had closures.
>
> Maybe, maybe not, I'm not sure. But surely I don't want to start a fight
> over it.

You're already doing that.

>>>OTOH, Erlang restricts marshalling to something like the
>>>semi-autonomous form of functions I talked about in my last posting.
>>> Namely, they must be defined at the module top-level. It does not
>>>support marshalling of closures.
>>
>>I'm not sure about that right now. It's one of the open questions (yet).
>>Technically, I see no reasons why it shouldn't transfer closures, but
>>maybe they haven't gotten around to actually doing it.
>
> Technically, there are no strong reasons, I suppose.

Let me just quote your exchange with Marcin:

>> I disagree. It's easy to imagine the case when toplevel named
>> functions are marshalled by name, and anonymous (or local)
>> functions can't be marshalled at all.
>
> It may be easy to imagine, but I'd consider it a rather poor and

> unnatural strategy for a functional language, so I doubt you will

> find something like it in practice (note that Joachim explicitly
> restricted the scope of his survey to functional languages).

Both of your assumptions (that something like that cannot be found, and
that this is not what I had in mind) don't hold.

> And if they did it,
> they could immediately transfer everything you call "composed functions" as
> well, because it is *exactly* the same mechanism. You see, that's my whole
> point: there is no such difference like you are trying to make (between
> closures and closures).

Again, let me quote one of your exchanges, this time with Fergus:

>>>> the only sane distinction you can make is between support for
>>>> "top-level" functions only (those that don't contain references
>>>> to their environment), and support for full closures. But I
>>>> really don't see why this is worth an extra category. After all,
>>>> a functional language incapable of dealing with closures is not
>>>> worth the name. And with respect to marshalling, the closure is
>>>> the easy part.
>>>>

>>> Not necessarily. Closures are a form of existential typing: knowing

>>> what that the type of a closure is does not tell you what the
>>> types of the components of that closure are.
>>>

>>> For example, suppose you have a closure of type Int->Int whose
>>> value is represented as the pair {f,x}, meaning (\y -> f x y). This
>>> closure has been formed by partially applying a function f of type
>>> foo->Int->Int, for some foo, and its representation consists of a
>>> pair: a reference to the function f, and a value x of type foo.
>>>
>>> It may be easy to marshall and unmarshall f, but if you don't have
>>> the appropriate RTTI information around at run-time, it may not be
>>> possible to determine the type of the partially applied argument x.
>>> This implies that it may not be possible to marshal or unmarshal
>>> x, because it may not be possible to marshal or unmarshal values
>>> without knowing their types.
>>>
>>> Some language implementations may prefer not to keep such RTTI
>>> information around at run-time, since there is a (small) cost to
>>> doing so, and the cost might be incurred even in cases where the
>>> information isn't actually used.

>>
>> I know, but wouldn't you agree that coping with that problem is

>> easier than the code externalisation issue? In particular, since
>> full-fledged marshalling mechanisms have to deal with it in other
>> places already.

>
> Easier than a _good_ solution to the code externalisation issue, yes,
> but not easier than just writing out the code address, for example.

Which pretty much sums it up: given the trade-offs that you personally
have, a full solution isn't "more difficult" than a partial one, but
other people have other trade-offs. (Just transmitting code pointers is
just one example of how a different trade-off might look like; there are
others.)

>>I wanted to differentiate between functions and things "that aren't
>>present in C". It's not easy to find a good definition of such terms;
>>literature either uses handwaving ("closures and similar things") or
>>over-precise, article-specific terminology that nobody outside the
>>theoretical camp would understand.
>>Finding a precise, generally understood terminology isn't easy.
>
> I have one: closures. ;-) No "similar things" exist, everything is just
> closures. But I'm really repeating myself.

Well, OK, it's essentially all closures, of course. It's just that
different people know them by different names (mostly because there are
different standard ways of constructing them).

>>I'm classifying observed phenomena,
>>not trying to design or improve anything.
>
> Yes, but *how* do you classify them, if not according to the (relatively
> well-understood) design space?

Of course they should be classified according to design space.
It's just that people are introducing more differences than you're
willing to accept. That's OK, it's just that I have to stick with the
general public, not with the ideas of a single person, be it you, me, or
Simon Peyton-Jones.

Regards,
Jo

Fergus Henderson

da leggere,

3 dic 2003, 01:43:4903/12/03

"Daniel C. Wang" <danwan...@cs.princeton.edu> writes:

>For me what makes a logical language "logical" is that you can read the
>program as a set of constraints, and you can interpret the programs
>execution as an attempt to find a solution to those constraints.

But that definition applies just as well to functional languages,
doesn't it? Function definitions are equations, and program execution
is finding a solution to those equations.

Ulf Wiger

da leggere,

3 dic 2003, 07:20:2203/12/03

"Daniel C. Wang" <danwan...@cs.princeton.edu> writes:

> Joachim Durchholz <joachim....@web.de> writes:
>
> > Erlangers typically would say that it started out as a logic language,
> > but the logic roots don't show except for some remnants of syntactic
> > sugar. (From what I know about the language, I agree: Erlang software
> > revolves around evaluating functions, not about finding a solution to a
> > set of constraints.)
>
> For me what makes a logical language "logical" is that you can read the
> program as a set of constraints, and you can interpret the programs
> execution as an attempt to find a solution to those constraints.

To quote from Bjarne Däcker's excellent thesis,
"Concurrent Functional Programming for Telecommunications:
A Case Study of Technology Introduction"
(Sorry, I can't seem to find a link to the document.)

"Joe Armstrong led the next round of experiments and started
with Prolog, because of its terse and clear style, and added
concurrency. However, the language began to change in the
direction of a functional style [1]. For example, one of
the key characteristics of Prolog is backtracking and this
could not be used because it is not possible to backtrack
over hardware (a tone signal once sent cannot be taken back.
[2]).

[1] Joe Armstrong, Robert Virding, Mike Williams,
"Use of Prolog for Developing a New Programming
Language", _The Practical Application of Prolog_.
London, April 1-3, 1992.

[2] Joe Armstrong, Nabiel Elshiewy and Robert Virding.
"The Phoning Philosophers' Problem of Logic
Programming for Telecommunications Applications".
_Third IEEE Symposium on Logic Programming_.
Salt Lake City, Sept 23-26, 1986.

From some early document describing the POTS programming
experiments, I found the following passage in the
conclusions.

"Functional programming is the purest form of programming.
It is easy to build and combine functions thus preserving
correctness. It is also easy to reason about the correctness
of such functions by using simple algebraic laws. However,
the question remains if it is sufficient to deal with all
types of applications. Moreover, the parallel communicating
functions which were used still behave and look very much
like imperative systems."

"Logic programming and the rule based system gave the most
unusual new approach to the problem with elegant solutions
to some aspects of the application. However, we know very
little of how to use these techniques for really large
systems. Systems of thousands of logic clauses might be
as unmanageable as huge systems of 'spaghetti' programming."

I think the paper is from 1986, but I don't have a reference.
The functional programming language used was an experimental
one, FPL, from the Chalmers Group. The logic programming
language was LPL from the Swedish Royal Institute of
Technology (Seif Haridi), and the rule based system was
OPS4 from Carnegie Mellon University.

/Uffe
--
Ulf Wiger, Senior Specialist,
/ / / Architecture & Design of Carrier-Class Software
/ / / Strategic Product & System Management
/ / / Ericsson AB, Connectivity and Control Nodes

Andreas Rossberg

da leggere,

3 dic 2003, 08:01:2203/12/03

Joachim,

I don't want to sound harsh or arrogant - although that already has happened
long ago, I suppose :(. But IMNSHO you really have a fundamental
misconception of how the closure mechanism is utilized in implementations
of higher-order languages. And you keep on drawing wrong conclusions from
that.

Trying to explain that stuff has already taken up far too much of my time,
and we have been going in circles for some time now, so this will probably
be my last attempt to clarify things.

Joachim Durchholz wrote:
>
> >>Yes, but the language looks can inspect the pickled data for a useful
> >>purpose, which it cannot for the heap data.
> >
> > There is no technical reason why a language couldn't do the same for the
> > heap, so this is not a technical argument.
>
> I don't see in what respect is matters whether an argument is technical
> or not, unless there's a definition of "technical".

If you want, replace "technical argument" with "argument relevant in a
technical context".

> > Sure, me too, but do you consider them functional? ;-)
>
> Again:
>
> A survey should be a passive observer if it is to be useful reading.

You explicitly excluded non-functional languages from your survey, so you
obviously make a distinction yourself.

> My position is actually simple:
> 1) Functional languages tend to have two kinds of functional objects:
> those that are defined as functions (using an equation or a "function
> declaration"), and those that aren't (constructed at run-time).
> 2) There were languages that had different marshalling capabilities for
> these two types of functional objects. Which makes this differentiation
> relevant for the Survey.
>
> As far as I understand your observations, you say it's nonsensical to
> differentiate as described in (1);

Right.

> while I agree that this may be the
> case, some languages have done so,

I say it's nonsensical because no language (with local functions) I have
ever encountered or I can imagine has done so.

You are simply differentiating the wrong thing. I agree that there is a
difference - and have said so. But in short, it is not declaration vs.
expression as you propose, it is basically global (top-level, basic,
autonomous) vs. local (nested, with free variables, requiring closures).

Local functions may be declared and named, just like global functions, but
still require closures - which are constructed at runtime, in exactly the
same basic way the closure for a lambda is constructed.

> The compose sin (compose ln tan) closure is specified directly in source
> code.
> The reduced version, sin ln tan, is the result of a computation on the
> original closure.

A source expression is not a closure. A closure is the runtime value
representing the result of evaluating a (functional) expression. The only
way to represent the resulting sin ln tan is by such a closure (in fact, 2
nested closures). A closure is always a value, it cannot be "reduced" any
further.

> It's /not/ directly present in the source code, it's a
> value that's (at least conceptually) computed during evaluation of the
> program.
> The distinction of the Survey is between closures that are constructed
> from the source vs. closures constructed by evaluation.

And that's what I make out as the fundamental misconception. All closures
are constructed at runtime. And all are constructed "from the source"! It
does not matter how the respective function is written down in source code.
I never have seen a functional language that makes a distinction between
closures constructed by different syntactic phrases, nor can I imagine any
implementation strategy that would profit from that.

> (Which,
> essentially, means that you say "closure" where I say "function", with
> no essential technical difference - which leaves me wondering if there's
> any real difference at all.)

A closure is the *representation* of a functional value (or of a thunk in
lazy languages). "Function" is only a semantic concept, functions don't
exist as such in implementations.

At syntax level, "function" is basically a synonym for lambda expression, as
far as functional languages are concerned.

> Let me just quote your exchange with Marcin:
>>> I disagree. It's easy to imagine the case when toplevel named
>>> functions are marshalled by name, and anonymous (or local)
>>> functions can't be marshalled at all.
>>
>> It may be easy to imagine, but I'd consider it a rather poor and
>> unnatural strategy for a functional language, so I doubt you will
>> find something like it in practice (note that Joachim explicitly
>> restricted the scope of his survey to functional languages).
>
> Both of your assumptions (that something like that cannot be found, and
> that this is not what I had in mind) don't hold.

OK, agreed with the 1st part: Erlang arguably is a language that makes a
distinction between closures and global functions wrt marshalling. But I
maintain my primary statement, namely that this is *not* the distinction
you describe. And I don't think that Marcin disagreed with that part.

> Again, let me quote one of your exchanges, this time with Fergus:
>>>>> the only sane distinction you can make is between support for
>>>>> "top-level" functions only (those that don't contain references
>>>>> to their environment), and support for full closures. But I
>>>>> really don't see why this is worth an extra category. After all,
>>>>> a functional language incapable of dealing with closures is not
>>>>> worth the name. And with respect to marshalling, the closure is
>>>>> the easy part.
>>>>>
>>>> Not necessarily. Closures are a form of existential typing: knowing
>>>> what that the type of a closure is does not tell you what the
>>>> types of the components of that closure are.

Again, this is *not* about the distinction you describe, but about the
global vs. local distinction. And again, Fergus didn't disagree with my
central point.

So from what I can see, both of these quotes do not make your point.

> Well, OK, it's essentially all closures, of course. It's just that
> different people know them by different names (mostly because there are
> different standard ways of constructing them).

There aren't.

>>>I'm classifying observed phenomena,
>>>not trying to design or improve anything.
>>
>> Yes, but *how* do you classify them, if not according to the (relatively
>> well-understood) design space?
>
> Of course they should be classified according to design space.

AFAICS, this contradicts your previous critique of me trying exactly that.

Joachim Durchholz

da leggere,

3 dic 2003, 15:56:1103/12/03

Andreas Rossberg wrote:
> Joachim,
>
> I don't want to sound harsh or arrogant - although that already has happened
> long ago, I suppose :(.

Likewise. It's not always easy to keep a level mood.

> But IMNSHO you really have a fundamental
> misconception of how the closure mechanism is utilized in implementations
> of higher-order languages. And you keep on drawing wrong conclusions from
> that.

I disagree with both the observation and your conclusion.

My impression was that we were looking at the same issues with
fundamentally different perspectives, and never got the terminology and
the base assumptions spelled out and cleaned up.

However, I'm happy to agree to disagree.

> Trying to explain that stuff has already taken up far too much of my time,
> and we have been going in circles for some time now, so this will probably
> be my last attempt to clarify things.

Likewise.

Regards,
Jo

0 nuovi messaggi