I have been shown how useful it can be to store functions inside data
structures, and while looking at data serialization for the purpose of
persistence I wondered "since functions are just values in Haskell, why
can't we persist them, too?".
To me the idea has interesting implications. For example, an arbitrary
program could simply be represented by a serialization of `IO ()`. In fact,
you could load any program into memory from a file and use
Control.Concurrent.forkIO to run it, and later kill it, giving you the
beginnings of an operating environment. If such a serialization
is architecture independent then to my understanding you have the beginnings
of a virtual machine. You could break your program into pieces and store
them in a database and load them when needed, or even pull updates to each
piece individually from down the web etc, enabling interesting methods of
software distribution. It would make very cool stuff possible.
I have had a look at hs-plugins, but it is unclear how to derive a simple
pair of functions `(a -> b) -> ByteString` and `ByteString -> Either
ParseError (a -> b)`, for example, from the functionality it provides, if it
is possible at all. I guess such a thing requires thorough digging into the
depths of GHC, (or maybe even LLVM if
an architecture independent representation is sought, but I don't know
enough to say.). Perhaps this is more a question for those interested and
knowledgable in Haskell compilation (and, to some extent, decompilation).
If not Haskell, are there any languages which provide a simple serialization
and deserialization of functions?
As far as I know, GHC has no support for this. There are issues with
the idea that will come out pretty fast, such as:
(1) Those cannot be pure functions, because it differentiate
denotationally equal functions. So it would have to be at least (a ->
b) -> IO ByteString.
(2) What if you tried to serialize a filehandle or an FFI Ptr?
But my answers are "Ok" and "Then you get a runtime error",
respectively. It is by no means impossible, IIRC Clean does it. I
think it's pretty dumb that we don't have support for this yet.
Purely functional languages have a unique disposition to be very good
at this. But oh well, there aren't enough tuits for everything.
A more elaborate answer to (2) is "you get a runtime error when you
try to *use* the thing that was impossible to serialize". This makes
sure that you don't get an error if you are trying to serialize \x ->
const x a_filehandle_or_something, which is just the identity
function.
Luke
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
This is more or less the idea in pseudocode:
type FuncExpr= String
data F a = F FuncExpr a
apply (F _ f) x= f x
instance Show (F a) where show (F str _)= str
instance Read (F a) where read (F str f)= eval f >>= F str
2010/11/11 Jesse Schalken <jesses...@gmail.com>
sorry
2010/11/11 Alberto G. Corona <agoc...@gmail.com>
Napier88 was a persistent language that also had higher-order
functions. I've no experience other than reading about it but as its
persistence was "orthogonal persistence" I'd expect HOFs to be
persistent. The implementation of Napier88 produced a substantial
runtime / persistent store that was used for other languages - I think
one was Persistent Haskell, certainly one was Staple which was a
higher order language.
Tycoon2 was a similar persistent language - it was heavily influenced
by ML so potentially it had HOFs.
PolyML has a persistent store, though this may have been just for the
top-level to freeze bindings I've no idea whether it supported
serializing HOFs.
Clean supports serialized HOFs as does Oz, see the paper below. Kali
Scheme supported migration of running code between networked computers
- as it was a Scheme I'd expect it to be higher order (the migration
would mandate serialization).
http://www-systems.cs.st-andrews.ac.uk/wiki/Napier88
http://www.polyml.org/FAQ.html
http://www.st.cs.ru.nl/papers/2003/verm2003-LazyDynamicIO.pdf
> On Thu, Nov 11, 2010 at 12:53 AM, Jesse Schalken
> <jesses...@gmail.com> wrote:
>> I have had a look at hs-plugins, but it is unclear how to derive a simple
>> pair of functions `(a -> b) -> ByteString` and `ByteString -> Either
>> ParseError (a -> b)`, for example, from the functionality it provides, if it
>> is possible at all. I guess such a thing requires thorough digging into the
>> depths of GHC, (or maybe even LLVM if
>> an architecture independent representation is sought, but I don't know
>> enough to say.). Perhaps this is more a question for those interested and
>> knowledgable in Haskell compilation (and, to some extent, decompilation).
>> If not Haskell, are there any languages which provide a simple serialization
>> and deserialization of functions?
>
> As far as I know, GHC has no support for this. There are issues with
> the idea that will come out pretty fast, such as:
>
> (1) Those cannot be pure functions, because it differentiate
> denotationally equal functions. So it would have to be at least (a ->
> b) -> IO ByteString.
I don't think I agree, I didn't see a rule f == g => serialise f == serialise g anywhere.
Bob_______________________________________________
The rule a == b => f a == f b is called referential transparency (for
denotational equality, not Eq) and is (perhaps the most important)
part of what is meant by "purely functional".
--Max
That equal things can be substituted for one another is one of the fundamental
ideas of equality (the other is that anything is equal to itself). In fact, in
second order logic, equality can be *defined* as (roughly):
(x = y) means (forall P. P x -> P y)
That is, x is equal to y if all predicates satisfied by x are also satisfied
by y. Using this, one can derive other typical laws for equality. Transitivity
is pretty easy, but surprisingly, even symmetry can be gotten:
If P z is z = x, using substitution we get x = x -> y = x,
and x = x is trivially true.
And of course, we also get congruence:
Given a function f, let P z be f x = f z,
substitution yields f x = f x -> f x = f y,
where f x = f x is again trivial.
The equality that people typically expect to hold for Haskell expressions is
that two such expressions are equal if they denote the same thing, as Max
said. Expressions with function type denote mathematical functions, and so if
we have something like:
serialize :: (Integer -> Integer) -> String
it must be a mathematical function. Further, its arguments will denote
functions, to, and equality on mathematical functions can be given point-wise:
f = g iff forall x. f x = g x
Now, here are two expressions with type (Integer -> Integer) that denote equal
functions:
\x -> x + x
\x -> 2 * x
So, for all this to work out, serialize must produce the same String for both
of those. But in general, it isn't possible to decide if two functions are
point-wise equal, let alone select a canonical representative for each
equivalence class of expressions that denote a particular function. So there's
no feasible way to implement serialize without breaking Haskell's semantics.
How making serialize :: (Integer -> Integer) -> IO String solves this? Well,
that's another story.
-- Dan
What I'm wondering is if it would actually break things if serialize would not produce the same String for these functions. The reasoning above is used regularly to shoot down some really useful functionality. So what would go wrong if we chose to take the practical path, and leave aside the theoretical issues?
Also, the above two functions might not be exactly denotationally equal if the type is (Float -> Float), or the speed or memory use might be different. Could it not be that requiring them to be equal could just as well break things?
greetings,
Sjoerd Visscher
Yeah, my sense -- but correct my if I'm reading the original post
incorrectly -- is that the whole thing with function equality is a
distraction and not really relevant here.
It is true that (a) per Luke Palmer, if we could serialize equal
functions to equal representations then we could we could decide
whether two pure functions were equal, which (if not done in the IO
monad) would(?) break purity; and (b) per Dan Doel, if we wanted to
implement our serialization in a way so that equal functions get equal
representations, we couldn't do it, because it's an impossible
problem.
But these sort of cancel each other out, because (a) it's an
impossible problem, and (b) we don't want to do it.
A function which does "x+x" would simply be serialized as doing x+x,
and a function which does "x*2" would be serialized as doing x*2, and
when deserialized the resulting functions would continue to do those
things, and it would be completely agnostic and indifferent as to
whether or not they are in fact equal.
Obviously there are questions here with regards to the functions which
the to-be-serialized function makes use of -- should they be
serialized along with it? Required to be present when it is
deserialized? Is it OK for the function to do something different when
it is loaded compared to when it was stored if its environment is
different, or not OK?
>
> Also, the above two functions might not be exactly denotationally equal if the type is (Float -> Float), or the speed or memory use might be different. Could it not be that requiring them to be equal could just as well break things?
>
> greetings,
> Sjoerd Visscher
>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskel...@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
--
Work is punishment for failing to procrastinate effectively.
> Obviously there are questions here with regards to the functions which
> the to-be-serialized function makes use of -- should they be
> serialized along with it? Required to be present when it is
> deserialized? Is it OK for the function to do something different when
> it is loaded compared to when it was stored if its environment is
> different, or not OK?
I would have say Yes, No, No. At the moment, when you serialise data
structure A which references data structure B which references data
structure C, using Data.Binary for example, the whole lot (A, B, and C) gets
serialised, so that the resulting deserialization of A is
denotationally equivalent to the original, regardless of the environment. I
don't see why this shouldn't be the case for functions also.
So a serialized function should include all its direct and indirect callees.
This might result in potentially simple functions ending up enormous when
serialized, simply because the call graph, including all it's libraries and
their libraries etc, is that size, but such would be pure function
serialization.
This raises the question of what is left. The assembled machine code? For
the architecture of the serializer or of the deserializer? Or LLVM IR
for architecture independence? C--? Core? I don't know, but it would be
awesome for the serialized representation to be both low-level and
architecture independent, then having it JIT compiled when it is
deserialized. To me, this means a virtual machine, which I guess is what you
need when you want fast mobile code, but I'm just musing here as I know
little about programming language implementation.
I'm not an expert here either, I'll just note that LLVM is only
platform independent to a degree. Or rather, I believe the situation
is that it *is* architecture independent, but it doesn't abstract
anything else besides the architecture -- so if you have any other
differences, it's not going to work. LLVM IR doesn't do #ifdefs.
..and FFI imports are something you definitely can't serialize, so
that's one case where you either have it say "call the linked-in
function with the name 'whatever'" and hope it's the same one, or just
disallow it entirely.
>
> Regards,
> Malcolm
Yes. It would break the (usual) mathematical underpinnings of Haskell.
> The reasoning above is used regularly to shoot down some really useful
> functionality. So what would go wrong if we chose to take the practical
> path, and leave aside the theoretical issues?
You would lose many uses of equational reasoning in your programs. Have you
every substituted 'x * 2' for the expression 'x + x' in one of your programs,
or vice versa? You can no longer do that, because someone may be serializing
the function you're writing, checking how it's implemented, and relying it.
You'd lose the whole notion of 'the category of haskell types and functions'
goodbye, too. Does f . id = f? Not if the former serializes as "f . id". We
already have a weak case of this, since (\x -> undefined x) can be
distinguished from undefined using seq, but that can be hand-waved away by not
worrying about bottoms so much. That isn't going to work for serialize. There
does exist practical Haskell stuff from category theory inspired people.
As a digression... When you get into things like dependent type theory,
equality is actually incorporated into the language in a much more direct way.
And there are various sorts of equality you can add to your type theory. Two
common choices are:
intensional equality: two values are provably equal if they evaluate to the
same normal form
extensional equality: this incorporates non-computational rules, like the
point-wise equality of functions.
Now, in a type theory where equality is intensional, I cannot prove:
(forall x. f x = g x) -> f = g
However, both these equalities (and others in between and on either side) are
*compatible*, in that I cannot disprove the above theorem in an intensional
theory.
What seq and serialize do is break from extensional equality, and allow us to
disprove the above (perhaps not for seq within a hypothetical theory, since
the invalidating case involves non-termination, but certainly for serialize).
And that's a big deal, because extensional equality is handy for the above
reasoning about programs.
> Also, the above two functions might not be exactly denotationally equal if
> the type is (Float -> Float), or the speed or memory use might be
> different. Could it not be that requiring them to be equal could just as
> well break things?
I would not be at all surprised if they are unequal for Float -> Float,
considering how ugly floating point types are. That doesn't justify breaking
equality for every other type.
Speed and memory use are typically disregarded for equality of (expressions
denoting) functions. Merge sort and bubble sort are two algorithms for
computing the same function, even though they vary wildly in these two
characteristics. If you want a calculus for precisely reasoning about resource
usage, then you would not necessarily be able to consider these equal.
But, Haskell is simply not such a calculus, and I don't think, "what if it
were," is a good argument for actively breaking extensional equality in such a
dramatic way (since the breakage would have nothing to do with precise
reasoning about resource usage).
G�bor Lehel wrote:
> It is true that (a) per Luke Palmer, if we could serialize equal
> functions to equal representations then we could we could decide
> whether two pure functions were equal, which (if not done in the IO
> monad) would(?) break purity; and (b) per Dan Doel, if we wanted to
> implement our serialization in a way so that equal functions get equal
> representations, we couldn't do it, because it's an impossible
> problem.
>
> But these sort of cancel each other out, because (a) it's an
> impossible problem, and (b) we don't want to do it.
They do not cancel each other out. Rather, it goes like this:
a) Serializing functions gives you a decision procedure for function equality.
b) It is impossible to decide extensional equality of functions.
Together, these mean that any serialization procedure you define is *wrong*,
in that it violates the semantics of pure Haskell expressions (the resulting
decision procedure will answer 'False' for some pair of extensionally equal
expressions f and g).
The way IO can get around this is that when you have:
serialize (\x -> x + x :: Integer) :: IO String
serialize (\x -> 2 * x :: Integer) :: IO String
you can say that the two IO actions are equal in the following sense:
1) There is an equivalence class of expressions denoting the function in
question. This equivalence class contains (\x -> x + x) and
(\x -> 2 * x)
2) The IO action produced by serialize selects a string representation
for one of these expressions at random*.
*) You probably won't ever observe "\x -> x + x" being the output of
(serialize (\x -> 2 * x)), but that's just a coincidence.
This explanation is not available if serialize has type
(Integer -> Integer) -> String, however. Those have to be mathematical
functions (of a sort) that select a unique representation for each equivalence
class, or else we have to get a completely different (and much less nice)
denotational semantics for Haskell.
----
Perhaps I missed it, but: has anyone explained why:
(a -> b) -> IO String
is an unacceptable type? Why does it need to be just String? Folks often ask,
"what benefit is purity?" Well, what is the benefit of making this function
impure? GHC has a lot of normally impure functionality incorporated in a way
that is mathematically sound (though not ideal). What are the things that have
been shot down? Or does IO, ST, etc. qualify as getting shot down?
-- Dan
This was my first thought as well! However, reading to/from a file
would of course be in IO, at which point you'd be free to read the
file back in through normal means to get at the representation. So in
that respect, this is equivalent to (a -> b) -> IO String.
Outside of IO, it would pretty much have to be limited to serializing
and deserializing. You'd be able to create opaque tokens representing
functions, pass them around, and/or extract the function in order to
apply it. Conveniently, it turns out that Haskell already has support
for this, you can implement it as follows:
> module Serialize.Pure (OpaqueFunction, serialize, deserialize) where
>
> newtype OpaqueFunction a b = Opaque { deserialize :: a -> b }
>
> serialize = Opaque
Toss in some existential types as desired, if you want to hide the
function's actual type.
I suppose one could object that this isn't actually serializing
anything at all; to which I would respond that, in pure code, how do
you expect to tell the difference?
- C.
> On Thursday 11 November 2010 6:22:06 am Sjoerd Visscher wrote:
>
>> The reasoning above is used regularly to shoot down some really useful
>> functionality. So what would go wrong if we chose to take the practical
>> path, and leave aside the theoretical issues?
>
> You would lose many uses of equational reasoning in your programs. Have you
> every substituted 'x * 2' for the expression 'x + x' in one of your programs,
> or vice versa? You can no longer do that, because someone may be serializing
> the function you're writing, checking how it's implemented, and relying it.
Yes, but it would not break any existing code. It would only break code that knowingly did the wrong thing.
> We already have a weak case of this, since (\x -> undefined x) can be
> distinguished from undefined using seq, but that can be hand-waved away by not
> worrying about bottoms so much. That isn't going to work for serialize.
Why not?
--
Sjoerd Visscher
http://w3future.com
I don't know to what extent it would apply in this hypothetical situation,
but ghc (and probably other compilers) rely upon Haskell's semantics in
performing various code transformations. If you break the semantics some
transformations become invalid, resulting in incorrect code.
I've experienced this with code that violated ref. transparency. The
program behavior changed depending on the compiler's optimization settings.
I'm not keen to go back to that.
I think the only way this would work would be if you consider functions to
be equal only to themselves, i.e. "x+x" and "2*x" are not equal. That's not
a trade-off I would be willing to make.
John
> I don't know to what extent it would apply in this hypothetical situation, but ghc (and probably other compilers) rely upon Haskell's semantics in performing various code transformations. If you break the semantics some transformations become invalid, resulting in incorrect code.
>
> I've experienced this with code that violated ref. transparency. The program behavior changed depending on the compiler's optimization settings. I'm not keen to go back to that.
Then don't do that. Being able to serialize functions is just as dangerous as having unsafePerformIO. If you don't use it, you don't have problems.
On Thu, Nov 11, 2010 at 11:54 AM, Sjoerd Visscher <sjo...@w3future.com> wrote:
>> You would lose many uses of equational reasoning in your programs. Have you
>> every substituted 'x * 2' for the expression 'x + x' in one of your programs,
>> or vice versa? You can no longer do that, because someone may be serializing
>> the function you're writing, checking how it's implemented, and relying it.
>
> Yes, but it would not break any existing code. It would only break code that knowingly did the wrong thing.
Or code that unknowingly depends transitively on code that does the
wrong thing. In that regard it would be much like unsafePerformIO, and
about as trustworthy. Better off just having any such "serialize" be
safely in IO, and let people who want to live dangerously just use
unsafePerformIO to get around it.
>> We already have a weak case of this, since (\x -> undefined x) can be
>> distinguished from undefined using seq, but that can be hand-waved away by not
>> worrying about bottoms so much. That isn't going to work for serialize.
>
> Why not?
I'd venture that perhaps because seq only behaves differently when one
possible outcome is _|_. An unsafe serialize could distinguish between
two non-bottom values, which means the sketchy behavior could be free
to wreak havoc in code that's not crashing.
For instance, assuming serialize can be applied to functions of any
type, it would probably be trivial to write a function (isExpr :: a ->
Bool) that reports whether an arbitrary term is a primitive value or
the result of some expression, which then lets you write a function
with type (forall a. a -> a) that is NOT equivalent to id, which could
then be passed freely into any other piece of code you like. That
sounds like buckets of fun, doesn't it?
- C.
But it's not the type of the serialized value that's at issue, it's
the type of the serializable values. Anything that lets you convert an
arbitrary closure into something with internals open to inspection
will likely have dire consequences for parametricity and referential
transparency. Remember, the problem isn't what you do with the
serialized form itself, it's what you can learn via it about the
original value it was serialized from. To retain sanity, either "types
that can be serialized" must be marked explicitly (perhaps in the
context, similar to having a Data.Typeable constraint) to indicate
potential non-parametric shenanigans, or the result of serializing and
inspecting a value must be quarantined off, such as with IO. Or some
other mechanism, but those seem like the obvious choices.
Having a full serialization function without some restriction along
those lines would be like renaming unsafePerformIO to runIO, moving it
to Control.Monad.IO, and telling people "hey, just don't misuse this
and everything will be okay".
You mean, like Data.Binary?
Cheers,
Greg
> But I don't see that you need introspection at user level for
> persistence, a dynamic type will do, thus the internals aren't open to
> inspection. Whatever introspection is necessary can be handled by the
> runtime system as in Clean and Persistent Haskell. You could look at
> the internals of a pickle with a binary editor but that's perhaps
> cheating.
>From a quick glance at the paper, the Haskell version is referentially
transparent in the standard, trivial sense: the persistence operations
all return IO actions. This is of course perfectly fine. What started
this thread, however, was the idea of a serialization function
producing something like a pure ByteString, and why that, as opposed
to (IO ByteString), would be extremely problematic.
What it boils down to is just that any pure "serialization" function
would necessarily do nothing useful. Serializing closures from IO
actions, on the other hand, I think is a great idea, though probably
difficult to implement!
In general, it doesn't even have to be based on a mathematical identity. As
has been stated, this would in general simply break referential transparency.
Are these two functions equal:
f x = k (h x) (h x)
g x = let y = h x in k y y
Presumably, no, if serialize exists (and they may have different performance
characteristics).
You cannot factor out or inline subexpressions or without the difference being
observable (presumably) by serialize.
-- Dan
Well, if we added a function that randomly scrambled GHC's heap memory, it
wouldn't break any existing code, because none would use it. :)
> > We already have a weak case of this, since (\x -> undefined x) can be
> > distinguished from undefined using seq, but that can be hand-waved away
> > by not worrying about bottoms so much. That isn't going to work for
> > serialize.
>
> Why not?
Because, there is an argument to be made in the seq case that no one really
cares about the differences it introduces. I don't usually care how my code
works on bottoms, except inasmuch as it determines various performance
characteristics of the code*, or whether it works on infinite values. I try,
generally speaking, to write total functions, and run them on well-defined
inputs. So when I reason about the programs, it is this aspect that I care
most about, not about what happens when I feed in undefineds.
There are even folks that have worked on making this perspective rigorous.
See, Fast and Loose Reasoning is Morally Correct.
serialize is not at all the same in this regard. There is no class of
functions that is immune to its inspection powers, presumably, because that's
its whole point. But that also means that there is no class of functions for
which we are justified in reasoning equationally using the standard
extensional equality. The only way that would be justified is saying,
"serialize doesn't exist."
> Then don't do that. Being able to serialize functions is just as dangerous
> as having unsafePerformIO. If you don't use it, you don't have problems.
And unsafePerformIO's very name suggests that you're breaking things when you
use it. It comes with lots of caveats, and the Haskell community will
generally heap scorn on you if you use it for trivialities (or even non-
trivialities). I don't understand why it would be desirable for a serialize
function to be branded, "don't ever use this unless you're an expert who knows
what he's doing."
unsafePerformIO is, for many uses, a concession to low-level compiler
extensibility. You can implement a fair amount of stuff in a library using
unsafePerformIO that would otherwise require some kind of compiler support.
What is the analogous motivation for functions to be turned into pure strings
containing their code?
Of course, if you want, "everything is unsafe, be careful," there are many,
many languages out there that already do that. For instance, ML**, Lisp, C,
Java, Python, .... But this is Haskell, and we try to do a little better than
that.
-- Dan
[*] Which is irrelevant to the reasoning in question. In fact, if you follow
Bird's methodology, you first write a naive, obviously correct program, and
then you transform it into a more efficient version via transformations you've
shown to preserve semantics. This approach doesn't work if there are no
correctness-preserving transformations, though.
[**] In fact, Alice ML (I think that's the right one) already has very fancy
serialization stuff. You can store and load and send entire modules over an
internet connection, I believe.
Admittedly, the class of reasoning I usually use in my Haskell
programs, and the one that you talked about using earlier this
message, is essentially "seq doesn't exist". However, I prefer to use
this class of reasoning because I would prefer if seq actually didn't
exist (er, I think the implication goes the other way). Not so for
serialize: I would like a serialize function, but I don't want the
semantic burden it brings. If only there were a way to...
oh yeah.
serialize :: (a -> b) -> IO String
I still don't really get what we're arguing about.
Luke
seq can still exist, I think. And I still want it (well, I could leave it for
functions, really, I think). What doesn't exist, loosely speaking, is bottom,
which means:
forall x y. x `seq` y = y
And so seq = flip const. That makes things like:
foo ... = ... (x `seq` y) ...
appear useless, unless we remember that denotational semantics aren't the end-
all and be-all, in which case we can recognize that seq is used as an
operational hint to the compiler, same as par and pseq. It just happens to be
the case that in Haskell's ordinary semantics, merely giving the denotational
semantics of seq is sufficient to induce the right operational behavior,
provided the compiler isn't bone headed (and further, is lenient enough to
allow sufficiently smart compilers to disregard our naive 'evaluate x before
y' reading of seq if it's more efficient to do so).
> Not so for
> serialize: I would like a serialize function, but I don't want the
> semantic burden it brings. If only there were a way to...
>
> oh yeah.
>
> serialize :: (a -> b) -> IO String
>
> I still don't really get what we're arguing about.
I don't know.
-- Dan
> This was my first thought as well! However, reading to/from a file
> would of course be in IO, at which point you'd be free to read the
> file back in through normal means to get at the representation. So in
> that respect, this is equivalent to (a -> b) -> IO String.
IMO, it's morally different, you're now operating on a file, and you
shouldn't rely on the contents being predictable. You can make the
sin-bin argument that IO can do anything, but I think there's a moral
distinction between
serialize :: a -> IO ByteString
x <- serialize f
and
serialize :: a -> Opaque
store :: Opaque -> FilePath -> IO ()
do x <- serialize f
store x n
B.readFile n
You could probably already snarf chunks of the heap and dump them to file.
> I suppose one could object that this isn't actually serializing
> anything at all; to which I would respond that, in pure code, how do
> you expect to tell the difference?
Nice one :-)
I guess the real question is what are the useful, pure operations on an
opaque type that can contain arbitrary functions.
-k
--
If I haven't seen further, it is by standing in the footprints of giants
> On Thursday 11 November 2010 12:34:21 pm John Lato wrote:
> > I think the only way this would work would be if you consider functions
> to
> > be equal only to themselves, i.e. "x+x" and "2*x" are not equal. That's
> > not a trade-off I would be willing to make.
>
> In general, it doesn't even have to be based on a mathematical identity. As
> has been stated, this would in general simply break referential
> transparency.
> Are these two functions equal:
>
> f x = k (h x) (h x)
> g x = let y = h x in k y y
>
> Presumably, no, if serialize exists (and they may have different
> performance
> characteristics).
>
> You cannot factor out or inline subexpressions or without the difference
> being
> observable (presumably) by serialize.
Yes, exactly. Thanks for this example, because it illustrates better how
far-reaching this would be. And it's true not just when these
transformations are manually performed, but also when they're performed by
the compiler. Haskell without referential transparency simply wouldn't be
Haskell any more. And any code that used a pure serialize may or may not
work, depending on compiler magic.
Of course, this is presuming that serialize is pure. I suppose it might be
possible for a serialize with type "a -> IO ByteString" to just dump
stack+heap+whatever. You could use TH+Hint, LLVM, or likewise to get a
similar effect now.
John
Any distinction here is mostly at the level of API design and informal
semantics; I'm inclined to agree with your preference, but as far as
impacts on the formal semantics of pure code go, these are essentially
equivalent.
> You could probably already snarf chunks of the heap and dump them to file.
Yep, and this is pretty much the reason, taken to its logical
conclusion, why almost all bets are off about what IO computations can
potentially do.
>> I suppose one could object that this isn't actually serializing
>> anything at all; to which I would respond that, in pure code, how do
>> you expect to tell the difference?
>
> Nice one :-)
>
> I guess the real question is what are the useful, pure operations on an
> opaque type that can contain arbitrary functions.
I would be very surprised if there were any that couldn't just as well
be done on the function directly. Even extracting type information
could be problematic if done incorrectly! Consider a function
(inspectType :: Opaque -> Serialization.Type), where the Show instance
for Type produces an approximation of the type signature the function
was declared with. Reasonable? Nope, we just broke everything. Imagine
a function taking an argument of type ((a, a) -> a). If by serializing
the argument it can recover the declared type signature it could
distinguish between fst and ((\(x, _) -> x) :: forall a. (a, a) -> a),
which again opens the door to non-parametric behavior. On the other
hand, (inspectType :: Opaque -> Data.Typeable.TypeRep) would probably
be safe, because it supports only monomorphic types.
- C.
> intensional equality: two values are provably equal if they evaluate to the
> same normal form
>
> extensional equality: this incorporates non-computational rules, like the
> point-wise equality of functions.
>
> Now, in a type theory where equality is intensional, I cannot prove:
>
> (forall x. f x = g x) -> f = g
>
> However, both these equalities (and others in between and on either side) are
> *compatible*, in that I cannot disprove the above theorem in an intensional
> theory.
>
> What seq and serialize do is break from extensional equality, and allow us to
> disprove the above (perhaps not for seq within a hypothetical theory, since
> the invalidating case involves non-termination, but certainly for serialize).
> And that's a big deal, because extensional equality is handy for the above
> reasoning about programs.
As you are well aware in Coq, and in Agda we don't have an extensionality
axiom; however this is not a problem because we simply use setoid equality
to capture extenional reasoning and prove that in every specific case
where we want to use extensioanl reasoning it is sound to do so.
Now suppose I add the following consitent axiom to Coq:
Axiom Church-Turing :
forall f:Nat -> Nat, exists e:Nat, forall n:Nat, {e}(n) = f(n)
This well-studied axiom is effectively what serialize is realizing[1].
Now, have I broken my old Coq proofs by adding this axiom? No, of course
not, because it is a consistent axioms and my proofs didn't use it. My
proofs were alreay explicity proving that extentional substitution was
sound in those cases I was using it.
The same will be true for reasoning in Haskell. Before serialization we
knew that extensional substitution was sound, but after adding
serialization we are now obligated to prove in the individual cases that
extensional subsitution is sound and/or add extentionally preconditions to
our proofs. So no big deal; people have already been doing this in Coq
and Agda for years.
[1]Actaully the realizer for serialize is *weaker* that this axioms. The
realizer for serialize would be (Nat -> Nat) -> IO Nat instead of (Nat ->
Nat) -> Nat, so should have less impact that the Church-Turing axiom.
--
Russell O'Connor <http://r6.ca/>
``All talk about `theft,''' the general counsel of the American Graphophone
Company wrote, ``is the merest claptrap, for there exists no property in
ideas musical, literary or artistic, except as defined by statute.''
> [1]Actaully the realizer for serialize is *weaker* that this
> axioms. The realizer for serialize would be (Nat -> Nat) -> IO Nat
> instead of (Nat -> Nat) -> Nat, so should have less impact that the
> Church-Turing axiom.
I don't see where IO comes in if you're dealing with pure functions.
Serializing pure structures is really really easy and can be done
entirely purely. The tricky part is finding a good encoding. As I
said in my original message on this topic, you are embedding a
compiler and interpreter into your runtime. That can be as easy as
what is below, or as tricky as embedding GHC in your runtime, or
anywhere in between.
But you simply cannot serialize IO actions without the GHC runtime.
The GHC runtime sequences IO actions in ways that depend on the
unserializable environment. (Imposed) Sequencing + Concrete
representation = Serialization.
class Serialize a where
serialize :: a -> ByteString
unSerialize :: ByteString -> Maybe a -- Parsers can fail
instance (Serialize a) => Serialize [a] where ...
instance (Serialize a, Serialize b) => Serialize (a, b) where ...
-- We can conclude that a and b must be enumerable from the
requirement that
-- f is recursively enumerable:
instance (Serialize a, Enum a, Serialize b, Enum b) => Serialize (a ->
b) where
serialize f = serialize $ ( zip [minBound..maxBound]
(fmap f [minBound..maxBound]) )
-- A map instance might be better: we trade some serialization time
for more
-- deserialization time.
instance (Serialize a, Serialize b) => Serialize (Map a b) where ...
instance (Enum a, Enum b, Serialize a, Serialize b) => Serialize (a ->
b) where
serialize f = serialize . fromList $ ( zip [minBound..maxBound]
(fmap f [minBound..maxBound]) )
deserialize map = \x -> lookup x (bytestring_decode_map map)
where bytestring_decode_map = ...
There's been a lot of talk about "if serialisation existed, you could do
X, which is bad". Well you know what? unsafePerformIO exists.
unsafeCoerce exists. And using FFI, you can do utterly evil things. And...?
Just today I was thinking about how useful it would be if you could send
a block of code from one PC to another to execute it remotely. The fact
that you can't do this is basically why there's no distributed Haskell
yet, despite what an obviously good idea that would be. It would be
really cool if we could do this.
And yes, it should of course be an IO operation. Because, let's face it,
any result it produces is inherantly going to be pretty random. (Much
like my favourite GHC function name, reallyUnsafePtrEquity#...)
> Just today I was thinking about how useful it would be if you could
> send a block of code from one PC to another to execute it remotely.
> The fact that you can't do this is basically why there's no
> distributed Haskell yet, despite what an obviously good idea that
> would be. It would be really cool if we could do this.
>
What kind of code are you thinking about? If you have a daemon
waiting on a foreign machine, you can send it bytecode to
deserialize. The issue is one of /encodings/. My "naive" approach
(enumerating the functions domain, applying the function, and
serializing a list of pairs of results) doesn't work so well because
it is slow and big. But it has all the properties we want, without
having to go into the compiler and make enormous changes. We need to
find an efficient representation of a function in that can be
serialized. Haskell code is one option, but the compiler would need
to pass it through to the run-time, so that the deserializer could run
a compiler on it. Passing some kind of intermediate level code
through would work as well. But we still face the same problem. We
need to compile the intermediate code into executable Haskell.
There is also the architecture independent approach I have described
previously (with a "Serialize" type class, enumerating the functions
domain and applying the function to each element, etc). This is
architecture independent in virtue of being implemented as plain old
run-time Haskell. It turns Haskell expressions into ByteStrings and
back. In short, the deserializer is an embedded compiler for "Haskell
bytecode". It interprets bytecode in terms architecture specific
code. To be honest, I'm not even sure how architecture independent
Data.Binary and the like really are. I'm guessing some care in the
serialization library could fix any issue, though.
> And yes, it should of course be an IO operation. Because, let's face
> it, any result it produces is inherantly going to be pretty random.
> (Much like my favourite GHC function name, reallyUnsafePtrEquity#...)
Serialization and transport are orthogonal. Serialization is pure.
Transport is not.
On Friday 12 November 2010 1:40:10 pm roco...@theorem.ca wrote:
> As you are well aware in Coq, and in Agda we don't have an extensionality
> axiom; however this is not a problem because we simply use setoid equality
> to capture extenional reasoning and prove that in every specific case
> where we want to use extensional reasoning it is sound to do so.
If we are talking about writing programs in a closed world, and proving things
about said world, this is fine. But this is not always what we are doing in
Haskell. For instance, if I am writing a library, am I justified in changing:
f (g x) (g x)
to:
let y = g x in f y y
without bumping the "semantics breaking change" version number? In a closed
world, where I have proved that all the relevant code I have written admits
extensionality, the answer is, "yes." But in an open world, where people are
free to use serialize to inspect my implementations in pure code, the answer
is, "no."
> Now suppose I add the following consistent axiom to Coq:
>
> Axiom Church-Turing :
> forall f:Nat -> Nat, exists e:Nat, forall n:Nat, {e}(n) = f(n)
So, your argument is, if I'm not mistaken, that intensional type theory can be
consistently extended with this axiom. I believe this. In fact, I'd even be
willing to accept that purely as an axiom, it might be consistent to add it to
extensional type theory. However, this is not the whole story.
One question I have to ask is: how does this compute? In Agda, and I suspect
Coq, axioms simply do not compute (and this is the reason I'd be willing to
believe Church-Turing is consistent with an extensional theory). However,
serialize *will* compute, and I expect it to compute like this:
forall e:Nat. serialize {e} => e
And I can believe that even this is consistent with intensional type theory.
In an intensional theory, I can imagine (Nat -> Nat) being interpreted not as
a type of functions, but as a type of algorithms. There may be many algorithms
that compute the same function, and Nat -> Nat contains them all.
serialize/Church-Turing returns the source code of each one.
When we assert extensionality, though, Nat -> Nat can no longer be inhabited
by mere algorithms, though. In an extensional theory, Nat -> Nat is a quotient
of the set of algorithms. And then we have only a few options:
1) serialize/Church-Turing violates the quotient
2) serialize/Church-Turing computes extensional equality of algorithms, and
chooses a single 'source code' representation for each equivalence class
3) It is impossible for two different pieces of source code to compute the
same function (so serialize {e} => e is valid because there is no e' /= e
such that forall x. {e} x = {e'} x)
*) ...
2 is, I think, impossible, and 3 is simply false, so the choice is 1, meaning
that ITT + a Church-Turing that actually computes cannot be consistently
extended with extensionality. And it is this that I care about, and what I was
referring to in the mail you replied to:
- Intensional type theory can be consistently extended to extensional type
theory.
- serialize is anti-extensional (similar to the way impredicative Set
and injective type constructors are anti-classical).
And it is this consistent extension that I care about. And, going back to my
library example, the reason to care about this is abstraction. I want clients
of my library to program to the abstraction of my code as functions, not as
algorithms/source code, because that allows me leeway to tweak the algorithms
as I see fit, so long as they compute the same function.
And, for that matter, the compiler can do this kind of mucking around with
algorithms. I think it's a big deal if, to enable optimization of a function,
we have to prove that all code in our program treats it extensionally (I don't
believe the compiler can do it, currently, barring serialize simply not being
used; and say goodbye to separate compilation).
> [1]Actaully the realizer for serialize is *weaker* that this axioms. The
> realizer for serialize would be (Nat -> Nat) -> IO Nat instead of (Nat ->
> Nat) -> Nat, so should have less impact that the Church-Turing axiom.
I have no problem with serialize :: (Nat -> Nat) -> IO Nat. The problem is
with (Nat -> Nat) -> Nat. The former can respect the quotient in question via
IO hand waving. Perhaps this distinction is frivolous, but it seems wrong to
make the language in general anti-extensional when it could instead be put
inside the sin bin. My quarrel is more with, "let's dump out the sin bin and
just be careful."
-- Dan
[*] I have seen a paper about a type theory with quotients that has a
function:
choose : T / R -> T
such that:
choose (squish x) => x
but it was meticulously designed to allow that in ways that I don't really
remember, so I'm going to pretend it doesn't exist at the moment.