Making the REPL less hopeless

759 views
Skip to first unread message

Alexis King

unread,
Jun 6, 2018, 4:50:11 PM6/6/18
to Racket Users
It is well-known that the top level is hopeless.[1] At this point, it
seems unsalvageably ad-hoc. I’ve made peace with this, since I can
largely avoid touching the top level — Racket provides a nice,
predictable, decidedly non-hopeless module system. Unfortunately, I am
sometimes forced to face the top level’s existence, and therefore its
unruliness, every time I must modify my code to support the REPL.

Making Hackett support the Racket REPL has grown increasingly difficult.
The expansion model differs in subtle ways, but they are significant
enough to regularly cause problems, especially as Hackett grows ever
more complex. Generally, the contortions required have been localized
and tractable, but a recent change I’ve attempted — which works fine in
a module — falls flat on its face in the top level in a way I have not
managed to work around.

I started wondering: why does the REPL need to be synonymous with the
top level? Sure, there are things about the REPL that benefit from
weakener guarantees than those of a module, but as far as I can tell,
these are restricted to redefinition/shadowing and delayed binding of
variables to permit mutual recursion. Seeing as this delayed binding
doesn’t work in Hackett, anyway, since Hackett is typed, the lawlessness
of the top level seems like an unnecessarily expensive price to pay for
the little I get in return.

I had a conversation with Spencer Florence earlier today on this topic,
and he described a hypothetical alternative semantics for the REPL that
I find compelling. Given a sequence of REPL interactions:

e_0 e_1 e_2 ... e_n

These could be interpreted the same way as a series of lexically nested
submodules:

(module repl_0 racket
e_0
(module* repl_1 #f
e_1
(module* repl_2 #f
e_2
...
(module* repl_n #f
e_n) ... )))

This has some pleasant properties. Here are the ones that immediately
come to mind:

1. Expansion is completely predictable and consistent with expansion
in a module (assuming #%module-begin doesn’t do anything strange
with module* submodules), and it provides a meaning for REPL
interactions in terms of the much more well-defined semantics of
Racket’s module system.

2. Multiple definitions of the same identifier within the same REPL
interaction (that is, combined with a begin form) are an error,
just as they would be in a module. For multiple definitions of
the same identifier in separate interactions, the later
definition shadows the earlier one.

3. Free variables in an expression are a syntax error, the same way
they are within a module.

4. Mutually recursive definitions can be defined in the same REPL
interaction by defining both at once using begin. This is
potentially less convenient than the behavior of the top level,
but I think it is a small price to pay for predictable expansion
and error reporting.

There are a few downsides. The obvious one is the change in mutually
recursive definition behavior, and another is that set! on
previously-declared identifiers no longer works. However, this could be
solved by implicitly rewriting definitions to set!-transformers, so it’s
possible to restore this behavior without losing a semantics defined
exclusively in terms of modules.

Unfortunately, as far as I can tell, it isn’t possible to implement this
behavior in terms of the concepts Racket already provides, since one
cannot add a submodule to a module once it has already been declared.
However, since module* submodules are morally just modules declared at
the end of the current module, Spencer also pointed out that it’s
possible to perform a mostly-equivalent translation using ordinary
modules instead of nested submodules:

(module repl_0 racket
(provide (all-defined-out))
e0)
(module repl_1 racket
(require 'repl_0)
(provide (all-defined-out))
e1)
(module repl_2 racket
(require 'repl_0 'repl_1)
(provide (all-defined-out))
e2)
...
(module repl_2 racket
(require 'repl_0 'repl_1 'repl_2 ... 'repl_n-1)
(provide (all-defined-out))
en)

However, this translation is imperfect: it does not preserve bindings
imported using require. Still, at least it is possible to implement.
Here is a short program that implements a naïve version of the above
translation:

#lang racket

(require syntax/datum
syntax/stx
syntax/strip-context)

(define modules-so-far '())

(let ([old-eval (current-eval)])
(current-eval
(lambda (stx)
(with-datum ([new-mod (gensym)]
[[mod-path ...] modules-so-far])
(old-eval
(quasidatum
(module new-mod racket
(require 'mod-path ...)
(provide (all-defined-out))
(undatum (strip-context (stx-cdr stx))))))
(dynamic-require (datum 'new-mod) #f)
(set! modules-so-far
(cons (datum new-mod) modules-so-far))))))

I imagine such a naïve approach has additional downsides in speed and
memory overhead, but the core technique does seem sound.

To be clear, I do NOT propose replacing the Racket REPL with one that
uses this new model, but I would like to have the ability to define a
REPL for my own language that takes this approach (or a similar one).
So I ask a few questions:

1. Would people find such an approach to making the REPL less
hopeless valuable?

2. Are there any glaring flaws in the above description that would
make it either difficult or undesirable to implement?

3. If not, what is the right way to implement this? Could it be done
in a way that is efficient, but shares machinery with existing
module expansion?

In any case, I may try a fancier version of the naïve approach for
Hackett, to experiment with the idea. I would be happy to be able to
have a REPL for my language without having to be so careful to please
the top level.

Alexis

[1]: https://gist.github.com/samth/3083053

Matthias Felleisen

unread,
Jun 6, 2018, 5:04:22 PM6/6/18
to Alexis King, Racket Users

[[ only scanned the message far enough to see whether you discovered something new ]]

History knows two kinds of REPLs, often confusingly called “interpreters” by lay people:

— Lisp’s
— ML’s

My impression is that you are re-discovering a flavor of ML’s REPL,
which reduces expressiveness and increases sane-ness.

I suspect Haskell’s is much closer to ML’s anyways, so going in this
direction is fine.

What this means is that perhaps Racket’s language support system
should include mechanisms for implementing both ends of the spectrum
and something in the middle too.
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Alexis King

unread,
Jun 6, 2018, 5:25:44 PM6/6/18
to Matthias Felleisen, Racket Users
> On Jun 6, 2018, at 16:05, Matthias Felleisen <matt...@felleisen.org>
> wrote:
>
> I suspect Haskell’s is much closer to ML’s anyways, so going in this
> direction is fine.

Yes, since I didn’t explicitly note it in my original email, I’ll note
here that the conversation Spencer and I had included a discussion of
the behavior of the GHC REPL. We agreed it would be acceptable and
even desirable to have that set of compromises in the case of Hackett.

That said, I think there are compelling additional benefits to defining
the semantics of the REPL in terms of modules rather than trying to just
make a “more ML-ish” REPL. Most obviously, if we can avoid introducing
any behavior that is truly REPL-specific, then even if we use a slightly
different implementation path for the REPL in practice, we can use the
behavior of the module system as a trusted source of truth when the two
disagree.

Matthias Felleisen

unread,
Jun 6, 2018, 5:35:18 PM6/6/18
to Alexis King, Racket Users
Sure.

But you are ignoring the community of Lispers who are used to
some specific REPL behavior and considered it _good_.

Pragmatically:

1. Novices want the transparent REPL.
2. Experienced coders want it for short-running code.
(for a suitable value of “short”)
3. Experienced developers want Lisp’s REPL for long-running code.
4. Nobody else should want any REPL. Or ML’s is a sad approximation.


— Matthias, requested the transparent REPL from Robby


Alexis King

unread,
Jun 6, 2018, 5:55:52 PM6/6/18
to Matthias Felleisen, Racket Users
> On Jun 6, 2018, at 16:35, Matthias Felleisen <matt...@felleisen.org> wrote:
>
> Sure.
>
> But you are ignoring the community of Lispers who are used to
> some specific REPL behavior and considered it _good_.
>
> Pragmatically:
>
> 1. Novices want the transparent REPL.
> 2. Experienced coders want it for short-running code.
> (for a suitable value of “short”)
> 3. Experienced developers want Lisp’s REPL for long-running code.
> 4. Nobody else should want any REPL. Or ML’s is a sad approximation.

I did explicitly state in the original message that I was not by any
means proposing replacing Racket’s REPL. The existing behavior seems to
work fine for #lang racket, and I could imagine it being nicer for
students. It just doesn’t work that well for Hackett, unless I’m
overlooking some technique that could solve my problem, and I think it’s
valuable to offer an interactive environment that is at the very least
predictable enough to not be regularly called “hopeless”.

Christopher Lemmer Webber

unread,
Jun 8, 2018, 12:10:33 PM6/8/18
to Matthias Felleisen, Alexis King, Racket Users
Matthias Felleisen writes:

> [[ only scanned the message far enough to see whether you discovered something new ]]
>
> History knows two kinds of REPLs, often confusingly called “interpreters” by lay people:
>
> — Lisp’s
> — ML’s
>
> My impression is that you are re-discovering a flavor of ML’s REPL,
> which reduces expressiveness and increases sane-ness.
>
> I suspect Haskell’s is much closer to ML’s anyways, so going in this
> direction is fine.
>
> What this means is that perhaps Racket’s language support system
> should include mechanisms for implementing both ends of the spectrum
> and something in the middle too.

Implementing both ends of the spectrum seems good... as I raised not
long ago, I am in the opposite camp from Alexis right now:

https://groups.google.com/forum/#!topic/racket-users/CxQ5o_OUQaw

I would like a "more hopeless" REPL, if hopelessness is defined as
mutability of the toplevel (maybe "more squishy" is better phrasing...
"squishy" vs "sturdy" seems to be the two ends of this spectrum).

I am not alone. Recently, two of the best lisp developers I know
personally (both of whom work on games) have told me the sole reason
they don't use Racket for some of their projects is entirely because
live hacking is not possible in Racket, which can matter a lot in
something like game or server development where you want to make a lot
of changes without restarting the process. For that same reason, I
avoided Racket for *years* for this same reason. (Seeing all the nice
things Racket has while running the workshops we did recently was what
won me over.)

I can definitely see the reasons that Alexis would want a sturdier REPL
for Hackett, and that seems perfectly reasonable. I just hope we don't
rule out the possibility of a squishier REPL as well in making changes
to support it.

Christopher Lemmer Webber

unread,
Jun 8, 2018, 12:11:32 PM6/8/18
to Alexis King, Matthias Felleisen, Racket Users
Alexis King writes:

>> Pragmatically:
>>
>> 1. Novices want the transparent REPL.
>> 2. Experienced coders want it for short-running code.
>> (for a suitable value of “short”)
>> 3. Experienced developers want Lisp’s REPL for long-running code.
>> 4. Nobody else should want any REPL. Or ML’s is a sad approximation.
>
> I did explicitly state in the original message that I was not by any
> means proposing replacing Racket’s REPL. The existing behavior seems to
> work fine for #lang racket, and I could imagine it being nicer for
> students. It just doesn’t work that well for Hackett, unless I’m
> overlooking some technique that could solve my problem, and I think it’s
> valuable to offer an interactive environment that is at the very least
> predictable enough to not be regularly called “hopeless”.

I suppose I should have read the rest of this thread before replying :)

Raoul Duke

unread,
Jun 8, 2018, 12:21:30 PM6/8/18
to Christopher Lemmer Webber, Matthias Felleisen, Alexis King, Racket Users
why does liveness have to be the enemy of safety?

Alexis King

unread,
Jun 8, 2018, 12:48:02 PM6/8/18
to Christopher Lemmer Webber, Raoul Duke, Racket Users
> On Jun 8, 2018, at 11:10, Christopher Lemmer Webber
> <cwe...@dustycloud.org> wrote:
>
> I would like a "more hopeless" REPL, if hopelessness is defined as
> mutability of the toplevel (maybe "more squishy" is better phrasing...
> "squishy" vs "sturdy" seems to be the two ends of this spectrum).

From where I see things, I don’t think your assumption that hopelessness
refers to mutability is accurate, but my view is also heavily distorted
— so much of what I work on these days is in the macro system that
sometimes I seem to forget that all the work we do at compile-time is
ostensibly to support runtime, rather than runtime being a happy
side-effect of the more important things that happen at compile time. ;)

Still, with that caveat, I will make the claim that the “hopelessness”
of the top level has less to do with mutability and more to do with
unpredictability in the face of expressive macros (see Flatt’s
“Composable and Compilable Macros”, which first presented Racket’s
module system and includes a discussion of the motivation). Racket
modules bring phases, explicit namespace management (in the sense of
requires and provides, not Racket’s first class namespaces, which are
themselves top level evaluation environments!), and predictable partial
expansion of module bodies to support mutually recursive definitions.
Code in modules is entirely statically bound — the compiler can always
determine precisely which binding an use refers to at compile-time, and
unbound identifiers are a compile-time error. This is useful for the
macroexpander and compiler alike, since stronger guarantees also mean
it’s possible to reason about both intra- and inter-module
optimizations.

You’re right in that some of the guarantees afforded by the module
system, such as restricted mutability, make live programming harder, but
I don’t think that the answer is to throw all those guarantees out the
window — rather, I think we want to be able to selectively weaken them
in a careful way. In that sense, I don’t think you want a “more
hopeless” REPL, you want a “more live” *language*, which does not
immediately seem to me in conflict with the desire to have predictable
scoping, expansion, and encapsulation guarantees.

> On Jun 8, 2018, at 11:21, Raoul Duke <rao...@gmail.com> wrote:
>
> why does liveness have to be the enemy of safety?

I don’t think it necessarily does in theory, but existing
implementations (which essentially allow ad-hoc redefinition of
arbitrary values at runtime) are intrinsically abstraction-breaking.
It’s easy to get a running system into a nonsensical state that could
never have happened if the system was started from scratch. Racket
programmers enjoy the luxury of strong guarantees enforced by the
runtime, and I think there is a sentiment that we would like to find
ways of permitting liveness without compromising on our principles. (But
that seems to be an open problem.)

Neil Van Dyke

unread,
Jun 8, 2018, 1:10:56 PM6/8/18
to Christopher Lemmer Webber, Racket Users
Christopher Lemmer Webber wrote on 06/08/2018 12:10 PM:
> I am not alone. Recently, two of the best lisp developers I know
> personally (both of whom work on games) have told me the sole reason
> they don't use Racket for some of their projects is entirely because
> live hacking is not possible in Racket, which can matter a lot in
> something like game or server development where you want to make a lot
> of changes without restarting the process.

For many modern server purposes, at least, Racket's less-dynamic-than-CL
inclinations aren't really a problem. HTTP-based systems are usually
stateless outside very brief transactions, or keep the state out of the
front-end server process (e.g., in a database process/store, or in a
separate server process).  And most Racket processes can be made start
quickly nowadays.  Also, in production of Web servers, there's likely to
be at least a simple load balancing server/process among multiple worker
processes, so you just start birthing and burying workers. (And also do
things like experimentally start a percentage of workers with new
changes in production, and back out that batch automatically, if it
starts failing.)

I'm interested in how Racket might be used in modern online games, but
someone else will have to comment on how they would like to use Racket
at run time in those.  (I myself only volunteer with such games, helping
youths in GTA Online to understand that operating a motor vehicle while
under the influence of marijuana will end badly for them. :)  If you
can't currently use Racket in tight game real-time code, anyway, maybe
the less-real-time-sensitive uses are still amenable to dynamic changes?

Maybe people want to talk about other ways that they've implemented "hot
patching" of running production systems in Racket, done very interactive
changes to a Racket program under development without restarting the
process, etc.?  `dynamic-require`?  `eval`?  Setting variables to new
closures, and signalling execution in the old closures to exit and start
the new ones?  Swapping Places?  Using debugging captures of state, for
later replay and testing, as an alternative to hot patching?

Christopher Lemmer Webber

unread,
Jun 8, 2018, 5:15:49 PM6/8/18
to Alexis King, Raoul Duke, Racket Users
Thanks for the thoughtful reply, Alexis :)

Alexis King writes:

> Still, with that caveat, I will make the claim that the “hopelessness”
> of the top level has less to do with mutability and more to do with
> unpredictability in the face of expressive macros (see Flatt’s
> “Composable and Compilable Macros”, which first presented Racket’s
> module system and includes a discussion of the motivation). Racket
> modules bring phases, explicit namespace management (in the sense of
> requires and provides, not Racket’s first class namespaces, which are
> themselves top level evaluation environments!), and predictable partial
> expansion of module bodies to support mutually recursive definitions.
> Code in modules is entirely statically bound — the compiler can always
> determine precisely which binding an use refers to at compile-time, and
> unbound identifiers are a compile-time error. This is useful for the
> macroexpander and compiler alike, since stronger guarantees also mean
> it’s possible to reason about both intra- and inter-module
> optimizations.
>
> You’re right in that some of the guarantees afforded by the module
> system, such as restricted mutability, make live programming harder, but
> I don’t think that the answer is to throw all those guarantees out the
> window — rather, I think we want to be able to selectively weaken them
> in a careful way. In that sense, I don’t think you want a “more
> hopeless” REPL, you want a “more live” *language*, which does not
> immediately seem to me in conflict with the desire to have predictable
> scoping, expansion, and encapsulation guarantees.

I think that's true. I would like to figure out how to have a "#lang
squishy" type thing. I guess I should spend more time learning about
making languages to see how hard it is for me to pull it off currently.

Matthias Felleisen

unread,
Jun 8, 2018, 6:42:17 PM6/8/18
to Christopher Lemmer Webber, Alexis King, Racket Users

> On Jun 8, 2018, at 12:10 PM, Christopher Lemmer Webber <cwe...@dustycloud.org> wrote:
>
> I am not alone. Recently, two of the best lisp developers I know
> personally (both of whom work on games) have told me the sole reason
> they don't use Racket for some of their projects is entirely because
> live hacking is not possible in Racket, which can matter a lot in
> something like game or server development where you want to make a lot
> of changes without restarting the process. For that same reason, I
> avoided Racket for *years* for this same reason. (Seeing all the nice
> things Racket has while running the workshops we did recently was what
> won me over.)


Yes I know this camp very well and I am sorry I didn’t want to
support them for the longest time. We might be in a position now
to help them, though given our limited resources, it’s not a truly
high priority. ~~ Then again, they might be able to hack the
language on their own — Matthias

Christopher Lemmer Webber

unread,
Jun 9, 2018, 10:03:09 AM6/9/18
to Matthias Felleisen, Alexis King, Racket Users
I think knowing that there's interest in supporting this direction is a
bit helpful itself. I have considered, when I can find the time, trying
to build a "#lang squishy" or the like, though I felt previously
somewhat discouraged because it seemed to me like Racket's culture
mostly felt it wasn't a worthwhile idea. It's clear to me now that this
isn't the case and that there would be support for work in such
direction, and that's helpful to know.

Matthias Felleisen

unread,
Jun 10, 2018, 3:23:57 PM6/10/18
to Christopher Lemmer Webber, Racket Users

> On Jun 9, 2018, at 10:03 AM, Christopher Lemmer Webber <cwe...@dustycloud.org> wrote:
>
> I think knowing that there's interest in supporting this direction is a
> bit helpful itself. I have considered, when I can find the time, trying
> to build a "#lang squishy" or the like, though I felt previously
> somewhat discouraged because it seemed to me like Racket's culture
> mostly felt it wasn't a worthwhile idea. It's clear to me now that this
> isn't the case and that there would be support for work in such
> direction, and that's helpful to know.


I wanted to add something to clarify the “discouraged” part.

I grew up on a LISP and PROLOG REPL with “saved heaps” and
I very much enjoyed this. Until I began to realize the problems with this
“programming systems” (as opposed to “programming languages”)
approach to sw dev. As many of us know, reactions always go as far
in the opposite directions as actions .. and so did I with REPLs.

—> TRANSPARENT REPL

Around 1990 — after writing the little books and teaching total
novices for the first time — I realized that the REPL was an
opaque beast that tripped almost all novices (except for the
survivors who remember nothing but glorious days) and many
experienced programmers.

So I proposed a dissertation topic to a student (Rene Rodriguez)
to create a REPL where the state of the programming system
was completely transparent. Robby added this to DrRacket and
I have been eternally grateful since because I can see how students
can use it, how i can use it with students, and what things do in the
REPL is transparent wrt to the Definitions Window.

—> DEBUGGING REPL

On occasion I missed the old REPLs, especially Chez’s cafes, which
allow debugging and can be nested. It’s so long ago that I almost forgot
how to use them.

Eli reminded me quite a few times that for long-running programs we
really want it back. And I have come to think that he’s right.

Alexis’s post reminded me of this and also reminded me that there have
always been many different REPLs (Lisp, Prolog, Hope, ML, Haskell, Racket)
and thar if Racket is an LOP language we should probably provide the
means to implement any of them.

Go for it — Matthias

Greg Hendershott

unread,
Jun 10, 2018, 5:43:58 PM6/10/18
to Matthias Felleisen, Christopher Lemmer Webber, Racket Users
> Alexis’s post reminded me of this

This also reminds me of another discussion triggered by Alexis end of April:

https://groups.google.com/forum/#!topic/racket-users/vX_0pERAS9g

---

Thinking out loud:

A running system has immutable vs. mutable parts. Where do you want
the boundary?

Some people want the mutable part to be as big as possible -- say
everything except the CPU microcode. Always be hackable.

Others want it to be much smaller -- just the data a program mutates
explicitly. Or (even smaller) implicitly, e.g. pure-functional. Always
be reproducible.

Is that a fair way to think about it?

Is Tony's racket-reloadable --
https://github.com/tonyg/racket-reloadable -- a way to explore the
space between?

Is there some rough analog of delimited continuations? "Delimited
hopelessness?" ;)

Seriously if the use-case is exploratory debugging, with an escape
back to a literal non-debug-REPL prompt and an "undo" of the
exploration mutations ... that's one thing. You learn something, you
update your source in e.g. git, you generate a new system image from
source, and it replaces the old system image.

Whereas another use-case is live-coding, and either you flush the
accumulated mutations when the performance ends, --or--, you're OK
with image snapshots instead of version control as your source of
truth.

HiPhish

unread,
Jun 11, 2018, 4:32:21 AM6/11/18
to Racket Users
Sorry for intruding on this topic; I have been lurking in this thread hoping to read something interesting, but I don't quite understand what you people mean. Could someone please explain the difference between a "hopeless" REPL and a "hopeful" one in a few sentences or provide a link for further reading? Thanks.

Jens Axel Søgaard

unread,
Jun 11, 2018, 5:47:11 AM6/11/18
to HiPhish, Racket Users
2018-06-11 10:32 GMT+02:00 HiPhish <hip...@openmailbox.org>:
Sorry for intruding on this topic; I have been lurking in this thread hoping to read something interesting, but I don't quite understand what you people mean. Could someone please explain the difference between a "hopeless" REPL and a "hopeful" one in a few sentences or provide a link for further reading? Thanks.


FWIW - here is a collection of past discussions:


/Jens Axel


Konrad Hinsen

unread,
Jun 11, 2018, 7:49:56 AM6/11/18
to Greg Hendershott, Racket Users
Greg,

> A running system has immutable vs. mutable parts. Where do you want
> the boundary?
>
> Some people want the mutable part to be as big as possible -- say
> everything except the CPU microcode. Always be hackable.
>
> Others want it to be much smaller -- just the data a program mutates
> explicitly. Or (even smaller) implicitly, e.g. pure-functional. Always
> be reproducible.
>
> Is that a fair way to think about it?

I'd say yes, but then I am a bit biased because I have taken this point
of view as well in ongoing debates in computational science, where
pretty much the same issue is framed as an opposition between
interactivity and reproducibility.

But for me the question is not so much "where do you want the boundary".
Where I want it depends on what phase of which kind of project I am
working on. So I'd want the boundary to be user-definable whenever a new
process is started. Moreover, I'd want the boundary to cover both code
and data.

I'd expect to start a new project as mostly mutable, but with all
external dependencies set to immutable. In the end, everything should
ideally be immutable and thus reproducible, but there would be justified
exceptions for performance reasons in long-running processes.

> Is there some rough analog of delimited continuations? "Delimited
> hopelessness?" ;)

"Delimited mutability" sounds nice to me ;-)

> Whereas another use-case is live-coding, and either you flush the
> accumulated mutations when the performance ends, --or--, you're OK
> with image snapshots instead of version control as your source of
> truth.

Third option: you keep a log of all interactions for replaying them
later.

Konrad.

Matthias Felleisen

unread,
Jun 11, 2018, 8:50:59 AM6/11/18
to Konrad Hinsen, Greg Hendershott, Racket Users

> On Jun 11, 2018, at 7:49 AM, Konrad Hinsen <google...@khinsen.fastmail.net> wrote:
>
>> Is there some rough analog of delimited continuations? "Delimited
>> hopelessness?" ;)
>
> "Delimited mutability" sounds nice to me ;-)


This exists but I am not sure it’s what we want for a REPL.

We could introduce first-class stores and snap-back operations.
This may even be done on a pay-as-you-go basis. But it will
call for some research and non-trivial syntax hacking. Alexis’s
suggestion of set! rewriters may be the starting point.

— Matthias

Christopher Lemmer Webber

unread,
Jun 11, 2018, 9:42:35 AM6/11/18
to Greg Hendershott, Matthias Felleisen, Racket Users
Greg Hendershott writes:

> Is Tony's racket-reloadable --
> https://github.com/tonyg/racket-reloadable -- a way to explore the
> space between?

It seems like a cool thing and probably something I will use in web
development. One notable difference between this and the classic lisp
environment type thing, eg when I am developing in Guile or Common Lisp,
I don't necessarily have to structurally plan my program at all for
which parts I might be playing with live. I might not know until I find
I need to start experimenting. So racket-reloadable makes a lot more
sense for web servers, where you're really setting up something closer
to how python and ruby web servers restart their processes as they run,
and you know where that boundary probably is (in the controllers and
view templates, most likely). It's probably not as great as the classic
lisp environment when you don't know what parts you're going to be
working on in advance!

> Is there some rough analog of delimited continuations? "Delimited
> hopelessness?" ;)

Could be... I'd certainly find that interesting if it exists. Arguably
even with a game, you might find that for instance NPC behavior is
primarily what you'll want to be changing. So yeah, I can see an
argument for it.

> Seriously if the use-case is exploratory debugging, with an escape
> back to a literal non-debug-REPL prompt and an "undo" of the
> exploration mutations ... that's one thing. You learn something, you
> update your source in e.g. git, you generate a new system image from
> source, and it replaces the old system image.

This is a really great breakdown of the differences. I suppose I'm in
this camp: I may have an IRC bot or a game server and I'm making
changes in response to friends' inputs, and evaluating the code changes
I've made as I go, but the source code is definitive. (And yes, I do
run into the kinds of mistakes that this dev style encourages... oops, I
was using a variable I never defined, for instance. Usually you catch
that when you shut it down and start it up again, but you may have made
several commits you didn't realize were incomplete in-between.)
I'm not sure about the "undo" of the exploration mutations bit though...
eg if you watch the "Mudsync" video I did of the multiplayer game, the
changes I was making to the world were meant to be persistent:

https://ia801900.us.archive.org/27/items/feb_2017-live_network_coding_8sync/live_network_coding_8sync.webm

It's not just live debugging... it's live evolution of the program, but
the idea is that the source files are still canonical. If I'm adding a
new NPC for the players to interact with on the server, I don't want
that behavior to disappear after I figure out how it works. I want it
to keep running as the server runs, and hopefully it's captured all
correctly in the files so that when I restart the process eventually
everything is working as it "evolved" in response to what I learned
while playing.

> Whereas another use-case is live-coding, and either you flush the
> accumulated mutations when the performance ends, --or--, you're OK
> with image snapshots instead of version control as your source of
> truth.

Yes, for this other camp I've always been nervous about the "save the
world" type environment. Reproducibility is too important to me, and
I've never been satisfied with "runtime is all there is."

- Chris

PS: I've felt somewhat embarassed that I derailed Alexis' thread by
(rudely?) asking for exactly the opposite thing of what she was asking
for, but I feel this has been a productive conversation especially WRT
multiple REPL styles may be worthwhile. Hopefully that makes up for it
a bit.

John Clements

unread,
Jun 26, 2018, 3:32:25 PM6/26/18
to Christopher Lemmer Webber, Greg Hendershott, Matthias Felleisen, Racket Users
I’d like to pick up on a thread that I think is important and was missed, here: data science.

Context: iPython/Jupyter is super-popular with data scientists, and I hear more about it than most folks because it’s pretty much headquartered here at Cal Poly (though not actually part of the CS department). For those not familiar: Jupyter is a browser-based interface to a local or remote compute engine. It uses a classic REPL; in fact, it uses the old macintosh shell idiom which cranks the statefulness of the REPL up to 11 by allowing you to highlight earlier chunks of code in the REPL and re-evaluate them, with results displayed inline. (Side note: does this idea predate the mac shell? I’d be curious to hear about this).

Anyhow, this has some obvious advantages, and some almost equally obvious problems. The principal problem, of course, is that the shell depends enormously on a hidden state. To take a super-obvious example, you can have

import pandas as pd

and

pd.read_csv(‘/tmp/my.csv’)

in an iPython notebook, and if you evaluate the second one before the first one, you just get an error message.

The advantage of this system is—if I successfully re-read the earlier messages in this thread carefully enough—not one that’s been mentioned already, which is the time savings associated with not re-running the code every time, BUT ALSO the literate-code-like advantage of having—essentially—a definitions window where the results of the intermediate expressions are displayed inline, in a collapsible way.

There is a strong smell of reactive code here: if the programmer goes back and edits an early part of the program, we’d like (with perhaps a warning) to re-evaluate only those expressions that depend on the edited one.

This kind of interface could potentially save data scientists serious time and frustration, by guaranteeing that the contents of their “notebooks” are always internally consistent.

Has someone already done this work?

Best,

John

Konrad Hinsen

unread,
Jun 27, 2018, 11:11:32 AM6/27/18
to John Clements, Racket Users
"'John Clements' via Racket Users" <racket...@googlegroups.com>
writes:

> department). For those not familiar: Jupyter is a browser-based
> interface to a local or remote compute engine. It uses a classic REPL;
> in fact, it uses the old macintosh shell idiom which cranks the
> statefulness of the REPL up to 11 by allowing you to highlight earlier
> chunks of code in the REPL and re-evaluate them, with results
> displayed inline. (Side note: does this idea predate the mac shell?
> I’d be curious to hear about this).

I don't know about the Mac shell - was that the original Mac, before the
switch to macos X ?

The main inspiration for Jupyter was the Mathematica notebook,
introduced in the 1980s. The first "IPython notebook", as it was called
initially, was essentially an Open Source clone of the Mathematica
notebook using Python as the programming language.

> Anyhow, this has some obvious advantages, and some almost equally
> obvious problems. The principal problem, of course, is that the shell
> depends enormously on a hidden state.

That is indeed the main problem and it has lead to lots of discussions
and alternative proposals.

> The advantage of this system is—if I successfully re-read the earlier
> messages in this thread carefully enough—not one that’s been mentioned
> already, which is the time savings associated with not re-running the
> code every time, BUT ALSO the literate-code-like advantage of
> having—essentially—a definitions window where the results of the
> intermediate expressions are displayed inline, in a collapsible way.

For Jupyter, that's true. In fact, the earlier IPython shell (a highly
polished user interface for the Python REPL) already had everything to
get time savings. The move to the notebook was motivated by the
combination of code and results with an explicative narrative.

However, it's the time savings aspect that is responsible for the hidden
state approach. Your idea of reactive code:

> There is a strong smell of reactive code here: if the programmer goes
> back and edits an early part of the program, we’d like (with perhaps a
> warning) to re-evaluate only those expressions that depend on the
> edited one.

has been tried for computational notebooks (nextjournal
(https://nextjournal.com/) comes to mind, but I have seen a few others),
but not found much acceptance because recomputing all results
potentially affected by a change in the code is way too expensive with
the imperative languages that dominate data science.

Designing a functional data science language that allows a sufficiently
fine-grained dependency analysis to minimize recomputations sounds like
an interesting research topic. I am not sure though that practitioners
would care much, given the huge investments into existing ecosystems
for Python and R.

Konrad.

John Clements

unread,
Jun 27, 2018, 12:54:38 PM6/27/18
to Konrad Hinsen, Racket Users
Many thanks for the pointers! I’ll take a look at nextjournal.

John

Hendrik Boom

unread,
Jun 27, 2018, 6:44:46 PM6/27/18
to Konrad Hinsen, John Clements, Racket Users
Yes. it would be. And if successful it would have implications for a
lot of other problems, such as incremental compilation while editing.

> I am not sure though that practitioners
> would care much, given the huge investments into existing ecosystems
> for Python and R.

If, as we hope, it is significantly faster than the prevailing
technology, practitioners will be interested. Though the interest might
manifest itself as attempts to accomplish the same effects despite the
obstacles placed by imperativism.

-- hendrik

Matthias Felleisen

unread,
Jun 28, 2018, 11:10:28 AM6/28/18
to Konrad Hinsen, John Clements, Racket Users

> On Jun 27, 2018, at 11:11 AM, Konrad Hinsen <google...@khinsen.fastmail.net> wrote:
>
> However, it's the time savings aspect that is responsible for the hidden
> state approach. Your idea of reactive code:
>
>> There is a strong smell of reactive code here: if the programmer goes
>> back and edits an early part of the program, we’d like (with perhaps a
>> warning) to re-evaluate only those expressions that depend on the
>> edited one.
>
> has been tried for computational notebooks (nextjournal
> (https://nextjournal.com/) comes to mind, but I have seen a few others),
> but not found much acceptance because recomputing all results
> potentially affected by a change in the code is way too expensive with
> the imperative languages that dominate data science.


@ John

This is how DrRacket’s “transparent REPL” originally came about.

I worked with a PhD student called Rene Rodriguez (Corky Cartwright
supervised him at the time, he switched Mary Hall later; not sure he
ever finished) on an Emacs-based REPL that performed a dependency
analysis on the code and only re-loaded those parts that, according
to the analysis, needed to be re-loaded. The system combined Emacs and
Chez, neither of which was suitable (too opaque).

In the end, the problem really was that the dependency analysis ran
into too many set!s and set-car!s (and I hadn’t stripped the freshman
curriculum of imperative programming yet).

As Kent said back then to me, “just re-load the whole program. Chez
is fast enough.” I had been doing this with Scheme 84 at the time but
I had sensed lagginess. In the end I agreed with Kent and decided to
reload the whole thing.

That was in 1990, just after we Cartwright and I had developed the
“semantic program dependence graph” and thought it was useful. (Many
of you now know the semPDG as SSA but that’s a different story of
co-inventing stuff.)

— Matthias



Eric Griffis

unread,
Jun 28, 2018, 2:19:55 PM6/28/18
to hen...@topoi.pooq.com, google...@khinsen.fastmail.net, clem...@brinckerhoff.org, racket...@googlegroups.com
On Wed, Jun 27, 2018 at 3:44 PM Hendrik Boom <hen...@topoi.pooq.com> wrote:
> On Wed, Jun 27, 2018 at 05:11:26PM +0200, Konrad Hinsen wrote:
> > "'John Clements' via Racket Users" <racket...@googlegroups.com>
> > writes:
> >
> > > The advantage of this system is—if I successfully re-read the earlier
> > > messages in this thread carefully enough—not one that’s been mentioned
> > > already, which is the time savings associated with not re-running the
> > > code every time, BUT ALSO the literate-code-like advantage of
> > > having—essentially—a definitions window where the results of the
> > > intermediate expressions are displayed inline, in a collapsible way.

This is starting to sound like Hydrogen (based on Light Table and
Jupyter):

https://atom.io/packages/hydrogen

http://lighttable.com

> > I am not sure though that practitioners
> > would care much, given the huge investments into existing ecosystems
> > for Python and R.

Most data scientists are not engineers. They are more attached to
techniques than technologies and depend largely on developer advocacy
for better tooling. For example, e-science gateways is an engineering
field dedicated to hiding platform detail from data scientists.

> If, as we hope, it is significantly faster than the prevailing
> technology, practitioners will be interested. Though the interest might
> manifest itself as attempts to accomplish the same effects despite the
> obstacles placed by imperativism.

I've designed programming languages for data scientists and co-wrote a
paper on formal languages for data intensive workflows. They can be a
fickle bunch, and the engineers working with them are incredibly
patient.

At a SuperComputing roundtable for academic data science, the
moderator asked why important results took so long for the community
to adopt. The discussion was hilariously fruitless.

"It's 2013. Why don't we have distributed task queuing for Python
yet?"

"Hasn't Celery been around since like 2007?"

"I can't find that on my workbench, so it might as well not exist."

"Then let's discuss that."

"But data science is hard enough as it is! I'll just keep using
Excel for that until the engineers figure it out."

"umm..."

On the other hand, the knowledge gap between what data scientists know
and what they do is fertile ground for high impact language-oriented
programming.

Eric

Konrad Hinsen

unread,
Jun 28, 2018, 11:31:18 PM6/28/18
to Eric Griffis, Racket Users
On 28/06/2018 20:19, Eric Griffis wrote:

> This is starting to sound like Hydrogen (based on Light Table and
> Jupyter):
>
> https://atom.io/packages/hydrogen

Does Hydrogen attempt to do any dependency analysis? My understanding is
that is simply sends code snippets to a Juypter kernel, meaning that it
inherits all the problems stemming from hidden state.

> Most data scientists are not engineers. They are more attached to
> techniques than technologies and depend largely on developer advocacy
> for better tooling.

That's true for scientists in general, in my experience at least
(physics, chemistry, biology).

> On the other hand, the knowledge gap between what data scientists know
> and what they do is fertile ground for high impact language-oriented
> programming.

I suspect so as well, also for more traditional science.

Konrad.

James Geddes

unread,
Jun 29, 2018, 5:59:21 AM6/29/18
to Eric Griffis, Racket Users, hen...@topoi.pooq.com, google...@khinsen.fastmail.net, clem...@brinckerhoff.org

I work with a small team of data scientists and I would love to see wider adoption of Racket (this appears to be a common plaint). Try as I might -- and certainly in part through ignorance -- I cannot like Python. So I am extremely interested in this discussion.

> On 28 Jun 2018, at 19:19, Eric Griffis <ded...@gmail.com> wrote:
>
> This is starting to sound like Hydrogen (based on Light Table and
> Jupyter):
>
> https://atom.io/packages/hydrogen
>
> http://lighttable.com

I'm lucky enough to be collaborating on a project that is precisely trying to build a notebook-like interface that /does/ maintain a dependency graph of the computations. It aims to solve, in part, the problem of state in Jupyter-like systems. We have a paper coming out shortly in TaPP (there's a copy at http://tomasp.net/academic/papers/wrattler/ if you are interested).

> At a SuperComputing roundtable for academic data science, the
> moderator asked why important results took so long for the community
> to adopt. The discussion was hilariously fruitless.

Jon Skeet, of Stack Overflow fame, gave a talk in which he lamented some of the problems caused, among other things, by not thinking carefully about correctness but instead adopting languages or features that mostly work most of the time. I did ask why we did not simply choose languages whose designers /had/ thought carefully about many (not all) of the problems he discussed. I didn't come away feeling hopeful. (If you find the talk online, it was a somewhat ill-posed question and I obviously oversimplified the history.)


James

-------------------------------
James Geddes
Principal Data Scientist
T +44 (0)20 3862 3326
M +44 (0)7973 223 571
jge...@turing.ac.uk

-------------------------------
The Alan Turing Institute
British Library
96 Euston Road
NW1 2DB London
turing.ac.uk

Jens Axel Søgaard

unread,
Jun 29, 2018, 6:27:21 AM6/29/18
to Matthias Felleisen, Christopher Lemmer Webber, Racket Users
2018-06-10 21:23 GMT+02:00 Matthias Felleisen <matt...@felleisen.org>:

Eli reminded me quite a few times that for long-running programs we
really want it back. And I have come to think that he’s right.

Alexis’s post reminded me of this and also reminded me that there have
always been many different REPLs (Lisp, Prolog, Hope, ML, Haskell, Racket)
and thar if Racket is an LOP language we should probably provide the
means to implement any of them.

Go for it — Matthias

Some low hanging fruit for DrRacket: make it easier to send s-expressions from the 
definitions window to interaction window. 

This example in the DrRacket documentation is a good start:

 
/Jens Axel

Konrad Hinsen

unread,
Jun 29, 2018, 8:11:14 AM6/29/18
to James Geddes, Racket Users
James Geddes <jge...@turing.ac.uk> writes:

> I'm lucky enough to be collaborating on a project that is precisely
> trying to build a notebook-like interface that /does/ maintain a
> dependency graph of the computations. It aims to solve, in part, the
> problem of state in Jupyter-like systems. We have a paper coming out
> shortly in TaPP (there's a copy at
> http://tomasp.net/academic/papers/wrattler/ if you are interested).

That looks interesting, although the paper leaves many practically
relevant details unclear. Do you plan to publish the code?

One limitation I see is the use of a single data structure (the JSON
data frames stored in a database) for exchanging data between
cells. That's great when the data structure fits the problem but becomes
a show-stopper otherwise.

In fact this reminds me of my own ActivePapers system
(http://dx.doi.org/10.12688/f1000research.5773.3) which is similar
to Wrattler under the hood (it doesn't have a notebook interface yet),
but uses HDF5 datasets as the data structure for all state. This is
nearly perfect for my own needs (molecular simulation), but not at all
suitable for some other application types, nor for languages without an
HDF5 interface (such as Racket, unfortunately).

> carefully about correctness but instead adopting languages or features
> that mostly work most of the time. I did ask why we did not simply
> choose languages whose designers /had/ thought carefully about many
> (not all) of the problems he discussed. I didn't come away feeling

In my experiences, scientists choose tools, including languages, by
popularity and by the effort required to get a first project done.
I have met only very few people who would actually test-drive several
options before making a choice, let alone reading some other field's
literature.

Konrad

Raoul Duke

unread,
Jun 29, 2018, 10:14:15 AM6/29/18
to Racket-Users List
see also Erlang's hot reload.

George Neuner

unread,
Jun 29, 2018, 4:42:57 PM6/29/18
to racket users

On 6/29/2018 5:59 AM, James Geddes wrote:
> I cannot like Python. So I am extremely interested in this discussion.

+1

I cannot like any language that has significant indentation.  Python
doesn't care how many spaces you indent at each level - only that every
level is different and every line at the same level matches. But even
with the structure aware editor, there still is too much manual fussing
for my taste.  It distracts from whatever you actually are trying to do.

George

Greg Trzeciak

unread,
Jun 29, 2018, 5:46:37 PM6/29/18
to Racket Users
Just in case you are not aware the low level Racket interface to HDF5 already exists:
On Friday, June 29, 2018 at 2:11:14 PM UTC+2, Konrad Hinsen wrote:

In fact this reminds me of my own ActivePapers system
(http://dx.doi.org/10.12688/f1000research.5773.3) which is similar
to Wrattler under the hood (it doesn't have a notebook interface yet),
but uses HDF5 datasets as the data structure for all state. This is
nearly perfect for my own needs (molecular simulation), but not at all
suitable for some other application types, nor for languages without an
HDF5 interface (such as Racket, unfortunately).


Konrad

Eric Griffis

unread,
Jun 29, 2018, 6:28:19 PM6/29/18
to google...@khinsen.fastmail.net, Racket Users
On Thu, Jun 28, 2018 at 8:31 PM Konrad Hinsen
<google...@khinsen.fastmail.net> wrote:
>
> On 28/06/2018 20:19, Eric Griffis wrote:
>
> > This is starting to sound like Hydrogen
>
> Does Hydrogen attempt to do any dependency analysis?

I've never used Hydrogen and should have worded my comment more clearly as
fishing. Sorry about that.

Eric

John Clements

unread,
Jun 30, 2018, 1:21:19 PM6/30/18
to Eric Griffis, Konrad Hinsen, Racket Users, Shubham Kahal
Okay, there’s clearly a lot of people with scientific programming experience on this thread. A student and I are looking at dataframe representations for data science in Racket, and IIUC, pandas is built on top of NumPy’s homogeneous arrays. Does this seem like the right way to go in Racket, as well?

Also, in re: the hdf5 bindings, it looks like there’s a build failure that doesn’t appear to be simply a missing dylib. Also, no docs...

Thanks!

John

James Geddes

unread,
Jun 30, 2018, 1:37:31 PM6/30/18
to John Clements, Eric Griffis, Konrad Hinsen, Racket Users, Shubham Kahal
Data frames seem to be exactly right as a sort of “standard unit” for stats or machine learning work.

Presumably they are “the same thing” as relations (aka tables) in the relational database world. (Where the scare quotes are there because I don’t quite know what equivalence means.)

R implements data frames as a (heterogeneous) list of homogeneous vectors. (I don’t know enough to comment on Python.) It allows row-wise access, which is convenient, but the column-wise storage model makes common operations (like averaging) vectorisable.

It would be fantastic to have a Racket version of this.

James

Konrad Hinsen

unread,
Jun 30, 2018, 3:16:38 PM6/30/18
to John Clements, Eric Griffis, Racket Users, Shubham Kahal
"John Clements" <clem...@brinckerhoff.org> writes:

> Okay, there’s clearly a lot of people with scientific programming
> experience on this thread. A student and I are looking at dataframe
> representations for data science in Racket, and IIUC, pandas is built
> on top of NumPy’s homogeneous arrays. Does this seem like the right
> way to go in Racket, as well?

This looks like two questions to me:

1. Are data frames the right data structure to do data science in
Racket?

2. Is it a good idea to build data frames on top of arrays, like
Pandas does?

As for 1., data frames are very versatile and rightly popular for that
reason, both in R and Python/pandas. However, they don't cover
everyone's needs, and in particular they are not a good match for
high-dimensional regular data, which are nicely represented by arrays.
A good data science environment should offer both.

This is one of the reasons why pandas builds on NumPy arrays, making
data conversion between the two simple and efficient. Another reason is
probably that NumPy is much older and mature, so it offered a nice
starting point. Unfortunately that also means that pandas has inherited
some of the problematic aspects of NumPy arrays.

Since in Racket, arrays are a layer on top of vectors, it's perhaps
simpler / easier / more efficient to use vectors as the common low-level
layer for both arrays and data frames. All the more since FFI interfaces
to C libraries tend to operate at the vector level as well.

> Also, in re: the hdf5 bindings, it looks like there’s a build failure
> that doesn’t appear to be simply a missing dylib. Also, no docs...

It looks more like an experiment to me. Does anyone know the author?
If he's interested in improving this package, I'd be happy to help out.

Konrad.

John Clements

unread,
Jun 30, 2018, 3:33:45 PM6/30/18
to Konrad Hinsen, Eric Griffis, Racket Users, Shubham Kahal


> On Jun 30, 2018, at 12:16, Konrad Hinsen <google...@khinsen.fastmail.net> wrote:
>
> "John Clements" <clem...@brinckerhoff.org> writes:
>
>> Okay, there’s clearly a lot of people with scientific programming
>> experience on this thread. A student and I are looking at dataframe
>> representations for data science in Racket, and IIUC, pandas is built
>> on top of NumPy’s homogeneous arrays. Does this seem like the right
>> way to go in Racket, as well?
>
> This looks like two questions to me:
>
> 1. Are data frames the right data structure to do data science in
> Racket?
>
> 2. Is it a good idea to build data frames on top of arrays, like
> Pandas does?
>
> As for 1., data frames are very versatile and rightly popular for that
> reason, both in R and Python/pandas. However, they don't cover
> everyone's needs, and in particular they are not a good match for
> high-dimensional regular data, which are nicely represented by arrays.
> A good data science environment should offer both.
>
> This is one of the reasons why pandas builds on NumPy arrays, making
> data conversion between the two simple and efficient. Another reason is
> probably that NumPy is much older and mature, so it offered a nice
> starting point. Unfortunately that also means that pandas has inherited
> some of the problematic aspects of NumPy arrays.

Do you have some insight into what aspects are problematic? I did a quick read of your ActivePapers paper (https://f1000research.com/articles/3-289/v3), and it doesn’t look to me like it goes into sufficient detail to answer this.

>
> Since in Racket, arrays are a layer on top of vectors, it's perhaps
> simpler / easier / more efficient to use vectors as the common low-level
> layer for both arrays and data frames. All the more since FFI interfaces
> to C libraries tend to operate at the vector level as well.

Yes, my assumption was that a “homogeneous array” would be represented in Racket as a vector or perhaps even an ffi-style c-formatted array.

John Clements



Grzegorz Trzeciak

unread,
Jun 30, 2018, 5:16:46 PM6/30/18
to google...@khinsen.fastmail.net, peters...@gmail.com, AlexHa...@gmail.com, ray.r...@gmail.com, clem...@brinckerhoff.org, ded...@gmail.com, racket...@googlegroups.com, shubha...@gmail.com
[I am including in the conversation all people mentioned in my post]

1. Racket HDF5 - I don't know the author (Peter Samarin) but hopefully he will shed some light on the status of hdf5 library.
But from what I see the low level bindings are quite complete and are almost 1 to 1  with regards to low level C api. In regards to documentation - since it is so low level the hdf5 docs should be sufficient for understanding the api:

It would be worthwile to think how the Racket high level library would look like - one thing for certain - it should work well with data frames implementation.

2. Racket data frames - I've seen 2 implementations, one by Ray Racine:
unfortunately it is lacking any documentation so that decreases its usefullness.

Another lighter implementation of Racket data frames is part of ActivityLog2 (by Alex Harsanyi):

G.

--
You received this message because you are subscribed to a topic in the Google Groups "Racket Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/racket-users/0BLHm18YUkc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to racket-users...@googlegroups.com.

peter....@gmail.com

unread,
Jul 1, 2018, 9:11:41 AM7/1/18
to Racket Users
Hi all,

As of now, the low-level c-style API should work. I have tested it on Ubuntu 14 and 16 and it seems to work fine with the libraries available from the package manager.

The high-level API needs something for automatic type detection/reconstruction and partial access/writing of data. But it requires either thinking or maybe simply converting h5py into Racket, for both of which I don't have  the time at the moment. Contributions are, of course, welcome!

Here is my attempt of automatically read all data from datasets: https://github.com/oetr/racket-hdf5/blob/master/examples/getting-data-format.rkt. It works OK except for complex compounds containing strings. Here the output data structure is math/array for the most part. Though, to obtain it from unsafe cblocks, it is always the (vector->array (cblock->vector ...)) composition.

Peter

Konrad Hinsen

unread,
Jul 2, 2018, 4:54:11 AM7/2/18
to John Clements, Racket Users
"John Clements" <clem...@brinckerhoff.org> writes:

> Do you have some insight into what aspects are problematic? I did a
> quick read of your ActivePapers paper
> (https://f1000research.com/articles/3-289/v3), and it doesn’t look to
> me like it goes into sufficient detail to answer this.

Indeed, the paper doesn't deal with such Python-specific technical
details at all.

In this case, one problem is that Pandas inherits the subtle copy/view
semantics of NumPy arrays, which NumPy introduced to permit efficient
handling of large datasets. But contrary to NumPy, Pandas doesn't make
these semantics explicit as part of the API. That makes modifying data
frames a frequent source of bad surprises.

>> Since in Racket, arrays are a layer on top of vectors, it's perhaps
>> simpler / easier / more efficient to use vectors as the common low-level
>> layer for both arrays and data frames. All the more since FFI interfaces
>> to C libraries tend to operate at the vector level as well.
>
> Yes, my assumption was that a “homogeneous array” would be represented
> in Racket as a vector or perhaps even an ffi-style c-formatted array.

Having looked at the array implementation in the past, I can confirm
that assumption. I even expected this to be documented, but I didn't
find any explicit statement about the internal representation of
arrays. There are conversion functions
(https://docs.racket-lang.org/math/array_convert.html?q=array%20vector#%28def._%28%28lib._math%2Farray..rkt%29._array-~3evector%29%29),
but the documentation doesn't even promise a performance advantage in
convertion from/to arrays as compared to lists.

Konrad.

Konrad Hinsen

unread,
Jul 3, 2018, 9:15:26 AM7/3/18
to John Clements, Racket Users
Konrad Hinsen <google...@khinsen.fastmail.net> writes:

> However, it's the time savings aspect that is responsible for the
> hidden state approach. Your idea of reactive code has been tried for
> computational notebooks (nextjournal (https://nextjournal.com/) comes
> to mind, but I have seen a few others), but not found much acceptance

A blog post I saw yesterday reminded me of another "functional"
notebook that has been under development for a while: Stencila.

https://elifesciences.org/labs/c496b8bb/stencila-an-office-suite-for-reproducible-research

Konrad.

Konrad Hinsen

unread,
Jul 3, 2018, 9:30:21 AM7/3/18
to peter....@gmail.com, Racket Users
peter....@gmail.com writes:

> As of now, the low-level c-style API should work. I have tested it on
> Ubuntu 14 and 16 and it seems to work fine with the libraries available
> from the package manager.

That sounds good. I'll give it a try under macOS when I am back from
vacations in August.

> The high-level API needs something for automatic type
> detection/reconstruction and partial access/writing of data. But it
> requires either thinking or maybe simply converting h5py into Racket, for
> both of which I don't have the time at the moment. Contributions are, of
> course, welcome!

Another potential source of inspiration is the Julia HDF5 interface:

https://github.com/JuliaIO/HDF5.jl

Julia has a somewhat hidden Lisp heritage, so maybe this is easier to
adapt than h5py.

Konrad.

James Geddes

unread,
Jul 4, 2018, 3:19:26 AM7/4/18
to Konrad Hinsen, Racket Users


> On 29 Jun 2018, at 13:11, Konrad Hinsen <konrad...@fastmail.net> wrote:
>
> James Geddes <jge...@turing.ac.uk> writes:
>
>> I'm lucky enough to be collaborating on a project that is precisely
>> trying to build a notebook-like interface that /does/ maintain a
>> dependency graph of the computations. It aims to solve, in part, the
>> problem of state in Jupyter-like systems. We have a paper coming out
>> shortly in TaPP (there's a copy at
>> http://tomasp.net/academic/papers/wrattler/ if you are interested).
>
> That looks interesting, although the paper leaves many practically
> relevant details unclear. Do you plan to publish the code?

Yes, definitely. (At the moment it is very prototype-y -- in other words, it works for the developer, most of the time.)


> One limitation I see is the use of a single data structure (the JSON
> data frames stored in a database) for exchanging data between
> cells. That's great when the data structure fits the problem but becomes
> a show-stopper otherwise.

Completely agree. Data frames look like the obvious place to start, but I don't know what the right answer is. I'm worried this a wicked problem; it is definitely an interesting one.


> In fact this reminds me of my own ActivePapers system
> (http://dx.doi.org/10.12688/f1000research.5773.3) which is similar
> to Wrattler under the hood (it doesn't have a notebook interface yet),
> but uses HDF5 datasets as the data structure for all state. This is
> nearly perfect for my own needs (molecular simulation), but not at all
> suitable for some other application types, nor for languages without an
> HDF5 interface (such as Racket, unfortunately).
>

Thank you for the link, I will read that paper. HDF5 had been suggested to me before, but it appeared to be something like "a hierarchical filesystem for data frames" (and I did not find a clear description of the format and structure). Given this discussion I will take another look.

Konrad Hinsen

unread,
Jul 4, 2018, 4:59:41 AM7/4/18
to James Geddes, Racket Users
James Geddes <jge...@turing.ac.uk> writes:

>> One limitation I see is the use of a single data structure (the JSON
>> data frames stored in a database) for exchanging data between
>> cells. That's great when the data structure fits the problem but
>> becomes a show-stopper otherwise.
>
> Completely agree. Data frames look like the obvious place to start,
> but I don't know what the right answer is. I'm worried this a wicked
> problem; it is definitely an interesting one.

Indeed. My current point of view (which may change tomorrow) is that the
ideal data exchange format should be a tree structure (e.g.
s-expressions, XML, JSON, ...), because you can map pretty much
everything on that. Next, you want optimized representations for
specific trees. Multidimensional homogeneous arrays are one of
them. Data frames are another one. The idea is to have a hierarchy of
data abstractions with at the basis something universal. If all you have
is data frames, many problems simply cannot be handled, not even
inefficiently.

> Thank you for the link, I will read that paper. HDF5 had been
> suggested to me before, but it appeared to be something like "a
> hierarchical filesystem for data frames" (and I did not find a clear
> description of the format and structure). Given this discussion I will
> take another look.

The HDF5 file format is documented but so complex that the only
practical way to work with it is through the HDF5 library. That is
probably the biggest problem with HDF5. But for efficiently handling
large datasets with structure and metadata, it's hard to beat.

Konrad

Norman Gray

unread,
Jul 4, 2018, 5:27:05 AM7/4/18
to Konrad Hinsen, James Geddes, Racket Users

On 4 Jul 2018, at 9:59, Konrad Hinsen wrote:

> Thank you for the link, I will read that paper. HDF5 had been
>> suggested to me before, but it appeared to be something like "a
>> hierarchical filesystem for data frames" (and I did not find a clear
>> description of the format and structure). Given this discussion I
>> will
>> take another look.
>
>
> The HDF5 file format is documented but so complex that the only
> practical way to work with it is through the HDF5 library. That is
> probably the biggest problem with HDF5. But for efficiently handling
> large datasets with structure and metadata, it's hard to beat.

Indeed, my understanding is that the only authoritative definition of
the HDF5 format is the HDF5 library. This was true when I last looked
at this a few years ago -- it's possible that it's changed, but even if
it has, the complexity point still stands. HDF5 is very impressive,
though.

I'll point to two papers I was involved in, which might be of some
interest to this thread, to the extent that they're about structured
data storage. [1] is a discussion of the long-term limitations of the
FITS format, and includes a comparison with HDF5 amongst others. [2] is
in part a history of a particular structured data format (now rather
niche), which discusses some of its motivations, its claimed successes,
and what it carefully didn't do.

Best wishes,

Norman


[1] http://dx.doi.org/10.1016/j.ascom.2015.01.009
[2] http://dx.doi.org/10.1016/j.ascom.2014.11.001

--
Norman Gray : https://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
Reply all
Reply to author
Forward
0 new messages