Is it possible to make submodules with private names?

rocketnia

unread,

May 22, 2020, 11:11:37 PM5/22/20

to Racket Users

Hi all,

I've been thinking about making libraries that would generate submodules when they're used. However, submodules exist in a flat namespace, I'm a bit afraid of conflicts if I choose the same name as some other library does, and I don't really want users to have to supply their own local choices of names (`rename-in` style) since I'm thinking of these submodules as an implementation detail.

To be more specific about my higher-level goals, I'm thinking of experimenting with a system of modules that have *optional compile-time arguments*, which make them somewhat like ML functors. If a user requires the module the usual way, they get the default arguments, but they can use a special require spec and a system of extended module path indexes to supply arguments. For instance, an extended module path could represent "apply module X to the arguments 1 and 2, and then access the resulting module's Y submodule." Since the default way to require a module just gets its no-argument version, I'm thinking of hiding away the argument-processing logic in a submodule of its own.

When someone supplies these arguments to a module, what's really going to happen is that they're defining a local submodule and requiring it on the spot. After all, the compilation of that module with those arguments has to happen sometime, and it couldn't have happened already, so it must be compiled alongside the current module. A submodule represents this situation well.

A subtler design challenge with this idea is that a library with compile-time arguments probably need to stop using "generative" definitions of structure types, so that their types can remain stable across various choices of module arguments. So I'd probably supply a type definition mechanism that associated the defined type with a stable module path, similar to the way `serializable-struct` creates a submodule called `deserialize-info`.

As you can see, if I proceed the way I'm imagining, my library is going to be generating submodules for several reasons. These submodules would exist mostly as a means to an end, so I'm not immediately inclined to expose them to users the way `serializable-struct` does. I probably could stabilize them if I put in some extra thought, but my first choice, especially early in development, would be to keep these details private. At least, private to anyone who isn't using reflective tools like `module-compiled-submodules` or `current-module-name-resolver`.

In the past, I've guarded against accidental namespace conflicts by using gensyms as my variable names. That approach seems viable here too.

It's a little tricky to do. The name of a submodule being defined or required must be known at compile time, but due to Racket's separate compilation guarantee, different clients using my library at compile time will be using different instantiations of it. If my library just calls (gensym), those clients will all end up using different gensyms, and it won't work. So every instantiation of my library needs to obtain the same gensym, and I do that by generating a gensym one phase up and embedding it in a quotation, like #`(... '#,(gensym) ..). While the Racket compiler can't marshal every kind of 3D syntax into the compiled code, gensyms are one thing it actually can marshal. The gensym's unique identity seems to be generated again at the time it's unmarshaled, which is exactly what I want. Since a (non-reflective) program will unmarshal my library only once, the gensym will be unique to my library but shared across all my library's instantiations.

(See the end of this email for example code.)

Using that technique, everything works fine... at least on the command line. Unfortunately, in DrRacket, the submodule simply isn't found when I try to require it:

require: unknown module
module name: #<resolved-module-path:(submod "/path/to/badlibrary.rkt" badlang-submodule-name12021)>

As far as I can tell, this error I'm getting is really specific to DrRacket. I tried compiling my code with more instrumentation at the command line using "racket -e '(compile-context-preservation-enabled #t)' -l errortrace -t client.rkt" but even that works successfully.

Am I simply running into a bug in DrRacket, or is this gensym technique obscure enough that I shouldn't rely on it? Is there a more stable technique that would give me a similar guarantee that my names aren't collision-prone? My goal with using a gensym was to avoid accidental incompatibilities with other code, so of course an immediate incompatiblity with DrRacket is a sign I might want to take a different approach.

Here are the three files of code I prepared to try this out (badlang.rkt, badlibrary.rkt, and client.rkt):

#lang racket
; badlang.rkt

(require (for-meta 2 syntax/parse))
(require (for-syntax racket))
(require (for-syntax syntax/parse))

(provide (all-defined-out))

; As long as we define `badlang-submodule-name` as an interned symbol
; like this, it works everywhere.
#;
(define-for-syntax badlang-submodule-name
'private/generated-by-badlang/submodule)

; As long as we define `badlang-submodule-name` as a quoted uninterned
; symbol like this, it works at the command line but not in DrRacket.
(begin-for-syntax
(define-syntax (define-quoted-gensym stx)
    (syntax-parse stx
      [
        (_ var:id)
        #`(define var '#,(gensym (syntax-e #'var)))]))
(define-quoted-gensym badlang-submodule-name))

(define-syntax (define-badlang-submodule-here stx)
#`(module #,badlang-submodule-name racket))

(define-syntax (require-badlang-submodule-from stx)
(syntax-parse stx
    [
      (_ parent-module)
      #`(require (submod parent-module #,badlang-submodule-name))]))

#lang racket
; badlibrary.rkt
(require "badlang.rkt")
(define-badlang-submodule-here)

#lang racket
; client.rkt
(require "badlang.rkt")
(require-badlang-submodule-from "badlibrary.rkt")

-Nia

Simon Schlee

unread,

May 23, 2020, 5:04:00 PM5/23/20

to Racket Users

I think pollen uses racket's compiled directories to store some of its own cached/generated code, maybe a similar technique could be helpful for you.

Do you want to create submodules for arbitrary modules or only for modules using a certain library or language?

When its the latter I think its a community coordination issue.

I can understand wanting to encapsulate everything but it seems that module names are expected to be public interned symbols in the current implementation:
https://docs.racket-lang.org/reference/Module_Names_and_Loading.html?q=module#%28def._%28%28quote._~23~25kernel%29._make-resolved-module-path%29%29

"A resolved module path is interned. That is, if two resolved module path values encapsulate paths that are equal?, then the resolved module path values are eq?."

I think gensyms are not allowed, because they are not interned.

I also would find it interesting to have something functor like, in the sense of being able to create parameterized module instances.

My guess is that constructs like that are difficult to optimize and the separation between runtime and compile time can become extremely blurry.

To the point that certain dynamic constructs would cause big chunks of code to become ready for compiling at run-time only and at that time an interpreter might be faster.

These are just my intuitions, I have only limited experience with modules and functors with my own toy language experiments.

Maybe if you need very dynamic/custom behavior, instead of generating module files:

You could create a program which creates and evaluates certain in memory generated module-forms at runtime, attaches them to a namespace and then runs the other code with that namespace.
Allowing that code or your library to require and interact with the dynamically generated module.

So far my namespace usage was limited to attaching certain modules, but I think this is possible.

Think of it as a custom program launcher that acts like a "scripted"/automated interactive repl session.

Simon

George Neuner

unread,

May 23, 2020, 6:40:37 PM5/23/20

to racket users

On 5/22/2020 11:11 PM, rocketnia wrote:
> I've been thinking about making libraries that would generate
> submodules when they're used. However, submodules exist in a flat
> namespace, I'm a bit afraid of conflicts if I choose the same name as
> some other library does, and I don't really want users to have to
> supply their own local choices of names (`rename-in` style) since I'm
> thinking of these submodules as an implementation detail.

Why not name your modules using UUIDs or similar generated random
strings? You can't ever be certain that a generated name won't
conflict, but you can make the probability very, very small.

George

Philip McGrath

unread,

May 23, 2020, 7:16:43 PM5/23/20

to Simon Schlee, Racket Users

On Sat, May 23, 2020 at 5:04 PM Simon Schlee <schle...@gmail.com> wrote:

I also would find it interesting to have something functor like, in the sense of being able to create parameterized module instances.
My guess is that constructs like that are difficult to optimize and the separation between runtime and compile time can become extremely blurry.
To the point that certain dynamic constructs would cause big chunks of code to become ready for compiling at run-time only and at that time an interpreter might be faster.

In fact, Racket has such a construct, units. They are first-class values, with run-time operations to link and invoke them, and they allowing for cyclic dependencies and multiple instantiations of a given unit.

One of the major design goals of units was support for separate compilation (units predate `module` in Racket), which may rule them out for Nia's goal of modules with "optional compile-time arguments" per se.

I guess I'm interested in what sorts of things these optional arguments might be used for: if dealing with the arguments could instead be done at link- or invoke-time, you might be able to use units to implement this module system. Regardless, I recommend anyone thinking about implementing a unit-like system look closely at units first: even when I ultimately ended up implementing my own unit-like system to meet a specific need (this was part of my RacketCon talk), my experience with units was very valuable in doing so.

-Philip

rocketnia

unread,

May 24, 2020, 12:04:44 AM5/24/20

to Racket Users

Thanks everyone for the perspectives and techniques you've offered so far.

I've found a flaw in my gensym technique, even at the command line. If I run "raco make badlang.rkt", "raco make badlibrary.rkt", and "raco make client.rkt", the last command has an error. That's because the gensym is marshaled into both badlang_rkt.zo and badlibrary_rkt.zo. When "raco make client.rkt" unmarshals both those files, it gets two different gensyms.

I suppose I would've liked badlibrary_rkt.zo to marshal the gensym in a way that "remembered" which gensym in badlang_rkt.zo it came from. Perhaps that's not how gensym marshaling works. :)

So I don't think DrRacket's behavior shows any signs of a problem. In fact, it alerted me to the problem sooner than the command line did.

On Saturday, May 23, 2020 at 2:04:00 PM UTC-7, Simon Schlee wrote:

I think pollen uses racket's compiled directories to store some of its own cached/generated code, maybe a similar technique could be helpful for you.

Do you want to create submodules for arbitrary modules or only for modules using a certain library or language?
When its the latter I think its a community coordination issue.

The latter. My compile-time-arguments system wouldn't need to interact with modules that don't receive or supply arguments. The modules that do receive or supply these arguments would refer to operations my library, so I can generate the code I need during those modules' usual compilation process. So I don't foresee any need to create files in the compiled/ directory or (as you mention later on) to generate code at run time.

I suppose I did consider using `eval` or `dynamic-require` so that I could use `parameterize` to supply my compile-time arguments. But I do consider it important that the module is actually compiled with those arguments, and that the compilation result can be reused rather than being generated each time. A module that supplies a lot of compile-time arguments to other modules might take a while to compile (since it has to compile customized variations of its dependencies as well), but at least that cost isn't paid at run time.

On Saturday, May 23, 2020 at 2:04:00 PM UTC-7, Simon Schlee wrote:

I can understand wanting to encapsulate everything but it seems that module names are expected to be public interned symbols in the current implementation:
https://docs.racket-lang.org/reference/Module_Names_and_Loading.html?q=module#%28def._%28%28quote._~23~25kernel%29._make-resolved-module-path%29%29
"A resolved module path is interned. That is, if two resolved module path values encapsulate paths that are equal?, then the resolved module path values are eq?."
I think gensyms are not allowed, because they are not interned.

The resolved module path as a whole is interned, but that doesn't mean every symbol within it is interned.

As the part you quoted says, the resolved paths are interned in the specific sense that if the paths to encapsulate are `equal?`, then the resolved paths are `eq?`.

If I pass in paths that contain two distinct uninterned symbols, those paths aren't `equal?`, so the documentation isn't implying that the resolved paths would be `eq?`. Currently, they're not `eq?`:

(define foo

(make-resolved-module-path (list 'mycollection/mylib (string->uninterned-symbol "x"))))
(define bar

(make-resolved-module-path (list 'mycollection/mylib (string->uninterned-symbol "x"))))
(eq? foo bar)

; returns #f

I suppose I'm not really sure what good it is to use an uninterned symbol as a submodule name. It seems that after the module is compiled, one of the only ways another part of the system can do anything with that submodule is by digging into the code using `module-compiled-submodules`. Perhaps a module can refer to its *own* gensym-named submodules at least.

On Saturday, May 23, 2020 at 3:40:37 PM UTC-7, gneuner2 wrote:

Why not name your modules using UUIDs or similar generated random
strings? You can't ever be certain that a generated name won't
conflict, but you can make the probability very, very small.

That could be good.

For Racket purposes, I'm not quite so worried about namespace collisions that I'd ask people to "raco pkg install <uuid>" to get my packages or (require <uuid>) to get my collections. Likewise, I probably wouldn't use a UUID for this submodule name. I just thought that if the Racket module system already offered a way to guarantee noncolliding submodule names, I oughta use it.

Instead, I'll probably settle for something like `private/generated-by/mycollection/mymodule`. This way, another library usually won't conflict with it unless it has its own `mycollection/mymodule` module that conflicts with mine at installation time. The rest of the name makes it pretty clear that it's not a stable interface for the public, and it gives the user a small clue to understand how this invader ended up in their module.

I've actually thought about how units could play into this! If this system had seamless support for optional run-time arguments alongside the compile-time ones, then the result of supplying those run time arguments would need to be something like a unit. And maybe the run-time entity those arguments are supplied *to* would need to be something like a unit as well. Actual Racket units could be involved, or maybe there'd merely be a "unit-like system" like you say.

But I don't expect to build that feature anytime soon. I've only thought it through well enough to get a rough idea of how it would fit into the design. In the meantime, normal Racket run time abstractions exist, including functions and units, and people can use those. :)

In the short term, I don't want to put very much work into this compile-time arguments system; there are other projects I feel I should be applying that attention to. I'm aiming to build just the minimal set of functionality I need, while giving it enough room to grow into something better later on. The reason I mentioned it here was to help demonstrate a concrete reason I was interested in private submodules (and this technique that turned out to be flawed).