To clarify for the broader crowd, to me this quote about (module) instances just describes an aspect of proper modularity, which naturally consists of two dual properties:
1. A module can only access what it (is given as) imports.
2. A client can only access what a module exports.
Of course, whether this implies an “ocap" system also depends on what exactly “what” quantifiers over. Formally this usually refers to free names or variables or addresses, which of course could still be side-stepped by a language providing ambient capabilities in an unnamed fashion, e.g. as primitive instructions.
Ben Titzer (CC’ed) initially prototyped Wasm that way, and as far as he told me, he consciously decided not to include any such primitives, even though he didn’t frame it as ocap. Given the diversity of environments that Wasm is supposed to be embeddable in that is almost a necessity: there practically is no interesting (non-computational) resource that can be assumed to exist in all possible embeddings, and hence you cannot build any into the language even if you wanted. In particular, Wasm shall neither depend on the web nor JavaScript nor any specific form of OS.
(A minor quibble I have with the quote is formulating the “what” as “effects”, because under the usual semantic interpretation of the term, a Wasm computation can still have observable effects, such as traps, non-determinism, or non-termination. But these (hopefully) are benign effects wrt security considerations.)
/Andreas
I suspect that most people in the room just
To clarify for the broader crowd, to me this quote about (module) instances just describes an aspect of proper modularity, which naturally consists of two dual properties:
1. A module can only access what it (is given as) imports.
2. A client can only access what a module exports.
Of course, whether this implies an “ocap" system also depends on what exactly “what” quantifiers over. Formally this usually refers to free names or variables or addresses, which of course could still be side-stepped by a language providing ambient capabilities in an unnamed fashion, e.g. as primitive instructions.
Ben Titzer (CC’ed) initially prototyped Wasm that way, and as far as he told me, he consciously decided not to include any such primitives, even though he didn’t frame it as ocap. Given the diversity of environments that Wasm is supposed to be embeddable in that is almost a necessity: there practically is no interesting (non-computational) resource that can be assumed to exist in all possible embeddings, and hence you cannot build any into the language even if you wanted. In particular, Wasm shall neither depend on the web nor JavaScript nor any specific form of OS.
(A minor quibble I have with the quote is formulating the “what” as “effects”, because under the usual semantic interpretation of the term, a Wasm computation can still have observable effects, such as traps, non-determinism, or non-termination. But these (hopefully) are benign effects wrt security considerations.)
/Andreas
I suspect that most people in the room just
From: Andreas Rossberg <ross...@mpi-sws.org>
Date: Sat, Nov 4, 2017 at 8:23 AM
Subject: Re: WASM and ocaps
I suppose I don’t fully understand why private global state would break ocap, assuming that you can control references to classes/modules like all other references.
On Sat, Nov 4, 2017 at 1:17 AM, Andreas Rossberg <ross...@mpi-sws.org> wrote:To clarify for the broader crowd, to me this quote about (module) instances just describes an aspect of proper modularity, which naturally consists of two dual properties:
1. A module can only access what it (is given as) imports.
2. A client can only access what a module exports.At the granularity of wasm modules, wasm is still not an ocap system for the very reason that multiple wasm modules within a wasm instance share a linear address space and can step on each other's data freely. This violates your criteria #1.
Also, criteria #1 is insufficient anyway as it admits the module system itself as a global communications channel. Worse, if a module X, i.e., the thing imported, can have mutable state, then two other modules Y and Z can communicate with each other merely by both importing module X.
On your criteria #2, exports to whom? What can import what a given module exports?
Hi Mark,
to summarise, you want to be able to pass table elements or memory contents between Wasm functions without them sharing a “global" table or memory. That makes sense, and I agree that the inability to do so (short of the GC proposal) is a hole.
However, as is, your concrete proposal probably isn't the best fit for WebAssembly, because it involves quite a bit of ad-hoc magic (implicit type coercions, implicit identity checks, implicit allocation callbacks, even a form of dependent typing) that doesn’t quite match the low-level nature of Wasm. Moreover, it abuses module-internal(!) table/memory ids with cross-module semantics in a way that is brittle at best — you should think of a module’s index spaces as a purely local naming mechanism whose order should be unobservable externally.
That said, I see at least two possible options for simplifying it:
1. Multiple tables / memories (plus instructions for copying/clearing elements). Two functions could pass back and forth data by sharing an auxiliary table/memory for passing arguments/results. A caller would copy its args there, the callee would use them or copy them as it sees fit. Vice versa for results. Essentially, exporting a function that takes or returns resources would always be paired with exporting an associated scratch table/memory.
Advantage: Simple and general mechanism; we want to add this anyway.
Disadvantage: Requires copying twice if the receiver needs to keep the values (no obvious way to optimise that in an engine)
2. The ability to form a read-only “slice” of a table or memory. Such a slice would be a first-class value type (a kind of bounds-checked pointer) that would allow accessing the contained elements but doesn’t provide a reference to the rest of a table/memory. A caller would form a slice, the callee can use or copy the contents as it sees fit. (This is a sort of generalisation of your proposal, but minus any magic.)
Advantage: More efficient than option (1) in various cases; if done right might be a basis for unraveling the bindings proposal.
Disadvantage: More complexity; requires new types, instructions for creating and accessing slices; slice values can keep tables/memories alive, might require reference counting.
As an aside, I note that in general you probably need the ability to pass lists of elements, not just individual ones, so both these options support that.
/Andreas
Hi Mark,
to summarise, you want to be able to pass table elements or memory contents between Wasm functions without them sharing a “global" table or memory. That makes sense, and I agree that the inability to do so (short of the GC proposal) is a hole.
However, as is, your concrete proposal probably isn't the best fit for WebAssembly, because it involves quite a bit of ad-hoc magic (implicit type coercions, implicit identity checks, implicit allocation callbacks, even a form of dependent typing) that doesn’t quite match the low-level nature of Wasm. Moreover, it abuses module-internal(!) table/memory ids with cross-module semantics in a way that is brittle at best — you should think of a module’s index spaces as a purely local naming mechanism whose order should be unobservable externally.
That said, I see at least two possible options for simplifying it:
1. Multiple tables / memories (plus instructions for copying/clearing elements). Two functions could pass back and forth data by sharing an auxiliary table/memory for passing arguments/results. A caller would copy its args there, the callee would use them or copy them as it sees fit. Vice versa for results. Essentially, exporting a function that takes or returns resources would always be paired with exporting an associated scratch table/memory.
Advantage: Simple and general mechanism; we want to add this anyway.
Disadvantage: Requires copying twice if the receiver needs to keep the values (no obvious way to optimise that in an engine)
2. The ability to form a read-only “slice” of a table or memory. Such a slice would be a first-class value type (a kind of bounds-checked pointer) that would allow accessing the contained elements but doesn’t provide a reference to the rest of a table/memory. A caller would form a slice, the callee can use or copy the contents as it sees fit. (This is a sort of generalisation of your proposal, but minus any magic.)
Advantage: More efficient than option (1) in various cases; if done right might be a basis for unraveling the bindings proposal.
Disadvantage: More complexity; requires new types, instructions for creating and accessing slices; slice values can keep tables/memories alive, might require reference counting.
As an aside, I note that in general you probably need the ability to pass lists of elements, not just individual ones, so both these options support that.
/Andreas
> > to summarise, you want to be able to pass table elements or memory contents between Wasm functions without them sharing a “global" table or memory. That makes sense, and I agree that the inability to do so (short of the GC proposal) is a hole.
>
> > However, as is, your concrete proposal probably isn't the best fit for WebAssembly, because it involves quite a bit of ad-hoc magic (implicit type coercions, implicit identity checks, implicit allocation callbacks, even a form of dependent typing) that doesn’t quite match the low-level nature of Wasm. Moreover, it abuses module-internal(!) table/memory ids with cross-module semantics in a way that is brittle at best — you should think of a module’s index spaces as a purely local naming mechanism whose order should be unobservable externally.
>
> These all sound like criticisms I would also like to see solved, thanks. In particular, I fully agree it is important to maintain a principled stance on encapsulation and unobservability. The "dependent typing" in your list surprised me. How does my proposed mechanism touch on that?
A type like mem(memid) contains a reference to a value-level entity, whose identity is dynamic (e.g., it might be imported). Technically, that’s a dependent type, and it’s not entirely clear to me how to handle it during type checking without the index space conflation I mentioned.
> Nevertheless, from your response, it achieved its important goal: to explain what's missing, and demonstrate that the mechanism needed to address it should both subsume the core host binding mechanism, and be at least as simple as the host binding proposal.
>
> However, from your suggestions below I am not sure that I've explained adequately what's missing, or rather, what goals the new mechanism should achieve. Let's take the function-calling form of the canonical Granovetter-diagram example:
>
> Say there are three functions, alice, bob, and carol, that appear in three module instances, A, B, and C, that have no sharing relationships beyond those implied by the following scenario. In the initial conditions alice, and therefore A, has access to (is able to call) bob, as exported from B. alice, and therefore A, also has access to carol as exported from C. In these initial conditions, B has no access to carol and C has no access to bob.
>
> alice says:
>
> bob(carol)
>
> transferring control to bob in B, giving bob, and therefore B, the ability to call carol as exported from C.
This would be the simplest example of a higher-order function. Let’s call a function that takes or returns “resources” a resourceful function. Resources include host objects, but also functions or tables themselves. In your example, bob is a 2nd-order resourceful function, because it takes another resourceful function as argument.
Using option (1) in the general case means performing a simple form of “lowering" of resourceful functions to Wasm’s more primitive functions plus tables. This is completely compositional and mechanical. Best understood by simply looking at the types of these functions.
As mentioned, a 1st-order resourceful function type like
(obj, obj, obj, i32) -> i32
(a function taking three opaque host objects as arguments plus an int and returning an int) would be lowered to a pair
((i32 -> i32), table(obj))
of the simplified function (taking only the int) and a table over opaque host objects (of size 3 in this case). So concretely, instead of exporting just one function a module would export a function and a table. In general, though, the result of lowering is a tuple, because tables are typed, so that you'll need a separate table for each different type of argument.
That generalises to the higher-order case straightforwardly. For the sake of example and differentiation, let’s assume the resourceful types of your three functions are:
alice : (obj, i64) -> i64
bob : (obj -> i32) -> f32
carol : obj -> i32
Note how bob takes a function of carol’s type as argument, which corresponds to the call you want to perform. The types of alice and bob are again just 1st-order resourceful function types that lower as before:
alice : (i64 -> i64, table(obj))
carol : (() -> i32, table(obj))
The type of bob is resourceful as well, because a function is itself a resource. But its 2nd-order. In a first step its parameter type lowers recursively to yield a pair of arguments:
bob : (() -> i32, table(obj)) -> f32
Both these arguments are still resource types, so the overall type further lowers to the tuple
bob : (() -> f32, table(() -> i32)), table(table(obj)))
We can already put functions into tables, so all we need here is the ability to also put tables themselves into other tables (they are resources, too). In fact, we could already do that indirectly with obj, but you'll want a type transparent to Wasm in this case.
As said, this lowering composes at arbitrary order, but I doubt anything beyond 2nd order will show up much in practice.
> To represent the initial conditions of my scenario in terms of your option #1, I take it that there would be a table AB shared between A and B, used by alice to pass parameters like carol to bob, and a table AC between A and C, supporting alice's ability to call carol. In the initial conditions, there would be no table BC between B and C, since in the initial conditions neither B nor C have any need to call anything in the other.
Right.
> For alice to call bob passing bob access to carol, alice would store something into the table AB at index ci, that provides the same ability to call carol that alice has. alice would then pass ci in that argument position to bob. bob would then (somehow) know to look up carol at index ci in table AB.
>
> However, if carol is also a function like bob that takes such parameters, then B and C would now need to share a table BC, for arguments that bob might pass to carol. Within option #1, how could this come about?
To summarise the above, roughly speaking, by using the table you call BC to also pass the table AC to bob, not just the function. However, because of types, you actually have to split that BC into two separate tables, as shown above.
Does that make sense? Really, this is just a fairly conventional form of lowering, similar e.g. to how a C compiler might convert functions that take/return structs by value to functions that instead pass pointers to temporary local buffers.
/Andreas
> > to summarise, you want to be able to pass table elements or memory contents between Wasm functions without them sharing a “global" table or memory. That makes sense, and I agree that the inability to do so (short of the GC proposal) is a hole.
>
> > However, as is, your concrete proposal probably isn't the best fit for WebAssembly, because it involves quite a bit of ad-hoc magic (implicit type coercions, implicit identity checks, implicit allocation callbacks, even a form of dependent typing) that doesn’t quite match the low-level nature of Wasm. Moreover, it abuses module-internal(!) table/memory ids with cross-module semantics in a way that is brittle at best — you should think of a module’s index spaces as a purely local naming mechanism whose order should be unobservable externally.
>
> These all sound like criticisms I would also like to see solved, thanks. In particular, I fully agree it is important to maintain a principled stance on encapsulation and unobservability. The "dependent typing" in your list surprised me. How does my proposed mechanism touch on that?
A type like mem(memid) contains a reference to a value-level entity, whose identity is dynamic (e.g., it might be imported). Technically, that’s a dependent type, and it’s not entirely clear to me how to handle it during type checking without the index space conflation I mentioned.
> Nevertheless, from your response, it achieved its important goal: to explain what's missing, and demonstrate that the mechanism needed to address it should both subsume the core host binding mechanism, and be at least as simple as the host binding proposal.
>
> However, from your suggestions below I am not sure that I've explained adequately what's missing, or rather, what goals the new mechanism should achieve. Let's take the function-calling form of the canonical Granovetter-diagram example:
>
> Say there are three functions, alice, bob, and carol, that appear in three module instances, A, B, and C, that have no sharing relationships beyond those implied by the following scenario. In the initial conditions alice, and therefore A, has access to (is able to call) bob, as exported from B. alice, and therefore A, also has access to carol as exported from C. In these initial conditions, B has no access to carol and C has no access to bob.
>
> alice says:
>
> bob(carol)
>
> transferring control to bob in B, giving bob, and therefore B, the ability to call carol as exported from C.
This would be the simplest example of a higher-order function. Let’s call a function that takes or returns “resources” a resourceful function. Resources include host objects, but also functions or tables themselves. In your example, bob is a 2nd-order resourceful function, because it takes another resourceful function as argument.
Using option (1) in the general case means performing a simple form of “lowering" of resourceful functions to Wasm’s more primitive functions plus tables. This is completely compositional and mechanical. Best understood by simply looking at the types of these functions.
As mentioned, a 1st-order resourceful function type like
(obj, obj, obj, i32) -> i32
(a function taking three opaque host objects as arguments plus an int and returning an int) would be lowered to a pair
((i32 -> i32), table(obj))
of the simplified function (taking only the int) and a table over opaque host objects (of size 3 in this case). So concretely, instead of exporting just one function a module would export a function and a table. In general, though, the result of lowering is a tuple, because tables are typed, so that you'll need a separate table for each different type of argument.
That generalises to the higher-order case straightforwardly. For the sake of example and differentiation, let’s assume the resourceful types of your three functions are:
alice : (obj, i64) -> i64
bob : (obj -> i32) -> f32
carol : obj -> i32
Note how bob takes a function of carol’s type as argument, which corresponds to the call you want to perform. The types of alice and bob are again just 1st-order resourceful function types that lower as before:
alice : (i64 -> i64, table(obj))
carol : (() -> i32, table(obj))
The type of bob is resourceful as well, because a function is itself a resource. But its 2nd-order. In a first step its parameter type lowers recursively to yield a pair of arguments:
bob : (() -> i32, table(obj)) -> f32
Both these arguments are still resource types, so the overall type further lowers to the tuple
bob : (() -> f32, table(() -> i32)), table(table(obj)))
We can already put functions into tables, so all we need here is the ability to also put tables themselves into other tables (they are resources, too). In fact, we could already do that indirectly with obj, but you'll want a type transparent to Wasm in this case.
As said, this lowering composes at arbitrary order, but I doubt anything beyond 2nd order will show up much in practice.
> To represent the initial conditions of my scenario in terms of your option #1, I take it that there would be a table AB shared between A and B, used by alice to pass parameters like carol to bob, and a table AC between A and C, supporting alice's ability to call carol. In the initial conditions, there would be no table BC between B and C, since in the initial conditions neither B nor C have any need to call anything in the other.
Right.
> For alice to call bob passing bob access to carol, alice would store something into the table AB at index ci, that provides the same ability to call carol that alice has. alice would then pass ci in that argument position to bob. bob would then (somehow) know to look up carol at index ci in table AB.
>
> However, if carol is also a function like bob that takes such parameters, then B and C would now need to share a table BC, for arguments that bob might pass to carol. Within option #1, how could this come about?
To summarise the above, roughly speaking, by using the table you call BC to also pass the table AC to bob, not just the function. However, because of types, you actually have to split that BC into two separate tables, as shown above.
Does that make sense? Really, this is just a fairly conventional form of lowering, similar e.g. to how a C compiler might convert functions that take/return structs by value to functions that instead pass pointers to temporary local buffers.
/Andreas
> On Nov 21, 2017, at 00:53 , Mark Miller <eri...@gmail.com> wrote:
> On Fri, Nov 17, 2017 at 11:33 AM, Andreas Rossberg <ross...@mpi-sws.org> wrote:
> >As mentioned, a 1st-order resourceful function type like
> >
> > (obj, obj, obj, i32) -> i32
> >
> >(a function taking three opaque host objects as arguments plus an int and returning an int) would be lowered to a pair
> >
> > ((i32 -> i32), table(obj))
> >
> >of the simplified function (taking only the int) and a table over opaque host objects (of size 3 in this case). So concretely, instead of exporting just one function a module would export a function and a table. In general, though, the result of lowering is a tuple, because tables are typed, so that you'll need a separate table for each different type of argument.
>
> So when a caller invokes a callee of this type, the caller would copy the three obj arguments into this table, and the callee would immediately copy out before doing any other function calls, to avoid reentrancy hazards? In this sense, the table is serving as parameter passing registers for calling this one function?
Yes. Of course, stateful protocols are nasty, so you have to be careful if you want to achieve reentrancy and not leak information. I should have expected your follow-up question. :)
> Currently, if modules D and E both import a function f from module F, D and E can then interact to the extent that f's behavior allows. They can both call f but do not have and direct shared mutable state.
>
> Assume the above is the type of f. With this lowering, both D and E would import this tuple, giving them shared access to this one table. After D uses this table to call f, can E read the table to obtain whatever D passed to f?
Well, not if you make it part of the parameter passing protocol that f clears the buffer before returning or calling out to a third party. That is necessary if you want to support or guard against reentrancy.
> >That generalises to the higher-order case straightforwardly. For the sake of example and differentiation, let’s assume the resourceful types of your three functions are:
> >
> > alice : (obj, i64) -> i64
> > bob : (obj -> i32) -> f32
> > carol : obj -> i32
>
> Good. This corresponds exactly to the high level types I have in mind for bob and carol. The type of alice is irrelevant, right? You just provided a type because alice must have some type?
Yes, that’s right.
> To restate our same confusion, with these signatures, do we agree that A (and therefore alice) should never get access to the obj that bob passes to carol?
Yes.
> >Note how bob takes a function of carol’s type as argument, which corresponds to the call you want to perform. The types of alice and bob are again just 1st-order resourceful function types that lower as before:
> >
> > alice : (i64 -> i64, table(obj))
> > carol : (() -> i32, table(obj))
> >
> >The type of bob is resourceful as well, because a function is itself a resource. But its 2nd-order. In a first step its parameter type lowers recursively to yield a pair of arguments:
> >
> > bob : (() -> i32, table(obj)) -> f32
> >
> >Both these arguments are still resource types, so the overall type further lowers to the tuple
> >
> > bob : (() -> f32, table(() -> i32)), table(table(obj)))
>
> When alice imports bob, is alice obtaining a table of tables of obj? Or is alice passing such a table to bob on invoking him. I think the former.
A imports from B the function and the two buffer tables. When it invokes the function it first fills the tables. In the case of the second table, it fills it with a reference to a third table.
> For all the tables above, what arity do they have when? What do they actually contain when? Bradley and I were imagining that, for this scenario, they are at most singleton tables. Is this right?
Yes, the fact that a table can have more than one entry is not essential for this. You could use a separate singleton table for each parameter, since their number is statically known. But as the earlier example shows combining multiple parameters of the same type into a single table is an obvious optimisation.
(Also, I could imagine examples where you need to pass lists of things.)
> Can you step us through the lowered implementation of the following dynamic sequence of actions, restating as a corresponding dynamic sequence of actions? Assume nothing is being passed other than what is stated.
>
> embedder: instantiate module B, providing obj1 as B's "obj" import. B exports bob as "bob”.
It exports the function “bob” plus the associated parameter-passing tables, say as “bob-param1-buf” and “bob-param2-buf”.
> embedder: instantiate module C providing no imports. C exports carol as "carol”.
It exports the function “carol” and the associated table, say “carol-param-buf”.
> embedder: instantiate module A providing bob as "bob" and carol as "carol". A exports alice.
A imports, and is provided with, all three entities that B exports. It exports the function “alice” and the associated table, say “alice-param-buf".
> embedder: call alice with arguments (obj2, 77).
It sets alice-param-buf[0] to obj2 and calls alice(77).
Alice copies obj2 somewhere else and unsets alice-param-buf[0].
> alice: call bob with arguments (carol).
Alice sets bob-param1-buf[0] to the function carol, and bob-param2-buf[0] to (a reference to) table carol-param-buf, and then calls bob().
Bob copies both arguments somewhere else and clears the tables.
> bob: call carol with arguments (obj1).
Bob sets carol-param-buf[0] (as previously received and saved) to obj1 and calls carol().
Carol unsets the table entry.
(I’m not sure if this was meant to be the same obj1 that was used as an import descriptor above? If so, I’m not sure why the two should be related.)
> embedder: call alice again with (obj2, 77). Can alice obtain obj1?
Not if carol-param-buf[0] was properly unset (nulled out) by carol after receiving it, as shown in the previous step.
The setting and clearing tables in this protocol is kind of brittle, as all stateful approaches. If that is not reliable enough, there would be another option based on a fairly simple language extension: introduce read-only and write-only views of tables/memories as supertypes and allow a module to export as a supertype. Then all the table types in my examples would be write-only tables and only the originating module could read them. So D and E in your earlier example could not use F's table as a communication channel between them, since only F can read it. Similarly, alice could not read obj1 out of carol's table, even if it wasn't cleared by carol.
(This restriction is easy to achieve with types, but unfortunately it is less clear how to elegantly reflect such type restrictions in an untyped embedder API like the JS one.)
/Andreas
> Currently, if modules D and E both import a function f from module F, D and E can then interact to the extent that f's behavior allows. They can both call f but do not have and direct shared mutable state.
>
> Assume the above is the type of f. With this lowering, both D and E would import this tuple, giving them shared access to this one table. After D uses this table to call f, can E read the table to obtain whatever D passed to f?
Well, not if you make it part of the parameter passing protocol that f clears the buffer before returning or calling out to a third party. That is necessary if you want to support or guard against reentrancy.
> To restate our same confusion, with these signatures, do we agree that A (and therefore alice) should never get access to the obj that bob passes to carol?
Yes.
> embedder: instantiate module B, providing obj1 as B's "obj" import. B exports bob as "bob”.
It exports the function “bob” plus the associated parameter-passing tables, say as “bob-param1-buf” and “bob-param2-buf”.
> embedder: instantiate module C providing no imports. C exports carol as "carol”.
It exports the function “carol” and the associated table, say “carol-param-buf”.
> embedder: instantiate module A providing bob as "bob" and carol as "carol". A exports alice.
A imports, and is provided with, all three entities that B exports. It exports the function “alice” and the associated table, say “alice-param-buf".
> embedder: call alice with arguments (obj2, 77).
It sets alice-param-buf[0] to obj2 and calls alice(77).
Alice copies obj2 somewhere else and unsets alice-param-buf[0].
> alice: call bob with arguments (carol).
Alice sets bob-param1-buf[0] to the function carol, and bob-param2-buf[0] to (a reference to) table carol-param-buf, and then calls bob().
Bob copies both arguments somewhere else and clears the tables.
> bob: call carol with arguments (obj1).
Bob sets carol-param-buf[0] (as previously received and saved) to obj1 and calls carol().
Carol unsets the table entry.
(I’m not sure if this was meant to be the same obj1 that was used as an import descriptor above? If so, I’m not sure why the two should be related.)
> embedder: call alice again with (obj2, 77). Can alice obtain obj1?
Not if carol-param-buf[0] was properly unset (nulled out) by carol after receiving it, as shown in the previous step.
The setting and clearing tables in this protocol is kind of brittle, as all stateful approaches. If that is not reliable enough, there would be another option based on a fairly simple language extension: introduce read-only and write-only views of tables/memories as supertypes and allow a module to export as a supertype. Then all the table types in my examples would be write-only tables and only the originating module could read them. So D and E in your earlier example could not use F's table as a communication channel between them, since only F can read it. Similarly, alice could not read obj1 out of carol's table, even if it wasn't cleared by carol.
> On Nov 21, 2017, at 21:19 , Mark Miller <eri...@gmail.com> wrote:
> I agree this guards against reentrancy. What about concurrency? IIUC, right now wasm threads only interact on mems as shared array buffers, where each wasm thread is in its own worker. Thus, right now, each module instance is in the one thread it was born into. Right now, there is no mechanism to share tables across threads, and thus each table is in the thread it was born into. Given these constraints, we don't need to worry about concurrent access to tables.
>
> But I don't understand our plans for concurrency in wasm. In some of these plans, is there a danger that, while bob is transferring obj1 to carol, that alice might have spawned a thread that might be able to read obj1 out of the table before carol clears that table entry? If so, and if there's no reasonable way to fix it, then we can (further) disqualify option #1.
>
> Given the transient manner in which these table entries are used, they seem most analogous to cpu registers used for parameter passing. All the copying and clearing is analogous to the reuse of registers across function calls. In other ways, of course, they are not analogous: they are statically allocated per exported function rather than per core, and there's no mechanism to give each thread its own registers for concurrent calls to the same function. Hence this new worry.
I think the way concurrency currently works (at least in the web embedding), you need to instantiate the modules separately in each thread (i.e., worker), so you already create separate per-thread copies of all the tables. But I’m not a 100% sure about respective future plans either.
> Just being fully careful here to ensure I fully understand. A also imports both objects that C exports, the carol function and carol-param-buf, right?
Ah, right.
> >The setting and clearing tables in this protocol is kind of brittle, as all stateful approaches. If that is not reliable enough, there would be another option based on a fairly simple language extension: introduce read-only and write-only views of tables/memories as supertypes and allow a module to export as a supertype. Then all the table types in my examples would be write-only tables and only the originating module could read them. So D and E in your earlier example could not use F's table as a communication channel between them, since only F can read it. Similarly, alice could not read obj1 out of carol's table, even if it wasn't cleared by carol.
>
> I see how this should work, and how it even prevents the specific concurrency danger above. But it has a dual concurrency danger: if alice can write into the buffers while bob's call to carol is in progress, she might be able to cause carol to receive obj2 rather than obj1 during this call from bob.
Yes, but I think any approach based on tables, and thus state, would have a potential problem with concurrency if you didn’t ensure that they are per-thread. I believe that is the case even for your original proposal.
/Andreas
Yes, but I think any approach based on tables, and thus state, would have a potential problem with concurrency if you didn’t ensure that they are per-thread. I believe that is the case even for your original proposal.
On 11/22/2017 06:00 AM, Mark Miller wrote:
In my original proposal, there is the mysterious allocation of a new index into a table. As long as this is atomic, i.e. that two concurrent allocations of a next index into the same table each get a unique index, then I don't see how my proposal suffers from concurrency hazards. Am I missing something? I ask not because I'm going back to advocating my original proposal, but because I want to understand the nature of the hazards that any proposal might have to think about.
Couldn't another thread come along and mutate the _source_ table before the callee is done copying from it?
On 11/22/2017 06:00 AM, Mark Miller wrote:
In my original proposal, there is the mysterious allocation of a new index into a table. As long as this is atomic, i.e. that two concurrent allocations of a next index into the same table each get a unique index, then I don't see how my proposal suffers from concurrency hazards. Am I missing something? I ask not because I'm going back to advocating my original proposal, but because I want to understand the nature of the hazards that any proposal might have to think about.
Couldn't another thread come along and mutate the _source_ table before the callee is done copying from it?
/Andreas
--between B and C that A should not have been able to influence. This is the kind of influence-over-another that I don't think my original proposal is vulnerable to.Cheers,
--MarkM
On 11/22/2017 05:41 PM, Mark Miller wrote:
On Wed, Nov 22, 2017 at 7:13 AM, Andreas Rossberg <ross...@mpi-sws.org <mailto:ross...@mpi-sws.org>> wrote:
On 11/22/2017 06:00 AM, Mark Miller wrote:
In my original proposal, there is the mysterious allocation of a
new index into a table. As long as this is atomic, i.e. that two
concurrent allocations of a next index into the same table each
get a unique index, then I don't see how my proposal suffers
from concurrency hazards. Am I missing something? I ask not
because I'm going back to advocating my original proposal, but
because I want to understand the nature of the hazards that any
proposal might have to think about.
Couldn't another thread come along and mutate the _source_ table
before the callee is done copying from it?
[...]
But this all may be besides the point you're actually raising. Is your concern that another thread in the caller, while the call is in progress, might overwrite that table entry in the caller's clist before the call mechanism itself copies it?
Yes, sorry for wording it sloppily.
This is only a problem of the caller messing themselves up, which is a hazard that can lead to vulnerabilities but is not itself a vulnerability.
Are you assuming that all callers follow the same strict conventions regarding their tables? In a hypothetical multi-threaded setting, other threads may have valid reasons to mutate a table. Clearly, that would demand for synchronisation, but it's not clear how source table mutations can be synchronised with calls under your proposal. The caller could take a lock on the table but it has no way of releasing it timely. AFAICS synchronisation would only work if it was part of the built-in parameter-passing mechanism, i.e., it would need to provide hooks for locking somehow -- which sounds scary.
/Andreas