Draft smart contract host functions CAP (CAP-0051)

Jay Geng

unread,

May 20, 2022, 5:57:45 PM5/20/22

to Stellar Developers

Hello everyone:

I just published a draft protocol for smart contract host functions:

https://github.com/stellar/stellar-protocol/blob/master/core/cap-0051.md

The host functions builds off of the core concepts introduced in CAP-0046 , and this CAP expands the host object repertoire introduced in CAP-0046. So you may find yourself referring back to it often.

Feedback is welcome, especially around the choice of the host function set. As mentioned in the CAP, they are not meant to be an official final set, rather a starting point for the discussion around the issue.

If anything is unclear, please don't hesitate to ask for clarification.

Cheers!

Jay

Leigh McCulloch

unread,

May 23, 2022, 5:59:41 PM5/23/22

to Jay Geng, Stellar Developers

Hi Jay,

The surface area of these host fns looks good for a first pass, assuming that we'll be adding host fns that interact with the ledger in a separate CAP.

SCO_HASH

What's the use case for making hash a first-class type? I think we should omit this type and simply use SCO_BINARY. I think if there is any specialization of binary types we should probably maintain that in the SDK and not in the host type system.

an arbitrary precision big integer number

Will BigInt's be truly arbitrary precision, or just some very high precision? Is CAP-51 the place we will state the upper bound on the precision, or will that come later?

SCO_PUBLIC_KEY

What's the use case for making public key a first-class type? Will account IDs appear as public keys or is that likely to be another type in the future?

If we make SCO_PUBLIC_KEY a first class type we probably also need functions for creating a public key from a binary type. For example, if my contract receives an arbitrary binary blob and I decode it and extract a public key, I should be able to use that public key everywhere public keys are used in the host fns. Likewise there should probably be a way to go from a public key to a binary.

Alternatively, we skip the type for public keys and work with them as SCO_BINARY, as long as we very specifically define what format it they're encoded in. If we're supporting only ed25519 keys to begin with, that should be trivial since the public key is usually expressed consistently as 32 bytes. If we support other key types like ecdsa p256/secp256k1 we'll need to discuss formats and can pick one that is reasonably common, like the compressed format. However, I suspect the public key type is useful.

+struct SCBigInt
+{
+ bool positive;
+ opaque magnitude<>;

SCBigInt lends itself to having multiple values for zero. In languages that use default values heavily the default value for zero will be negative. It would also be quite reasonable for someone to think they could set positive to true for zero. I think we should address this in the same way that the Go stdlib does, by changing 'positive' to 'negative' so that if the sign field is not set the value is default positive. (Other ecosystems like Java have preferred an enum that has a specific value sign value for zero, but I don't think that adds anything over the better behaving default.)

func $binary_

The binary functions look like they cover a similar area as the vec functions, although they're a subset which is inconvenient. Could we expand the function set so that they match the vec type, and are just a specialized version of a vec?

Or maybe we should add U8 as a SCVal so that we can represent this inside the existing vec host fns?

As I understand there are two advantages of making SCO_BINARY its own type: 1) the XDR is more compact since each byte won't be aligned to 4 bytes, and 2) it might make it easier to add some binary specific functions that optimize for passing u8s. For example, it looks like the binary functions pass a single byte at a time, however we could pass 8 bytes at a time inside a 64-bit integer reducing 8 host fn calls with 1 at the expense of some shifts and ors.

We could keep the SCO_BINARY type, but reuse the existing vec functions, then in the future add binary specific functions that can pass multiple values at once.

func $vec_

Could you add a function (or functions) for doing slicing?

func $map_

I think we will need a way to iterate a map, either by getting the keys first which seems not efficient, or by getting an iterator value. i.e. SCO_ITERATOR. I'm not sure the best way to contain the key and value, there are probably a few ways: a vec and use vec functions, or we could have a map specific iterator with next_key and next_value fns that are only valid to call in succession.

Type case SCO_BIGINT: construct a host object by first invoking the from_bytes_be

The Conversion section discusses how to build SCO_BIGINT by referencing a Rust crate, we should probably specify this independent of any implementation though, and include test cases to validate in SDKs that implement it. All the existing SDKs are likely going to need to be expanded with this logic.

Error handling — favor traps over errors

I think this is the right call for type errors, since the SDK should provide type safety to the developer and therefore it is an SDK or host bug if a panic occurs because of a type error.

For other types of errors I think this moves the problems described, and doesn't eliminate them.

Developers writing contracts in languages such as Rust or Go, that have a heavy bias towards developers handling errors themselves, will now be writing panic heavy code that could panic at any moment. It will force developers to become intimately aware of when host fns could panic and the developer toolchain will not help them understand when they could be, or help them code around those edge cases.

I expect we will make the SDK do checks before calling a hostfn and return an error, but the result of this will be the same number of branches but extra host fn calls. For example, when getting a value from a map the SDK will call has() before calling get(), so that the SDK can return an Option<T>.

commit one bit in the host value to denote a None value (analogous to Rust’s std::option)

I think this idea would be a very reasonable approach to take. We could commit the low bit in RawVal to signal that a value is present. The only types it would have an impact on is BitSet which would be reduced from 60 bites to 59, and the positive-i64/u63 type which would be reduced to u62. u62 would still be a substantial optimization and so I don't think we'd lose out here much. There could be an impact to Symbols, I'm not sure. The SDK would convert the RawVal that has value None (low bit = 0), or Some(T) (low bit = 1), to Option<T>.

Cheers,

Leigh

--
You received this message because you are subscribed to the Google Groups "Stellar Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stellar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stellar-dev/7772eac2-b2e1-40db-8105-c866052d24f3n%40googlegroups.com.

Graydon Hoare

unread,

Jun 2, 2022, 1:59:43 PM6/2/22

to Stellar Developers

I can speak to a few of these questions.

On Monday, May 23, 2022 at 2:59:41 PM UTC-7 Leigh McCulloch wrote:

Hi Jay,

The surface area of these host fns looks good for a first pass, assuming that we'll be adding host fns that interact with the ledger in a separate CAP.

Yes, the contract-data-access functions are in CAP-53.

SCO_HASH

What's the use case for making hash a first-class type? I think we should omit this type and simply use SCO_BINARY. I think if there is any specialization of binary types we should probably maintain that in the SDK and not in the host type system.

HASH is a fixed-size 32-byte binary which is fairly ubiquitous in the network / ecosystem. A general SCO_BINARY doesn't have a fixed size, so every codepath has a panic-due-to-wrong-size if we go that route. It's not impossible but I think it might make sense to keep HASH special.

Also hashes are fairly strictly the output from hash functions and inputs to hash-checking functions, and are typically opaque / indivisible, and have security meaning. So I don't know that there's a lot of value in making it overlap with general binaries, and I can imagine errors being caught by differentiating the two.

More generally, every type _could_ be a "general binary" -- integers, bignums, even maps and vectors -- since every value is "just bytes" at a low level; but if the operations that produce-and-consume them are all specialized, and users have strong expectations about the bytes that comprise them and what those bytes mean, it seems to me to make sense to specialize the object type code it's marked with. It's not like object type codes are scarce.

an arbitrary precision big integer number

Will BigInt's be truly arbitrary precision, or just some very high precision? Is CAP-51 the place we will state the upper bound on the precision, or will that come later?

Every type will have some size limit implied by the resource-metering aspects of this (and other) CAPs. Besides those costs I do not think there's any other size limit imposed by the software.

SCO_PUBLIC_KEY

What's the use case for making public key a first-class type? Will account IDs appear as public keys or is that likely to be another type in the future?

I think this is similar to the hash question: if we have functions that produce and consume PKs have very specific expectations of the bytes, and the bytes are otherwise opaque (nobody's reading or writing single bytes in the middle of a pubkey), it seems to me to make sense to differentiate it from SCO_BINARY. But I can also understand the counterargument that "if the host representation is just an un-transformed byte buffer, maybe just use BINARY".

SCBigInt lends itself to having multiple values for zero. In languages that use default values heavily the default value for zero will be negative. It would also be quite reasonable for someone to think they could set positive to true for zero. I think we should address this in the same way that the Go stdlib does, by changing 'positive' to 'negative' so that if the sign field is not set the value is default positive. (Other ecosystems like Java have preferred an enum that has a specific value sign value for zero, but I don't think that adds anything over the better behaving default.)

Fair point, this should probably change. We can also make it a little less error prone by having a 3-state enum ZERO, POSITIVE, and NEGATIVE with a magnitude body only in the POSITIVE and NEGATIVE states, and define-as-invalid a magnitude of all-zero bytes.

func $binary_

The binary functions look like they cover a similar area as the vec functions, although they're a subset which is inconvenient. Could we expand the function set so that they match the vec type, and are just a specialized version of a vec?

They have different representations and different operation-complexity classes.

We also don't currently have any concept of a specialized Vec with a homogeneous element type -- Vec supports heterogeneous elements. I think it's a lot less work to just replicate a few host functions than try to develop machinery for container-type specialization (multiple representations, conversion-or-failure on interactions that violate specialization, etc.)

Or maybe we should add U8 as a SCVal so that we can represent this inside the existing vec host fns?

I think we could allow a put/get of single byte (or eg. a single u32) to/from a binary. But it'd also make sense to just provide a memcpy interface between BINARY and slices of guest linear memory. Pass a pair of u32 integers representing (pos,len), host copies memory to or from that slice. A stack-allocated [u8;1024] in a rust function will live in linear memory and not require shipping a malloc or anything.

We don't want to _encourage_ use of linear memory / byte-by-byte operations in guest code -- as it's likely to come with lots of guest code and undesirably-slow guest loops -- but linear memory is _there_ and I think if the user really wants to think in terms of byte arrays and byte operations, we can't stop them, and shouldn't make the interface to accomplishing that more obtuse than necessary.

I think we will need a way to iterate a map, either by getting the keys first which seems not efficient, or by getting an iterator value. i.e. SCO_ITERATOR. I'm not sure the best way to contain the key and value, there are probably a few ways: a vec and use vec functions, or we could have a map specific iterator with next_key and next_value fns that are only valid to call in succession.

I think the best way to deal with this is a lower-bound / upper-bound pair of functions you can pass a key into and get a key back from. An iterator doesn't have a meaningful XDR representation -- it'd break the bijection we're maintaining between host values and XDR values.

The Conversion section discusses how to build SCO_BIGINT by referencing a Rust crate, we should probably specify this independent of any implementation though, and include test cases to validate in SDKs that implement it. All the existing SDKs are likely going to need to be expanded with this logic.

Agreed.

Error handling — favor traps over errors

I think this is the right call for type errors, since the SDK should provide type safety to the developer and therefore it is an SDK or host bug if a panic occurs because of a type error.

For other types of errors I think this moves the problems described, and doesn't eliminate them.

Developers writing contracts in languages such as Rust or Go, that have a heavy bias towards developers handling errors themselves, will now be writing panic heavy code that could panic at any moment. It will force developers to become intimately aware of when host fns could panic and the developer toolchain will not help them understand when they could be, or help them code around those edge cases.

Counterpoint: a lot of Rust code is full of unwrap() anyways -- the user only wants to handle the happy path and is fine with a crash -- and a lot of Go code fails to propagate or handle errors correctly, which is a major source of bugs.

I expect we will make the SDK do checks before calling a hostfn and return an error, but the result of this will be the same number of branches but extra host fn calls. For example, when getting a value from a map the SDK will call has() before calling get(), so that the SDK can return an Option<T>.

For such cases I would like the SDK to provide a try_foo() that returns Result/Option and a foo() that just calls the happy path and lets it trap. My bet is that users will rarely choose the try_foo. Smart contracts are transactional and trap-and-rollback is safe and easy to rely on. What will the user likely to in the bad path? Very often: trap anyways.

commit one bit in the host value to denote a None value (analogous to Rust’s std::option)

I think this idea would be a very reasonable approach to take. We could commit the low bit in RawVal to signal that a value is present. The only types it would have an impact on is BitSet which would be reduced from 60 bites to 59, and the positive-i64/u63 type which would be reduced to u62. u62 would still be a substantial optimization and so I don't think we'd lose out here much. There could be an impact to Symbols, I'm not sure. The SDK would convert the RawVal that has value None (low bit = 0), or Some(T) (low bit = 1), to Option<T>.

This isn't necessary. We already have the status type carved out of ScVal / HostVal. No representation choice here avoids the problem of collapsing together success-that-returns-an-error from failure-that-generates-an-error, and you already know the expected type of the Ok-case to provide for T in the static type's Option<T>.

-Graydon

Leigh McCulloch

unread,

Jun 2, 2022, 3:14:10 PM6/2/22

to Graydon Hoare, Stellar Developers

Hi Graydon,

SCO_HASH

What's the use case for making hash a first-class type? I think we should omit this type and simply use SCO_BINARY. I think if there is any specialization of binary types we should probably maintain that in the SDK and not in the host type system.

HASH is a fixed-size 32-byte binary which is fairly ubiquitous in the network / ecosystem. A general SCO_BINARY doesn't have a fixed size, so every codepath has a panic-due-to-wrong-size if we go that route. It's not impossible but I think it might make sense to keep HASH special.

Also hashes are fairly strictly the output from hash functions and inputs to hash-checking functions, and are typically opaque / indivisible, and have security meaning. So I don't know that there's a lot of value in making it overlap with general binaries, and I can imagine errors being caught by differentiating the two.

More generally, every type _could_ be a "general binary" -- integers, bignums, even maps and vectors -- since every value is "just bytes" at a low level; but if the operations that produce-and-consume them are all specialized, and users have strong expectations about the bytes that comprise them and what those bytes mean, it seems to me to make sense to specialize the object type code it's marked with. It's not like object type codes are scarce.

SCO_PUBLIC_KEY

What's the use case for making public key a first-class type? Will account IDs appear as public keys or is that likely to be another type in the future?

I think this is similar to the hash question: if we have functions that produce and consume PKs have very specific expectations of the bytes, and the bytes are otherwise opaque (nobody's reading or writing single bytes in the middle of a pubkey), it seems to me to make sense to differentiate it from SCO_BINARY. But I can also understand the counterargument that "if the host representation is just an un-transformed byte buffer, maybe just use BINARY".

Got it, this sounds fine then. Could we add host fns for converting from SCO_BINARY to SCO_HASH/PUBLIC_KEY and back? And could we add host fns for SCO_ACCOUNT_ID to SCO_PUBLIC_KEY and back? I think we need to be able to interoperate with these types via binary for people to flexibly use their APIs.

an arbitrary precision big integer number

Will BigInt's be truly arbitrary precision, or just some very high precision? Is CAP-51 the place we will state the upper bound on the precision, or will that come later?

Every type will have some size limit implied by the resource-metering aspects of this (and other) CAPs. Besides those costs I do not think there's any other size limit imposed by the software.

I get that resource-metering will impose limits, but the VM is not the only stakeholder that will parse txns, and we need to communicate to other stakeholders what a reasonable tx is. We typically do that with the XDR definition. Similar to how we imposed an upper limit on WASM code size in the XDR, I think we should specify some upper limit on all the unbounded variable-length arrays, especially for any type that can be used inside a transaction.

SCBigInt lends itself to having multiple values for zero. In languages that use default values heavily the default value for zero will be negative. It would also be quite reasonable for someone to think they could set positive to true for zero. I think we should address this in the same way that the Go stdlib does, by changing 'positive' to 'negative' so that if the sign field is not set the value is default positive. (Other ecosystems like Java have preferred an enum that has a specific value sign value for zero, but I don't think that adds anything over the better behaving default.)

Fair point, this should probably change. We can also make it a little less error prone by having a 3-state enum ZERO, POSITIVE, and NEGATIVE with a magnitude body only in the POSITIVE and NEGATIVE states, and define-as-invalid a magnitude of all-zero bytes.

The enum idea sounds good to me.

I think we will need a way to iterate a map, either by getting the keys first which seems not efficient, or by getting an iterator value. i.e. SCO_ITERATOR. I'm not sure the best way to contain the key and value, there are probably a few ways: a vec and use vec functions, or we could have a map specific iterator with next_key and next_value fns that are only valid to call in succession.

I think the best way to deal with this is a lower-bound / upper-bound pair of functions you can pass a key into and get a key back from. An iterator doesn't have a meaningful XDR representation -- it'd break the bijection we're maintaining between host values and XDR values.

Is the idea that you pass in a key, and you get the next key in the map? That sounds fine, and that sounds like an iterator to me and would do the job.

Thanks,

Leigh

--

You received this message because you are subscribed to the Google Groups "Stellar Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stellar-dev...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/stellar-dev/73c5e3b9-c3a2-4469-894a-02e958ba871fn%40googlegroups.com.

Graydon Hoare

unread,

Jun 2, 2022, 7:18:25 PM6/2/22

to Stellar Developers

On Thursday, June 2, 2022 at 12:14:10 PM UTC-7 Leigh McCulloch wrote:

Got it, this sounds fine then. Could we add host fns for converting from SCO_BINARY to SCO_HASH/PUBLIC_KEY and back? And could we add host fns for SCO_ACCOUNT_ID to SCO_PUBLIC_KEY and back? I think we need to be able to interoperate with these types via binary for people to flexibly use their APIs.

For sure.

I get that resource-metering will impose limits, but the VM is not the only stakeholder that will parse txns, and we need to communicate to other stakeholders what a reasonable tx is. We typically do that with the XDR definition. Similar to how we imposed an upper limit on WASM code size in the XDR, I think we should specify some upper limit on all the unbounded variable-length arrays, especially for any type that can be used inside a transaction.

Well, I think we actually just _removed_ the WASM code size from the XDR, moving it to a CONFIG ledger entry so the network can negotiate it up or down without a protocol change.

(And this is doubly true with the move to storing code under a plain BINARY in a CONTRACT_DATA ledger entry -- the size of a binary is presumably not related to the size of a wasm)

I get your point that 3rd parties need to be able to evaluate limits, but at present I think the current plan is to tell people to consult the current CONFIG ledger entries, whose values are all TBD.

Is the idea that you pass in a key, and you get the next key in the map? That sounds fine, and that sounds like an iterator to me and would do the job.

Next or previous, yeah, one usually provides a function to go each way. And probably another that returns true or false if you're at the min or max key in a map (otherwise you don't know when to stop -- this is another of these "you could use a sentinel/status value but then you wouldn't be able to use the sentinel as an actual key" situations).

Returning to that earlier question of explicit error-status code / sentinel values (I know I keep saying no to them ubiquitously, but I do recognize they're useful sometimes), I've been thinking about the ambiguity issue and thinking there might be a way around it for functions we want to use status codes with. If we carved off a single bit of the existing status type -- call it the "hot" bit -- and required that any status SCVal converted to a host value as well as any status host value passed to a host function has to have the hot bit cleared, then we could say that the hot bit is reserved for a status that _is_ representative of a currently-returning host-function failure. If anyone wants to reify a status into a longer-lived value (eg. put it in a container, pass it into a host function for use elsewhere) they have to clear the hot bit, mark it as a "cold" status. Still lets users store statuses as first class values if they want, but requires their use is unambiguous (or less ambiguous). If you call a function and you get back a hot status, it means the function failed. If you get a cold status, it means the function succeeded and is returning a status someone gave it from somewhere else, somehow (eg. put in a data structure, returned from an inner call, etc.)

WDYT? It is still possible to have _some_ ambiguity (cold status relayed from 1 callee deep vs. N callees deep, say) but not in the case where it's important, trying to interpret the return from a call that actually failed.

-Graydon

Leigh McCulloch

unread,

Jun 2, 2022, 7:34:44 PM6/2/22

to Graydon Hoare, Stellar Developers

Hi Graydon,

I get that resource-metering will impose limits, but the VM is not the only stakeholder that will parse txns, and we need to communicate to other stakeholders what a reasonable tx is. We typically do that with the XDR definition. Similar to how we imposed an upper limit on WASM code size in the XDR, I think we should specify some upper limit on all the unbounded variable-length arrays, especially for any type that can be used inside a transaction.

Well, I think we actually just _removed_ the WASM code size from the XDR, moving it to a CONFIG ledger entry so the network can negotiate it up or down without a protocol change.

(And this is doubly true with the move to storing code under a plain BINARY in a CONTRACT_DATA ledger entry -- the size of a binary is presumably not related to the size of a wasm)

I get your point that 3rd parties need to be able to evaluate limits, but at present I think the current plan is to tell people to consult the current CONFIG ledger entries, whose values are all TBD.

I thought we agreed otherwise in last week's protocol meeting, but I might be mistaken. I don't think it is practical for applications ingesting XDR to consult the value of CONFIG ledger entries that were set at each ledger. I understand there will be config ledger entries that define the currently configured boundaries, but we should define some upper bound that is more reasonable than XDR's default 4GB maximum. SCO_BINARY should have an upper bound too, otherwise there is no way for an ingester or parser of network transactions to be able to anticipate what valid data should be according to the XDR definition.

Returning to that earlier question of explicit error-status code / sentinel values (I know I keep saying no to them ubiquitously, but I do recognize they're useful sometimes), I've been thinking about the ambiguity issue and thinking there might be a way around it for functions we want to use status codes with. If we carved off a single bit of the existing status type -- call it the "hot" bit -- and required that any status SCVal converted to a host value as well as any status host value passed to a host function has to have the hot bit cleared, then we could say that the hot bit is reserved for a status that _is_ representative of a currently-returning host-function failure. If anyone wants to reify a status into a longer-lived value (eg. put it in a container, pass it into a host function for use elsewhere) they have to clear the hot bit, mark it as a "cold" status. Still lets users store statuses as first class values if they want, but requires their use is unambiguous (or less ambiguous). If you call a function and you get back a hot status, it means the function failed. If you get a cold status, it means the function succeeded and is returning a status someone gave it from somewhere else, somehow (eg. put in a data structure, returned from an inner call, etc.)

WDYT? It is still possible to have _some_ ambiguity (cold status relayed from 1 callee deep vs. N callees deep, say) but not in the case where it's important, trying to interpret the return from a call that actually failed.

I also think it is fine for us to say that status values are never allowed to be stored, such that the type is only used for this use case. I don't imagine a need to store status values places, and if someone tries to, we should just trap. It would be like trying to store Result<> or an Error.

Using a bit sounds fine too, but is essentially doing the same with some additional complexity? Wdyt?

Leigh

Graydon Hoare

unread,

Jun 2, 2022, 8:17:04 PM6/2/22

to Leigh McCulloch, Stellar Developers

On Thu, Jun 2, 2022 at 4:34 PM Leigh McCulloch <le...@stellar.org> wrote:

> I thought we agreed otherwise in last week's protocol meeting, but I might be mistaken. I don't think it is practical for applications ingesting XDR to consult the value of CONFIG ledger entries that were set at each ledger. I understand there will be config ledger entries that define the currently configured boundaries, but we should define some upper bound that is more reasonable than XDR's default 4GB maximum. SCO_BINARY should have an upper bound too, otherwise there is no way for an ingester or parser of network transactions to be able to anticipate what valid data should be according to the XDR definition.

Oh, possibly! I don't recall, I just know (a) there are CONFIG LEs
planned and (b) there aren't currently XDR-imposed limits on BINARY. I
don't mind there being a smaller one. As a wise person once said, 64kb
ought to be enough for anyone!

> I also think it is fine for us to say that status values are never allowed to be stored, such that the type is only used for this use case. I don't imagine a need to store status values places, and if someone tries to, we should just trap. It would be like trying to store Result<> or an Error.
>
> Using a bit sounds fine too, but is essentially doing the same with some additional complexity? Wdyt?

Sure, making them non-storable (and non-passable-to-functions) is also
fine by me, and is fewer branches / less complexity. The code and
category parts of a status value can be extracted as u32s manually if
anyone wants to store/transmit them indirectly somehow.

-Graydon

Siddharth Suresh

unread,

Jun 2, 2022, 8:58:44 PM6/2/22

to Graydon Hoare, Leigh McCulloch, Stellar Developers

> I thought we agreed otherwise in last week's protocol meeting, but I might be mistaken. I don't think it is practical for applications ingesting XDR to consult the value of CONFIG ledger entries that were set at each ledger. I understand there will be config ledger entries that define the currently configured boundaries, but we should define some upper bound that is more reasonable than XDR's default 4GB maximum. SCO_BINARY should have an upper bound too, otherwise there is no way for an ingester or parser of network transactions to be able to anticipate what valid data should be according to the XDR definition.

Oh, possibly! I don't recall, I just know (a) there are CONFIG LEs
planned and (b) there aren't currently XDR-imposed limits on BINARY. I
don't mind there being a smaller one. As a wise person once said, 64kb
ought to be enough for anyone!

To clear some of this up - now that contract code is stored in CONTRACT_DATA. The upper bound on code size depends on a limit set by `CreateContractTransaction` and the host functions to create and update contracts. `CreateContractTransaction` in CAP-0047 uses an upper bound of 256000(still up for debate), but the mechanism described here can be used for a configurable maximum - https://github.com/stellar/stellar-protocol/blob/master/core/cap-0047.md#wasm-contract-code-size-is-configurable. For example, if the limit is set to 16000 in a CONFIG LE, then the transaction fails if the specified contract is greater than that.

The CAP doesn't specify any limits for the host functions, so you could write a 4GB contract because SCO_BINARY does not specify a limit. I agree with Leigh that we should have a reasonable XDR limit for SCO_BINARY. InvokeContractTransaction also specifies a list of them (the list is also currently unbounded, which I think should change as well).

--
You received this message because you are subscribed to the Google Groups "Stellar Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stellar-dev...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/stellar-dev/CANhY3x5xS8QGoJJVTO6kfje%3D%3DsrMn0jBSgav4D6SnNf7GUHW%2BQ%40mail.gmail.com.

Jonathan Jove

unread,

Jun 3, 2022, 11:50:05 AM6/3/22

to Siddharth Suresh, Graydon Hoare, Leigh McCulloch, Stellar Developers

I don't think this proposal has a generic call function in it, only call functions with known numbers of parameters. This is a very useful feature for writing functions that call user-specified functions. Such a function would have a signature like call(contract, symbol, vector of parameters).

To view this discussion on the web visit https://groups.google.com/d/msgid/stellar-dev/CAC7ST6yVoQ_hg_N7SGUwffvvvfjfnpALHWecQepkKDjMM-E%2B4Q%40mail.gmail.com.

Jay Geng

unread,

Jun 3, 2022, 11:51:50 AM6/3/22

to Siddharth Suresh, Graydon Hoare, Leigh McCulloch, Stellar Developers

Thanks for all your responses. To summarize, here are things that are agreed, and I will start updating the CAP with:

Keep the Hash and public key as SCO types, as they are fairly ubiquitous and specially-purposed. But provide the following functions for interoperability:

Convert between SCO_BINARY and SCO_HASH
Convert between SCO_BINARY and PUBLIC_KEY
Convert between SCO_ACCOUNT_ID and SCO_PUBLIC_KEY

Add vec functions for slicing
ScBigInt signess: add an enum {POSITIVE, ZERO, NEGATIVE} to denote its sign (instead of just a POSITIVE).
Map: provide host functions (similar to lower-bound, upper-bound) to iterate through a map, as well as knowing when to end.

I will update the CAP with these changes.

There are a few points mentioned that I would like to get more clarification on.

Why are Hash and PublicKey first-class objects?

@Leigh McCulloch You also mentioned (in a discord thread) that the guarantees (e.g. length) that these objects provide are only one-sided, only on the host side, not on the contracts. I don't think I fully understand this aspect. Would you mind elaborate on it a bit more (if this is still a concern)?

Also, you raised a good point of the possibility of including other Hash and PublicKey types (like ecdsa p256/secp256k1), should we change the current XDR representation to make it extensible?

Conversion between SCO_BIGINT and host object should be implementation dependent

@Leigh McCulloch I agree this semantics should be generic instead of tied to a Rust implementation. Why does the SDK need to implement conversion between SCO types and the host types (I thought that is done only on the host side)? Maybe this is something obvious that I'm not getting at.

Go stdlib’s signness issue

@Leigh McCulloch I'm not familiar with this issue, but I think this is a great catch! Do you happen to have a link to the Go stdlib or other resource where this is being discussed in depth, so that I can fully appreciate the nuances? I will go with Graydon’s suggestion of using enum but would like to understand the issue fully.

Memcpy interface between binary and guest linear memory

@Graydon Hoare This sounds like a great idea which discourages users from performing inefficient per-byte operations (e.g. looping). I will have to read up on how WASM linear memory works.

Regarding the issue of XDR size limit. I'm a little skeptical we should place a hard cap on the variable length arrays at the XDR level.

For binary, as length corresponds to the number of bytes and we can have a reasonably good estimate of contract size limit so this might be plausible. However, ScVec is a heterogeneous structure which can be deeply nested, which makes the length cap kind of pointless. For BigInt, circling back to an earlier point by @Leigh McCulloch, there is no precision limit at implementation level (it is just an array of digits). So the actual precision limit will be bounded heavily by the operation. I'm just not sure what benefits the array length limits provide other than specifying an absolutely impossible to exceed limit where the actual limit is bounded by the application/gas limitation.

Best,

Jay

To view this discussion on the web visit https://groups.google.com/d/msgid/stellar-dev/CAC7ST6yVoQ_hg_N7SGUwffvvvfjfnpALHWecQepkKDjMM-E%2B4Q%40mail.gmail.com.

Jay Geng

unread,

Jun 3, 2022, 12:06:49 PM6/3/22

to Siddharth Suresh, Jonathan Jove, Graydon Hoare, Leigh McCulloch, Stellar Developers

"I don't think this proposal has a generic call function in it, only call functions with known numbers of parameters. This is a very useful feature for writing functions that call user-specified functions. Such a function would have a signature like call(contract, symbol, vector of parameters)."

@Jonathan Jove Do you mean extending these contract-invoking functions (https://github.com/stellar/stellar-protocol/blob/master/core/cap-0051.md#invoking-another-function), to include variadic version?

In that case we'll have to pass in a single host value that corresponds to a host vec object, but need a way of indicating this vec should be treated as a vec of arguments instead of a single argument of a Vec object. I think that might be doable, what do you think @Graydon Hoare ?

Leigh McCulloch

unread,

Jun 3, 2022, 1:51:17 PM6/3/22

to Jay Geng, Siddharth Suresh, Jonathan Jove, Graydon Hoare, Stellar Developers

Hi Jay,

Why are Hash and PublicKey first-class objects?
@Leigh McCulloch You also mentioned (in a discord thread) that the guarantees (e.g. length) that these objects provide are only one-sided, only on the host side, not on the contracts. I don't think I fully understand this aspect. Would you mind elaborate on it a bit more (if this is still a concern)?

There will be cases where a contract has a binary and needs to construct the hash, or vice-versa. The contract code will not necessarily use a fixed-size array. This isn't particularly important now that we are going to have conversion functions.

Also, you raised a good point of the possibility of including other Hash and PublicKey types (like ecdsa p256/secp256k1), should we change the current XDR representation to make it extensible?

Probably. I would lend towards making these unions from the get go, with typed enums.

Conversion between SCO_BIGINT and host object should be implementation dependent
@Leigh McCulloch I agree this semantics should be generic instead of tied to a Rust implementation. Why does the SDK need to implement conversion between SCO types and the host types (I thought that is done only on the host side)? Maybe this is something obvious that I'm not getting at.

Generally we should aim to specify behavior, and not rely on a crate or stdlib functions to specify the behavior for us, which is really another way of us saying that the behavior is undefined and consistent only by happenstance.

When I said SDKs need to be able to implement these things, I'm referring to the fact that Stellar SDKs like the Go SDK, JavaScript SDK, etc will need to construct the SCO_BIGINT type, and so the CAP should specify exactly how to do that independent of any specific technology. If you want to specify that in Rust code, Go code, or pseudocode, that's fine as long as it does so using the most basic primitives that are easily transferable to other implementations.

Go stdlib’s signness issue
@Leigh McCulloch I'm not familiar with this issue, but I think this is a great catch! Do you happen to have a link to the Go stdlib or other resource where this is being discussed in depth, so that I can fully appreciate the nuances? I will go with Graydon’s suggestion of using enum but would like to understand the issue fully.

There is no Go stdlib issue, I was just referencing how some stdlibs have represented arbiratrary precision integers to provide better default zero values. The Go stdlib's implementation is at:

https://cs.opensource.google/go/go/+/refs/tags/go1.18.3:src/math/big/int.go;l=25

However, I think this is irrelevant now, the enum @Graydon Hoare suggested fits an XDR union better. I wouldn't make it an enum followed by a magnitude, instead I'd make it a union in the following form. A union will be more space efficient since the magnitude can be omitted for zero. It will also render better in code, especially in Rust, producing a safer construct.

enum SCBigIntType {

SC_BIGINT_TYPE_ZERO = 0,

SC_BIGINT_TYPE_POS = 1,

SC_BIGINT_TYPE_NEG = 2,

}

union ScBigInt switch (ScBigIntType type) {

case SC_BIGINT_TYPE_ZERO:

void;

case SC_BIGINT_TYPE_POS:

case SC_BIGINT_TYPE_NEG:

opaque magnitude<>;

}

Regarding the issue of XDR size limit. I'm a little skeptical we should place a hard cap on the variable length arrays at the XDR level.
For binary, as length corresponds to the number of bytes and we can have a reasonably good estimate of contract size limit so this might be plausible. However, ScVec is a heterogeneous structure which can be deeply nested, which makes the length cap kind of pointless. For BigInt, circling back to an earlier point by @Leigh McCulloch, there is no precision limit at implementation level (it is just an array of digits). So the actual precision limit will be bounded heavily by the operation. I'm just not sure what benefits the array length limits provide other than specifying an absolutely impossible to exceed limit where the actual limit is bounded by the application/gas limitation.

XDR by default specifies a size limit of 4G items in any array or opaque, but that is not a reasonable limit, as no ingester of XDR in the ecosystem is likely to successfully load 4G of bytes or other structs into memory. The XDR should define reasonable limits. XDR libs that parse XDR will use the limits as guards to protect the applications they use from exploitive data. Of course these applications can also impose general buffer limits to do similar, but if we don't define reasonable limits in the XDR definition we aren't communicating with the ecosystem about what the reasonable limits to impose are. I realize stellar-core will impose its own configurable limits, but there are other ingesters/parsers of XDR other than stellar-core, and not all XDR passes through stellar-core first. Limits must exist, I think we should specify them rather than require others to. It improves interoperability and general clarity of our data.

Cheers,

Leigh

Graydon Hoare

unread,

Jun 3, 2022, 2:13:04 PM6/3/22

to Jay Geng, Siddharth Suresh, Jonathan Jove, Leigh McCulloch, Stellar Developers

On Fri, Jun 3, 2022 at 9:06 AM Jay Geng <j...@stellar.org> wrote:
> In that case we'll have to pass in a single host value that corresponds to a host vec object, but need a way of indicating this vec should be treated as a vec of arguments instead of a single argument of a Vec object. I think that might be doable, what do you think @Graydon Hoare ?

I think this should work -- I actually think the call-level interface
to wasmi might already work this way, and if not it's probably
achievable with a little reflection on the wasm blob or such. I'm
happy to take a look.

-Graydon

Jay Geng

unread,

Jun 8, 2022, 10:38:21 AM6/8/22

to Graydon Hoare, Siddharth Suresh, Jonathan Jove, Leigh McCulloch, Stellar Developers

Hi all,

I've published a new version of CAP-0051 draft that has addressed most of the comments this far. Please have a look and feel free to provide more comments and suggestions.