Question for Rust guru

74 views
Skip to first unread message

Ben Goertzel

unread,
Dec 28, 2021, 1:38:21 PM12/28/21
to opencog
We are hitting some weird issues with our experimental use of Rust in a Hyperon prototype ... Any Rust gurus on here know an answer?



Luke Peterson

unread,
Dec 28, 2021, 9:11:40 PM12/28/21
to ope...@googlegroups.com
It looks like Alice and Quinedot have already provided a nuts-and-bolts explanation, and an ugly but possibly workable solution.

Backing up a bit, I infer you’re trying to implement some kind of type lattice where a concrete type is defined by its properties, and you can add new properties to types and transmute them.  I’m making a leap.

If I'm right, I tried to do this a while back in my Rust implementation of AdamV’s Information Programming (https://adamv.be/Information-programming) and I concluded that the Rust trait system and dyn dispatch mechanism was a bad fit - at least without quite a bit of additional plumbing and glue.  Things it needs to do in order to perform the compile-time monomorphization prohibit the kind of fast-and-loose polymorphism that you need.

If you’re happy to use unstable language features, you might look at rattish. https://crates.io/crates/rattish  I wouldn’t recommend using it as is, but looking through the implementation might give you ideas.  Personally, I didn’t want to pay the runtime cost of a hash for every join, so I ended up making a proc macro that created a big table of implementations.

Also interesting to look at for ideas is pergola https://crates.io/crates/pergola  Created by the inventor of the Rust language.

-Luke

 AdamV’s Information Programming (https://adamv.be/Information-programming).

On Dec 28, 2021, at 1:38 PM, Ben Goertzel <bengo...@gmail.com> wrote:

We are hitting some weird issues with our experimental use of Rust in a Hyperon prototype ... Any Rust gurus on here know an answer?




--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBezTgUd6i_tpw7CzzDvjS8VC4VOfMwVdfDrMCE8EsFihg%40mail.gmail.com.

Linas Vepstas

unread,
Dec 29, 2021, 6:52:28 PM12/29/21
to opencog
On Tue, Dec 28, 2021 at 8:11 PM Luke Peterson <luketp...@gmail.com> wrote:

> I concluded that the Rust trait system and dyn dispatch mechanism was a bad fit - at least without quite a bit of additional plumbing and glue. Things it needs to do in order to perform the compile-time monomorphization prohibit the kind of fast-and-loose polymorphism that you need.

Speaking of polymorphism, I've discovered that OCaML has a similar set
of issues. My zeroth impression was that it's a modern typed
functional language with high performance. Should be a good fit for
the AtomSpace, right?

Nope. After fiddling with it for a few weeks, I "discovered" the OCaml
type system was static, compile-time typed, and there are no
provisions for run-time type-casting/polymorphism. That means, for
example, if I define a type "Atom" at the beginning, and then, minutes
or hours later add something like ProteinNode, there's no way to pass
this to something expecting an Atom; conversely, given an Atom,
there's no way to discover it's actually a ProteinNode. (no
polymorphic lists, either)

Of course, this could be made to work with a lot of extra "plumbing
and glue", but adding that plumbing defeats the whole point of having
the OCaML type system, which was (*) type safety (*) compile-time
optimizations. More about that here:
https://github.com/opencog/atomspace/tree/master/opencog/ocaml

-----------------
Reading about Rust, I see "traits" as an important language feature.
Reading about traits says that both Haskel and OCaml can support them
("easily"??) I don't know OCaml well enough to figure this out, or if
this would be a good match for the atomspace type system. I'm utterly
lost as to vsbogd's issues, or what alice & quinedot wrote.

Which leads to some other off-hand comments: the reason the atomspace
keeps getting smaller and simpler is so that .. well, complexity and
over-engineering is the enemy. One should always try to keep things as
simple as possible. This helps avoid hard-to-debug problems, and it
also helps performance. (Those paying attention might have noticed
some large parts of the atomspace were nuked to oblivion in the last 6
months. It's gotten smaller and simpler... again.)

One humble suggestion for hyperon: instead of trying to reinvent the
atomspace, you might have more luck and less expended effort by just
... using the atomspace as-is, and layering what you need on top.
Don't create headaches where they don't need to exist. Focus on doing
the experiments that you need to do, get them to work right. After
they work, after you get good results, then and only then start
thinking about what kind of infrastructure you need. Don't put the
cart before the horse. Don't over-engineer.

Everyone here has observed that I do most of my work in scheme.
There's a reason for this: when coding in scheme, one never-ever runs
into problems where the language/compiler is trying to stop you from
doing "the obvious thing" (as OCAML is doing, and as Rust is doing,
apparently.) This flexibility of being able to do anything you want,
anything you feel like, without having to stop to ask questions on
stackexchange, is one of the huge benefits of scheme/lisp.

Socially, though, it is a curse: because everything is so easy,
programmers never feel much of an urge to standardize anything,
instead working happily with their home-grown solutions. There is a
worthwhile essay about this to read and ponder:

http://www.winestockwebdesign.com/Essays/Lisp_Curse.html

I think the above essay explains vsbogd's problem with Rust. (about 5
paragraphs down.)

Since I'm on the topic: perhaps my biggest #1 greatest "discovery"
about the atomspace is that it would be really really really nice to
have a vector (matrix, tensor) API to it. To be able to say "here's a
vector of Atoms". Because, well, vectors are really really useful. I
implemented one, its here:
https://github.com/opencog/atomspace/tree/master/opencog/matrix
Unfortunately, it is in ... scheme, so the pythonistas and rustaceans
are out of luck. Someone should recreate this API in C++ (and thus get
R and python bindings "for free") ...

Why the heck am I talking about this? It's an example of a
"discovery". I didn't start with a profound theoretical insight or
fundamentalist proclamation. I started with a tiny collection of
small-time utilities to make writing code easier. Over time, it
evolved and got more sophisticated, as the actual needs became clear.
Now that the shape of the thing is clear, and the need for it is
fairly obvious, one can go back and "harden" the code, formalize it,
rewrite it, tune it, optimize it.

Put the horse before the cart: Let experiments and quick-n-dirty
attempts be the horse. They'll pull the cart of infrastructure, the
code base, in the direction you want to go in. Find what works,
first. Make it industrial-strength second.

-- Linas

Luke Peterson

unread,
Dec 29, 2021, 8:14:07 PM12/29/21
to ope...@googlegroups.com
Hey Linas!

All your points about “Engineering" vs. "Knowing what you are Building" are right on.  Rust is a systems-level language that you bring in when you have a pretty good idea about what you are building and you care about execution speed and long-term maintainability.  It’s not a good platform for experimentation.  That said, I can experiment faster in Rust than I ever could in C++.  And I never ever lose days to debugging memory stompers or data races.  Which is pretty huge for me as those cost me a lot of momentum on past projects.

So, what I wrote was not to suggest Rust is a bad fit for Hyperon’s Atomspace.  Just that the default Box<dyn Trait> approach typically used for polymorphism in Rust may need to be extended with some additional customization (implemented with `unsafe`) to make it do what you want.  Still a little bit of unsafe in Rust is better than everything being unsafe in C++.

Rust is an awesome language for a whole host of reasons from the performance to the memory safety to the well-thought-out std library.  (Almost) every limitation in the language is there for a good reason - it’s just that those reasons might not entirely apply to our use case and we might have made different choices if we were designing a perfect language to implement something like Atomspace.

On Dec 29, 2021, at 6:52 PM, Linas Vepstas <linasv...@gmail.com> wrote:

On Tue, Dec 28, 2021 at 8:11 PM Luke Peterson <luketp...@gmail.com> wrote:

I concluded that the Rust trait system and dyn dispatch mechanism was a bad fit - at least without quite a bit of additional plumbing and glue.  Things it needs to do in order to perform the compile-time monomorphization prohibit the kind of fast-and-loose polymorphism that you need.

Speaking of polymorphism, I've discovered that OCaML has a similar set
of issues. My zeroth impression was that it's a modern typed
functional language with high performance. Should be a good fit for
the AtomSpace, right?

Nope. After fiddling with it for a few weeks, I "discovered" the OCaml
type system was static, compile-time typed, and there are no
provisions for run-time type-casting/polymorphism. That means, for
example, if I define a type "Atom" at the beginning, and then, minutes
or hours later add something like ProteinNode, there's no way to pass
this to something expecting an Atom; conversely, given an Atom,
there's no way to discover it's actually a ProteinNode.  (no
polymorphic lists, either)

I don’t know OCaml at all but Rust does have a Polymorphism mechanism, via the `dyn` keyword.  This basically tells the compiler to build a vtable.  It works well most of the time.

The main annoyance with it is that it’s incompatible with generics - again this makes sense because a generic monomorphizes into many different implementations, and you wouldn’t want the number of vtable entries to explode. (https://rust-lang.github.io/rfcs/0255-object-safety.html)  But what it means is that you often have to do type erasure and create dynamic wrappers manually.  For example (https://github.com/dtolnay/dyn-clone)  Runtime-performance-wise, you’re no worse off than if you were working in a language that did the indirection behind the scenes - but sometimes the extra keystrokes are annoying.

As far as getting from a `dyn Trait` object back to the concrete object you started with, Rust offers the `Any` trait. (https://doc.rust-lang.org/std/any/index.html) which works in the majority of cases.  Unfortunately it’s limited to objects that don’t reference other objects ('static lifetimes), because the lifetime tracking is resolved at compile time so you couldn’t safely guarantee a reference didn’t outlive a referent.  So if you know this is ok in your situation, you’ll need to use unsafe to tell the compiler to let you do it.

These kind of limitations make a lot of sense when considering the overall design goals of Rust, but aren’t ideal for implementing an executable node graph.  Luckily you can use unsafe in a few judicious places and make make Rust do what you want.

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.

Linas Vepstas

unread,
Dec 29, 2021, 11:27:56 PM12/29/21
to opencog
Hi Luke,

On Wed, Dec 29, 2021 at 7:14 PM Luke Peterson <luketp...@gmail.com> wrote:
>
> Hey Linas!
>
> That said, I can experiment faster in Rust than I ever could in C++.

Heh. I'm not advocating for C++. In the early days of the AtomSpace, I
argued strongly against it, but I was over-ruled. So it goes. I've
come to peace with it. Then again, there weren't many viable
candidates, way back then.

> And I never ever lose days to debugging memory stompers or data races. Which is pretty huge for me as those cost me a lot of momentum on past projects.

"Doctor Doctor, it hurts when I do this" ... "well, don't do that!"

FYI, some stats:
* The AtomSpace itself is 2KLOC grand total, and has one mutex in it.
It's used in only one file, that's under 300 LOC in size. The
protected region is tiny.
* The basic Atoms are under 3KLOC and have two mutexes: one for the
incoming set, one for the key-value store. Both protect miniscule
regions.
* The atom factory is 700 LOC and uses one mutex.
* The RocksDB backend is 1.6 KLOC and two mutexes

So about 7.5KLOC grand total. This is not a system with great
complexity. The hardest part is to figure out how to further simplify
it, without hurting performance.

The subsystems that actually are complex and hard to maintain:
* The pattern engine. Static analysis: 8KLOC plus 8KLOC runtime: so
16KLOC total, which is more than 2x larger than the atomspace itself
(!) Zero mutexes
* guile bindings: 6KLOC of C++ (same size as the atomspace itself!) 7
mutexes for stuff.
* assorted scheme utilities (including docs) 5KLOC
* python bindings: 4KLOC has 4 mutexes, including the global python GIL lock.
* rainbow of assorted atom types: 20+ KLOC of which 7KLOC is GPL
boilerplate! Zero mutexes.
* The vector/matrix API: 8KLOC of scheme; a handful of mutexes
* Unit tests: 71 KLOC -- Note that this is larger than everything else
combined, which is about 60KLOC

The above numbers are meant to give you an idea of where the *actual*
complexity is. Don't imagine that it is in the atomspace: the
atomspace is small, and getting smaller. I'm comparing 7.5KLOC to
130KLOC of "other stuff"; the atomspace itself is about 5%
grand-total of what's in the github repo of the same name. The git
repo could (should?) be broken up into disjoint parts to make it more
manageable.

Other github repos:
* "unified" rule engine: 12 KLOC source plus 43KLOC unit tests
* PLN 13 KLOC plus 2KLOC unit tests
* Miner: 9KLOC plus 10KLOC unit tests

> So, what I wrote was not to suggest Rust is a bad fit for Hyperon’s Atomspace.

Well, I'm suggesting that since the current atomspace is 7KLOC, and is
"done", stable and debugged, just use that. Don't reinvent it.

The issue is, I guess, the 200 KLOC of "other stuff" built on top of
the atomspace; not clear to me how much of that other stuff you want
to replicate. I don't understand where the boundaries are being drawn.

> our use case

I've been unable to find any public descriptions of hyperon. Are there any?

-- linas

Vitaly Bogdanov

unread,
Dec 30, 2021, 1:42:16 PM12/30/21
to opencog
Hi Luke,

My topic at users.rust-lang.org is more about lifetimes and finally Alice an Quinedot helped me to find the issues in my code. But I started writing this implementation of the Visitor because I also found that it is impossible to get the full information about type in Rust. Specifically having a trait object one cannot downcast to another trait object which is implemented by the type. Adding RTTI in Rust is probably a complex issue because new trait can be implemented for a type even in separate library. Not adding RTTI from the very beginning seems even logical and my question is if there are other ways to implement functionality which requires such sort of operations.

One example of the problem is transmuting the items in a heterogeneous container. Container should be parameterized by some common trait but after putting item into such container you cannot downcast the item to the transmuting trait. Downcasting to the specific type is also a problem if this type is generic and listing all type parameter values in advance is not possible.

The solution I tried to implement is parameterizing a container by a sum of traits including `Visitable<T>` trait. Then each item type in container should implement the `Visitable<T>` trait to cast self to the required type `T`. I thought it will allow me to use common implementation of the `Visitor<T>` to transmute any collection which is parameterized by `.. + Visitable<T>`. Unfortunately it doesn't work in stable Rust because trait upcasting is experimental but looks like in principle this approach works.

It is similar to building upper bound for a set of types which are presented in the container but I am not specifically interested in least upper bound but it should be enough if upcasting work. I need to think about this further and https://crates.io/crates/pergola is interesting from this perspective. It can be event more useful for thinking about Hyperon type system.

I posted another example of the issue at https://stackoverflow.com/questions/70504911/how-to-implement-an-introspection-of-a-trait-content . I suspect it also could be covered by `Visiable/Visitor` but when I am tried to apply it I found that I need to parameterize `Plan<T, R>` by additional `P0: Plan<(), R>` and I am not sure if it will compile and it looks ugly.

All of the issues above seem to be solvable by introducing some kind of RTTI. Thanks for the link to https://crates.io/crates/rattish I am trying to stick stable Rust compiler version but anyway it is worth to look at the code. I also found https://docs.rs/query_interface/0.3.5/query_interface/ while searching for the answer at StackOverflow.

Thanks,
Vitaly


Luke Peterson

unread,
Dec 30, 2021, 9:33:40 PM12/30/21
to ope...@googlegroups.com
Hi Vitaly,

The solution I tried to implement is parameterizing a container by a sum of traits including `Visitable<T>` trait. Then each item type in container should implement the `Visitable<T>` trait to cast self to the required type `T`. I thought it will allow me to use common implementation of the `Visitor<T>` to transmute any collection which is parameterized by `.. + Visitable<T>`. Unfortunately it doesn't work in stable Rust because trait upcasting is experimental but looks like in principle this approach works.

I’m a little confused on the purpose of the generic T trait parameter on Visitable<T>.  Are you wanting your container to contain a heterogeneous mix of T, or will T be the same for all objects in the container?  In general, I’ve found there are only a few situations where generic parameters on traits make sense.  Like the From / Into traits.  Usually you want associated types or ideally no type at all if possible.

Anyway, you can get around the lack of trait upcasting using an additional method in your other trait(s) along the lines of `as_visitable`.  Check out this playground for an example.


-Luke

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.

Luke Peterson

unread,
Dec 30, 2021, 10:01:15 PM12/30/21
to ope...@googlegroups.com

Oh.  I think I see what you’re trying to do... you want differently typed versions of your visitor function.  I didn’t understand that the first time I read your message.  When I’ve dealt with this problem in the past, I ended up rolling my own custom version of the `Any` trait with the safety requirements relaxed in ways that were appropriate for my code.

Here is the Any trait’s implementation in the Rust standard lib, and in particular look at downcast_ref on line 219


-Luke
Reply all
Reply to author
Forward
0 new messages