First impressions and thoughts

123 views
Skip to first unread message

Mark Schwarzmann

unread,
Oct 5, 2018, 12:05:59 PM10/5/18
to cell-lang
Hi Giovanni,

I came across the language and read through some of the information on the site. The language has a lot of interesting ideas, some of which (like relational automata) I never encountered before. I'm quite impressed.

I have a few questions and thoughts, though, if you'll oblige me :)

1) Like you, I'm quite fond of the relational model. But I'm not sure I understand how relations are queried in the language. From the examples on the site, I gather that this is done via plain functions and relation comprehensions. Am I right? The syntax certainly seems "cleaner" than SQL, or even relational algebra. Do you intend to provide the standard RA operators (intersection, join, division, etc.) in the standard library?

2) Regarding the execution model of message handlers (deferred updates), I'm not aware of any other language that uses such a model, but in C++ the copy-and-swap idiom involves a similar idea.

It says that if you are writing a class method that mutates the underlying object (for example, assignment) then you should implement it by first copying the object, then mutating the copy as needed, and finally swap the underlying object with that copy. Thus, the underlying object only goes through two states, similar to the situation in your execution model.

However, the primary motivation for this idiom is not cleaner or easier-to-parallelize code, but to achieve a strong exception guarantee, i.e. to ensure that if an exception is thrown while the method is executing, the underlying object remains in its original state and, importantly, it still satisfies the invariants of the class. The only operation in the method that changes the object's state is the swap, and swap functions are expected to never throw.

To be clear, this is an idiom that isn't enforced by the language at all, and a lot of C++ programmers probably don't use it at all.


If you're curious how I came across the language, I found it through this post in HackerNews. I wasn't surprised that it is inspired by the "Out of the Tar Pit" paper, a favorite of mine. I learned of that paper from this excellent list of fogus. Maybe you'll find the other papers in his list as inspiring and insightful :)

Good luck,
Mark

cell-lang

unread,
Oct 8, 2018, 5:42:59 AM10/8/18
to cell-lang
Hi Mark,

there's indeed no query language at the moment. All the operations you can do with relations right now are searches, lookups and linear scans, and any logic that in a database would be implement in SQL or relational algebra in Cell has to be implemented using the functional language.

I do plan to include at least bits and pieces of a query language at some point, but I'm going to model it after Datalog rather than relational algebra. Not only non-trivial queries are normally much easier to express in Datalog, but relational algebra is (as far as I can tell) specifically designed for the particular flavor of the relational model used in relational databases, and wouldn't work well in Cell, where relations can (at the moment) be at most ternary, and are meant to encode a single attribute or relationship. "Wide" relations of the type used in SQL may make sense for databases, but it seems to me that Datalog and atomic relations integrate much better with a general purpose programming language.

The motivation is also different. Since whatever logic you can implement with a query can already be implemented anyway with the functional language (although in a more clumsy form, of course), a query language in and of itself is not much of a priority for me at the moment: it's a lot of work for limited gains. What I would like to have, instead, is materialized views that are updated incrementally (and efficiently) as the underlying stored relations change, because without that one would be forced (for performance reason) to introduce redundancy in the schemas more often than I'd like (there's a (rather convoluted) example of that in the "Why relations are better than objects" page, in the "Declarative query languages" paragraph).

I've to admit that until I read your email, I was one of the C++ programmers who were completely unaware of the copy-and-swap idiom. It's very different though from the deferred updates model used in Cell. For example, the issue with the Game automaton described in www.cell-lang.net/updates.html would not appear with copy-and-swap, because every update of the state of the copy would be still visible to the code that follows. With copy-and-swap, if I understand it correctly, the update appears to be atomic to the clients of the class, but methods of the latter would still see all the intermediate states. And then there's of course the fact that any non-trivial update to the state of an application typically involves updating several objects, so the object graph itself would end up going through a number of intermediate states even if each object is updated atomically. The approach used in Cell is, in a sense, more radical, and its consequences, both good and bad, more drastic.

The papers listed at the link you provided are indeed very interesting. In fact, both papers in the "I’ve Seen the Future, Brother: It is Murder" section, "Out of the Tar Pit" and "Dynamo: Amazon’s Highly Available Key-value Store" have influenced the design of Cell. I got from the latter the idea of using a user-provided function to reconcile divergent replicas of an automaton (or, more precisely, to reconcile their diverging message histories) that will be so crucial in Cell's future network architecture (https://github.com/cell-lang/network-model). "Time, Clocks, and the Ordering of Events in a Distributed System" also looks very interesting, and might actually be relevant for the network architecture. I'll read it soon.

I'm a bit surprised you managed to find the website through a post that's nearly one year old. That was my first post ever about Cell, and unfortunately the only one that ever managed to attract any attention on Hacker News. I tried to post again there every time I released something new, but subsequent posts always went completely unnoticed.

Thanks for the link and for getting in touch.

Regards,
  Giovanni

Mark Schwarzmann

unread,
Oct 11, 2018, 2:03:22 PM10/11/18
to cell-lang
On Monday, October 8, 2018 at 12:42:59 PM UTC+3, cell-lang wrote:
I'm a bit surprised you managed to find the website through a post that's nearly one year old. That was my first post ever about Cell, and unfortunately the only one that ever managed to attract any attention on Hacker News. I tried to post again there every time I released something new, but subsequent posts always went completely unnoticed.


 I saw the post back then but didn't have time to look into the language (I'm also following several other experimental programming languages).

Have you tried posting about the language in the programming subreddit? It's a pretty big programming community.

Best,
Mark

cell-lang

unread,
Oct 17, 2018, 5:51:29 AM10/17/18
to cell-lang
Yes, I regularly post in several subreddits, including the one you mention, and it's definitely easier for a post to be noticed there. Still, it would be nice to be able to reach the Hacker News community as well...

Mark Schwarzmann

unread,
Oct 20, 2018, 11:00:26 AM10/20/18
to cell-lang
I forgot something. Another great paper that isn't on Fogus' list, but would be on mine, is Gregor Kiczales' "Towards a New Model of Abstraction in Software Engineering". There is also a lecture of his which is pretty much equivalent to the paper. It has implications for pretty much all software design, including PL design.

-Mark


On Monday, October 8, 2018 at 12:42:59 PM UTC+3, cell-lang wrote:

cell-lang

unread,
Oct 24, 2018, 3:47:25 AM10/24/18
to cell-lang
I've watched the talk a couple times, and maybe I'm missing something, but it seems to me that the ideas outlined in it have not aged well (maybe it's just the fact that these days I find it hard to take seriously anything that mentions OOP, unless it's to criticize it). The general idea of specifying the semantics of a software system at a very high level of abstraction and then provide the compiler/infrastructure with declarative performance hints (that do not affect the semantics) to help it generate an efficient implementation is as valid today as it was back then of course, but my impression is that the class of issues it can be effectively applied to is rather narrow, as it's sort of squeezed between the set of cases in which it's really hard to disentangle semantics and optimization (like, for instance, the choice of algorithms or, in distributed applications, the "division of labor" between network nodes) and those in which the compiler/infrastructure should be able to figure out how to optimize the generated code on its own, without involving the developer at all.

Even many of the examples provided in the talk feel rather outdated, as they have been solved by either compiler (e.g. inlining declarations) or hardware advances (virtual memory), or where just caused by terrible design (spreadsheets and window systems). The DBMS and HPF example may still be valid today, but I'm not nearly competent enough to have an informed opinion on them.

The talk hints at using the same approach for something more than just performance optimization, but I'm not aware of any actual example of that, nor would I be able to imagine one on my own. It kind of feels to me like one of those ideas that initially looked promising, but then failed to yield anything useful, or at least fell far short of my own expectations (I actually got worked up about quite a few of them, in over twenty years of software development. Computational reflection, which is mentioned in the talk, was one of them).

Incidentally, I was initially intrigued by the title of the talk ("Why Black Boxes are so Hard to Reuse"), as I've a strong dislike for the idea of having functions as "black boxes" or, more precisely, I've a strong dislike for using functions to structure computation exactly because a function is too much of a black box. But then the talk turned out to be about something else entirely.

If I had to choose one paper to complement Fogus', my vote would without a moment's hesitation go the original map/reduce paper from Google. For me personally, it was nothing short of a revelation, on par with things like my first encounter with functional programming. That was the paper that made me realize that everything I though about network programming was just plain wrong...

Mark Schwarzmann

unread,
Oct 28, 2018, 9:46:15 AM10/28/18
to cell-lang
The talk is a bit outdated, yes. I think the reputation of OOP at the time of the talk was just about the opposite of its reputation today. Having said that, I heard that the style of OOP in CLOS (which KIczales helped design) and Smalltalk is quite different from what we refer to as OOP today. I have no first hand experience with these languages, though.

But that's tangential to the topic of the talk, which is the difficulty of making useful, reusable abstractions. OOP has failed us in that respect, but the programming community keeps exploring other paradigms. For instance, DOD (data oriented design) is being touted as an alternative to OOP these days, e.g. this talk; C++ keeps increasing its generic/meta programming capabilities (templates, constexpr, concepts, etc.) and relies heavily on them in the STL; Functional programming and the Actor Model have been popularized as the "proper" way to deal with concurrent computation, etc.

Some languages favor a minimalistic style, offering only very modest abstraction capabilities (one of them - functions - is pretty much always there). But all nontrivial software relies on some form(s) or abstraction, so this merely pushes the problem to libraries.

Performance is mentioned so many times in the talk because that's the channel through which abstractions typically leak. One of the motivations behind the DOD paradigm I mentioned is achieving good cache locality, which (its proponents claim) can't be achieved with object-oriented design. In functional programming, where the main abstraction is function composition, you can find yourself in a situation where x -> f(x) is fast on average and x -> g(x) is fast on average but x -> f(g(x)) is very slow on average.

Joel Spolsky made the same point as Kiczales, but he didn't offer any solution to it. Kiczales offers making abstractions into "white boxes" and allow users to modify them according to their needs, in a principled way. Presumably this didn't work as expected, as the programming community turned to other paradigms. I don't have a solution myself, but I find the problem pervasive.

Also, I think the paper/talk is particularly interesting from a pedagogical perspective. From my experience, CS programs tend to overstate the value of abstraction and understate its pitfalls.


On Wednesday, October 24, 2018 at 10:47:25 AM UTC+3, cell-lang wrote:
If I had to choose one paper to complement Fogus', my vote would without a moment's hesitation go the original map/reduce paper from Google. For me personally, it was nothing short of a revelation, on par with things like my first encounter with functional programming. That was the paper that made me realize that everything I though about network programming was just plain wrong...

I'll be sure to check it out, though I'd probably need to learn a bit about distributed programming to appreciate the ideas in the paper.

cell-lang

unread,
Nov 4, 2018, 7:16:09 AM11/4/18
to cell-lang
My remark on any mention of OOP putting me off was made half in jest, and wasn't supposed to be taken too seriously. I did realize that the proposals in the talk had nothing to do OOP. As for Smalltalk, I know very little about it, but I don't see how it solves any of the problems of OOP, which if anything I suspect are exacerbated by the fact that it's OOP all the way down. The only Smalltalk implementation I know something about is Squeak: I think I can see (in part at least) the benefits that such a platform can offer compared to other very dynamic but more conventional languages like for example Ruby, but as far as I can tell, those are the benefits that come from having a highly integrated and homogeneous platform, not from OOP, and I don't see any particular reason they could not be had with a different paradigm. I know nothing at all about CLOS, so I cannot comment there.

I didn't mean to deny the performance problems created by abstraction, or to suggest that we should ignore them. It's just that I'm somewhat skeptical about the specific solution that Kiczales was proposing. It would be great if it could be made to work, and in fact I've tried to apply it to the design of Cell, but I've found it very difficult to do it. Let me give you an example. I'll soon start working on the optimization of relational automata, and a big part of it is to decide what low-level data structures should be used to implement each specific relation. Here are some of the issues I'll have to deal with:

  A) Every attribute of an entity or relationship in Cell is typically stored in its own relation, and can be used in a search. The data structures that are used at the moment are reasonably efficient, but if an attribute is never searched on, it can be implemented even more efficiently using data structures very similar to those used in the Entity/Component architecture, which is a close relative of DOD.

  B) The optimal way to implement an optional attribute depends a lot on how often that attribute is present: an attribute that is present 90% of the time requires different data structures from one that is present only 10% of the time, if one wants to optimize memory consumption. That could make a noticeable difference in the presence of a large number of optional attributes.

  C) The attributes of an entity could be stored separately, each one in its own "column", or some or all of them could be merged together (at the physical level) to get something more similar to the "wide" relations/tables that are common in SQL. The optimal way to merge those attributes/columns depends of course on how they're used.

At first glance, it looks like there's an opportunity to put Kiczales's approach to a good use here: the programmer could augment the declaration of a relational schema with declarative annotations that would guide the compiler through all those choices. But it's not at all clear to me that it would make that much of a difference in practice, or even that it would be the best way to go about it. Case A) for example can be decided easily by the compiler on its own: all it needs to do is scan the entire codebase to see if a given attribute is ever used in a search, and implement it accordingly. Case B) is a bit more complex, as an optimal decision-making requires information that is not available at compile time, but one way to deal with it is to have the generated code keep track of how often the attribute is present, and use that information to transparently switch between different representations at runtime. Case C) is probably the most complex one, but again the compiler could try to guess at the best way to merge those columns with an analysis of the source code. And for both B) and C) the ideal thing to do would probably be to profile the application at runtime and then have the compiler use that information to choose the optimal implementation. That's of course highly nontrivial, and I'm certainly not going to try it with Cell, but it's the kind of thing that has been done before.

One last thing that I would like to add is that abstraction and performance are not always incompatible, and sometimes performance problems can actually be solved with more abstraction. The relational model is one such example: the fact that it's so abstract actually gives a lot of leeway in its physical implementation, and I'm convinced it can provide at least some of the performance benefits of DOD but with a much higher-level programming style. Another example is declarative query languages, where the query engine can rewrite and optimize a query in ways that cannot be done with conventional programming languages.

As for map/reduce, you really don't need to be an expert in network programming to appreciate it. The core idea is (IMHO) absolutely brilliant, but it's also pretty simple. About 20 years ago I read and was very impressed by a paper on network programming written by a bunch of guys at Sun Microsystems. It was sort of a response to CORBA and other "distributed objects" programming models that were high in fashion at the time, and were meant to "make the network transparent" by hiding the difference between local and remote objects. The paper pointed out that the network is NOT transparent and went on to list and discuss a number of issues that made network programming different from local programming: the fact that a network has a limited bandwidth and (more importantly) a latency; that you need to deal with things like communication failures, partial failures and concurrency; and a few more that I can't remember (I repeatedly looked for that paper in recent years but I was never able to find it. If anyone reads this and recognizes the paper I'm talking about, and has a link to it, please let me know). All CORBA and similar models could do was provide a semi-uniform way to invoke methods on local and remote objects, but they could not hide the semantic differences between the two operations. So for a while I just regarded those network programming issues as unavoidable, and believed that the actor model, as implemented for example in Erlang, was the best thing we could hope for. But reading about map/reduce made me see that that was wrong, and that there actually was a way to make the network transparent (in some situations at least), just not one that I could have ever imagined on my own. The core idea is that, instead of building a distributed application by explicitly implementing the software that runs on each network node and all the communication between them, which is very error prone and time-consuming, you can just structure your computation according to some specific abstract model that allows a compiler to automatically generate a complete and very robust network application while preserving its exact semantics. In the case of map/reduce, the computation has to be expressed, very loosely speaking, as the composition of a map and a reduce stage, but that's not the important bit and the same idea can be applied to many other types of computation, and even to stateful software. And that's exactly what I'd like to do with Cell, once I finally get around to working on its network architecture.

Reply all
Reply to author
Forward
0 new messages