Function names as a language

70 views
Skip to first unread message

Meir Goldenberg

unread,
Jan 12, 2026, 11:44:27 AM (3 days ago) Jan 12
to software-d...@googlegroups.com

Hi John,

Thank you for writing this important book.

One idea that particularly resonates with me is the role of functions not just as units of behavior, but as a way of creating a language for the code. For example, a call like

add_null(attribute);

can read more naturally and convey intent more clearly than something like

data.put(attribute, null);

—even though the function itself may be considered shallow.

I’d be very curious to hear your take on this perspective.

Thank you,
Meir

Justin Hill

unread,
Jan 12, 2026, 2:14:34 PM (3 days ago) Jan 12
to Meir Goldenberg, software-d...@googlegroups.com
I'm not John Ousterhout, but I'd like to add some ideas here.

As for the example...

add_null(attribute);

can read more naturally and convey intent more clearly than something like

data.put(attribute, null);

I agree on the former reading more naturally, but would add that it's also vague and too open to interpretation. I would like to vehemently disagree that either example conveys any intent at all.

I would consider set_null(attribute) as a less misleading word choice, but still absent of intent. If you want to convey intent then you need to actually specify the intent, something like disable(attribute)reset(attribute), or forget(attribute).

A function like add_null(attribute) is the sort of thing that makes sense to write, but hides details and opens too many interpretations to a reader. Without context a reader can look at data.put(attribute, null), and while it's clunky it's possible to understand what's going on. I don't even know the language, but it conveys setting an attribute of a key-value collection to null, and it also implies there are many attributes. The example of add_null(attribute) doesn't even convey the idea that the attribute is a scalar value, or that it is related to any sort of collection. If anything it add_null implies a value is being added to a collection.

As for the broader idea of "creating a language" with small functions as a programming strategy...

There are two broad programming language groups, Lisps and Forth-likes, where the emergent act of programming is less about using a programming language in a standardized way, and more about using the language as a programming language toolkit. The style of programming is often to create a language specific to solving the problem at hand. "Programming a Problem-Oriented Language" from Chuck Moore, and the Forth books from Leo Brodie expand on this concept and argue for many small functions. I'd say the language is more amenable to this style than Java, and the examples are much stronger than in something like Clean Code. I'd also argue that this is the approach taken by SICP, but just like Forth, it's more inherent to the Lisp programming model.

I'd say in my experience, creating many small and specific functions does help accomplish a task and deliver software. There is some overlap with DDD ideas, especially, if you can manage to keep the terms in your code very close to the terms used within the knowledge domain as a whole.

That said, while it may help accomplish an immediate goal, my experience is also that the custom-built language is far less amenable to change. Even when done right, the cost to understand the program from scratch becomes exponentially higher. The cost of entry is not just to understand the host language, but also the hosted language. A contributor (even an original author after some time) has to understand more terms and very complex interrelations. Language design is a skill, and most people making software today do not have experience in the domain. Beginner mistakes are to have inconsistent calling conventions, or to violate the principals of least surprise. Experience and discipline help. Intermediate mistakes in my experience involve time management and planning: "Oh yes, it's an immutable collection, although if you do X it will mutate Y. I needed to do Z and it was easier this way. I thought I'd get that fixed, but I do plan to update it in a future version..."

In any case, I would say I've seen projects follow this "creating a language" approach in numerous languages. It works better in languages like Lisps and Forth-likes where this is an established programming approach. It's much less helpful in verbose languages like Java or C#. It's almost impossible for me to follow in more abstract contexts like Haskell/Scala type systems, in Rust/Kotlin metaprogramming (macros/DSLs), or in more disparate contexts like collections of shell scripts etc.

"Creating a language" works much better on "passion projects" with small numbers of dedicated and determined authors. For projects in larger contexts of time and scope, especially with a rotating cast of contributors, I find it's much more useful to stick to idioms established in the language rather than trying to create your own idioms.

Anyway, thanks for reading!
Justin


--
You received this message because you are subscribed to the Google Groups "software-design-book" group.
To unsubscribe from this group and stop receiving emails from it, send an email to software-design-...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/software-design-book/CAP1Du3XhhUt4cx6yVfC6bnWwG34w9j7G7s7aEwox6d121je72g%40mail.gmail.com.

Meir Goldenberg

unread,
Jan 12, 2026, 3:59:43 PM (3 days ago) Jan 12
to Justin Hill, software-d...@googlegroups.com

Hi Justin,
Thank you for the detailed reply — it helped clarify the risks much better than I could articulate.

I understand that languages like Lisp explicitly support language growth. But when working in a language like Python, do you ever treat “this introduces necessary vocabulary” as a deciding factor for defining a function?

Best,
Meir

Justin Hill

unread,
Jan 12, 2026, 4:40:10 PM (3 days ago) Jan 12
to Meir Goldenberg, software-d...@googlegroups.com
Thanks Meir!

...in a language like Python, do you ever treat “this introduces necessary vocabulary” as a deciding factor for defining a function?

Good question, I'm still learning myself. I'd say that there are multiple points in "The Zen of Python" that I'd read as discouraging a large vocabulary:
  • Explicit is better than implicit.
  • Special cases aren't special enough to break the rules.
  • There should be one-- and preferably only one --obvious way to do it.
  • If the implementation is hard to explain, it's a bad idea.
  • If the implementation is easy to explain, it may be a good idea.
And only a few points that I could read as encouraging a large vocabulary:
  • Beautiful is better than ugly.
  • Readability counts.
...but I don't think adding to vocabulary is the only way to achieve beauty/readability. I'd conclude that it's not idiomatic in Python to create projects with large vocabularies. That said, there are a handful of examples of language extension that began as nonstandard additions to the language, types and datatables. That's not a huge list, though.

For me, the "necessary vocabulary" of a project will revolve around two things:
  1. The domain. This informs the nouns, the concrete types and the structs/classes/etc. With finance as an example, software will fare better using terms like "invoice" and "charge" etc in exactly the same way the business uses them. It is very common, in my experience, for engineers to take a fuzzy understanding of the problem being solved (usually arising from fuzzy definitions, often further obscured through a telephone game) and define their software in terms that are unique to the software. The business may use the word "Invoice" and the software may define the same thing in terms of a "Receipt" resulting in perpetual miscommunication and introducing an unending need for translation. I've seen the same happen in nearly every domain I have worked.
  2. The actions being performed. This informs the verbs, the abstract types/interfaces/traits and the functions/methods/etc. This is more tied to the inherent complexity of the problem being solved, and what abstractions will actually be useful to solve the problem. Does it make sense to have lots of mutable classes? Stateless workers? Monolith? Distributed services? Request/reply? Event-based? Streaming?
For me, I think it's easier to motivate a valuable abstraction (like the "Narrow and Deep" concept from Ousterhout's book) than it is to motivate a valuable vocabulary. The first to me feels easier to catch when refactoring, when you see a repeated pattern of actions throughout a project with only minor changes, (or many projects) it's easy to recognize what a valuable abstraction will be.

I think without actually seeing the repetition, any attempt to make valuable abstractions is going to start as a guess at best. I think it's worth putting a lot of scrutiny on the semantics of a function (or API etc) that is referenced in many places, and less worth scrutiny for functions of limited use (a "helper function" only referenced in the file it's defined in, for example).

In other words, the less times a function is referenced, the less effort it takes to change it. When call sites are introduced for a function, these are the best times to revisit the other uses of the function and the total utility of that function. If you are always performing similar things before and after, does that mean the function doesn't do enough? Is it doing too much and always needs to be corrected afterward, and the function is doing something surprising and unnecessary? Are there many comments explaining what the function does, and the naming is unclear?

Just some food for thought, I'd be interested in more opinions on the subject, I think the amount of "necessary vocabulary" is also going to vary drastically by style and taste. I think Lisp/Forth mentioned earlier in the chain have communities that favor styles/tastes for larger vocabularies. And as a full disclosure, I really really love programming in those languages, especially Forth and other concatenative languages, but I think it's hard to argue that the "create your own language" approach makes a project anything but as challenging to work on as a project in a different programming language entirely.
Justin

John Ousterhout

unread,
Jan 12, 2026, 7:13:25 PM (3 days ago) Jan 12
to Justin Hill, Meir Goldenberg, software-d...@googlegroups.com
To me,

add_null(attribute)

is no more natural or obvious than

data.put(attribute, null)

Both are pretty obvious in terms of what they do. If anything, I think 'data.put' is slightly clearer because the term 'add_null' makes me wonder what it means to 'add' if the attribute already exists (will this make a second one? generate an error?).

Both are shallow in my book, with 'add_null' being even shallower than 'data.put'. So, I see disadvantages to 'add_null' but no advantages.

At a higher level, I don't know what is meant in this discussion by 'creating a language' or 'introducing vocabulary'. Doesn't every interface 'create a language' and 'introduce vocabulary'? Is 'introducing vocabulary' a good thing? For example, is a bigger vocabulary better than a smaller one?  I would answer 'not necessarily' to both of these questions, just as I would answer 'not necessarily' if you asked whether introducing new interfaces is a good thing. I suspect that this will come around to something like deep and shallow classes. If a new 'vocabulary' is easy to learn and allows me to express a large number of tasks cleanly and simply, then it's probably good. If the vocabulary is large and complex, with lots of special cases (such as 'add_null'), or if it doesn't make it easier for me to express lots of tasks, then it probably isn't good.

-John-

Justin Hill

unread,
Jan 12, 2026, 9:33:28 PM (2 days ago) Jan 12
to John Ousterhout, Meir Goldenberg, software-d...@googlegroups.com
Appreciate your input!

I was reading "creating a language" and a "large vocabulary" as favoring a large quantity of shallow functions.

I'd also recommend checking out the conversation with Uncle Bob, specifically the section on "Method Length" https://github.com/johnousterhout/aposd-vs-clean-code

Meir Goldenberg

unread,
Jan 13, 2026, 11:31:21 AM (2 days ago) Jan 13
to John Ousterhout, Justin Hill, software-d...@googlegroups.com

Thank you for the follow-up. Let me give another example to clarify what I mean.

Consider computing the area of a triangle with side lengths stored in variables a, b, and c using Heron’s formula. That formula is expressed in terms of the half-perimeter of the triangle.

Reading triangle_perimeter(a, b, c) / 2 requires less mental effort than reading (a + b + c) / 2, because the former explicitly states the geometric meaning of that sum—it introduces vocabulary specific to the application domain.

Justin Hill

unread,
Jan 13, 2026, 12:20:00 PM (2 days ago) Jan 13
to Meir Goldenberg, John Ousterhout, software-d...@googlegroups.com
With that exact example, I cannot imagine a context where it would be worth defining a function taking exactly 3 numbers as arguments and returning the sum. If it's truly difficult in the context for a reader to understand why three numbers should be added, a comment (or extra variable) can also convey the context needed.

I think a function for Heron's formula makes more sense as a reasonable abstraction:

def triangle_area_from_sides(a, b, c):
    """Given three sides of a triangle, performs Heron's formula to return the area."""
    perimeter = a + b + c
    s = perimeter / 2 # s = semiperimeter
    
    return math.sqrt(s * (s - a) * (s - b) * (s - c))

Meir Goldenberg

unread,
Jan 13, 2026, 1:56:14 PM (2 days ago) Jan 13
to Justin Hill, John Ousterhout, software-d...@googlegroups.com

Hi Justin,

Thank you very much for the insight.

To clarify your position, suppose I have a Triangle class with accessors (e.g., @property in Python) for its three sides—setting aside the question of whether exposing them directly is a good design choice.

In that context, would you still see no reason to define a perimeter method?

Thank you,
Meir

Justin Hill

unread,
Jan 13, 2026, 2:03:55 PM (2 days ago) Jan 13
to Meir Goldenberg, John Ousterhout, software-d...@googlegroups.com
If you're asking whether a Triangle class could reasonably expose a perimeter(self) function or perimeter property, of course

If you're asking if a Triangle class could reasonably expose a static function (i.e. no self/this) as described before of perimeter(a, b, c) that seems silly to me 

Justin Hill

unread,
Jan 13, 2026, 4:01:12 PM (2 days ago) Jan 13
to Meir Goldenberg, John Ousterhout, software-d...@googlegroups.com
I'm making the assumption that in the example already having a Triangle class, the abstraction of a triangle is already a justified abstraction in whatever context it's being used in. If a triangle class is justified, I don't see an issue with exposing common properties of the abstraction, and a perimeter is a common property. Whether or not it's worth implementing would have to depend on the context.

Also I think I would disagree that "attaching the name to the object ... makes it worthwhile." I don't think this is about the value of names beyond a perimeter being a known concept in the domain of geometry. 

For me the difference in value is that if there is already a triangle abstraction, perimeter(triangle) (equivalent expressive power to OO syntax of triangle.perimeter()) is a superior calling convention to perimeter(a, b, c) and allows for a much more "narrow and deep" interface.

The implementation could be a reference to a static field that triangle saves, the addition of 3 side lengths, it could be calculated from 3 points in space, it could be calculated from two sides and an angle... Or it could be inferred from all of these depending on the known properties.

I'm assuming a Triangle class may be more sophisticated than a struct containing three side lengths. It's hard for me to imagine a context where a primary concern of triangles would be side lengths... Maybe something like software for quilting patterns? 





On Tue, Jan 13, 2026, 12:39 Meir Goldenberg <mgol...@gmail.com> wrote:

Hi Justin,

So, the free function triangle_perimeter(a, b, c) does not add clarity beyond  perimeter = a + b + c. But when the same computation is exposed as triangle.perimeter(), it may add value beyond perimeter = t.a + t.b + t.c. What is it about attaching the name to the object that makes it worthwhile?

Thank you,
Meir

Meir Goldenberg

unread,
Jan 13, 2026, 4:20:15 PM (2 days ago) Jan 13
to Justin Hill, John Ousterhout, software-d...@googlegroups.com

Hi Justin,

So, the free function triangle_perimeter(a, b, c) does not add clarity beyond  perimeter = a + b + c. But when the same computation is exposed as triangle.perimeter(), it may add value beyond perimeter = t.a + t.b + t.c. What is it about attaching the name to the object that makes it worthwhile?

Thank you,
Meir

Dan Cross

unread,
Jan 13, 2026, 7:38:52 PM (2 days ago) Jan 13
to Meir Goldenberg, Justin Hill, John Ousterhout, software-d...@googlegroups.com
On Tue, Jan 13, 2026 at 4:20 PM Meir Goldenberg <mgol...@gmail.com> wrote:
> Hi Justin,

I'm going to take a stab at this.

> So, the free function triangle_perimeter(a, b, c) does not add clarity beyond perimeter = a + b + c.

No, it emphatically does not, assuming there's some context here that
makes it obvious that a, b, and c each represent the length of a side
of a triangle.

The signature, `triangle_perimeter(a, b, c)`, forces the programmer to
know superfluous details about the lengths of the triangle's sides.
Wrapping a named function around trivial grade-school arithmetic in
_this_ manner doesn't buy you anything: it's not useful because it
doesn't tell anyone anything they don't already know. This name
doesn't usefully raise the level of abstraction.

It's actively harmful because it comes with the substantial cost of
defining something concrete and named (a function or whatever) in a
way that divorces it from the context. The level of abstraction is
lowered by tying it to the details of side lengths, but as defined it
appears easy to misuse: I don't see anything preventing me from
passing any three quantities a, b, and c that I want to that function,
as nothing enforces that they are sides of triangles, or even if they
are, that they are unique sides of the same triangle.

Similarly, _by itself_ just having access to the lengths of the sides
isn't terribly useful information. Some triangle described only by
those lengths doesn't uniquely identify that triangle, but rather,
describes an infinite number of similar triangles, as they can be
arbitrarily translated, rotated, or reflected.

So as an interface this is simultaneously less flexible while being
overly permissive.

> But when the same computation is exposed as triangle.perimeter(), it may add value beyond perimeter = t.a + t.b + t.c.

The consumer of `triangle.perimeter()` doesn't have to know the
details of how the triangle is represented and doesn't have to care.
They don't have to care that there are three quantities, what those
quantities are, their order, or anything else.

That said, it's true that one needs access to the side lengths to
implement Heron's formula, in which case it may be clearer to simply
sum them oneself. But that wasn't the question: the question was
whether calling something called `triangle_perimeter(a, b, c)` was
better than `a + b + c`, and in context, it's hard to see any argument
for how it might be. For that matter, it whether using Heron's formula
to compute the area also seems like a pretty irrelevant detail: a
better abstraction would be to expose an `area` method and implement
that however one sees fit. Critically, the caller doesn't have to
care.

> What is it about attaching the name to the object that makes it worthwhile?

It's not about attaching the name to the object, it's about separating
the abstraction of an interface, as consumed by some caller, from the
concrete details of a type's representation and the implementation of
algorithms that operate on that type and its representation.

I hate to anthropomorphize it, but the "Triangle" class already
"knows" what it is and how it is represented; the interface
`triangle_perimeter(a, b, c)` forces the programmer to contend with
that when they wouldn't, and indeed shouldn't, otherwise have to.

`triangle.perimeter()` is a pleasant syntactic interface that hides
that, but isn't strictly necessary: C programmers would be used to
seeing something like `triangle_perimeter(&triangle)` or something
similar that would do largely the same thing. Similarly, they wouldn't
be surprised by something like, `triangle_area(&triangle)`. The thing
that they might, and should, object to would be something similarly
named that takes the three side-length arguments: particularly if they
have to extract those from the object instance anyway.

- Dan C.
> To view this discussion visit https://groups.google.com/d/msgid/software-design-book/CAP1Du3XWTL4K8HV6J0E97iHrVquKVspoBaCVQyn9fZAJk6kN-w%40mail.gmail.com.

Meir Goldenberg

unread,
Jan 14, 2026, 11:25:27 AM (21 hours ago) Jan 14
to Dan Cross, Justin Hill, John Ousterhout, software-d...@googlegroups.com

Thank you all.

Would defining a free function like triangle_perimeter(a, b, c) be also considered a violation of the principle from Chapter 7, “Different Layer, Different Abstraction”?

Thank you,
Meir

Jonathan Camenisch

unread,
Jan 14, 2026, 10:14:08 PM (11 hours ago) Jan 14
to Dan Cross, Meir Goldenberg, Justin Hill, John Ousterhout, software-d...@googlegroups.com
I would like to interject an observation on the original inspiration of this thread:


> One idea that particularly resonates with me is the role of functions not just as units of behavior, but as a way of creating a language for the code.

Many good points have been raised here about the disadvantages of adding too many functions or the wrong functions. I think it's notable that all of these points could be viewed as arguments in favor crafting a better language as opposed to a worse language. That is, we could affirm that introducing functions does have the effect of shaping a language for the codebase, or a way of forming and communicating ideas. However, all else being equal, creating more vocabulary does not create a better language. Rather, much as with deep modules, the best language would be the one where each term provides maximum value, both in itself and in how it works with other terms.

That is a simplistic way of describing it, but I hope I haven't confused the point. I think the "linguistic" impact of the way you design your functions is pretty profound, and the material in this thread reinforces that.



Reply all
Reply to author
Forward
0 new messages