C++ Syntax Possibilities

196 views

Skip to first unread message

Bill Clare

unread,

Nov 16, 2016, 2:00:59 PM11/16/16

to SG8 - Concepts

C++ Syntax Possibilities

Introduction

This note suggests some generalization and simplification of C++ language syntax, largely based on extensions of what is currently in “Concepts Lite”. There is some general discussion of reasons for what exists, and of future evolution. Since it is a rather long note for a forum entry, skipping to the Examples section might provide a quick overview of some of the resulting possibilities.

A Word file with the same contents is also attached.

Objectives

Previous notes posted here have made suggestions for simplifying and expanding approaches to C++ templates and Concepts. This note is designed to expand on that and to suggest a start at a significant simplification of code constructs and syntax.

As additional capabilities, have been added to the language, C++ syntax constructs have become increasingly distracting relative to basic code functionality. Some emphasis on simplification is already happening, such as not having to specify the same type repeatedly in template parameters, in requirements clauses, and in function parameters. However, the complexities seem to be growing at a greater rate.

The attempt here is to define syntax that reads naturally in expressing the programmer’s abstractions and intent, rather than its implementation characteristics. A measure of this is code that “does what it says” and “says what it does”, which is hard enough to do anyway, without having to be incorporated in syntax to tell the compiler what it is trying to do. The code does need to speak to the compiler as well as the human reader, but, if well crafted, doing the latter should suffice for the former. Ultimately, the focus needs to be on the abstractions needed by the programmer to convey the essence of a computation or of a data type. Often, this abstraction can also lead to increased generality.

This note discusses an incomplete series of topics, and there are at best incomplete arguments for any particular topic. The hope, however, is that taken together, this might suggest fundamental approaches that significantly increase the abstraction level of the language.

Precision and Lack of Clarity

A Latin teacher once described a good translation as being both accurate and idiomatic. The analogy to programming languages is a need for both precision and clarity. Ultimately syntax needs to be precise, both for the compiler and the language expert. Equally important though, it requires clarity to convey the intent of the programmer, as well as the demands of syntax.

Consider const, volatile, reference and reference reference. It is easy to add qualifiers such as const and reference to a type specification. For removing these qualifiers, a variety of syntax circumlocutions are invented, some of which remove the qualification, others of which do not. While there is value in supporting capabilities through library facilities rather than language change, simple use of a negative symbol could lead to code that is simper and intuitively expressive; for instance: (type ~const ~&).

Often focus on language processing precision leads to confusing and arcane constructs that suffer in programmer clarity such as:

requires requires {typename …type }; . . .

concept bool {return requires . . . };

Here the ellipses indicate areas of code where the programmer’s construct is buried in a cascade of syntax constructs. This lack of simplicity and clarity in syntax has undoubtedly raised some suspicions about the underlying consistency for Concepts, and this hesitancy may account for much of the reluctance to approve Concepts for C++.

More difficult are library facilities for interacting with compiler constructs for multiple types of references (rvalue, glvalue, prvalue, . . .) and “reference collapsing” rules through an amazing set of “std::xxx” constructs. It is an interesting question, if there are many experts who could look at a set of “decltype” declarations and correctly parse their meaning. More direct approaches to capturing programmer intent here is discussed under the topic of function parameters.

Compatibility

Of course, compatibility is a dominant issue. Other approaches, such as Java, D and C#, are spin offs that have sought simplification in critical areas, but have lost much of the basic underlying power of C++. An alternative is new syntax, that leaves the base capabilities in place, and even suggests enhancements.

However, forcing new features into old syntax can lead to awkward constructs that often disguise the programmer’s intent. Constructs for references to references with internal collapsing rules, to permit move semantics is an example of such obfuscation.

It is useful to consider compatibility of language elements, constructs, abstractions and performance separately from compatibility of concrete syntax. New syntax can often be achieved solely with extensions and additions. However, revisions at the level of syntax parsing need not necessarily exclude the possibility for alternative compiler syntax processing signaled by pragmas. Such pragmas are already used by tools, for such purposes as debugging assertions and enforcing local consistency for coding styles and standards. Not systematizing these with language facilities adds considerable code complexity along with non-standard compiler options and directives, and often encourages preprocessing macros.

As a fairly simple example of syntax extension, the compiler could be made aware of new line indications and indentation to simplify punctuation and emphasis clarity. “Modernization” of C++ syntax could start with removal of tri-graph support.

Such considerations are at the boundaries of compiler, preprocessor and tool support. Standardization of a minimal canonical set of controls can increase current portability, consistency and clarity.

The stability issue then is compatibility for interoperability of new and existing code, that may be expressed with somewhat different syntax. That is, compatibility based on what the language says, rather than necessarily on how it says it.

Terminology

Terminology used for syntax can be obtuse and even misleading. “Static” is an interesting example (see below).

Sometimes terms are introduced that have technical meaning for language processing or are based on, at best, an analogy with mathematical concepts that seem somewhat obscure. Mathematical rigor is a lofty, but unachievable and misleading goal for software. Mathematical underpinnings can be useful but seem best left to technical documentation, rather than common use. For instance, axioms are useful for theorem proving, but the implied notion is misleading when they are only definitions of type properties. To be really useful they need to be incorporated in a test facility.

A Goggle query for “regular types” produces an assortment of unrelated concepts. “Structured” seems more user friendly than “semi-regular” to indicate operations for allocation, deallocation, copy, move, assign, and swap. Similarly, “entity” could indicate basic classes that are “structured”, and which also have comparison and ordering operations defined.

Other approaches, such as reuse of existing keyword, lead to ambiguity and confusion.

Syntax Rules

The incredible complexity, to which the C++ standard has evolved, is breathtaking. Undoubtedly much of this is unavoidable; it comes with the territory and the need to be precise. Perhaps, though, there are significant strategies for simplification.

One such approach that seems to drive complexity, to a significant extent, is a desire for the syntax to be complete, in the sense that the compiler tries to make decisions as to the meaning of constructs under a variety of different circumstances. This can lead to design of elaborate interpretations that assume that “the compiler knows best”. Under particular conditions, the compiler driven interpretations may be unintuitive, and even ambiguous, relative to the programmer intent. The alternative for rule simplification is not to guess at intent, but, rather than defining exceptions to exceptions, simply declare an error and require that the programmer provide clarification.

An immediate example, pehaps, is the seven situations used for overload resolution. It can make sense for the compiler to apply reasonable conversions to meet parameter declarations. However, when this produces multiple candidates, and there is not an easily remembered and intuitive rule for selection, don’t try for a “best” match.

Type Extensions

Previous notes have suggested that template type parameters are essentially “Abstract Types”. That is, a notion of incomplete types that abstract from details of size, ordering and layout associated with struct based declarations. Abstract types can be extended through instantiation; much as base types are extended by derived classes. Such extensions can be bidirectional as in role-actor constructs,

In this way, declarations of the Abstract Types with all of the constructs for template definition can allow template code to appear, to be written and to be compiled essentially the same as any other code. Other notes discuss binding of compiled templates to actual concrete types.

Variants provide a complementary notion of type extension. Rather than providing common structure that can adapt to different functionality, they support common functions that can adapt to different structure.

Aspects

As a basis for management of software component dependencies, a class declaration can specify different interfaces, or Aspects, for different users. This is essentially a generalization of the “friend” concept.

Aspects and Abstract Types both form partial views of a type. Aspects are parts of an abstract or concrete type. Abstract Types are parts that still need to be mapped to a concrete type. From this perspective, Abstract types are essentially an Aspect of a type.

To avoid over specification of an interface, a user can require some subset of Aspects, while still allowing for use of optional constructs, where available. For instance, a complete set of member and non-member functions for comparison can be constructed from generic “<” and “==” operators, while still allowing for direct use of other constructs if available.

Alternatives can also be used to avoid under specification of a Capability by allowing variant approaches. For instance, an iteration Capability could be implemented to use a “for” loop using end, last, count or conditional member and non-member termination indicators, or with list recursion, or with either, depending on the Capabilities supplied.

Capabilities

Concepts are a specification mechanism for template type declarations. Concept syntax specifications are proposed for generic type parameters and for their interactions. In this sense, they are full equivalent to, and indeed must match, the declaration syntax for the supplied types.

Concepts, then, are basically an interface specification. This suggests two generalizations. First, the same declarations are applicable to, and could be used similarly for, both the generic declarations and the supplied type declarations. The second observation is that such declarations are significant, whether or not they are used for templates. The third observation here is that, where the declarations are equivalent but do not match directly, then mappings can be provided.

This suggests a more general notion of “Capabilities” as a fundamental construct for specifying both what is required and what is supplied by an interface. A user specifies the Capability declarations for what it needs, and the implementation specifies them for what it supplies. The Capability declaration, itself, can specify mappings for alternative constructs for its use or support.

This supports the fundamental abstraction of a software system consisting of Capabilities built from other Capabilities.

Capabilities extend type extractions. Type abstractions encapsulate representation and method for an interface based on syntax declarations. Capabilities capture basic invariants that are also independent of representation and method, and they also generalize this to independence of syntax expression and data source.

Also, Capabilities extend the notion of a class which abstracts operations on a single data object, to an abstraction of operations on the interaction of multiple interacting objects. These interactions can be at a common level, as in transformations, or nested, as with STL containers.

Capability syntax is inherently an extended Abstract Type declaration. Capability declarations consist of a parameterized set of type specifications along with value expressions that use these types, and, where required, they include mappings for one form of type or value expression to another. Proposals for a Concepts specification provide a base for such an approach.

Since Capabilities are specifications at the level of interfaces, they can also provide a base for systematically inserting code for metrics, reporting, logging, error handling, etc.

Mappings

Often there is a direct correspondence from the constructs needed by a Capability and those supplied for it. When the Capabilities are equivalent, but specified differently, a Capability interface declaration can provide mappings both for data, for operations and for types.

Data mappings can be for location, organization, representation and format.

Operation mappings include basic computational constructs. The STL provides examples for iteration (begin, next, end), access (read, write), structure (copy, move, assignment, swap) and comparison (ordering, equivalence, equality). Invoke, callable, and parameter packs provide models for functions and their parameters. Considerable extension callable is possible, such as providing “invoke” specifications for remote procedure calls, queued calls, messages and events, and exceptions.

The capabilities above can be represented with an abstract syntax for value expressions. This can be extended to type expressions that ensue coherent correspondence among related types and type constructs. These could include capabilities for generation, modification and test (reference classification, qualifiers, attributes, derivation, inclusion, traits, . . . ), compare, transform (cast), etc.

Parts

Systematic extension of such capabilities could form the basis for fundamental components necessary to drive some domain specific languages or even a small general purpose application development language.

Additional Capabilities can be developed for logic (and, or, xor, not), arithmetic (plus, minus, multiply, divide, remainder), bit fields (get, set), references(pointer, index, map, path), decisions (condition, tree), database (read, update, insert, remove, find), sequence( visit, accumulate, merge, sort, , . . .), transactions (do, undo, prepare, commit, backout), control (start, init, re-init, run, pause, stop), state machines (state, event, transition), traversal (route, visit), use case (objective, step), etc. More basic capabilities support such capabilities as readable, assignable, call, parameter, referenced, related, etc. These capability specifications can provide a mechanism for alternative expressions, as well as hiding the complexities of C++ type syntax.

The ultimate objective however, is general support for parts driven software development and construction. This has been a long sought after goal for many, and often there have been extravagant claims for success. Existing technology has indeed produced a considerable collection of parts. The focus here, however, is not on parts themselves, or on discovering the holy grail of a fundamental and universal set of parts. Rather, the key focus is on the mechanisms for connecting parts.

Abstract Syntax

An abstract syntax represents the semantics of a syntax construct and the components or terms used to denote those semantics. As with other abstractions there is often generality here that can lead to increased functionality.

A concrete syntax represents a particular representation of the components, usually as some set of keywords, brackets and special symbols. Alternatively, it can use diagrams and tables and other presentation forms. Capabilities also represent a way of representing the semantics of a construct independently of any particular syntax. In addition they extend this with support for alternative syntax components and for alternative data.

An abstract syntax supports interoperability among different syntactical representations. Capabilities support interoperability for different syntax, syntax components and data sources.

Name Use Differentiation

Clarity in the use of names for identification of different categories of identifiers is critical in any syntax. This is particularly important for generic code. For instance, elaborate syntax can be used with brackets and key words to distinguish type and object names and their scopes. For clarity in code reading, naming conventions, such as leading caps or trailing “type” or plural “s” suffixes are often used to distinguish type names from object names. Far simpler, and consistent, would be a distinguishing naming convention. This would provide language consistency and clarity. More significantly, it could, in fact, greatly simplify the syntax.

An extreme approach to this can be found in what is known as Hungarian Notation, a rather elaborate approach to conveying type information. More appropriate however, is to focus on the differing uses of names for the primary abstractions of object, type and abstract type. For instance, reference to types, classes, structs, and unions might be identified with a prefix of “#”. Abstract Types represent a higher level of abstraction and could be declared with a prefix such as “##” or “!”. The prefix can be used when the identifier is introduced to simplify syntax. It is probably not as important for subsequent use of the identifier, but its consistent use can enhance clarity in many contexts.

Example: template<typename T> concept bool C = true;

becomes: ##C

Also, since they are fundamentally very different things, it is useful to distinguish assignment of type expressions (←) from object expressions (=), as well abstract type and namespace components (::) from object components (.).

With this, most uses of “typedef”, “typename”, “using”, “rebind”, and even “template” and angle brackets can be eliminated. Consider the following, which just sputters with “type” syllables:

typedef typename BaseType::value_type value_type

An alternative might be a type expression such as:

#value ← ##Base::#value;

Auto

Auto is a useful placeholder for a type that can be deduced. This, however, loses information since the actual resulting type name is hidden. An alternative is “#type” to capture the name of the type.

For example:

#length length = vector.size( ) ;

For consistency, the basic auto keyword placeholder could be replaced with a simple “#”, and thus eliminate confusion with allocation concepts.

Examples

An basic example can be taken from STL proposals:

      template<typename T>
            concept bool Readable() {
                  return requires (T i) {
                        typename Value_type; {
                        {*i} -> const Value_type&;
                  };
            };

This defines a property of iterators. The more general concept is Readable, which captures the general Capability to extract a value from an expression, and which is suggested below.

The example assumes that the base language defines operations for the use of readable and writeable values (such as assignable). Capability declarations below provide a means for accessing values through use of diverse syntax constructs. For instance, a template might access a data value in a container or where the value itself is generate by a particular supplier through a function. The using code doesn’t care

Examples - requirements in square brackets use &&, ||, ? and ~ for “and”, “or”. “optional” and “not” expressions.

Assignable {##Writeable w && ##Readable r; w=r , } ;

Readable {

// Value

(##Value v) { [v ?const ?& ) || (v() ?const ?&) ||( v.get() ?const ?&) };

// Access

(##Value v, #Readable r) { r → v };

// Pointer
(##Pointer p , ##Pointer::#Readable r ) {* ?const p → r};

// Function

(##Function f , ##Argument a, # Readable r ) {f(a) → r} ;

// Member

(const ##Base b, #Member m, const # Readable r)
{[ b.m → r) ] || [ (b.m( ) → r ] || [ (b.m.get( ) → r ] )};

// Component

( ##C ← (##Map||##Vector||##array) c, ##Index i, # Readable r) {c[i] →r} ;

. . .

// A common Get function, similar to “invoke” for “callable” could be used.

Get (##Pointer p, # Readable r)

{*p → r) ;

Get (##Function f , ##Parameter a, # Readable r)

{f(a) → r};

Get (##Base b, Member m, # Readable r

{( b.m → v ?&) || (b.m( ) → v ?&) || (b.m.get( ) → r};

Get ( ##C ← (##Map||##Vector||##array) c , ##Index I, ##C::# Readable r)

{ c[i] →r };

};

. . .
##Writeable { Readable ~const, . . . } ;

##Modifiable { ##Readable && ##Writeable };

Then we can have:

x int ←##Writeable::Set (map, index) ; /* specified by user or supplier

y int ← ##Readable:: Get (function, argument) ; /* specified by supplier or user

x = y ; //invokes Get and Set

The above is a fairly trivial example, but it illustrates the principle of separation of mechanisms for access to data constructs from the functional operations on the data values.

Container access provides some interesting possibilities for syntax commonality for generalizations such as “insert”.

Insert (##V←##Vector v, ##V::#Element e) →##V v

{v.emplace_back (e) →v};

Insert (##V←##Vector v, ##V::#Positon p, ##V::#Element e) → ##V v

{ v.emplace(v, p, e) → v};

Insert (##L←##List l, ##L::#Position p, ##L::#Element e)→ ##L l

{ l.insert(p, e) }→ l };

Insert(##M←##Map m, ##M::#Key k, ##M::##Element e) →##M m
{ m.emplace(k, e) → m };

Insert(##M←##Map m, ##M::#Pair p) →##M m
{ m.emplace(p.first, p.second) → m };

Insert(##M←##Map m, ##M::#Element e) →##M m
{ m.emplace(e.key( ), e) → m };

Or, more generally:

Insert (##C←##Container c, ##C::#Element e) →##C c

{ (Insert (c, e) → c} ;

Insert (##C←##Container c, ##C::#Position p, ##C::#Element e ) →##C c

{ (Insert(c, p, e) →c };

And, still better yet, overload “<<” for container and stream insert and “.” for element access and use:

c << e ;

c <<(p) e ; /* or */ ; c.p << e ;

Another useful extension could be based on “invoke”, to supply a continuation function call. This supports a useful pattern for pipes. Also, this could be combined with tests to handle special conditions, errors and exceptions.

Somewhat different are requirements for expressions over type constructs. For example:

      template<typename T>
            concept bool Input_range() {
                  return requires(T range) {
                        typename Iterator_type;
                        requires Input_iterator;
                  };
            };

This can become:

##Input_range ← ##Range::#Iterator::category == Input ;

An objective for these syntax constructs is that, with little practice, the syntax can be read, and generally written, directly from their intended semantics.

Principles

These examples illustrate fundamental principles of context access, of process abstraction, and of generality.

· The approach effectively provides a general mechanism for computations over values, that is separated from the context in which the values reside, and from where they must be extracted and inserted.

This essentially provides a base for exploiting many of the constructs and facilities of functional programming.

· It provides a mechanism for system and user libraries to encapsulate and thereby exploit the extensive syntax mechanisms that are evolving for C++, in a manner that lends to increased readability and understanding of the using code functionality.

· More generally, it can support a range of abstractions for tyeps, data and processing, which can then be separately mapped to concrete implementations.

Function Parameters

Much of the complexity of parameter type specification is tied up with the complexities of mapping function call arguments to declaration parameters.

Function declarations specify how the function can use supplied parameters with various categories of modes for read, reference, modify, copy and move. C++ specifications use const and reference decorators to indirectly indicate what is intended. This started out to be reasonably straightforward, until references to references and special support for “rvalues” were needed.

Much, if not all needed simplification and clarity can occur with syntax for the declaration of a parameter that directly indicates its purpose and its use. In many cases this can start with simply distinguishing inputs from outputs in the function declaration. This is done in some languages with specific “in” and “out” mode specifications for the parameter. Since “in” is the most common it can be elided. Also, whether or not a copy is made internally by, or for the function is of no concern to the caller or to the declaration. With this “out” parameters indicate values that the function can modify. Return parameters are values the function can return, including tuples. These categories can be specified separately. For instance:

functionName (inParms . . . )(outParms . . . ) → result;

Function invocation could require the extra brackets for the sake of clarity. The compiler can decide, at least for most cases, among copy, move (if available), reference, const, pointer extraction, copy elision, etc. Also, if there is some particular need in the declaration, a programmer can specify conversions, to be used or preferred.

“Move semantics” introduces a completely different mode of parameter use; i.e., values that can be invalidated. These are used for copying of temporaries and are chosen implicitly through function overload. These can be clearly indicated with an additional mode category.

functionName (inParms . . . )(outParms . . . )(moveParms . . . )→ result;

There is also a category of generic parameters that can be passed to a function and can be of any of the above categories. These are then forward to a specific function invocation for which all the rules of function overload selection and validation are applied when the function is instantiated. Such usage can be indicated, for instance, with double parens; i.e. ((genericParms)).

Captured Names

Although it is often looked upon as dangerous, use or “capture” of names outside of a function declaration or definition is common, but often lacks clarity.

For lambdas, this use is explicitly indicated with special bracket syntax. This can be extended for any function declaration.

For example:

functionName (inputs . . . )(outputs . . . ) → result [=name1, &name2, . . . ] {. . .};

or better,

functionName(inputs. . .)(outputs. . .) → result [name1 . . .][name2 . . .] {. . .};

Also, while capture occurs in the scope of the function declaration, it is can also be useful to provide capture in the context of the function definition and, possibly even with generics, in the instantiation context. This could be indicated with repetition of prefix characters for the unusual latter cases, e.g. “==” and “===”. Since, such uses are unusual and require special notice, simpler might be to use double and triple brackets in the declaration.

As part of this, the square brackets can also introduce template requirements:

fun(##A a, ##B b) (##C c)→ result [x, (c= a*x+b ) ] { . . . } ;

Name Scope Use

Clarity in the scope of names used in code is critical to understanding the intent of the programs in which they occur. This is relevant at different levels of brackets, functions, parameters, name scope, derived class definitions and local declarations, etc.

Lack of clarity of the scope of a name can lead to interesting magazine columns. More critically this it can lead to subtle bugs especially when code is later modified and extended.

Prefix characters are useful here also to highlight the use of names that are not part of the local specification. Function parameters in some languages are addressed with a leading character such as “$”. Captured names require more clarity and perhaps a prefix such as “@”. Local variable names in a class often have a “_” prefix, in both the declaration and use. Such specifications are critical to understanding code and could be inserted by the compiler (or tool) automatically, if not supplied. Short alias prefixes derived from namespaces are useful to identify the source and hence meaning of names.

Some consolidation and systemization of such conventions can be either explicitly specified by syntax, or implicitly by rules declared to the compiler by pragmas.

Factored Attributes

Keywords for public, protected, private, namespace and using directives, all introduce bracketed sections of declarations with common attributes.

This could be systematically extended to provide structuring for any attribute. It would be especially useful for long lists of declarations.

Also, it could clarify the use of “static”, which connotes little of what it actually means. Inside a class or function scope, a keyword such as “fixed” might be clearer. Outside a class or function context, names can be declared “file scope”, or, perhaps better, “local”.

A related category might be “thread local”.

Leading Names in Declarations

Much of the previous aims to introduce names at the left of a line of code to make it easy to scan the code looking for a declaration.

Declarations can be obtuse when they bury a name declaration in the middle of a complex syntax structure. Function type names are a simple and obvious example.