Output parameters

115 views
Skip to first unread message

odo...@gmail.com

unread,
Dec 9, 2017, 2:22:47 PM12/9/17
to ISO C++ Standard - Future Proposals
It would be useful for c++ to have an idiomatic way to express strictly output parameters. This would especially be useful for self-documenting APIs, and would help prevent usage errors with 0 runtime cost.

It should be easy to implement using templates. Something like the following:

void foo(out_ref<int> o) { o = 10; }

void main()
{
   
int x;
    foo
(x);
    cout
<< x << endl;
}

I've attached my (probably buggy) proof-of-concept.

I'm curious why the stl doesn't already have something like this. Was it overlooked, or is there a good reason against having this?
out_ref.h

Nicol Bolas

unread,
Dec 9, 2017, 2:36:56 PM12/9/17
to ISO C++ Standard - Future Proposals, odo...@gmail.com
On Saturday, December 9, 2017 at 2:22:47 PM UTC-5, odo...@gmail.com wrote:
It would be useful for c++ to have an idiomatic way to express strictly output parameters.

Or we can just return multiple values from the function by sticking them in a struct. To put it another way, functions should not have "strictly output parameters". I can understand taking input/output parameters (like a `vector` that you add data to), but there's no reason to have "strictly output parameters" in C++.

Jonathan Müller

unread,
Dec 9, 2017, 2:50:06 PM12/9/17
to std-pr...@isocpp.org
I covered that topic in depth in this blog post:
http://foonathan.net/blog/2016/10/26/output-parameter.html

Jake Arkinstall

unread,
Dec 9, 2017, 2:58:58 PM12/9/17
to std-pr...@isocpp.org
On 9 Dec 2017 19:36, "Nicol Bolas" <jmck...@gmail.com> wrote:
functions should not have "strictly output parameters". I can understand taking input/output parameters (like a `vector` that you add data to), but there's no reason to have "strictly output parameters" in C++.

I agree. Parameters that are strictly for output don't really fit into modern C++, or at least that is the impression I get. I certainly haven't seen it in quite some time.

I also don't see why describing a parameter in a special way would be useful. If a function accepts a non-const reference, that is already glaring indication that the intention is to change the value. Is there any real difference between having a parameter that is designed purely to be modified without being read first? Not that I can think of, but I welcome anyone to correct me.

Vicente J. Botet Escriba

unread,
Dec 9, 2017, 3:14:10 PM12/9/17
to std-pr...@isocpp.org, Nicol Bolas, odo...@gmail.com
I agree. We don't have a way to ensure strictly output parameters in C++. Or can we?

I would be great to have some motivating examples where it could be better to have out parameters than return then on the result type.

As Jonathan wrote in his blog, an out factory seems a good thing to state clearly at the call site the parameter is out. I believe this is could be the more motivating advantage of such a class.

If all the useful cases of out parameters require that the instance is at least constructed, maybe, having an in_out parameter could be useful.

Just my 2cts
Vicente

odo...@gmail.com

unread,
Dec 9, 2017, 4:38:10 PM12/9/17
to ISO C++ Standard - Future Proposals, jmck...@gmail.com, odo...@gmail.com
I can think of one case where output parameters are arguably a better choice than returning a struct/tuple: when you're returning multiple values of the same type, which aren't inherently related to each other.

If they're different types, you can use a tuple -- the type parameters will give you info about what's being returned. If the data is inherently related, then it arguably deserves its own type.

Consider the following example, though:

Case #1

tuple<int, int, int> calculate_averages(vector<int> values);

Case #2

struct averages_out
{
 
int mean;
 
int median;
 
int mode;
};

averages_out calculate_averages
(vector<int> values);

Case #3

void calculate_averages(vector<int> values, int& mean, int& median, int& mode);

Case #4

void calculate_averages(vector<int> values, output_parameter<int> mean, output_parameter<int> median, output_parameter<int> mode);

Case #1 would be the clear winner, if tuples had named parameters somehow. As it stands, I'd be hesitant to go this route even if I documented the return type thoroughly.

For case #2, Since the three results aren't really related, it feels clumsy to have a separate struct which is used only in one function.

Case #3 seems pretty clear that they are supposed to be output parameters, and whose initial values are not for input. However, this isn't explicit.

Case #4 seems the least bad to me. Output parameters aren't great, but the fact that the function definition describe its usage is a big win, IMO.


Note, I think the recommendations given here are good, but I don't think they apply to the above example: http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-out-multi

Also note, I thought about it, and I think Jake is right that a non-const reference is clearly an input/output parameter. A separate type for this doesn't seem useful to me.

Jake Arkinstall

unread,
Dec 9, 2017, 5:12:20 PM12/9/17
to std-pr...@isocpp.org


On 9 Dec 2017 21:38, <odo...@gmail.com> wrote:
For case #2, Since the three results aren't really related, it feels clumsy to have a separate struct which is used only in one function.

It seems, to me, to be perfectly reasonable to have a dedicated struct for this specific situation. Although the individual parameters are essentially independent, they do have quite a lot of meaning as a collective (which pretty much sums up the point of a struct, to me at least).

I don't know. Maybe I'm too liberal with structs and classes. But I rarely come across a legitimate time when a return type only occurs once - especially when the project matures somewhat and it turns out there is a lot more you want to do with the result than you originally anticipated. Returning a custom object also gives you a lot more flexibility during refactoring, too, especially if it has a decent amount of encapsulation.

However, now I'm going quite off topic, because it is all to do with personal coding preferences. I cannot deny that there are attractive benefits to using out parameters, particularly if you have sufficient reason to only ever calculate a small number of fixed results and sufficient need to have control over their scope. If one result takes up a large amount of RAM, for example, you might have great reasons to have it disposed of as soon as possible, even if other return values are needed later on.

Nicol Bolas

unread,
Dec 9, 2017, 6:52:19 PM12/9/17
to ISO C++ Standard - Future Proposals, jmck...@gmail.com, odo...@gmail.com
On Saturday, December 9, 2017 at 4:38:10 PM UTC-5, odo...@gmail.com wrote:
I can think of one case where output parameters are arguably a better choice than returning a struct/tuple: when you're returning multiple values of the same type, which aren't inherently related to each other.

If they're different types, you can use a tuple -- the type parameters will give you info about what's being returned. If the data is inherently related, then it arguably deserves its own type.

Consider the following example, though:

Case #1

tuple<int, int, int> calculate_averages(vector<int> values);

Case #2

struct averages_out
{
 
int mean;
 
int median;
 
int mode;
};

averages_out calculate_averages
(vector<int> values);

Case #3

void calculate_averages(vector<int> values, int& mean, int& median, int& mode);

Case #4

void calculate_averages(vector<int> values, output_parameter<int> mean, output_parameter<int> median, output_parameter<int> mode);

Case #1 would be the clear winner, if tuples had named parameters somehow. As it stands, I'd be hesitant to go this route even if I documented the return type thoroughly.

For case #2, Since the three results aren't really related, it feels clumsy to have a separate struct which is used only in one function.

Case #3 seems pretty clear that they are supposed to be output parameters, and whose initial values are not for input. However, this isn't explicit.

Case #4 seems the least bad to me. Output parameters aren't great, but the fact that the function definition describe its usage is a big win, IMO.

I can't really agree with this analysis.

First, Case #4 has a lot of noise in the function declaration. This makes it hard to read the important parts of the function's interface. Second, calling it is going to consist of the user declaring 3 variables (probably on 3 lines) and passing them in. This imposes a lot of noise at the call site. Not only that, it breaks DRY a bit, since you have to use the right types for your output variables.

By contrast, Case #2 imposes zero noise in the function declaration. Oh sure, the function declaration is not self-contained, but the return type's name ought to be a reasonable clue as to what's going on.

It's the difference between:

int mean, median, mode;
calculate_averages
(vec, &mean, &median, &mode);

and:

auto[mean, median, mode] = calculate_averages(vec);

Case #2 also has one built-in advantage that the other three don't have: you don't have to remember the order of the parameters. In all other cases, if you forget that `median` is the second one, you're boned. But in Case #2, all you need to do is:

auto stats = calculate_averages(vec);

And you just use `stats.median`. You cannot get it wrong. Unless you use structured binding, in which case you're no worse off than before. That is, it is your choice to get it wrong. With the others, it's built-in.

The only "disadvantage" I see with case #2 is that you have to come up with a decent name for the aggregate. But considering the advantages, that's pretty minor.



Brittany Friedman

unread,
Dec 9, 2017, 8:11:11 PM12/9/17
to std-pr...@isocpp.org
I agree that #2 is the right solution. One change that might make #2 more palatable would be to allow structs to be declared as part of a function's return type. MSVC seems to accept this code but I don't think it is conforming.

struct { int mean; int median; int mode; } calc_averages(vector<int>);

odo...@gmail.com

unread,
Dec 9, 2017, 9:30:47 PM12/9/17
to ISO C++ Standard - Future Proposals


On Saturday, December 9, 2017 at 7:11:11 PM UTC-6, Brittany Friedman wrote:
I agree that #2 is the right solution. One change that might make #2 more palatable would be to allow structs to be declared as part of a function's return type. MSVC seems to accept this code but I don't think it is conforming.

struct { int mean; int median; int mode; } calc_averages(vector<int>);

I like that syntax, but according to [dcl.fct]/9 it shouldn't be allowed. I wonder why this is expressly disallowed -- it seems like an optimal solution to this kind of an issue, assuming a named struct wasn't appropriate.

bastie...@gmail.com

unread,
Dec 9, 2017, 11:51:05 PM12/9/17
to ISO C++ Standard - Future Proposals, odo...@gmail.com
If your concern is documentation you can just add an attribute to your variable named out.

void foo([[out]] int &o) { o = 10; }

It's how Microsoft does it: https://msdn.microsoft.com/en-us/library/sw5633k4.aspx


Vicente J. Botet Escriba

unread,
Dec 10, 2017, 12:49:59 AM12/10/17
to std-pr...@isocpp.org, odo...@gmail.com, jmck...@gmail.com
Le 09/12/2017 à 22:38, odo...@gmail.com a écrit :
I can think of one case where output parameters are arguably a better choice than returning a struct/tuple: when you're returning multiple values of the same type, which aren't inherently related to each other.

From http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f20-for-out-output-values-prefer-return-values-to-output-parameters

If a type is expensive to move (e.g., array<BigPOD>), consider allocating it on the free store and return a handle (e.g., unique_ptr), or passing it in a reference to non-const target object to fill (to be used as an out-parameter).

Since C++17 copy elision applies in some cases.

Lets see for

struct Package {      // exceptional case: expensive-to-move object
    char header[16];
    char load[2024 - 16];
};

If copy elision doesn't applies to fill

Package fill();       // Bad: large return value
void fill(Package&);  // OK

Package pkg;
fill(pkg)

and in this case maybe an out parameter is welcome

void fill(out_param<Package>);  // Better?
Package pkg;
fill(out(pkg))


If copy elision could apply to fill
Package fill();       // Ok: copy elision
void fill(Package&);  // Bad


If they're different types, you can use a tuple -- the type parameters will give you info about what's being returned. If the data is inherently related, then it arguably deserves its own type.

Consider the following example, though:

Case #1

tuple<int, int, int> calculate_averages(vector<int> values);

Case #2

struct averages_out
{
 
int mean;
 
int median;
 
int mode;
};

averages_out calculate_averages
(vector<int> values);

Case #3

void calculate_averages(vector<int> values, int& mean, int& median, int& mode);

Case #4

void calculate_averages(vector<int> values, output_parameter<int> mean, output_parameter<int> median, output_parameter<int> mode);

Case #1 would be the clear winner, if tuples had named parameters somehow. As it stands, I'd be hesitant to go this route even if I documented the return type thoroughly.
Range TS has tagged tuples. Maybe you can consider it.


For case #2, Since the three results aren't really related, it feels clumsy to have a separate struct which is used only in one function.

Case #3 seems pretty clear that they are supposed to be output parameters, and whose initial values are not for input. However, this isn't explicit.

Case #4 seems the least bad to me. Output parameters aren't great, but the fact that the function definition describe its usage is a big win, IMO.


Note, I think the recommendations given here are good, but I don't think they apply to the above example: http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-out-multi

If you are talking of the in-out parameter, I agree.

istream& operator>>(istream& is, string& s);    // much like std::operator>>()

In this case we are forced by the standard library design.

However, for in-out parameters, sometimes it is better to create an object that contains the in-out parameter and pass the other parameters in a function to this object.

line_reader rd(in_out(cin)); // line_reader contains the string to return by reference each time we call to getline

while (auto [succeed, line] = rd.getline(); succeed)
{
  //use line here
}


Also note, I thought about it, and I think Jake is right that a non-const reference is clearly an input/output parameter. A separate type for this doesn't seem useful to me.
I can live with just a reference. Some people coming from the C world prefer a pointer because the call site is explicit that the parameter is in-out.
If I want to be explicit at the call site, I would use an in_out_param<T> and a in_out function to pass the parameter.
The number of such cases should be rare to merit such cumbersome syntax.

The problem with functions with in-out parameters is that them don't compose well, so having a cumbersome syntax for them, could force to find out a better design. This is like the explicit cast. casts are a signal of a bad design most of the time. Having them explicit could help to identify a possible refactoring.

Vicente


inkwizyt...@gmail.com

unread,
Dec 10, 2017, 8:16:34 AM12/10/17
to ISO C++ Standard - Future Proposals, odo...@gmail.com
I think attribute is good solution, it can be ignored and this do not change meaning of program. I can generate warnings if use in wrong way.

Nicol Bolas

unread,
Dec 10, 2017, 11:44:59 AM12/10/17
to ISO C++ Standard - Future Proposals
On Saturday, December 9, 2017 at 8:11:11 PM UTC-5, Brittany Friedman wrote:
I agree that #2 is the right solution. One change that might make #2 more palatable would be to allow structs to be declared as part of a function's return type. MSVC seems to accept this code but I don't think it is conforming.

struct { int mean; int median; int mode; } calc_averages(vector<int>);

There was a proposal for adding that, P0222/P539. The committee shot it down. Personally, I don't blame them.

Unnamed structs used for this purpose would only makes sense when you:

1. Have a one-off data structure used to return multiple value. That is, no other code returns or takes this particular aggregate of values.
2. Cannot come up with a reasonable name for it.

This is far more rare than you might think. Consider `from/to_chars`. They need a way to return multiple values (pointer + error). But all of the `from_chars` overloads return the same value. "from_chars_result" may not be the best name for the return value, but the fact that you have multiple overloads returning the same thing effectively requires it.

So how much code would this unnamed struct thing be useful with, really? One-offs like this are not particularly common.

Jake Arkinstall

unread,
Dec 10, 2017, 12:48:09 PM12/10/17
to std-pr...@isocpp.org
One-off return types often occur... the first time you use them. Then, the next time you use them, which is often the case later down the line, you're tied to using whatever approach you used the first time unless you want to either be inconsistent or introduce breaking changes.

I think of this as a DRY-trap. If you are to keep repeated code to a minimum, you need to make decisions early on such that you don't end up forced into repeating code. For example, with strictly output parameters, you need to create those variables before the function is called (and if you call this function in several places, this means repeating). If you want to use an unnamed struct, you have to do the same thing unless you auto it out (but then you might as well just name that struct and be done with the matter).

This argument only applies to the justification "it's only used once". There are other justifications, such as resource usage (covered by Vincente) and scope control ({ cheap a; {expensive b; do_something(a,b); handle_b(b); } use_lots_of_resources(); handle_a(a); } ) for which I have no rebuttal. They're completely valid.

I'd sooner go for a proposal that somehow adds some fairy dust to make returning equally as effective in these outlying cases, than one which standardises (and thus legitimizes) use of out parameters for the foreseeable future. Though if the former is even possible, whoever comes up with it should probably prepare to receive a lot of hate mail from compiler writers.

Nicol Bolas

unread,
Dec 10, 2017, 1:40:44 PM12/10/17
to ISO C++ Standard - Future Proposals
On Sunday, December 10, 2017 at 12:48:09 PM UTC-5, Jake Arkinstall wrote:
One-off return types often occur... the first time you use them. Then, the next time you use them, which is often the case later down the line, you're tied to using whatever approach you used the first time unless you want to either be inconsistent or introduce breaking changes.

I think of this as a DRY-trap. If you are to keep repeated code to a minimum, you need to make decisions early on such that you don't end up forced into repeating code. For example, with strictly output parameters, you need to create those variables before the function is called (and if you call this function in several places, this means repeating). If you want to use an unnamed struct, you have to do the same thing unless you auto it out (but then you might as well just name that struct and be done with the matter).

This argument only applies to the justification "it's only used once". There are other justifications, such as resource usage (covered by Vincente) and scope control ({ cheap a; {expensive b; do_something(a,b); handle_b(b); } use_lots_of_resources(); handle_a(a); } ) for which I have no rebuttal. They're completely valid.

I'm trying to understand this use case.

You have a function which returns two values: one is "cheap" and the other is "expensive". Therefore, the caller may want to give the "expensive" one a shorter lifetime. Note that the caller is not required to do so; such a user simply might want to.

In order for this to matter, "expensive" would have to be expensive in terms of stack space. If it were simply "expensive" in terms of dynamic storage (containers), then you could simply remove the data from `expensive` (move it into a temporary, for example).

So here's a narrow list of elements needed for this to come up:

1. The function returns an object of sufficient size that callers may want to get rid of it as soon as conditions permit.
2. The function returns multiple values, at least one of which is the aforementioned gargantuan object.
3. The multiple return values are sufficiently unrelated that they could reasonably have different lifetimes. That is, it is reasonable to use the non-gargantuan object even after the gargantuan object is gone.

This is a subset-of-a-subset-of-a-subset. I can't think of a circumstance where this would happen. The closest thing I can come up with here is something like `expected<expensive, cheap>`, where the other return value is something like an error code. But here, you're talking about divergent code paths, since `expected` is a sum type, not a product type.

Just because a use case is "valid" doesn't mean we need special language to support it. If the valid use case is not particularly common, we can just use the existing tools to handle it. The default mechanism should be to use return values and structs; if there is some overriding concern in a specific case, then we always have the option to take references.

Ugly solutions for unusual circumstances are fine.

Lastly, if `cheap` truly is "cheap", then there's nothing stopping you from doing this:

cheap a;
{
 
auto [c, b] = do_something();
  handle_b
(b);
  a
= c;
}

It may not be the most optimal solution possible, but it gets the job done. And it retains all of the advantages of a struct-based solution.

Indeed, we may someday allow structured binding to directly assign the decomposed subobjects to existing objects. That would remove the need to use a different name and the `a = c` part. So all you would be left with is the overhead of one return value that never gets used after it gets assigned.

I think we can live with that.

I'd sooner go for a proposal that somehow adds some fairy dust to make returning equally as effective in these outlying cases, than one which standardises (and thus legitimizes) use of out parameters for the foreseeable future. Though if the former is even possible, whoever comes up with it should probably prepare to receive a lot of hate mail from compiler writers.

The reason we have structured binding for multiple return values is because that doesn't require making changes to the basic ABIs used for communicating between functions. The kind of "fairy dust" you're talking about would probably require such changes; a function that uses this syntax will need an entire new way of communicating though ABIs.

That's going to be a hard sell, especially since the best motivation thus far is a corner case.
Reply all
Reply to author
Forward
0 new messages