Simulating piping one program into another, and into another

Frederick Gotham

unread,

Aug 5, 2019, 6:23:01 AM8/5/19

to

So let's say I have three programs. Normally I would run these three programs at the commandline as follows:

prog1 | prog2 | prog3

So let's say I take the source code for these 3 programs and try to combine them into one. So I rename the three 'main' functions and then make a new 'main'. I start off with code like this:

int main_prog1();
int main_prog2();
int main_prog3();

int main()
{
int const retval1 = main_prog1();
int const retval2 = main_prog2();
int const retval3 = main_prog3();

return retval1 & retval2 & retval3;
}

An alternative method would be to start two more threads so that each line would be processed "on the fly", but for now I'm going to work with one thread, so 'main_prog1' will finish completely before 'main_prog2' begins.

How would you go about doing this?

Here's what I'm thinking so far. . .

Go through the code for prog1 and replace all occurrences of "cout" with "cout1". Do the same with prog2 (i.e. cout2). Same goes for standard input (cin2, cin3).

Next create a header file with something like:

#include <iostream>

extern std::stringstream cout1, cout2;

static std::stringtream &cin2 = cout1;
static std::stringtream &cin3 = cout2;

So the first program will write to cout1, and then the second program will read from cin2.

So the previous code snippet becomes something like:

stringstream cout1, cout2;

stringtream &cin2 = cout1;
stringtream &cin3 = cout2;

int main_prog1();
int main_prog2();
int main_prog3();

int main()
{
int const retval1 = main_prog1();

cout1.seekg(0, std::ios::beg);

int const retval2 = main_prog2();

cout2.seekg(0, std::ios::beg);

int const retval3 = main_prog3();

return retval1 & retval2 & retval3;
}

Have any of you ever done this before? What do you think of my idea? What way would you do it?

Frederick

Jorgen Grahn

unread,

Aug 5, 2019, 7:49:00 AM8/5/19

to

On Mon, 2019-08-05, Frederick Gotham wrote:
>
> So let's say I have three programs. Normally I would run these three
> programs at the commandline as follows:
>
> prog1 | prog2 | prog3

That's a core idea in the Unix world, yes.

> So let's say I take the source code for these 3 programs and try to
> combine them into one.

But why? If you have a problem that can be solved with a Unix
pipeline, count yourself lucky. There's no drop-in replacement
in C++ or elsewhere, except possibly in functional languages like
Haskell or Erlang.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Öö Tiib

unread,

Aug 5, 2019, 8:48:34 AM8/5/19

to

On Monday, 5 August 2019 13:23:01 UTC+3, Frederick Gotham wrote:
> So let's say I have three programs. Normally I would run these three programs at the commandline as follows:
>
> prog1 | prog2 | prog3

More or less, there are typically some command line arguments.

>
> So let's say I take the source code for these 3 programs and try to combine them into one.

Before doing it you should think why. Are pipes inefficient for your
use case? There is Boost.Interprocess with plenty of tools for more
efficient inter-process communications. Do you hope for optimizations
in interfaces between modules? The streams won't anyway allow much.
Do such modules share lot of code? Use shared objects or DLLs.

> Have any of you ever done this before? What do you think of my idea? What way would you do it?

I like to keep modules small if possible. I have done in other
direction split single large code base into several. For frequent
example kicked filters/converters of old, rarely used file formats
or versions or functionality into separate, rarely used processes.
It lets main processing module to use single input format/version
and single output format version and that can simplify it a lot.
It can cause some performance hit to rarely used functionality but
more frequently needed modules load and execute quicker and take
less resources. I have sometimes replaced pipes with RPC so I can do
more than pipes allow and can spread the modules to different
hosts easier. Only thing that is needed for such decisions is to
collect statistics of frequency and performance of feature usage.
That can be tricky with on-premise or embedded software (that C++
is often about).

James Kuyper

unread,

Aug 5, 2019, 9:58:58 AM8/5/19

to

On 8/5/19 6:22 AM, Frederick Gotham wrote:
>
> So let's say I have three programs. Normally I would run these three programs at the commandline as follows:
>
> prog1 | prog2 | prog3
>
> So let's say I take the source code for these 3 programs and try to combine them into one. So I rename the three 'main' functions and then make a new 'main'. I start off with code like this:
>
> int main_prog1();
> int main_prog2();
> int main_prog3();
>
> int main()
> {
> int const retval1 = main_prog1();
> int const retval2 = main_prog2();
> int const retval3 = main_prog3();
>
> return retval1 & retval2 & retval3;
> }
>
> An alternative method would be to start two more threads so that each line would be processed "on the fly", but for now I'm going to work with one thread, so 'main_prog1' will finish completely before 'main_prog2' begins.
>
> How would you go about doing this?

Offhand, I would do "prog1 | prog2 | prog3" - it's a lot simpler and in
many contexts can be more efficient. Why do you want to take a different
approach?

Szyk Cech

unread,

Aug 5, 2019, 11:28:40 AM8/5/19

to

On 05.08.2019 15:58, James Kuyper wrote:
> Offhand, I would do "prog1 | prog2 | prog3" - it's a lot simpler and in
> many contexts can be more efficient. Why do you want to take a different
> approach?

Try debug prog1, prog2 and prog3 simultaneously...

Paavo Helde

unread,

Aug 5, 2019, 12:41:13 PM8/5/19

to

Why would I want to do that? One of the most important benefits of
modular design like in "prog1 | prog2 | prog3" is better localization of
problems, so the system can be debugged one component at a time, making
the task *much* easier.

James Kuyper

unread,

Aug 5, 2019, 12:57:20 PM8/5/19

to

If the three programs interacted with each other, directly or indirectly, by any method other than the pipeline, the OP's suggestion wouldn't work. If they interact only through the pipeline, there's no need to debug them simultaneously. Debug the first program while dumping it's output to a file; debug the second program while reading from that file and dumping to a second file; debug the third program while reading from the second file.