On 11/11/2015 10:54 PM, Christian Gollwitzer wrote:
>
> [snip]
> After looking up the telegram problem, I could understand
> why you would solve it using your own implementation of pipes.
Well that's a difference between Q&A sites like SO, and discussion
groups like clc++: in the Q&A sites (like SO) there's usually LESS than
meets the eye, while in the discussion groups (like clc++) there's often
MORE than meets the eye.
Or at least it's so in the periods when the groups work as intended, and
for clc++ it was so originally.
Historically clc++ has had two long periods of extreme waywardness. The
first such period almost turned the group into a Windows programming
group, resulting in the creation of the moderated clc++m. In the second
such period, which apparently is just over, the group was dominated by
non-technical people; only Odin knows where they came from.
> [snip] Implemening pipes in a langugage with coroutines is indeed
> a very simple thing, in fact Python uses generators for almost all for
> loops today.
Well, Python probably has some coroutine facility among its included
batteries, but the generators are, as far as I know, all based on
/continuations/. The difference is that a continuation function yields
only in code within that function, so that it only needs a single stack
frame of state, and can be manually "inverted" (by the compiler) to be
expressed as a member function of an object with that state, plus just a
little more. In contrast, a coroutine can yield also within a function
that it calls, or anywhere, which means that it requires its own
full-blown stack: it's more heavy-weight, and a bit more general.
> Ignoring the "use pipes" requirement, I am still failing to
> see how this is surprisingly difficult.
Yes, as I mentioned in the original posting, I also fail to see any
difficulty. So then that's two. I think the lecturer blindly passed on
an evaluation from the 1950's or something.
> I am not even convinced that the streaming is part of the specification.
> Loading all of the file into memory and working from there, which leaves
> aside all the buffering issues, would also be a correct solution,
> wouldn't it?
That would probably be necessary for the more sophisticated approach of
trying to optimize some global measure of nice line-justification.
One problem with such an approach is that in a worst case editing can
cause rippling both ways, up and down, trough megabytes of text.
Two inter-related and more pressing problems: global justification can
be /unpredictable/ for the user, and it can be so complex that it
attracts bugs. For example, using Microsoft Word I sometimes struggle to
get it to adopt a sensible formatting of a paragraph. Sometimes I have
to insert a manual line break, which is very undesirable because it's
invisible in normal editing and then can wreak havoc with later edits,
especially automated ones such as global search and replace. And
regarding bugs, Word has a tendency to not handle page breaks correctly.
Again that requires manual formatting intervention. :(
> [snip]
> That applies to your code also: you use operator >> to tokenize the
> input. AFAIK there is no way to influence operator >> how it should
> treat whitespace. In unicode this is a non-trivial problem, since there
> exist non-breaking spaces and spaces with zero-width, direction-changer
> and similar complicated stuff.
Yeah. But for sure, using Unicode is necessary to do it at all. And in
Windows that means using wide strings and streams (with proper setup).
> [snip]
> but your fix introduces many bugs.
>
>> int const max = stoi( args[1] );
>
> What happens if you pass negative number? What happens if you pass text?
> What happens if you pass more than two arguments or no argument?
> In the latter case I think you are lucky that the code does not invoke UB.
> argv[argc] contains a single NULL byte, i.e. an empty string IIRC.
The code does indeed provoke UB if the program is run without an
argument. Such an invocation is a breach of /contract/. When the
contract is breached, anything can happen. This is not a bug. It's
ordinary good design, according to the principles of (1) clear contracts
and (2) not using time on functionality that may never be needed (Knuth
described how a very elaborate piece of code to handle a special case,
lay dormant for several years, and failed on first call).
The code expresses the contract very neatly. I chose to express just the
minimal possible contract compatible with the problem requirements.
There are several possible more elaborate contracts, e.g. guaranteed
exception for breach, but IMO this kind of usability is irrelevant.
> [snip]
> This code is almost the same as mine;
Well, your code, which was almost identical to my earlier function
"words_to_lines" in the flow based solution, was pretty close to
reasonably shortest -- it just needed a little pruning for the goal of
shortness.
>> #include <iostream>
>> #include <string>
>> using namespace std;
>>
>> auto main( int, char** args ) -> int
>> {
>> int const max = stoi( args[1] );
> see above
>> wstring word, separator, line;
> separator is not used, is it?
>> while( wcin >> word ) {
>> if( int(line.length() + 1 + word.length()) > max ) {
>> wcout << line << endl;
>> line.clear();
>> }
>> line += (line.length()? L" " : L"") + word;
>
> I don't like treating integers as boolean values; I would rather write
> line.length() > 0, assuming that you did not do that just for brevity?
Heh, you caught me. :)
>> }
>> wcout << line << endl;
>> }
>>
>> In case you'd think there a missing "return": nope.
>
> I don't understand why you can leave it off
"main" is a very special function and among other special properties, it
has a default function result of 0 in both C (since C99, its
§5.1.2.2.3/1, but not in earlier C89/C90) and C++ (since the first
standard C++98, its §3.6.1/5).
> I believe that this
> solution, at least to the problem as stated, is easier to understand
> than any of your pipeline versions.
Oh, there was just 1 pipeline version.
And as mentioned, that's the version that's almost identical to your
code, or vice versa. ;-)
That's because it expresses the same fundamental idea of treating the
problem as a flow of words between a word extractor and a line composer.
The word flow is the main focus in the code. In contrast, the other two
solutions I posted focused on input lines, and output lines.