Concatenating strings efficiently

Paul

unread,

Oct 17, 2018, 5:44:09 AM10/17/18

to

A popular coding-interview question is copy-pasted below.

String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. You can assume the string has only uppercase and lowercase letters (a - z).

At the end of the post, I will put my ultra-naive C++ solution. The standard
way of approaching this problem (which will be tackled by users of other
languages, too) assumes that operations like += are inefficient because
s+="12" involves moving everything in the string.
The conventional wisdom is therefore that you need to do something more
sophisticated like creating a stringBuilder class.
But aren't basic string operations like "+=" optimised for efficiency in C++?

Is there anything wrong with my solution below?
Thanks,

Paul

#include <iostream>
#include <string>
// Run length encoding as in CTCI
// Return a string which represents
// the number of times each letter occurred in a run.

std::string encode(const std::string& str)
{
std::string result;
if(str.empty())
return result;

char prev = str[0];
int counter = 1;
for(int i = 1; i < str.size(); ++i)
if(str[i] == prev)
++counter;
else
{
result += prev;
result += std::to_string(counter);
prev = str[i];
counter = 1;
}
result += prev;
result += std::to_string(counter);

return result;
}

int main() // Some basic tests which all pass.
{
std::cout << encode("a") << "\n" << encode ("aaa") << "\n" << encode ("aabbc");

}

Jorgen Grahn

unread,

Oct 17, 2018, 6:39:41 AM10/17/18

to

On Wed, 2018-10-17, Paul wrote:
> A popular coding-interview question is copy-pasted below.
>
> String Compression: Implement a method to perform basic string
> compression using the counts of repeated characters. For example,
> the string aabcccccaaa would become a2blc5a3. You can assume the
> string has only uppercase and lowercase letters (a - z).
>
> At the end of the post, I will put my ultra-naive C++ solution. The
> standard way of approaching this problem (which will be tackled by
> users of other languages, too) assumes that operations like += are
> inefficient because s+="12" involves moving everything in the
> string. The conventional wisdom is therefore that you need to do
> something more sophisticated like creating a stringBuilder class.

There's one already: std::ostringstream.

> But aren't basic string operations like "+=" optimised for
> efficiency in C++?

Yes, in the sense that it's almost never something to worry about.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Paul

unread,

Oct 17, 2018, 7:37:20 AM10/17/18

to

Hi Jorgen,

Thanks for your reply.
But should ostringstream be used or is my solution fine?
Is the text wrong for suggesting that this is a problem.
Admittedly, the main audience is java coders but it should
at least say something like "If you're using C++, the naive
solution works perfectly fine, as is."
Here's what the text says about a java solution that basically
translates my solution to java.

The runtime is O(p + k^2), where p is the size of the original string and k is the number of character sequences. For example, if the string is aabccdeeaa, then there are six character sequences. It's slow because string concatenation operates in O( n^2) time (see StringBuilder on pg 89).
We can fix this by using a StringBuilder. ...

Paul

Öö Tiib

unread,

Oct 17, 2018, 8:19:26 AM10/17/18

to

The string::operator+= is implemented using
string::append not using operator+(string,string)
like the naive document seems to claim. If you want to
optimize away any mid-way allocations then do
result.reserve(str.size()*2) first since increasing
by factor of 2 is what that "packing" usually
achieves.

Öö Tiib

unread,

Oct 17, 2018, 8:47:53 AM10/17/18

to

On Wednesday, 17 October 2018 15:19:26 UTC+3, Öö Tiib wrote:
> If you want to
> optimize away any mid-way allocations then do
> result.reserve(str.size()*2) first since increasing
> by factor of 2 is what that "packing" usually
> achieves.

Forgot to mention that this approach (reserve +
appends) usually beats all those ostringstream
solutions by some margin.

Paul

unread,

Oct 17, 2018, 9:29:23 AM10/17/18

to

In case we attack the "naive document" too strongly,
the context is that this is a programming problem to be solved
in any language. The majority of people would choose java.
So the claim about needing a StringBuilder is aimed at java
coders. Is it still wrong? Why would java and C++ have different
+= functions?

Paul

Jorgen Grahn

unread,

Oct 17, 2018, 9:40:09 AM10/17/18

to

On Wed, 2018-10-17, Paul wrote:
> On Wednesday, October 17, 2018 at 11:39:41 AM UTC+1, Jorgen Grahn wrote:
>> On Wed, 2018-10-17, Paul wrote:
>> > A popular coding-interview question is copy-pasted below.
>> >
>> > String Compression: Implement a method to perform basic string
>> > compression using the counts of repeated characters. For example,
>> > the string aabcccccaaa would become a2blc5a3. You can assume the
>> > string has only uppercase and lowercase letters (a - z).
>> >
>> > At the end of the post, I will put my ultra-naive C++ solution. The
>> > standard way of approaching this problem (which will be tackled by
>> > users of other languages, too) assumes that operations like += are
>> > inefficient because s+="12" involves moving everything in the
>> > string. The conventional wisdom is therefore that you need to do
>> > something more sophisticated like creating a stringBuilder class.
>>
>> There's one already: std::ostringstream.
>>
>> > But aren't basic string operations like "+=" optimised for
>> > efficiency in C++?
>>
>> Yes, in the sense that it's almost never something to worry about.

> Thanks for your reply.
> But should ostringstream be used or is my solution fine?

I like ostringstreams because they're streams, and I like Unix
pipelines. If your function read from an istream and wrote to an
ostream, it would be useful not only for short strings but for
gigabytes of data.

I also prefer ostream formatting to foo + std::to_string(bar) which
smells like Java and doesn't let you control the formatting of bar.

On the other hand, if you just need to concatenate three strings into
a fourth string, it's clearer to write a + b + c than to create an
ostringstream, stream into it, and pull out the resulting string.

> Is the text wrong for suggesting that this is a problem.

> Admittedly, the main audience is java coders but it should
> at least say something like "If you're using C++, the naive
> solution works perfectly fine, as is."
> Here's what the text says about a java solution that basically
> translates my solution to java.
>
> The runtime is O(p + k^2), where p is the size of the original
> string and k is the number of character sequences. For example, if
> the string is aabccdeeaa, then there are six character
> sequences. It's slow because string concatenation operates in O(
> n^2) time (see StringBuilder on pg 89). We can fix this by using a
> StringBuilder. ...

I think Tiib answered this. AFAICT, that text is about Java, and you
cannot apply it to programming in general.

Bo Persson

unread,

Oct 17, 2018, 10:58:18 AM10/17/18

to

Why would they not be different?

A C++ std::string is mutable, so can be modified in place. If it has
enough extra room at the end, appending a (small) string can be
extremely fast.

And replacing "aaa" with the shorter "a3" can always be done in place,
without any reallocation.

A Java string cannot be modified once created, so we need a separate
StringBuilder to build strings.

Bo Persson

Manfred

unread,

Oct 17, 2018, 11:19:34 AM10/17/18

to

On 10/17/2018 2:19 PM, Öö Tiib wrote:
> If you want to
> optimize away any mid-way allocations then do
> result.reserve(str.size()*2) first since increasing
> by factor of 2 is what that "packing" usually
> achieves.

In fact that is the worst case length of the encoded string.

bol...@cylonhq.com

unread,

Oct 17, 2018, 11:25:16 AM10/17/18

to

On Wed, 17 Oct 2018 16:58:00 +0200
Bo Persson <b...@gmb.dk> wrote:
>A Java string cannot be modified once created, so we need a separate
>StringBuilder to build strings.

The more I find out about Java the more I wonder what Gosling was smoking
when he "designed" it.

Richard

unread,

Oct 17, 2018, 1:19:31 PM10/17/18

to

[Please do not mail me a copy of your followup]

Paul <peps...@gmail.com> spake the secret code
<75361e24-4863-4af4...@googlegroups.com> thusly:

>Is there anything wrong with my solution below?

No. It's fine because it passes your tests and it's not worth
optimizing further unless your profiling shows this to be a
bottleneck.

As an exercise, I took your problem and wrote an alternative solution
just for the fun of it. I'm not saying mine is "better", just an
alternative. I had fun writing it, thanks for the problem idea!

#include <algorithm>
#include <cassert>
#include <iostream>
#include <string>

std::string encode(const std::string & str)
{
std::string result{str};
size_t dest = 0;
int counter = 0;
for (size_t source = 0; source < str.size(); ++source)
{
if (str[source] == str[dest])
{
if (counter == 0)
{
result[dest++] = str[source];
}
++counter;
}
else
{
if (counter > 1)
{
std::string count = std::to_string(counter);
std::copy(count.begin(), count.end(), &result[dest]);
dest += count.size();
counter = 1;
}
result[dest++] = str[source];
}
}
if (counter > 1)
{
std::string count = std::to_string(counter);
std::copy(count.begin(), count.end(), &result[dest]);
dest += count.size();
}
if (dest < str.size())
{
result.erase(dest);
}
return result;
}

int main()
{
assert(std::string{"a"} == encode("a"));
assert(std::string{"a3"} == encode("aaa"));
assert(std::string{"a2b2c"} == encode("aabbc"));
assert(std::string{"a3bc3d"} == encode("aaabcccd"));
}
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Öö Tiib

unread,

Oct 17, 2018, 2:15:48 PM10/17/18

to

Because Java does not have keywords to denote immutability. Therefore
in Java StringBuilder is a mutable sequence of characters and String
is immutable sequence of characters. You assume that since Java needs
other class but String for efficiency concatenating strings then C++
should have same problem.

That is not the case, std::string does it alone and already very
efficiently. In C++ std::string is mutable sequence of characters
and const std::string is immutable sequence of characters.

Additionally C++17 added std::string_view that can be used to
refer to any string, character array or sub-part of such. When
using it with string literals it can be compile-time constant.
So from C++ job candidate who claims senior level I might ask to
implement a program that does such "compression" compile-time.
Such request would not make even sense in Java. ;)

Chris M. Thomasson

unread,

Oct 17, 2018, 5:34:29 PM10/17/18

to

On 10/17/2018 2:43 AM, Paul wrote:
> A popular coding-interview question is copy-pasted below.
>
> String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. You can assume the string has only uppercase and lowercase letters (a - z).
>
> At the end of the post, I will put my ultra-naive C++ solution. The standard
> way of approaching this problem (which will be tackled by users of other
> languages, too) assumes that operations like += are inefficient because
> s+="12" involves moving everything in the string.
> The conventional wisdom is therefore that you need to do something more
> sophisticated like creating a stringBuilder class.
> But aren't basic string operations like "+=" optimised for efficiency in C++?
>
> Is there anything wrong with my solution below?
> Thanks,

[snip code]
>

That works fine. Since we can assume the string only has upper and lower
case (a - z), we can store abcdxxxxxefgh as abcdx5efgh instead of:
a1b1c1d1x5e1f1g1h1.

The decoder can check for bytes that are out-of-range, perhaps
isdigit(). If a byte is outside the range, then it is reading a
character count.

bitrex

unread,

Oct 17, 2018, 10:49:50 PM10/17/18

to

On 10/17/2018 05:43 AM, Paul wrote:
> A popular coding-interview question is copy-pasted below.
>
> String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. You can assume the string has only uppercase and lowercase letters (a - z).

Interviewer has to be a big JERK to ask to store "b1" instead of just b"
and call it a "compression" algorithm, lol. or "a2" for that matter...

this is what I would call an "ultra-naive" solution I suppose:

#include <deque>
#include <stack>
#include <iostream>

std::string encode(const std::string& long_string) {
if (long_string.empty()) {
return long_string;
}

std::deque<std::stack<char>> stack_queue;
stack_queue.emplace_back(std::stack<char>{});
stack_queue.back().push(long_string.at(0)); // push first character

for (auto foo_it = long_string.begin() + 1; foo_it != long_string.end();
++foo_it) {
if (stack_queue.back().top() == *foo_it) {
stack_queue.back().push(*foo_it);
} else {
stack_queue.emplace_back(std::stack<char>{});
stack_queue.back().push(*foo_it);
}
}

std::string result;

for (const auto& stack : stack_queue) {
switch (stack.size()) {
case 2:
result += stack.top();
case 1:
result += stack.top();
break;

default:
result += stack.top();
result += std::to_string(stack.size());
}
}

return result;
}

int main() {
std::string foostring = "abbcccddddeeeeeffffff";
std::cout << encode(foostring) << std::endl;
}

Juha Nieminen

unread,

Oct 18, 2018, 4:03:48 AM10/18/18

to

bitrex <us...@example.net> wrote:
>> String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. You can assume the string has only uppercase and lowercase letters (a - z).
>
> Interviewer has to be a big JERK to ask to store "b1" instead of just b"
> and call it a "compression" algorithm, lol. or "a2" for that matter...

Where do you see a "b1"?

Juha Nieminen

unread,

Oct 18, 2018, 4:05:44 AM10/18/18

to

Paul <peps...@gmail.com> wrote:
> At the end of the post, I will put my ultra-naive C++ solution. The standard
> way of approaching this problem (which will be tackled by users of other
> languages, too) assumes that operations like += are inefficient because
> s+="12" involves moving everything in the string.

If you were to create this kind of "compression" format, surely you can
spare the first few characters to store the length of the original string?

If not, you can traverse the "compressed" string once and resolve the
original length that way. (Of course you would have to make measurements
to see if it actually makes it faster.)

Ian Collins

unread,

Oct 18, 2018, 4:09:39 AM10/18/18

to

On 18/10/18 15:49, bitrex wrote:
> On 10/17/2018 05:43 AM, Paul wrote:
>> A popular coding-interview question is copy-pasted below.
>>
>> String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. You can assume the string has only uppercase and lowercase letters (a - z).
>
> Interviewer has to be a big JERK to ask to store "b1" instead of just b"
> and call it a "compression" algorithm, lol. or "a2" for that matter...

https://en.wikipedia.org/wiki/Run-length_encoding

--
Ian.

Öö Tiib

unread,

Oct 18, 2018, 9:55:31 AM10/18/18

to

That is perfect, then it is sure that there won't be reallocations.
For example if to compress the text of current post with that
algorithm then it will result in bit shorter than 2 times longer
size but the difference would be 9 bytes or so. If that matters
then we can use result.shrink_to_fit() at end.

Scott Lurndal

unread,

Oct 18, 2018, 10:51:57 AM10/18/18

to

probably depends on the font the NNTP client is using. Looks like bee ell
with the font -adobe-courier-medium-r-normal--11-80-100-100-m-60-iso8859-1.

Öö Tiib

unread,

Oct 18, 2018, 10:56:17 AM10/18/18

to

On Thursday, 18 October 2018 05:49:50 UTC+3, bitrex wrote:
> On 10/17/2018 05:43 AM, Paul wrote:
> > A popular coding-interview question is copy-pasted below.
> >
> > String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. You can assume the string has only uppercase and lowercase letters (a - z).
>
> Interviewer has to be a big JERK to ask to store "b1" instead of just b"
> and call it a "compression" algorithm, lol. or "a2" for that matter...

I also think that packing one character as "b1" is a waste.
However indicating 2 can be profitable. Majority of char strings that
we ever face are UTF8 encoded. Then "☠☠" could be compressed
as "☠2". That is shorter by two bytes (because "☠" takes three
bytes in UTF8).

What could still cause increase of result would be number characters
in input text. These should be escaped with something, for example
with '\xFE' or '\xFF' that are only used by its BOM that uses both and
is usually missing anyway.

bitrex

unread,

Oct 18, 2018, 2:38:55 PM10/18/18

to

Interview questions can be little tricky, I think it's fine to ask
"write a program that encodes a string this way: a2b1c5a3" without
further commentary on some hypothetical real-world purpose, as a question.

Yes interviewer can say "this is a string-compression algorithm you're
writing" yeah a SHITTY string-compression algorithm. One doesn't really
want to get into a debate over run-length encoding strategies with an
interviewer, however I don't think. So one is left to guess whether they
really mean what they say.

If I'm ever in the interviewer-position I think I'll try to keep it
abstract and not add a lot of my own verbage as to how the test might
relate to some real world task it cuts down on chance for
mis-interpretation.

I also think interviewers who e.g. knock off points for a solution that
say, uses a loop to sum the first ten integers rather than the
closed-form summation formula is lame. Quick-and-correct solution in
interview situation should beat clever-and-possibly-error-prone
solution. If interviewer wants clever/efficient then interviewer should
ask for that, specifically. i.e. in the OP's post the subject says
"efficient" but the text of the post says nothing about efficiency
requirement.

Maybe this seems a little nitpicky and it may be but damn people lose
out on good jobs they want over stuff like this!

james...@alumni.caltech.edu

unread,

Oct 18, 2018, 3:11:31 PM10/18/18

to

The input string doesn't contain any 'l' characters, so using 'l' rather than '1' was probably a typo.

Chris M. Thomasson

unread,

Oct 18, 2018, 4:42:55 PM10/18/18

to

On 10/17/2018 7:49 PM, bitrex wrote:
> On 10/17/2018 05:43 AM, Paul wrote:
>> A popular coding-interview question is copy-pasted below.
>>
>> String Compression: Implement a method to perform basic string
>> compression using the counts of repeated characters. For example, the
>> string aabcccccaaa would become a2blc5a3. You can assume the string
>> has only uppercase and lowercase letters (a - z).
>
> Interviewer has to be a big JERK to ask to store "b1" instead of just b"
> and call it a "compression" algorithm, lol. or "a2" for that matter...

;^)

The "a2" just makes the decoder "slower". Imho, "aa" versus "a2" equal
no savings, and just makes the logic more complicated. However, "aaa"
versus "a3" _does_ make things a little smaller...

[...]

bitrex

unread,

Oct 18, 2018, 6:03:09 PM10/18/18

to

It was probably done that way in like, friggin', 1967 as someone pointed
out with a wikipedia entry which mentioned run-length-encoding of analog
B&W television signals because the computer doing it had 12 bytes of RAM
and anything more sophisticated needed a government UNIVAC machine to
implement.

why am I implementing an encoding scheme for black and white television
on this test?!?!??!!?

bitrex

unread,

Oct 18, 2018, 6:08:44 PM10/18/18

to

On 10/18/2018 04:42 PM, Chris M. Thomasson wrote:

I like avoiding solutions for any problem that involve raw integer
counters being incremented and/or decremented like in the original
solution. Raw counters are the flippin' devil!

Alf P. Steinbach

unread,

Oct 18, 2018, 8:39:47 PM10/18/18

to

On 17.10.2018 11:43, Paul wrote:
> A popular coding-interview question is copy-pasted below.
>
> String Compression: Implement a method to perform basic string
> compression using the counts of repeated characters. For example, the
> string aabcccccaaa would become a2blc5a3. You can assume the string
> has only uppercase and lowercase letters (a - z).

For most strings encountered by a program this will be string
/expansion/, not compression.

I'm not sure about what interviewers of fresh-from-college new employees
know or understand these days, but as an interviewer I'd expect the
prospective employee to note that there is a definite potential for
improvement in the scheme.

> At the end of the post, I will put my ultra-naive C++ solution. The standard
> way of approaching this problem (which will be tackled by users of other
> languages, too) assumes that operations like += are inefficient because
> s+="12" involves moving everything in the string.

I've never heard of a “standard way” to do that, and the assumption that
`+=` is inefficient, is wrong.

> The conventional wisdom is therefore that you need to do something more
> sophisticated like creating a stringBuilder class.
> But aren't basic string operations like "+=" optimised for efficiency in C++?

They are.

And from C++11 there are overloads that pick up the temporary nature of
partial results in e.g. `s = string() + "a" + "b" + "c" + "d" + "e";`.
The complexity is linear in the final result size. Maybe the very
incorrect “wisdom” you heard was for C++03, or Java.

C++17 § 24.3.3.1 paragraphs 1 and 2:
<paraphrase>
template<class charT, class traits, class Allocator>
auto operator+(
basic_string<charT, traits, Allocator>&& lhs,
const basic_string<charT, traits, Allocator>& rhs
)
-> basic_string<charT, traits, Allocator>;

Returns: std::move(lhs.append(rhs))
</paraphrase>

Also do ignore the advice you've got about using `ostringstream`. That
introduces inefficiencies in a number of ways, including that it
needlessly involves the C++ locale machinery.

> Is there anything wrong with my solution below?

Well, let's look at it.

> #include <iostream>

No no, don't include that in a header, if you can make do with `<iosfwd>`.

I guess it's only for the `main` function.

Then I'd put that `#include` just before the `main` function.

> #include <string>
> // Run length encoding as in CTCI
> // Return a string which represents
> // the number of times each letter occurred in a run.
>
> std::string encode(const std::string& str)
> {
> std::string result;
> if(str.empty())
> return result;

My eyes find `str == ""` to be a more clear condition.

Less words to digest.

> char prev = str[0];
> int counter = 1;

At this point you know that the result size will be roughly 2*n where n
is the size of `str`. Maybe a little more. So I'd do a

const int n = str.size();
result.reserve( 3*n );

> for(int i = 1; i < str.size(); ++i)

Comparing signed and unsigned will make compilers cough up a warning.

If you didn't already have a definition of `n` then a good way to both
avoid the warning and guarantee a single call to `size` would be

for( int i = 1, n = str.size(); i < n; ++i )

But let's assume `n` is defined, as shown. Also, since you can /assume/
that the string contains only letters, and not arbitrary binary data,
I'd make that

for( int i = 1; i <= n; ++i )

... which takes care of processing also the last character within the
loop body.

And which gives an opportunity to explain to the interviewer how that is
well-defined behavior.

> if(str[i] == prev)
> ++counter;
> else
> {
> result += prev;
> result += std::to_string(counter);
> prev = str[i];
> counter = 1;
> }

Uhm, indentation. Tip: free programs such AStyle can format your code
for you. Also most programmers' editors have that functionality, or can
use e.g. that the AStyle program as a plug-in.

> result += prev;
> result += std::to_string(counter);

See above; it can be avoided.

> return result;
> }
>
> int main() // Some basic tests which all pass.
> {
> std::cout << encode("a") << "\n" << encode ("aaa") << "\n" << encode ("aabbc");
>
> }

Cheers & hth.,

- Alf

Jorgen Grahn

unread,

Oct 19, 2018, 1:34:08 AM10/19/18

to

On Thu, 2018-10-18, bitrex wrote:
...

> Interview questions can be little tricky, I think it's fine to ask
> "write a program that encodes a string this way: a2b1c5a3" without
> further commentary on some hypothetical real-world purpose, as a question.
>
> Yes interviewer can say "this is a string-compression algorithm you're
> writing" yeah a SHITTY string-compression algorithm. One doesn't really
> want to get into a debate over run-length encoding strategies with an
> interviewer, however I don't think. So one is left to guess whether they
> really mean what they say.

> If I'm ever in the interviewer-position I think I'll try to keep it
> abstract and not add a lot of my own verbage as to how the test might
> relate to some real world task it cuts down on chance for
> mis-interpretation.

YMMV, but I think the interview is more useful (to both parts) if
it's about more than coding to specifications, no matter how silly.
(Unless the job is like that, which no job should be.)

> I also think interviewers who e.g. knock off points for a solution that
> say, uses a loop to sum the first ten integers rather than the
> closed-form summation formula is lame. Quick-and-correct solution in
> interview situation should beat clever-and-possibly-error-prone
> solution. If interviewer wants clever/efficient then interviewer should
> ask for that, specifically. i.e. in the OP's post the subject says
> "efficient" but the text of the post says nothing about efficiency
> requirement.

If I was the interviewer I'd want the interviewee to ask for
efficiency requirements, or go for (as you say) quick-and-correct, but
explain that that's what she does. Anyway, I wouldn't like to see her
getting bogged down in micro-optimizations noone asked for.

> Maybe this seems a little nitpicky and it may be but damn people lose
> out on good jobs they want over stuff like this!

Don't know about where you are, but here it's just as much about the
interviewee finding out if the job is any good. Although it's unclear
if test like this one helps anyone.

Ralf Goertz

unread,

Oct 19, 2018, 3:26:38 AM10/19/18

to

I certainly see why that is the case. If there are no runs (immediate
repetitions) one would need to write a 1 after each letter. OTOH each
run gets either encoded to a sequence of the same (length 2) or shorter
(length >2) size. But what about the '\0' at the end of the string?
Since I learned elsethread that str[str.size()] is valid, doesn't it
need to be stored as well? So if str is "ab" in reality three chars are
stored, right? If we reserve four chars for result it is not enough for
the five chars needed for "a1b1". I guess this is taken care of by
always reserving one char more than asked for.

Manfred

unread,

Oct 19, 2018, 6:57:34 AM10/19/18

to

In the context of std::string "ab" is 2 chars long, and calling
result.reserve(2*str.size()) will guarantee that result will accommodate
"a1b1" (i.e. 4 chars) with no reallocation (see also capacity()).
In both cases the terminating '\0' is handled transparently by std::string.

Jorgen Grahn

unread,

Oct 19, 2018, 7:56:50 AM10/19/18

to

A better way of stating it is: there *is* no zero-termination with
std::string, except in the C strings compatibility part of the
interface.

And you can have a std::string which /contains/ '\0'. It's just
another character.

Ralf Goertz

unread,

Oct 19, 2018, 7:59:05 AM10/19/18

to

Am Fri, 19 Oct 2018 12:57:22 +0200
schrieb Manfred <non...@add.invalid>:

> In the context of std::string "ab" is 2 chars long, and calling
> result.reserve(2*str.size()) will guarantee that result will
> accommodate "a1b1" (i.e. 4 chars) with no reallocation (see also
> capacity()). In both cases the terminating '\0' is handled
> transparently by std::string.

Hm, transparently means that when I access str[str.size()] it will
internally be handled differently than str[str.size()-1] (assuming that
str is not empty)? I always thought operator [] does unchecked access so
that it doesn't care whether it crosses boundaries. How can that work at
runtime when the str is indexed is a variable whose value happens to be
str.size()?

james...@alumni.caltech.edu

unread,

Oct 19, 2018, 9:35:44 AM10/19/18

to

On Friday, October 19, 2018 at 7:56:50 AM UTC-4, Jorgen Grahn wrote:
...

> A better way of stating it is: there *is* no zero-termination with
> std::string, except in the C strings compatibility part of the
> interface.

Do you consider std::string::operator[] and std::string::at() to be in
the C strings compatibility part of the interface? 20.3.2.5p2 requires
that, given std::string<charT> str, then str[str.size()] return "a
reference to an object of type charT with value charT()". Paragraph 6
requires the same of str.at(str.size()).

Alf P. Steinbach

unread,

Oct 19, 2018, 12:37:11 PM10/19/18

to

On 19.10.2018 02:39, Alf P. Steinbach wrote:
> [snip]

> At this point you know that the result size will be roughly 2*n where n
> is the size of `str`. Maybe a little more. So I'd do a
>
> const int n = str.size();
> result.reserve( 3*n );

Uhm, that was dumb. The result size will never be >2n. So,

result.reserve( 2*n );

Sorry, not sure where that blindness came from. :(

Cheers!,

- Alf

Manfred

unread,

Oct 19, 2018, 12:50:45 PM10/19/18

to

On 10/19/2018 1:58 PM, Ralf Goertz wrote:
> Am Fri, 19 Oct 2018 12:57:22 +0200
> schrieb Manfred <non...@add.invalid>:
>
>> In the context of std::string "ab" is 2 chars long, and calling
>> result.reserve(2*str.size()) will guarantee that result will
>> accommodate "a1b1" (i.e. 4 chars) with no reallocation (see also
>> capacity()). In both cases the terminating '\0' is handled
>> transparently by std::string.
>
> Hm, transparently means that when I access str[str.size()] it will
> internally be handled differently than str[str.size()-1] (assuming that
> str is not empty)?

No, transparently means that reserve(X) would internally reserve (X+1),
as an example, in the same way as capacity() would return
(__internal_bufsize-1).

I always thought operator [] does unchecked access so
> that it doesn't care whether it crosses boundaries. How can that work at
> runtime when the str is indexed is a variable whose value happens to be
> str.size()?
>

Pre C++11 std::string's /could/ work this way (having something like: if
pos == size() return __nulchar; in the implementation of
operator[](size_t pos)).
Nonetheless the additional requirements of C++11 and after do complicate
things.
In any case, this is about implementation details that are out of the
scope of the dictate of the standard.

Manfred

unread,

Oct 19, 2018, 12:58:08 PM10/19/18

to

I was pretty much tempted to write so, but then I realized that the
example given by cppreference.com to access the string buffer:

const char* c = &str[0];

(https://en.cppreference.com/w/cpp/string/basic_string/operator_at)
does make it /very/ difficult to have a std::string buffer without a
terminating '\0'.
That said, also because of your note below, I agree that std::string's
are best manipulated without assuming a terminating '\0';
This just yields to better code, since the rationale of std::string was
meant to be alternative to C strings.

Jorgen Grahn

unread,

Oct 19, 2018, 3:16:58 PM10/19/18

to

The short answer is I simplified my answer, ignoring the special
s[s.end()] feature discussed elsewhere.

But also, I guess I fail to see any use for that feature. One of the
most important things to remember about the C++ containers is to never
go past their std::end(), to manage the invariants around that.
I don't need or want relaxed rules for one of them.

Jorgen Grahn

unread,

Oct 19, 2018, 3:36:44 PM10/19/18

to

Yes. I didn't mean to disagree with you, but Ralf seemed to have the
misconception that there is a terminating \0 in any std::string -- thus
carrying an unnecessary handicap over from C into C++.

(Not that there's really a '\0' in any C string either; it's halfway
there, halfway not. Traditionally, a fertile source of bugs.)

>> And you can have a std::string which /contains/ '\0'. It's just
>> another character.

/Jorgen

Alf P. Steinbach

unread,

Oct 19, 2018, 4:25:24 PM10/19/18

to

On 19.10.2018 21:36, Jorgen Grahn wrote:
> [snip]

> (Not that there's really a '\0' in any C string either; it's halfway
> there, halfway not. Traditionally, a fertile source of bugs.)

Possibly getting more acute with `std::string_view`.

Apparently there's a meme going around that the modern way to accept a
string argument is to use `std::string_view` as formal argument type.

That accepts both a C string and a `std::string`. OK so far. But it also
accepts a substring without zero-termination, so if that string is
passed further on to some C function, one could be in dire straits.

Cheers!,

- Alf

Pavel

unread,

Oct 20, 2018, 2:54:06 AM10/20/18

to

Paul wrote:
> A popular coding-interview question is copy-pasted below.
>
> String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. You can assume the string has only uppercase and lowercase letters (a - z).
>

> At the end of the post, I will put my ultra-naive C++ solution. The standard
> way of approaching this problem (which will be tackled by users of other
> languages, too) assumes that operations like += are inefficient because
> s+="12" involves moving everything in the string.

> The conventional wisdom is therefore that you need to do something more
> sophisticated like creating a stringBuilder class.
> But aren't basic string operations like "+=" optimised for efficiency in C++?
>

> Is there anything wrong with my solution below?

It depends on your definition of "wrong". Below are a few thoughts:

1. encode() should provide required functionality most of the time (assuming the
current locale-dependent encoding of counts was agreed with the interviewer).
This is not wrong.
2. encode() may fail under extreme memory pressure where more memory-efficient
solution would work. Whether this is wrong depends on whether the interviewee
asked the interviewer all the correct questions and on interviewer answers. (*)
3. encode() is not most memory-efficient and, at most architectures, not most
time-efficient. Whether this is wrong depends on (*).

A more memory-efficient (and probably more time-efficient) solution would
calculate the exact capacity required by the result, then use only one memory
allocation to reserve that capacity in the result, then actually encode. The
interviewee would have to solve a couple of challenges to implement it that
would allow him or her demonstrate his or her more advanced skills in this
solution, specifically:

- how to avoid code duplication (demonstrate knowledge of "template method"
design pattern and maybe of lambdas);
- how to correctly and efficiently compute the encoded size of to-be-encoded
string::size_type value, possibly making it a reusable [template] function;
- reason about whether locale-independent (or fixed "C" locale) encoding of
size_type is more appropriate for this problem; demonstrate how to encode
string::size_type to a pre-existing memory buffer without additional allocations;
- managerial skills (start with a simple solution, accurately estimate for the
interviewer the benefits and implementation cost for every following improvement).
- maybe more (throw in parallelism?)

Long story short, this interview question can turn out more or less simple (and
the ultra-naive C++ solution -- more or less wrong) depending on how far the
interviewer plans to go with it and whether interviewee is willing to play
along. This may be one reason for why it is popular.

HTH
-Pavel

> Thanks,
>
> Paul
>
> #include <iostream>

> #include <string>
> // Run length encoding as in CTCI
> // Return a string which represents
> // the number of times each letter occurred in a run.
>
> std::string encode(const std::string& str)
> {
> std::string result;
> if(str.empty())
> return result;
>

> char prev = str[0];
> int counter = 1;

> for(int i = 1; i < str.size(); ++i)

> if(str[i] == prev)
> ++counter;
> else
> {
> result += prev;
> result += std::to_string(counter);
> prev = str[i];
> counter = 1;
> }
> result += prev;
> result += std::to_string(counter);
>

Ralf Goertz

unread,

Oct 20, 2018, 3:03:40 AM10/20/18

to

Am 19 Oct 2018 19:36:30 GMT
schrieb Jorgen Grahn <grahn...@snipabacken.se>:

> Yes. I didn't mean to disagree with you, but Ralf seemed to have the
> misconception that there is a terminating \0 in any std::string --
> thus carrying an unnecessary handicap over from C into C++.

Actually, I didn't think that. Having programmed in C++ for more than
two decades (although not as a professional programmer but in order to
do my job efficiently) I could not imagine it would be legal to access a
string in that way. My "misconception" was that I thought that string
was basically a class which (among others, notably iterators) contains
two size_t variables, one containing the capacity and the other the size
of the string. Accessing it beyond its size would be UB but wouldn't
fail as long as it is within the bounds of the capacity. Therefore,
operator[](size_t pos) would do nothing more than returning the
character stored at "pos". The legality of using str.size() as index
complicates things as Manfred put it and I still see no point in being
allowed to do so. (I guess this is true for you as well if I read your
other post correctly.) I know very little about optimisation but I think
the need to check for one special value of "pos" can a heavy burden
performance-wise.

> (Not that there's really a '\0' in any C string either; it's halfway
> there, halfway not. Traditionally, a fertile source of bugs.)

Is there not? I thought that's how strings are terminated in C.

Paul

unread,

Oct 20, 2018, 11:24:12 AM10/20/18

to

Very interesting thoughts. Any ideas for references to implement some of
these more advanced ideas? I'm googling the template design method and
everything else in your post to try to create a better solutions to learn
from this interview, and I think I would do much better with more
concrete guidance.
Many thanks,

Paul

Manfred

unread,

Oct 20, 2018, 2:45:57 PM10/20/18

to

On 10/20/2018 8:53 AM, Pavel wrote:
> Paul wrote:
>> A popular coding-interview question is copy-pasted below.
>>
>> String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. You can assume the string has only uppercase and lowercase letters (a - z).
>>
>> At the end of the post, I will put my ultra-naive C++ solution. The standard
>> way of approaching this problem (which will be tackled by users of other
>> languages, too) assumes that operations like += are inefficient because
>> s+="12" involves moving everything in the string.
>> The conventional wisdom is therefore that you need to do something more
>> sophisticated like creating a stringBuilder class.
>> But aren't basic string operations like "+=" optimised for efficiency in C++?
>>
>> Is there anything wrong with my solution below?
>
> It depends on your definition of "wrong". Below are a few thoughts:
>

[...]

>
> A more memory-efficient (and probably more time-efficient) solution would
> calculate the exact capacity required by the result, then use only one memory
> allocation to reserve that capacity in the result, then actually encode. The
> interviewee would have to solve a couple of challenges to implement it that
> would allow him or her demonstrate his or her more advanced skills in this
> solution, specifically:
>
> - how to avoid code duplication (demonstrate knowledge of "template method"
> design pattern and maybe of lambdas);
> - how to correctly and efficiently compute the encoded size of to-be-encoded
> string::size_type value, possibly making it a reusable [template] function;
> - reason about whether locale-independent (or fixed "C" locale) encoding of
> size_type is more appropriate for this problem; demonstrate how to encode
> string::size_type to a pre-existing memory buffer without additional allocations;
> - managerial skills (start with a simple solution, accurately estimate for the
> interviewer the benefits and implementation cost for every following improvement).
> - maybe more (throw in parallelism?)

OTOH the proposal would have the side effect of being unusable with
stream I/O of indefinite length.
Although this is not a requirement posed by the assignment as is, it is
relevant in the sense that such a RLE algorithm is independent of the
length of the string, i.e. it can be implemented as an inline filter.
Constraining it to strings whose length is to be known in advance would
be a significant limitation of its potential applications.

This is one example where optimization should be evaluated in the light
of the specific application.

For the sake of curiosity, here is how it would like to make it entirely
streamlined:

===============================================================
#include <iterator>
#include <cstdlib>

namespace RleIt
{

using Char = char;
using InputIterator = std::istreambuf_iterator<Char>;
using OutputIterator = std::ostreambuf_iterator<Char>;

// this is one solution to avoid allocating a string for the
// formatted counter
static void encodeDecimal(OutputIterator& outIt, long val)
{
if(0 < val)
{
std::ldiv_t d = std::div(val, 10L);

encodeDecimal(outIt, d.quot);
// this only works if ascii is a subset of Char
*outIt++ = OutputIterator::traits_type::to_char_type('0' + d.rem);
}
}

void encode(InputIterator inIt, OutputIterator outIt)
{
static const InputIterator endIt{}; // end()
static const Char nul{}; // '\0'

size_t cnt{0};
Char cur{nul};

while(endIt != inIt)
{
cur = *inIt++;
if(endIt!= inIt && *inIt == cur)
{
++cnt;
}
else
{
*outIt++ = cur;
encodeDecimal(outIt, cnt+1);

cur = endIt != inIt ? *inIt : nul;
cnt = 0;
}
}
}
} // namespace RleIt

#include <iostream>

int main()
{
RleIt::encode(
RleIt::InputIterator{std::cin},
RleIt::OutputIterator{std::cout});
}

===============================================================
$ echo "abaabbcaaaqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq" |
./a.out
a1b1a2b2c1a3q50
1$
(the last 1 is the terminating LF inserted by 'echo')
===============================================================

Pavel

unread,

Oct 20, 2018, 5:39:29 PM10/20/18

to

Paul wrote:
> On Saturday, October 20, 2018 at 7:54:06 AM UTC+1, Pavel wrote:

...

>
>
> Very interesting thoughts. Any ideas for references to implement some of
> these more advanced ideas? I'm googling the template design method and
> everything else in your post to try to create a better solutions to learn
> from this interview, and I think I would do much better with more
> concrete guidance.

Please see below. I implemented the more efficient solution I referred to.
Admittedly, it is difficult to go all the way to this solution during the
interview; but it is still possible to keep moving in the right direction by
incremental improvements: the interviewee will either gain himself/herself more
time to finish the problem or the interviewer moves on to another problem when
it is obvious for for them the interviewee knows what s/he is doing and will
finish the job eventually.

It is sometimes difficult to write a generic template method like RunDumbRle
below correctly off the bat. I usually write at least 2 instances and make them
as similar as possible after which the correct line between generic and specific
behaviors usually becomes more clear. As an illustration, I left the two
original concrete functions in the comments below the code.

HTH,
-Pavel

> Many thanks,
>
> Paul
>

// ------------ cut here ---------
#include <algorithm>
#include <cassert>
#include <iostream>
#include <string>
#include <vector>

using namespace std;

typedef string::size_type Size;

Size
CalcEncodedCounterSize(Size counter)
{
for (Size result = 1; ; ++result)
if ((counter /= 10) == 0)
return result;
}

vector<Size>
CalcEncodedCounterSizeLimits(Size maxSize)
{
const Size MaxEncodedSize = CalcEncodedCounterSize(maxSize);

vector<Size> result;
result.push_back(9);
for (vector<Size>::size_type nNines = 2; nNines < MaxEncodedSize; ++nNines)
result.push_back(result.back() * 10 + 9);

return result;
}

Size
GetEncodedCounterSize(Size counter)
{
static const Size MaxCounter = string().max_size();
assert(counter <= MaxCounter);
static const vector<Size> EncodedCounterSizeLimits(
CalcEncodedCounterSizeLimits(MaxCounter));

const vector<Size>::size_type nLimits = EncodedCounterSizeLimits.size();
for (vector<Size>::size_type i = 0; i < nLimits; ++i) {
if (counter <= EncodedCounterSizeLimits[i])
return i + 1;
}

return nLimits;
}

void
EncodeCounterToString(string& result, Size counter)
{
const static char Digits[] = "0123456789";

Size fromIndex = result.size();
do {
result += Digits[counter % 10];
counter /= 10;
} while (!!counter);

reverse(result.begin() + fromIndex, result.end());
}

template<typename R/*result type*/, typename I/*result initializer*/,
typename U/*result updater*/>
R
RunDumbRle(const string& str, I initFunc, U updateFunc)
{
assert(!str.empty());

R result(initFunc());

char prevChar = str[0];
const Size strLen = str.size();
if (strLen == 1) {
updateFunc(result, 1, prevChar);
return result;
}

Size counter = 1;

for (Size i = 1; ; ++i) {
if (i == strLen) {
updateFunc(result, counter, prevChar);
break;
}

const char curChar = str[i];
if (curChar == prevChar) {
++counter;
continue;
}

updateFunc(result, counter, prevChar);
prevChar = curChar;
counter = 1;
}

return result;
}

string
DumbRleEncode(const string& str)
{
if (str.empty())
return str;

const Size resultSize = RunDumbRle<Size>(str,
[]{return 0;},
[](Size& result, Size counter, char){
result += GetEncodedCounterSize(counter) + 1;
});

return RunDumbRle<string>(str,
[&resultSize](){
string result;
result.reserve(resultSize);
return result;
},
[](string& result, Size counter, char c) {
result += c;
EncodeCounterToString(result, counter);
});
}

int
main(int, char*[])
{
assert(CalcEncodedCounterSize(1) == 1);
assert(GetEncodedCounterSize(1) == 1);
assert(DumbRleEncode("") == "");
assert(DumbRleEncode("a") == "a1");
assert(DumbRleEncode("ab") == "a1b1");
assert(DumbRleEncode("abc") == "a1b1c1");
assert(DumbRleEncode("abbbbbbc") == "a1b6c1");
assert(DumbRleEncode("aaaaaaaaaac") == "a10c1");
assert(DumbRleEncode(string(1000, 'b')) == "b1000");

return 0;
}

/*
string
DumbRleEncodeOrig(const string& str)
{
if (str.empty())
return string();

string result;
result.reserve(CalcDumbRleEncodedSize(str));

char prevChar = str[0];
const Size strLen = str.size();
if (strLen == 1) {
result += prevChar;
EncodeCounterToString(result, 1);
return result;
}

Size counter = 1;
for (Size i = 1; ; ++i) {
if (i == strLen) {
result += prevChar;
EncodeCounterToString(result, counter);
break;
}

const char curChar = str[i];
if (curChar == prevChar) {
++counter;
continue;
}

result += prevChar;
EncodeCounterToString(result, counter);
prevChar = curChar;
counter = 1;
}

return result;
}

Size
CalcDumbRleEncodedSize(const string& str)
{
assert(!str.empty());

Size result = 0;

char prevChar = str[0];
Size counter = 1;
const Size strLen = str.size();
for (Size i = 1; ; ++i) {
if (i == strLen) {
result += GetEncodedCounterSize(counter) + 1;
break;
}

const char curChar = str[i];
if (curChar == prevChar) {
++counter;
continue;
}

result += GetEncodedCounterSize(counter) + 1;
prevChar = curChar;
counter = 1;
}

return result;
}
*/

Pavel

unread,

Oct 20, 2018, 6:11:33 PM10/20/18

to

Agree. This is another direction where the interviewee can try steer the
interview after providing quick-and-dirty solution to the original problem. The
interviewer will less likely play along as this changes his requirements -- but
who knows what they have in mind. An interview is as much psychological as a
technical challenge...

Neat solution, BTW. I like your recursive encodeDecimal, very haskellish. There
is no need to compare inIt to endIt more than once though.

-Pavel

Alf P. Steinbach

unread,

Oct 20, 2018, 7:53:51 PM10/20/18

to

On 20.10.2018 20:45, Manfred wrote:
> // this is one solution to avoid allocating a string for the
> // formatted counter
> static void encodeDecimal(OutputIterator& outIt, long val)
> {
>     if(0 < val)
>     {
>       std::ldiv_t d = std::div(val, 10L);
>
>       encodeDecimal(outIt, d.quot);
>       // this only works if ascii is a subset of Char
>       *outIt++ = OutputIterator::traits_type::to_char_type('0' + d.rem);
>     }
> }

The C++ standard guarantees that the decimal digits are consecutive.

There's no such guarantee for the English letters.

Since I like things simple and robust I don't like this function: it
relies too much on client code supplying a safe output iterator, and the
recursive call will most probably not be optimized out, so, inefficient.

But then I don't like the C++17 standard library solution either, the
`std::to_chars` function
(https://en.cppreference.com/w/cpp/utility/to_chars). They boast that
"std::to_chars is locale-independent, non-allocating, and non-throwing".
But that's basic requirements.

In particular there is apparently no way to compute a required buffer
size. If so then that's kind of über-silly. IMHO.

When the time comes where all common compilers support `std::to_chars`
one can write some simple wrappers. Maybe that time has already come, a
month or two ago. Not sure, I haven't checked.

Cheers!,

- Alf

Manfred

unread,

Oct 21, 2018, 11:26:21 AM10/21/18

to

On 10/21/2018 1:53 AM, Alf P. Steinbach wrote:
> On 20.10.2018 20:45, Manfred wrote:
>>    // this is one solution to avoid allocating a string for the
>>    // formatted counter
>>    static void encodeDecimal(OutputIterator& outIt, long val)
>>    {
>>      if(0 < val)
>>      {
>>        std::ldiv_t d = std::div(val, 10L);
>>
>>        encodeDecimal(outIt, d.quot);
>>        // this only works if ascii is a subset of Char
>>        *outIt++ = OutputIterator::traits_type::to_char_type('0' + d.rem);
>>      }
>>    }
>
> The C++ standard guarantees that the decimal digits are consecutive.

Good remark, thanks!

>
> There's no such guarantee for the English letters.
>
> Since I like things simple and robust I don't like this function: it
> relies too much on client code supplying a safe output iterator, and the
> recursive call will most probably not be optimized out, so, inefficient.

In fact in all of the code I posted there is absolutely no error
checking - which makes it inapt for anything else than a post example.

I believe the recursive call cannot be optimized out, since output must
occur /after/ the recursion.
This is a direct consequence that the digits have to be output in
reverse order with respect to the div() sequence.
In this respect I am not sure it is /that/ inefficient, it basically
replaces allocation of a (possibly growing) string buffer with a call
sequence on the stack. I guess in general the latter is more efficient
than the former.
There are most probably better solutions, though..

Manfred

unread,

Oct 21, 2018, 12:19:25 PM10/21/18

to

On 10/21/2018 12:11 AM, Pavel wrote:
> Neat solution, BTW. I like your recursive encodeDecimal, very haskellish. There
> is no need to compare inIt to endIt more than once though.

Yes, nor there is need to dereference inIt more than once, so that a
couple of local variables would be in order.
Also, an option could be to have encodeDecimal as a recursive lambda.

Martijn van Buul

unread,

Oct 23, 2018, 4:17:55 AM10/23/18

to

* bitrex:

> why am I implementing an encoding scheme for black and white television
> on this test?!?!??!!?

. ^^^^^^^^^

Maybe this is why: to test your response to things you don't appreciate.

--
Martijn van Buul - pi...@dohd.org

bol...@cylonhq.com

unread,

Oct 23, 2018, 4:49:05 AM10/23/18

to

On Tue, 23 Oct 2018 08:17:43 -0000 (UTC)
Martijn van Buul <pi...@dohd.org> wrote:
>* bitrex:
>> why am I implementing an encoding scheme for black and white television
>> on this test?!?!??!!?

>.. ^^^^^^^^^

>
>Maybe this is why: to test your response to things you don't appreciate.

More likely to test his ability to formulate an algorithm from scratch without
just resorting to lego brick programming, ie slapping together APIs and
frameworks developed by someone else. Any idiot can do the latter as millions
of javascript and python programmers demonstrate every day.

Öö Tiib

unread,

Oct 23, 2018, 6:44:50 AM10/23/18

to

On Tuesday, 23 October 2018 11:17:55 UTC+3, Martijn van Buul wrote:
> * bitrex:
> > why am I implementing an encoding scheme for black and white television
> > on this test?!?!??!!?
> . ^^^^^^^^^
>
> Maybe this is why: to test your response to things you don't appreciate.

It could be fruitful to try to combine RLE into some real compression
algorithm. For example as alternative for escaping totally transparent
parts of image (or white parts of paper). Implementing it separately feels
pointless in modern world where non-naive implementation of LZW is
often quicker than memcpy.

James Kuyper

unread,

Oct 23, 2018, 7:54:31 AM10/23/18

to

On 10/20/18 03:03, Ralf Goertz wrote:
> Am 19 Oct 2018 19:36:30 GMT
> schrieb Jorgen Grahn <grahn...@snipabacken.se>:

...

>> (Not that there's really a '\0' in any C string either; it's halfway
>> there, halfway not. Traditionally, a fertile source of bugs.)
>
> Is there not? I thought that's how strings are terminated in C.

What he's referring to is the ambiguous status of the '\0'. For some
purposes, it's part of the string. For other purposes, it's the part
prior to the '\0' that is the actual string, the '\0' just lets you know
where the string ends. This isn't actually ambiguous if you use clear
terminology - which I deliberately failed to do in this message, to make
the ambiguity clearer. However, it is actually necessary to be careful
about such issues in order to avoid ambiguity, and it's that need to be
careful which he's referring to.

bitrex

unread,

Oct 23, 2018, 12:40:54 PM10/23/18

to

Poor bitrex has to formulate algorithm from scratch but interviewer gets
to pull problem from Wikipedia. Hmph. I see how it is.

Christian Gollwitzer

unread,

Oct 23, 2018, 3:54:57 PM10/23/18

to

Am 23.10.18 um 12:44 schrieb Öö Tiib:

> On Tuesday, 23 October 2018 11:17:55 UTC+3, Martijn van Buul wrote:
>> * bitrex:
>>> why am I implementing an encoding scheme for black and white television
>>> on this test?!?!??!!?
>> . ^^^^^^^^^
>>
>> Maybe this is why: to test your response to things you don't appreciate.
>
> It could be fruitful to try to combine RLE into some real compression
> algorithm. For example as alternative for escaping totally transparent
> parts of image (or white parts of paper).

Whenever you load or save a JPEG image, you use RLE compression, which
is used as a pre-processor before Huffman entropy coding. I can barely
think of another compression algorithm that is used as widely as JPEG,
so it's not at all a 70's only concept.

Christian

Öö Tiib

unread,

Oct 23, 2018, 7:07:06 PM10/23/18

to

Yes, there it also packs zeroes. My point was that it is not useful alone
but in combination with other compression algorithms be it statistics
(like Huffman) or dictionary (like LZW). I actually already tried to
express that but you considered it good idea to cut it in reply:

Christian Gollwitzer

unread,

Oct 24, 2018, 2:53:21 AM10/24/18

to

Am 24.10.18 um 01:06 schrieb Öö Tiib:

Sorry I missed that. I thought you were saying this is a theoretical
concept, while I was trying to say that it is indeed part of one of the
most widely used compression schemes. So I think we agree :)

Christian

bol...@cylonhq.com

unread,

Oct 24, 2018, 5:45:02 AM10/24/18

to

So what? The company isn't testing the interviewers abilities, its testing
yours and they don't owe you a job. If you can't implement an algorithm
from scratch then perhaps move over to web coding where your lack of ability
won't be such an issue.

Jorgen Grahn

unread,

Oct 24, 2018, 7:42:02 AM10/24/18

to

On Sat, 2018-10-20, Ralf Goertz wrote:
> Am 19 Oct 2018 19:36:30 GMT
> schrieb Jorgen Grahn <grahn...@snipabacken.se>:
>
>> Yes. I didn't mean to disagree with you, but Ralf seemed to have the
>> misconception that there is a terminating \0 in any std::string --
>> thus carrying an unnecessary handicap over from C into C++.
>
> Actually, I didn't think that. Having programmed in C++ for more than
> two decades (although not as a professional programmer but in order to
> do my job efficiently) I could not imagine it would be legal to access a
> string in that way. My "misconception" was that I thought that string
> was basically a class which (among others, notably iterators) contains
> two size_t variables, one containing the capacity and the other the size
> of the string. Accessing it beyond its size would be UB but wouldn't
> fail as long as it is within the bounds of the capacity. Therefore,
> operator[](size_t pos) would do nothing more than returning the
> character stored at "pos". The legality of using str.size() as index
> complicates things as Manfred put it and I still see no point in being
> allowed to do so. (I guess this is true for you as well if I read your
> other post correctly.)

Yes. I'm sure there /is/ a point, but I don't see it.

> I know very little about optimisation but I think
> the need to check for one special value of "pos" can a heavy burden
> performance-wise.

I suppose the way to satisfy the s[s.size()] thing is to keep
operator[] as-is, but make sure there's always padding at the end of
the string. The same padding, I guess, makes it possible to implement
s.c_str() without an extra buffer.

>> (Not that there's really a '\0' in any C string either; it's halfway
>> there, halfway not. Traditionally, a fertile source of bugs.)
>
> Is there not? I thought that's how strings are terminated in C.

Someone who shares my view answered that.

bitrex

unread,

Oct 24, 2018, 3:51:27 PM10/24/18

to

The tech biz needs more arrogant, humorless nerds. That's what it needs!

bitrex

unread,

Oct 24, 2018, 3:55:27 PM10/24/18

to

Get a load of this cat. Motherfucker probably thinks he's doing people a
favor by hiring them. I somehow doubt that.

bol...@cylonhq.com

unread,

Oct 25, 2018, 6:03:39 AM10/25/18

to

On Wed, 24 Oct 2018 15:55:18 -0400

Did you just step out of a 1970s B movie?

Chris M. Thomasson

unread,

Oct 25, 2018, 4:27:39 PM10/25/18

to

Fritz the Cat?

Paavo Helde

unread,

Oct 26, 2018, 2:37:02 PM10/26/18

to

On 17.10.2018 12:43, Paul wrote:
> The standard
> way of approaching this problem (which will be tackled by users of other
> languages, too) assumes that operations like += are inefficient because
> s+="12" involves moving everything in the string.
> The conventional wisdom is therefore that you need to do something more
> sophisticated like creating a stringBuilder class.
> But aren't basic string operations like "+=" optimised for efficiency in C++?

In a decent C++ implementation string+= automatically grows the internal
buffer exponentially so there is no reason for it to be inefficient.
Some years ago I measured this for my usage cases and string+= won with
a factor like 2 over ostringstream<< with all compilers I tested
(presumably because streams are choked up by virtuals and locale
dependencies).

Since then I do not feel guilty any more for not using a fancy stream or
builder for simple string concatenation, += reads the best and works the
best.

YMMV, of course.

bitrex

unread,

Oct 26, 2018, 2:57:41 PM10/26/18

to

Were you upset when they canceled Galactica 1980?

Jorgen Grahn

unread,

Oct 27, 2018, 4:03:06 AM10/27/18

to

On Fri, 2018-10-26, Paavo Helde wrote:
> On 17.10.2018 12:43, Paul wrote:
>> The standard
>> way of approaching this problem (which will be tackled by users of other
>> languages, too) assumes that operations like += are inefficient because
>> s+="12" involves moving everything in the string.
>> The conventional wisdom is therefore that you need to do something more
>> sophisticated like creating a stringBuilder class.
>> But aren't basic string operations like "+=" optimised for efficiency in C++?
>
> In a decent C++ implementation string+= automatically grows the internal
> buffer exponentially so there is no reason for it to be inefficient.
> Some years ago I measured this for my usage cases and string+= won with
> a factor like 2 over ostringstream<< with all compilers I tested
> (presumably because streams are choked up by virtuals and locale
> dependencies).

That (winning over ostringstream) sounds likely. When all the pieces
are already std::string, at least.

> Since then I do not feel guilty any more for not using a fancy
> stream or builder for simple string concatenation, += reads the best
> and works the best.
>
> YMMV, of course.

I still prefer ostringstreams, because I don't want to be pressured to
use std::string rather than more specific, homegrown classes. And I
don't want to add both to_string() functions and ostream streaming to
those classes.

Having control over the formatting would be another reason, except I
still haven't learned how to use ostream formatting properly ... and
anyway I let the classes decide how to format themselves, so it's not
something I do at the "top" level.

bol...@cylonhq.com

unread,

Oct 29, 2018, 5:35:05 AM10/29/18

to

On Fri, 26 Oct 2018 14:57:31 -0400

Gutted mate. Still, at least I now had free time to upgrade the Cylons control
system to C++. The old FORTRAN version had quite a few bugs.

Chris M. Thomasson

unread,

Oct 29, 2018, 5:21:36 PM10/29/18

to

lol. :^)