Longest word in a dictionary problem -- which container

Paul

unread,

Nov 19, 2018, 5:19:51 AM11/19/18

to

Below is my solution to finding the longest word in a dictionary given
a collection of letters. For more (but perhaps still not total) clarity,
please see my code below. It's an interview question and interviewers
don't usually give total clarity either.
I would love to see what people think of my solution. In particular
is it better to use a vector or a set for the sorted container
(or doesn't it matter?)
[BTW, this solution wasn't actually presented at the interview itself,
so even if it's good it doesn't mean I got the job].

Many thanks.

// Practising the common interview question of finding a longest word in a given dictionary
// with all characters in a given array of letters. The array may contain duplicate characters.
// In that case, the number of times that a char, c, can occur = the number of times
// c is duplicated in the array
#include <string>
#include <unordered_set>
#include <vector>
#include <iostream>
#include <algorithm>
#include <set>

// First step is to acquire the character counts in the array of letters.
std::vector<int> charCounts(const std::string& arrayOfLetters)
{
std::vector<int>counts(256);
for(char c : arrayOfLetters)
++counts[c];

return counts;
}

std::string LongestWord(const std::string& arrayOfLetters, const std::unordered_set<std::string>& Dictionary)
{
// Preprocessing of array of letters
const std::vector<int>& characterCounts = charCounts(arrayOfLetters);
// Test the longest words first
auto cmp = [](const std::string& word1, const std::string& word2)->bool{return word1.size() > word2.size();};

const std::set<std::string, decltype(cmp)> sortedDictionary(Dictionary.begin(), Dictionary.end(), cmp);
// A vector solution would have been as below. Should I have done this?
/*std::vector<std::string> sortedDictionary(Dictionary.begin(), Dictionary.end());
std::sort(sortedDictionary.begin(), sortedDictionary.end(), cmp);*/

// Sorted by length so we only need to find the first word that works.
for(const std::string& w : sortedDictionary)
{
std::vector<int> charPositions(256);
bool wordWorks = true;
for(char c : w)
if(++charPositions[c] > characterCounts[c])
{
wordWorks = false;
break;
}
if(wordWorks)
return w;
}

std::cout << "\No words in the dictionary satisfied the conditions";
return "";
}

// Testing code -- "acce" is returned which is correct.
// Needs more tests, though.
int main()
{
std::unordered_set<std::string> words = {"acceca", "ace", "acce", "accce" };
std::cout << LongestWord("acecad", words);
}

Paul

unread,

Nov 19, 2018, 5:27:11 AM11/19/18

to

Funny how you always see instant problems as soon as you post something.
Big stylistic issue in the above in that I don't use continue; which
is the tailor-made tool. Will change this.
Please comment if you see anything else.

Paul

Ben Bacarisse

unread,

Nov 19, 2018, 6:21:51 AM11/19/18

to

Paul <peps...@gmail.com> writes:

> Below is my solution to finding the longest word in a dictionary given
> a collection of letters. For more (but perhaps still not total) clarity,
> please see my code below. It's an interview question and interviewers
> don't usually give total clarity either.

I would have assumed you were to read a dictionary from a file. Having
a dictionary explicit in the code does not strike me as a likely
interview question.

> I would love to see what people think of my solution. In particular
> is it better to use a vector or a set for the sorted container
> (or doesn't it matter?)

I would not use a container at all, since I would be reading a file of
words (you can find many such lists on the Web). My guess is that the
interviewer would like to see that you do this without storing the
dictionary in memory.

> [BTW, this solution wasn't actually presented at the interview itself,
> so even if it's good it doesn't mean I got the job].
>
> Many thanks.
>
> // Practising the common interview question of finding a longest word in a given dictionary
> // with all characters in a given array of letters. The array may
> contain duplicate characters.

"with all" sounds odd. The longest word with them all is a word that is
the length of the array, so the only test in the program will be if
there is such a word, but...

> // In that case, the number of times that a char, c, can occur = the number of times
> // c is duplicated in the array

... "can occur" clearly indicates that not all occurrences must be
matched.

I know this is a quibble about words, but interviewers are likely to
care about comments too.

<snip code>
--
Ben.

Juha Nieminen

unread,

Nov 19, 2018, 8:16:48 AM11/19/18

to

Paul <peps...@gmail.com> wrote:
> Below is my solution to finding the longest word in a dictionary given
> a collection of letters.

The easiest solution to this, which at first glance might look like a rather
naive beginner solution, but which is, in fact, relatively efficient (at
least for this kind of task), is to simply go through each word in the
dictionary and see if you can form that word with those given letters.
(The most efficient way to do *that* is to sort the letters of the word
and compare it to the given letters (which have been themselves sorted
as a preprocessing step). I think you can figure out how to make this
comparison so that what's being checked is "can this word be formed
by using letters from this group of letters?")

This might not be the absolutely fastest way of doing it that's
possible, but in most cases it's efficient enough.

Ben Bacarisse

unread,

Nov 19, 2018, 7:50:31 PM11/19/18

to

Paul <peps...@gmail.com> writes:

> Below is my solution to finding the longest word in a dictionary given
> a collection of letters. For more (but perhaps still not total) clarity,
> please see my code below. It's an interview question and interviewers
> don't usually give total clarity either.

My interpretation (and solution) is shown below, after some space in
case some reader does not want to see other solutions just yet.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

#include <cstring>
#include <map>
#include <iostream>

bool characters_match(const std::map<char, int> &counts,
const std::string &word)
{
std::map<char, int> wd_counts;
for (auto c : word)
if (counts.find(c) == counts.end() || counts.at(c) == wd_counts[c])
return false;
else wd_counts[c] += 1;
return true;
}

int main(int argc, char *argv[])
{
if (argc == 2) {
size_t max_len = std::strlen(argv[1]);
std::map<char, int> counts;
for (size_t i = 0; i < max_len; i++)
counts[argv[1][i]] += 1;

size_t cur_max_len = 0;
std::string word, best_word = "";
while (std::cin >> word)
if (word.length() > cur_max_len &&
word.length() <= max_len &&
characters_match(counts, word)) {
cur_max_len = word.length();
best_word = word;
}
std::cout << "Best: " << best_word << "\n";
}
else std::cerr << "one string argument please.\n";
}

--
Ben.

Louis Krupp

unread,

Nov 20, 2018, 3:14:36 AM11/20/18

to

On Mon, 19 Nov 2018 13:16:39 -0000 (UTC), Juha Nieminen
<nos...@thanks.invalid> wrote:

>Paul <peps...@gmail.com> wrote:
>> Below is my solution to finding the longest word in a dictionary given
>> a collection of letters.
>
>The easiest solution to this, which at first glance might look like a rather
>naive beginner solution, but which is, in fact, relatively efficient (at
>least for this kind of task), is to simply go through each word in the
>dictionary and see if you can form that word with those given letters.
>(The most efficient way to do *that* is to sort the letters of the word
>and compare it to the given letters (which have been themselves sorted
>as a preprocessing step). I think you can figure out how to make this
>comparison so that what's being checked is "can this word be formed
>by using letters from this group of letters?")

Once the letters in the pattern are sorted, you've got the makings of
a regular expression; for example, "acecad" would become
"^a{0,2}c{0,2}d?e?$" . You could sort the letters in the words as Juha
suggested and then offload the pattern matching to the Standard
Library.

>
>This might not be the absolutely fastest way of doing it that's
>possible, but in most cases it's efficient enough.

You might be able to speed it up a little by ignoring words that are
shorter than the longest word that matches the pattern.

If nothing else, sorting the letters and building the pattern makes it
really obvious what the code is doing. How often does an interviewer
see a candidate do that?

Louis

Juha Nieminen

unread,

Nov 20, 2018, 9:55:44 AM11/20/18

to

Louis Krupp <lkr...@nospam.pssw.com.invalid> wrote:
> Once the letters in the pattern are sorted, you've got the makings of
> a regular expression; for example, "acecad" would become
> "^a{0,2}c{0,2}d?e?$" . You could sort the letters in the words as Juha
> suggested and then offload the pattern matching to the Standard
> Library.

No need to do it that complicated.

Since the original word has all of its letters sorted, and the given
letters are also sorted, all you have to do is traverse both one
letter at a time, advancing in the given letters only when the
word letter doesn't match. (If a given letter becomes larger
than the word letter in this manner, you can stop because you know
that the word has a letter that's not among the input letters).

In other words, take the first letter of the word, and advance in
the given letters until you find it (or if a given letter becomes larger
than the word letter, you can stop because it won't be found. Likewise
if you reach the end of the given letters, of course.) Then advance in
the word letters until you encounter a different letter. Rinse and repeat
until you have gone through the entire word.

The above is a bit lacking as a full algorithm explanation, but I'm sure
it ought to be enough to give an idea.

Alf P. Steinbach

unread,

Nov 20, 2018, 7:50:47 PM11/20/18

to

On 19.11.2018 11:19, Paul wrote:
> [snip]

> // Practising the common interview question of finding a longest word in a given dictionary
> // with all characters in a given array of letters. The array may contain duplicate characters.
> // In that case, the number of times that a char, c, can occur = the number of times
> // c is duplicated in the array
>

> [snip]

> // Testing code -- "acce" is returned which is correct.
> // Needs more tests, though.
> int main()
> {
> std::unordered_set<std::string> words = {"acceca", "ace", "acce", "accce" };
> std::cout << LongestWord("acecad", words);
> }
>

"acce", said to be a correct answer, doesn't contain all the characters
in "acecad".

As an example of another word that contains /some/ of the characters,
"ace" does.

None of words contain /all/ of the characters.

And none of the words are /permutations/ of the characters.

What's the rule here?

Cheers!,

- Alf

Louis Krupp

unread,

Nov 21, 2018, 1:50:11 AM11/21/18

to

True, the implementation isn't complicated, but in my opinion
describing what it does takes longer than laying out these steps:

1. Sort the letters in the pattern.
2. Generate a regular expression.
3. Sort the letters in each input string.
4. Call routines in <regex> to do the matching.
5. Keep track of the longest matching input string.

The regular expression library has been tested, so there's (hopefully)
no need to worry about that.

If you'd rather do everything in, say, Python, the steps are simple.
If you want it to happen faster, use C++ and <regex>. If it's still
not fast enough, the implementation you laid out is probably the way
to go. Doing it first with regular expressions -- whether it's in
Python or in C++ -- would give you a baseline you could use for
testing.

( I almost said "do it in Python or a shell script" but then I thought
about what a nightmare it would be to do it in a shell script. There
was a time when I would have said "Perl," meaning Perl 5, but that's
out of fashion now, so ... Perl 6. Yes, that's it. Perl 6. :))

Louis

Juha Nieminen

unread,

Nov 21, 2018, 2:40:15 AM11/21/18

to

Louis Krupp <lkr...@nospam.pssw.com.invalid> wrote:
> If you'd rather do everything in, say, Python, the steps are simple.
> If you want it to happen faster, use C++ and <regex>. If it's still
> not fast enough, the implementation you laid out is probably the way
> to go. Doing it first with regular expressions -- whether it's in
> Python or in C++ -- would give you a baseline you could use for
> testing.

Firstly, I have my doubts that building an ascii string that's a regexp
that does what you want is any simpler than doing the comparison yourself.

Secondly, the original task is, essentially and pretty much, comparing
binary bytes to each other. Going the route of creating a dynamic string
from the binary data, have a library parse that string, allocate more
data containers within itself (probably some kind of tree or such),
and then run a complicated regexp matching algorithm not only sounds
like it's needlessly inefficient, but also needlessly complicated for
such a relatively simple task. It's like shooting flies with a cannon.

Thirdly, depending on what kind of thing it is that the task is trying
to teach, it may actually be more didactic and beneficial to do the
comparison yourself, as a programming exercise.

Louis Krupp

unread,

Nov 21, 2018, 5:48:05 AM11/21/18

to

The world would be a scary place if everyone agreed with me.

Louis

Vir Campestris

unread,

Nov 22, 2018, 4:15:58 PM11/22/18

to

On 19/11/2018 10:19, Paul wrote:
> Below is my solution to finding the longest word in a dictionary given
> a collection of letters.

You want to rework your question.

I think a dictionary is a list of words, so I'd read through it looking
for the longest gap between two non-word characters.

But it's apparent that isn't at all what you mean.

Andy

Pavel

unread,

Nov 22, 2018, 8:01:45 PM11/22/18

to

Paul wrote:
> Below is my solution to finding the longest word in a dictionary given
> a collection of letters. For more (but perhaps still not total) clarity,
> please see my code below. It's an interview question and interviewers
> don't usually give total clarity either.
> I would love to see what people think of my solution. In particular
> is it better to use a vector or a set for the sorted container
> (or doesn't it matter?)
> [BTW, this solution wasn't actually presented at the interview itself,
> so even if it's good it doesn't mean I got the job].
>
> Many thanks.
>
> // Practising the common interview question of finding a longest word in a given dictionary
> // with all characters in a given array of letters. The array may contain duplicate characters.
> // In that case, the number of times that a char, c, can occur = the number of times
> // c is duplicated in the array
> #include <string>
> #include <unordered_set>
> #include <vector>
> #include <iostream>
> #include <algorithm>
> #include <set>
>
> // First step is to acquire the character counts in the array of letters.
> std::vector<int> charCounts(const std::string& arrayOfLetters)
> {
> std::vector<int>counts(256);

vector<unsigned>?

> for(char c : arrayOfLetters)
> ++counts[c];

(unsigned char)c ?

>
> return counts;
> }
>
> std::string LongestWord(const std::string& arrayOfLetters, const std::unordered_set<std::string>& Dictionary)
> {
> // Preprocessing of array of letters
> const std::vector<int>& characterCounts = charCounts(arrayOfLetters);
> // Test the longest words first
> auto cmp = [](const std::string& word1, const std::string& word2)->bool{return word1.size() > word2.size();};
>
> const std::set<std::string, decltype(cmp)> sortedDictionary(Dictionary.begin(), Dictionary.end(), cmp);
> // A vector solution would have been as below. Should I have done this?
> /*std::vector<std::string> sortedDictionary(Dictionary.begin(), Dictionary.end());
> std::sort(sortedDictionary.begin(), sortedDictionary.end(), cmp);*/

why do you have to sort? this will allocate tons of memory for your sorted set
and will pass the dictionary O(log(size)) times -- all to save a part of one pass?

Öö Tiib

unread,

Nov 23, 2018, 10:18:54 AM11/23/18

to

On Monday, 19 November 2018 12:19:51 UTC+2, Paul wrote:
> Below is my solution to finding the longest word in a dictionary given
> a collection of letters. For more (but perhaps still not total) clarity,
> please see my code below. It's an interview question and interviewers
> don't usually give total clarity either.
> I would love to see what people think of my solution. In particular
> is it better to use a vector or a set for the sorted container
> (or doesn't it matter?)
> [BTW, this solution wasn't actually presented at the interview itself,
> so even if it's good it doesn't mean I got the job].
>
> Many thanks.
>
> // Practising the common interview question of finding a longest word in a given dictionary
> // with all characters in a given array of letters. The array may contain duplicate characters.
> // In that case, the number of times that a char, c, can occur = the number of times
> // c is duplicated in the array
> #include <string>
> #include <unordered_set>
> #include <vector>
> #include <iostream>
> #include <algorithm>
> #include <set>
>
> // First step is to acquire the character counts in the array of letters.
> std::vector<int> charCounts(const std::string& arrayOfLetters)
> {
> std::vector<int>counts(256);

That is pessimization of std::array<int,256> into std::vector<int>.

> for(char c : arrayOfLetters)
> ++counts[c];
>
> return counts;
> }
>
> std::string LongestWord(const std::string& arrayOfLetters, const std::unordered_set<std::string>& Dictionary)
> {
> // Preprocessing of array of letters
> const std::vector<int>& characterCounts = charCounts(arrayOfLetters);
> // Test the longest words first
> auto cmp = [](const std::string& word1, const std::string& word2)->bool{return word1.size() > word2.size();};

There is point to sort outside when several arrays of letters need
to be validated for same dictionary. It is likely wasteful to sort
for one pass search.

> const std::set<std::string, decltype(cmp)> sortedDictionary(Dictionary.begin(), Dictionary.end(), cmp);

That std::set destroys your algorithm when there are multiple entries
with same length in dictionary. Your test data below does not have
multiple entries with same length but generally it is the case.
Some std::multiset would work.

> // A vector solution would have been as below. Should I have done this?
> /*std::vector<std::string> sortedDictionary(Dictionary.begin(), Dictionary.end());
> std::sort(sortedDictionary.begin(), sortedDictionary.end(), cmp);*/

In sorted dictionary you should first skip "too long" entries
with std::upper_bound.

> // Sorted by length so we only need to find the first word that works.
> for(const std::string& w : sortedDictionary)
> {
> std::vector<int> charPositions(256);
> bool wordWorks = true;
> for(char c : w)
> if(++charPositions[c] > characterCounts[c])
> {
> wordWorks = false;
> break;
> }

I did not understand what you meant by "using continue" in
other post? The break makes perfect sense here.

> if(wordWorks)
> return w;
> }

>
> std::cout << "\No words in the dictionary satisfied the conditions";
> return "";
> }
>
> // Testing code -- "acce" is returned which is correct.
> // Needs more tests, though.
> int main()
> {
> std::unordered_set<std::string> words = {"acceca", "ace", "acce", "accce" };
> std::cout << LongestWord("acecad", words);
> }

Download some words from somewhere
stuff: https://github.com/dwyl/english-words
Then you can use subsets of those for whatever tests.

If you try to maximize performance of your program then
you may discover that there are more efficient algorithms
than std::sort for sorting dictionary with lot of entries with
equal length by length.
These are likely not good idea to write in answer for a test
assignment.