string::find complexity

Christopher J. Pisz

unread,

Jan 27, 2017, 10:17:57 AM1/27/17

to

I am going to be looking up whether or not my custom data structure
contains a string one character at a time.

For example:

Check "c"
Check "ca"
Check "cat"
Check "cate"
Check "categ"
..
Check "category"
...

All I need to know is if my data structure contains the string or not.

I was thinking of trying to implement my own Radix Trie.
https://en.wikipedia.org/wiki/Radix_tree

I'd give it a method contains and see if any node in the Trie contains
the string in question.

It will only contain the actual full words, but allow me to search for
the words as they are being typed out.

When doing insertions into this data structure, it appears as though I
am going to do a std::string find on all children to find if the value
and the node/edge (depending on how I implement) has characters in
common and how many.

How complex is std::find(substr, offset)?
I wonder if I am barking up the wrong tree.

Christopher J. Pisz

unread,

Jan 27, 2017, 10:39:42 AM1/27/17

to

Nm. I don't think I'll be using std::string find at all. Since I am
always comparing string at position 0. I guess I am looking at O(n) for
equality basically.

Paavo Helde

unread,

Jan 27, 2017, 10:44:20 AM1/27/17

to

www.cplusplus.com says about string::find complexity: "Unspecified, but
generally up to linear in length()-pos times the length of the sequence
to match (worst case)."

The non-worst-case complexity is most probably just linear in length()-pos.

Anyway, from your descriptions it looks like you are trying to match
beginnings of words. If so, you should put these words in a map or set
and use lower_bound() for looking up the start point of the stretch of
the matching words.

hth
Paavo

Alf P. Steinbach

unread,

Jan 27, 2017, 5:24:26 PM1/27/17

to

On 27.01.2017 16:17, Christopher J. Pisz wrote:
>
[snip]

> How complex is std::find(substr, offset)?
> I wonder if I am barking up the wrong tree.

It's clear that your task is finding a substring, possibly just a string
prefix, and `std::find` does not support either of those tasks. The
title of the post also indicates that you meant to refer to
`std::basic_string::find` here. I.e., `s.find( substr, offset )` with
`s` a `std::basic_string<T>` for some suitable character type `T`.

`std::basic_string::find` has four overloads, two of which match the
informal argument specification above.

The complexity is not specified by the standard, but with simplest
possible straightforward implementation a call `s.find( t )` has complexity

O(SL*length(t))

where SL = max( length(s) - length(t), 0).

This is then the worst case.

For the best case a quality implementation should tend to complexity
O(SL), by never examining a character in s more than once. Wikipedia on
the Knuth-Morris-Pratt: “the expected complexity of searching string S[]
of length k is on the order of k comparisons or O(k)”. However, the
expected complexity, what you'll usually get, is not the worst case,
whcih is as noted above. I don't know how likely the worst case is.

As I recall it I first became familiar with the KMP algorithm via an
article in Scientific American by Niklaus Wirth (!), about how one could
apply simple transformations to algorithms. But I may possibly have a
false memory there, because I've never been able to find that article
again. On the other hand I've not searched for it lately.

Cheers & hth.!,

- Alf