unordered_set in C++

1,792 views
Skip to first unread message

abhishek gupta

unread,
Mar 10, 2019, 4:25:50 AM3/10/19
to std-dis...@isocpp.org
Hi,

I am converting a C file to C++. Since the functions will still be called from C code I will place the entire file in extern "C" block.
The file contains following code-

struct node{
char* name;
struct node* next;
};

static struct node* list;  //file scope

void insertInList(FILE*){
read file line-by-line and add names present in file to the list
}
bool isNamePresent(char* name){
//iterate through Linked-list & returnt true if present
}

Now, it appears to me that the complexity of 'isNamePresent' can be improved by using unordered_set<string> . But, looking at customer usage it appears that generally few names are entered in the list.(sometimes it is just 1)

Q1) So, should I still change the code to use unordered_set? Will it be still considered a good change in terms of performance or any other terms?If so please explain why??
Also, does scenarios like "what if user enters hundred thousand names in file" are considered during software development when we know the general use pattern?

Q2) How should I write the set in the file? What is the difference between following lines written in global space-

static std::unordered_set<std::string> st;
vs 
namespace{
static std::unordered_set<std::string> st;
}//anonymous namespace

Is the first one initialized with some garbage value?

Regards,
Abhishek Gupta

Andrew Tomazos

unread,
Mar 10, 2019, 6:04:31 AM3/10/19
to std-discussion, connect.a...@gmail.com
On Sun, Mar 10, 2019 at 6:25 PM abhishek gupta <connect.a...@gmail.com> wrote:
Hi,

I am converting a C file to C++. Since the functions will still be called from C code I will place the entire file in extern "C" block.
The file contains following code-

struct node{
char* name;
struct node* next;
};

static struct node* list;  //file scope

void insertInList(FILE*){
read file line-by-line and add names present in file to the list
}
bool isNamePresent(char* name){
//iterate through Linked-list & returnt true if present
}

Now, it appears to me that the complexity of 'isNamePresent' can be improved by using unordered_set<string> . But, looking at customer usage it appears that generally few names are entered in the list.(sometimes it is just 1)

Q1) So, should I still change the code to use unordered_set? Will it be still considered a good change in terms of performance or any other terms?If so please explain why??
Also, does scenarios like "what if user enters hundred thousand names in file" are considered during software development when we know the general use pattern?

Construct a benchmark that exercises the API in a realistic scenario.  Record a baseline.  Make your candidate optimization.  Compare performance against baseline.  If better, commit optimization.


Q2) How should I write the set in the file? What is the difference between following lines written in global space-

static std::unordered_set<std::string> st;
vs 
namespace{
static std::unordered_set<std::string> st;
}//anonymous namespace

Is the first one initialized with some garbage value?

They are both default-constructed.

This is more of a StackOverflow question.  Try posting in the [C++] tag at www.stackoverflow.com.  (Check if your question is already answered before posting.)

Thiago Macieira

unread,
Mar 10, 2019, 11:48:11 AM3/10/19
to std-dis...@isocpp.org
On Sunday, 10 March 2019 03:04:15 PDT Andrew Tomazos wrote:
> > Q1) So, should I still change the code to use unordered_set? Will it be
> > still considered a good change in terms of performance or any other
> > terms?If so please explain why??
> > Also, does scenarios like "what if user enters hundred thousand names in
> > file" are considered during software development when we know the general
> > use pattern?
>
> Construct a benchmark that exercises the API in a realistic scenario.
> Record a baseline. Make your candidate optimization. Compare performance
> against baseline. If better, commit optimization.

And when doing that, try a third container: a vector. Your original singly-
linked list is actually the worst possible container due to cache non-locality
and due to memory overhead.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel System Software Products



Thiago Macieira

unread,
Mar 11, 2019, 11:46:22 PM3/11/19
to std-dis...@isocpp.org
On Sunday, 10 March 2019 08:48:06 PDT Thiago Macieira wrote:
> And when doing that, try a third container: a vector. Your original singly-
> linked list is actually the worst possible container due to cache
> non-locality and due to memory overhead.

And as others have found out, that is getting very difficult. The current Git
version of xkbcommon can only be built with Meson and you can't install Meson
on RHEL 6.6 (its Python3 is too old).

You can build the current xkbcommon release. And past releases that are
greater than the minimum that Qt requires, of course.

But the point remains: at some point, it becomes difficult to actually build
those dependencies because the distribution is old. Also note Qt 6 will not
ship bundled libraries. There will be a way to install them from sources, but
then we go back to the problem of actually building them.

Thiago Macieira

unread,
Mar 12, 2019, 12:00:19 AM3/12/19
to std-dis...@isocpp.org
On Monday, 11 March 2019 20:46:17 PDT Thiago Macieira wrote:
> And as others have found out, that is getting very difficult. The current
> Git version of xkbcommon can only be built with Meson and you can't install
> Meson on RHEL 6.6 (its Python3 is too old).

Oops, somehow I managed to write the reply to a completely different message
here, but I have no clue how.

Please ignore.
Reply all
Reply to author
Forward
0 new messages