Re: [lock-free] About concurrent hash table

733 views
Skip to first unread message

Kimo Crossman

unread,
Aug 6, 2012, 2:42:01 AM8/6/12
to lock...@googlegroups.com
Hi you might want to look at the archives for the Java concurrency JDK

http://cs.oswego.edu/mailman/listinfo/concurrency-interest 

On Sun, Aug 5, 2012 at 7:43 PM, sunqxj <sun...@gmail.com> wrote:
Hi all, 

I am looking for the solution for a good concurrent hash table which might have quite heavy read-write conflicts. Also the hash table is losslessness which means it needs a link list to resolve collision. I came across many blogs and web sites. I saw lots of people published their solutions and showed the good performance on their benchmarks. Could someone tell me where to find those benchmarks in order to have a fair comparison among different solutions? 

Thanks in advance. 

sunqxj

unread,
Aug 6, 2012, 2:45:30 AM8/6/12
to lock...@googlegroups.com
Sorry to mention that I am looking for the C or C++ solution. 

Dmitriy V'jukov

unread,
Aug 6, 2012, 3:27:21 AM8/6/12
to lock...@googlegroups.com

On Aug 6, 2012 9:14 AM, "sunqxj" <sun...@gmail.com> wrote:
>
> Hi all, 
>
> I am looking for the solution for a good concurrent hash table which might have quite heavy read-write conflicts. Also the hash table is losslessness which means it needs a link list to resolve collision. I came across many blogs and web sites. I saw lots of people published their solutions and showed the good performance on their benchmarks. Could someone tell me where to find those benchmarks in order to have a fair comparison among different solutions? 

Use your application. That's the best benchmark.

Nicola Bonelli

unread,
Aug 6, 2012, 2:55:20 PM8/6/12
to lock...@googlegroups.com
Hi sunqxj,

hash tables are data structures that wok well with concurrent access.
If you have too much collisions probably you are using a wrong sized
table or the hash function is not working properly on your data set.
In addition, for space locality, you might consider a vector instead
of a list of collisions, especially if the values to be stored are
small enough.

From the concurrency point of view, please don't forget to avoid false
sharing and do not consider reader-writer mutex that, as shown in
Dmitriy site, prevent also readers from scaling linearly on a
multi-core architecture.

My 2 cents,
Nicola



On Mon, Aug 6, 2012 at 4:43 AM, sunqxj <sun...@gmail.com> wrote:
> Hi all,
>
> I am looking for the solution for a good concurrent hash table which might
> have quite heavy read-write conflicts. Also the hash table is losslessness
> which means it needs a link list to resolve collision. I came across many
> blogs and web sites. I saw lots of people published their solutions and
> showed the good performance on their benchmarks. Could someone tell me where
> to find those benchmarks in order to have a fair comparison among different
> solutions?
>
> Thanks in advance.



--
The difference between theory and practice is bigger in practice than in theory.

xiaoqing jin

unread,
Aug 6, 2012, 3:55:12 PM8/6/12
to lock...@googlegroups.com

Hi Nicola,

Thanks for the tips. I am examing all the options here. I favor the lockless design and as for the change from link list to vector, thanks for the suggestion. It is a worthy change. As for my application, the key value would be the index of the hash table and the value would be a pointer to a data structure. Also, I do not understand the part to prevent readers from scaling linearly on a multi-core architecture. Please bear with my novice. thanks in advance.  

Nicola Bonelli

unread,
Aug 6, 2012, 4:03:11 PM8/6/12
to lock...@googlegroups.com

Please  read the following article:

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem

You might also want to read:

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex

Where a distributed rw mutex is presented.

Ciao,
Nicola

xiaoqing jin

unread,
Aug 6, 2012, 4:07:00 PM8/6/12
to lock...@googlegroups.com

Thanks. I finish reading all his posters before I came to this group.
I think some ideas needs practice to understand it thoroughly. thanks for the hint.

Samy Bahra

unread,
Aug 6, 2012, 6:47:59 PM8/6/12
to lock...@googlegroups.com
Concurrency Kit has a SPMC hash table (optimized for TSO architectures) and URCU has a lock-less hash table (just google URCU), just google "Concurrency Kit". The latter is in the process of a major over-haul. Also, though it is dead, you may be interested in "nbds". URCU is C++-compatible, CK is not.

Regards.
--
Samy Al Bahra [http://repnop.org]

xiaoqing jin

unread,
Aug 6, 2012, 10:23:46 PM8/6/12
to lock...@googlegroups.com
I am also trying to compare CDS (Concurrent Data Structures)  http://sourceforge.net/projects/libcds/ which is a C++ template library of lock-free and fine-grained lock-based algorithms. And I will turn them for my applications. 
Thanks for the suggestions. I will look into them. 
--
Xiaoqing Jin

Email: ji...@cs.ucr.edu
University of California, Riverside


pww19...@gmail.com

unread,
Apr 2, 2015, 2:28:48 AM4/2/15
to lock...@googlegroups.com, sun...@gmail.com

Title:       The core of the core of the big data solutions -- Map
Author:      pengwenwei
Email:       
Language:    c++
Platform:    Windows, linux
Technology:  Perfect hash algorithm
Level:       Advanced
Description: Map algorithm with high performance
Section      MFC c++ map stl
SubSection   c++ algorithm
License:     (GPLv3)

    Download demo project - 1070 Kb
    Download source - 1070 Kb

Introduction:
For the c++ program, map is used everywhere.And bottleneck of program performance is often the performance of map.Especially in the case of large data,and the business association closely and unable to realize the data distribution and parallel processing condition.So the performance of map becomes the key technology.

In the work experience with telecommunications industry and the information security industry, I was dealing with the big bottom data,especially the most complex information security industry data,all can’t do without map.

For example, IP table, MAC table, telephone number list, domain name resolution table, ID number table query, the Trojan horse virus characteristic code of cloud killing etc..

The map of STL library using binary chop, its has the worst performance.Google Hash map has the optimal performance and memory at present, but it has repeated collision probability.Now the big data rarely use a collision probability map,especially relating to fees, can’t be wrong.

Now I put my algorithms out here,there are three kinds of map,after the build is Hash map.We can test the comparison,my algorithm has the zero probability of collision,but its performance is also better than the hash algorithm, even its ordinary performance has no much difference with Google.

My algorithm is perfect hash algorithm,its key index and the principle of compression algorithm is out of the ordinary,the most important is a completely different structure,so the key index compression  is fundamentally different.The most direct benefit for program is that for the original map need ten servers for solutions but now I only need one server.
Declare: the code can not be used for commercial purposes, if for commercial applications,you can contact me with QQ 75293192.
Download:
https://sourceforge.net/projects/pwwhashmap/files

Applications:
First,modern warfare can’t be without the mass of information query, if the query of enemy target information slows down a second, it could lead to the delaying fighter, leading to failure of the entire war. Information retrieval is inseparable from the map, if military products use pwwhashMap instead of the traditional map,you must be the winner.

Scond,the performance of the router determines the surfing speed, just replace open source router code map for pwwHashMap, its speed can increase ten times.
There are many tables to query and set in the router DHCP ptotocol,such as IP,Mac ,and all these are completed by map.But until now,all map are  using STL liabrary,its performance is very low,and using the Hash map has error probability,so it can only use multi router packet dispersion treatment.If using pwwHashMap, you can save at least ten sets of equipment.

Third,Hadoop is recognized as the big data solutions at present,and its most fundamental thing is super heavy use of the map,instead of SQL and table.Hadoop assumes the huge amounts of data so that the data is completely unable to move, people must carry on the data analysis in the local.But as long as the open source Hadoop code of the map changes into pwwHashMap, the performance will increase hundredfold without any problems.


Background to this article that may be useful such as an introduction to the basic ideas presented:
http://blog.csdn.net/chixinmuzi/article/details/1727195


在 2012年8月6日星期一 UTC+8上午10:43:30,sunqxj写道:
Reply all
Reply to author
Forward
0 new messages