Title: The core of the big data solutions -- Map
Author: pengwenwei
Email: wenwei19710430
Language: c++
Platform: Windows, linux
Technology: Perfect hash algorithm
Level: Advanced
Description: A high performance map algorithm
Section MFC c++ map stl
SubSection c++ algorithm
License: (GPLv3)
Download demo project - 1070 Kb
Download source - 1070 Kb
Introduction:
For the c++ program, map is used everywhere. And the bottleneck of program performance is often the performance of map. Especially in following two scenarios, the performance of map becomes the most critical technology:
- Big Data
- Because of strong bussiness coupling, unable to realize data distribution and parallel processing
I have been working in telecommunication and information security industry for a few years, and have a lot of experience with bottom big data. From my perspective, information security industry's data is the most complicated. All of them relay on map. For example, IP table, MAC table, telephone number list, domain name resolution table, ID number table query, the cloud killing of trojan/virus characteristic code etc..
The STL map utilizes binary chop, which causes a bad performance. Google Hash map has the best performance at present, but it has repeated collision probability. Currently the big data rarely use a collision probability map. In some special scenarios (Eg. charing/billing), any mistake is unacceptable.
Now I would like to publish my algorithms here. It includes three kinds of map. You will get the hash map after building. By comparing the test result, you will find that my algorithm has zero collision probability, but its performance is better than the hash algorithm. Even the general map's performance has no much difference with Google.
My algorithm is a perfect hash algorithm. Its key index algorithm and the theory of compression algorithm is out of the ordinary. The most important point is that, it uses a completely different structure, so the key index compression is fundamentally different. The most direct benefit for program is that, you need only one server to support the solution which needs ten servers before.
Declare: the code can not be used for commercial purposes. For commercial applications, please contact with me through QQ 75293192.
Download:
https://sourceforge.net/projects/pwwhashmap/filesYou can find my design ideas from following article:
http://blog.csdn.net/chixinmuzi/article/details/1727195Development Manual:
There are three different map algorithm, used in different application scenarios:
1,memMap: Based on memory No hard disk consumption.
2,diskMap: Based on the hard disk No memory consumption.
3,hashMap: No delete function, but the best performance.
memMap and diskMap can turn to hashMap by memMap2HashMap and diskMap2HashMap.
In addition,Provide Google HashMap for comparative test.
Algorithm of Google,the performance is not good, but also has the collision probability .