CityHash 1.0.3 has been released!

163 views
Skip to first unread message

gpike

unread,
Oct 6, 2011, 5:10:06 PM10/6/11
to cityhash-discuss
CityHash 1.0.3 has been released!

http://code.google.com/p/cityhash/

enjoy,

Geoff

Sam

unread,
Jan 3, 2013, 1:18:44 AM1/3/13
to cityhash...@googlegroups.com, gp...@google.com
Could you explain how to use city hash with google sparse hashmap? I'm using visual studio 2010.
Thanks in advance.

chinapo...@gmail.com

unread,
May 6, 2015, 3:56:51 AM5/6/15
to cityhash...@googlegroups.com, gp...@google.com
Title:       The core of the big data solutions -- Map
Author:      pengwenwei
Email:       
Language:    c++
Platform:    Windows, linux
Technology:  Perfect hash algorithm
Level:       Advanced
Description: A high performance map algorithm
Section      MFC c++ map stl
SubSection   c++ algorithm
License:     (GPLv3)

    Download demo project - 1070 Kb
    Download source - 1070 Kb

Introduction:
For the c++ program, map is used everywhere. And the bottleneck of program performance is often the performance of map. Especially in following two scenarios, the performance of map becomes the most critical technology:
    - Big Data
    - Because of strong bussiness coupling, unable to realize data distribution and parallel processing

I have been working in telecommunication and information security industry for a few years, and have a lot of experience with bottom big data. From my perspective, information security industry's data is the most complicated. All of them relay on map. For example, IP table, MAC table, telephone number list, domain name resolution table, ID number table query, the cloud killing of trojan/virus characteristic code etc..

The STL map utilizes binary chop, which causes a bad performance. Google Hash map has the best performance at present, but it has repeated collision probability. Currently the big data rarely use a collision probability map. In some special scenarios (Eg. charing/billing), any mistake is unacceptable.

Now I would like to publish my algorithms here. It includes three kinds of map. You will get the hash map after building. By comparing the test result, you will find that my algorithm has zero collision probability, but its performance is better than the hash algorithm. Even the general map's performance has no much difference with Google.

My algorithm is a perfect hash algorithm. Its key index algorithm and the theory of compression algorithm is out of the ordinary. The most important point is that, it uses a completely different structure, so the key index compression is fundamentally different. The most direct benefit for program is that, you need only one server to support the solution which needs ten servers before.

Declare: the code can not be used for commercial purposes. For commercial applications, please contact with me through QQ 75293192.

Download:
https://sourceforge.net/projects/pwwhashmap/files

You can find my design ideas from following article:
http://blog.csdn.net/chixinmuzi/article/details/1727195


Development Manual:
There are three different map algorithm, used in different application scenarios:
1,memMap:  Based on memory No hard disk consumption.
2,diskMap: Based on the hard disk No memory consumption.
3,hashMap: No delete function, but the best performance.
memMap and diskMap can turn to hashMap by memMap2HashMap and diskMap2HashMap.

In addition,Provide Google HashMap for comparative test.
Algorithm of Google,the performance is not good, but also has the collision probability .

Reply all
Reply to author
Forward
0 new messages