inverting a hash function

Kartit

unread,

Nov 18, 2009, 7:36:04 AM11/18/09

to

Hi,
I need to implement an algorithm to invert a hash function but I am
not sure if it is possible:
given a sequence of characters : ABC and a Hash function the following
algorithm produces a hash.

for each character i in the sequence of characters "ABC"
H(X)=5453 XOR Char[i] (ascii code E.G 49);
repeat till end of characters

if I apply the the algorithm on one character then the same algorithm
gives me the original input text(easy), but with several characters it
changes. How should I proceed the analysis if I am bound with memory
limits?I guess birthday attack would not be possible.

Kartit

unread,

Nov 18, 2009, 7:42:53 AM11/18/09

to

The output of the previous algorithm is a long and the sequence of
characters varies between 5 and 20. the returned long is a signed 64-
bit integer type

for each character i in the sequence of characters "ABC"
H(X)=5453 XOR Char[i] (ascii code E.G 49);
repeat till end of characters

return hash(long)

bmearns

unread,

Nov 18, 2009, 8:42:58 AM11/18/09

to

By definition, hashes are meant to be non-reversible, but the validity
of that depends of course on how good the hash algorithm is. However,
I can say for certain that you won't (in general) find a unique
solution to your problem: the output is only 64 bits and the input is
up to 160, so there are more possible inputs than there are possible
outputs. That means at least some outputs will have two or more
possible inputs so even if you can find those inputs (i.e., reverse
the hash) you won't know which one was actually used. It's easy to
understand in the context of familiar functions like a parabola: f(x)
= x^2. If f(x) = 4, what's x? It could be 2 or it could be -2, you
have no way of knowing, and it's the same for your hash. But I might
be going too much into detail about something you already know.

I'm a little confused about your algorithm, though: What is X in H(X)?
The equation you've given isn't really valid without some explanation
of X and i. I'm assuming you're supposed to perform H() on each
character of the input, but how do you put them together? This
algorithm isn't going to produce a single 64 bit output, it's going to
produce one 16-bit output per input character. A hash function
requires some way to combine each of these per-character outputs into
a single value, like XORing them all together, or adding them or
whatever. Something more like this:

H = 0;
for(i=0; i<inputLength; i++)
{
hi = 5453 ^ input[i]; //per-character hash
H += hi //some way of accumulating/combining
the per-character hashes into a single value.
}

If you clarify the algorithm and possibly put it into context (what
exactly are you trying to achieve?), it might improve the chances that
someone can help you.

-Brian

rossum

unread,

Nov 18, 2009, 12:58:41 PM11/18/09

to

On Wed, 18 Nov 2009 04:36:04 -0800 (PST), Kartit <ana...@gmail.com>
wrote:

This is an extremely weak hash function. It does not incorporate
position so all permutations of the same string will give the same
result:

hash(ABC) = hash(ACB) = hash(BAC) = hash(CAB) etc.

Even with a better hash function you will not be able to invert it
perfectly since the input can be longer than the output so there must
be duplicate inputs giving the same output.

rossum

Unruh

unread,

Nov 18, 2009, 4:38:35 PM11/18/09

to

Kartit <ana...@gmail.com> writes:

>Hi,
>I need to implement an algorithm to invert a hash function but I am
>not sure if it is possible:
>given a sequence of characters : ABC and a Hash function the following
>algorithm produces a hash.

A hash is many to one. Ie there are many many original texts which
produce the same hash. As an example, the first letter of a word is a
hash of that word.
Obviously if you have "a" as the hash you cannot invert it. The original
could have been
apple
apricot
able
assunder
or many many other words.

>for each character i in the sequence of characters "ABC"
>H(X)=5453 XOR Char[i] (ascii code E.G 49);
>repeat till end of characters

That is not a hash. a hash is a mapping from the input to a fixed length
output. MD5 produces 128 bit outputs, no matter how long the input. What
you are describing is a particularly useless encryption.
( and encryption is a one to one mapping of the input to the output)

>if I apply the the algorithm on one character then the same algorithm
>gives me the original input text(easy), but with several characters it
>changes. How should I proceed the analysis if I am bound with memory
>limits?I guess birthday attack would not be possible.

A better example of a hash would be
H(i)=H(i-1)XOR Char[i]
H(0)=5453

Noob

unread,

Nov 19, 2009, 5:12:23 AM11/19/09

to

Unruh wrote:

> The original could have been
> apple
> apricot
> able
> assunder
> or many many other words.

Assunder?

As in the donkey below? :-)

Kartit

unread,

Nov 19, 2009, 6:52:52 AM11/19/09

to

> -Brian- Hide quoted text -
>
> - Show quoted text -

Thanks all for your answer:
The problem can be described as follows:
I am given a hash and I should return a string.
The algorithm is the following
long encode(CHAR input[]){
long num= 5453 ;

for(i = 0; i<str.length; i++)
num= (num* 22) ^ input[i];
return key;
}
the input can be any ascii printable character starting from character
'!' and ending with character '~' in the ascii table. Is it possible
to invert, several inputs can has the same output so I think it is
mathematically not possible to factorize XOR operations since we don't
know the order of the inputs used.

Message has been deleted

rossum

unread,

Nov 19, 2009, 8:39:51 AM11/19/09

to

On Thu, 19 Nov 2009 03:59:26 -0800 (PST), Kartit <ana...@gmail.com>
wrote:

>On Nov 18, 12:36 pm, Kartit <ana...@gmail.com> wrote:

>sorry I miss wrote the algorithm
>it is
>long encode(char input[]){
>long hashed=5453;
>for each character in input
>hashed = (hashed * 22) XOR input[i];
>return hashed;
With this hash algorithm we can use a 64 character string to map to
any desired 64 bit long. Each character of the string can be mapped
to each bit of the target value. Pick two characters, one ending in a
0 bit and one in a 1 bit in their unicode representation - I picked
'A' and 'B'. Set up a 64 character array and scan it from the right
hand end flipping characters where the corresponding bits in the
target number and the current hash do not match.

The example code below is written for clarity, not for speed.

rossum

// *** Begin Java code ***

public static void main(String[] args) {

Random rand = new Random();

for (int i = 0; i < 10; ++i) {
long target = rand.nextLong();
String inverted = invertHash(target);

System.out.println("Target value = " + target);
System.out.println("Hash string = " + inverted);
System.out.println("Hash = " + calcHash(inverted));
System.out.println();
}

System.out.println("Done.");
} // end main()

static long calcHash(String text) {
long hash = 5453;
for (int i = 0; i < text.length(); ++i) {
hash *= 22;
hash ^= text.charAt(i);
}
return hash;
}

private static String invertHash(long target) {
// Set up array with all one character.
char[] result = new char[64];
for (int i = 0; i < 64; ++i) { result[i] = 'A'; }

// Check bits in reverse order, flipping where no match.
for (int i = 63; i >= 0; --i) {
long currentHash = calcHash(new String(result));
if (getBit(i, target) != getBit(i, currentHash)) {
result[i] = 'B';
}
}
return new String(result);
}

static int getBit(int n, long bits) {
return (int)((bits >>> (63 - n)) & 1L);
}

// *** End Java code ***

Kartit

unread,

Nov 19, 2009, 6:46:00 PM11/19/09

to

On Nov 19, 1:39 pm, rossum <rossu...@coldmail.com> wrote:
> On Thu, 19 Nov 2009 03:59:26 -0800 (PST),Kartit<ana...@gmail.com>

> // *** End Java code ***- Hide quoted text -

>
> - Show quoted text -

What about Rainbow tables, I read about them somewhere? any idea how
are they implmeneted?

Maaartin

unread,

Nov 19, 2009, 11:16:12 PM11/19/09

to

On Nov 20, 12:46 am, Kartit <ana...@gmail.com> wrote:
> What about Rainbow tables, I read about them somewhere? any idea how
> are they implmeneted?

Just UTFG and look at the first result: http://en.wikipedia.org/wiki/Rainbow_table
But you don't need it, this hash is very weak as rossum has shown. If
you're looking for shorter string leading to a given hash value, just
modify more bits at once, I suppose you can get strings of 16 or less
characters without much effort.

What is the reason for using 22 as the multiplication factor? Because
of it being even, the hash looses all information about all but the
last 64 characters, i.e.,
hash(a.concat(suffix)) == hash(b.concat(suffix))
for any String a, b, and suffix with suffix.length() >= 64.

Kartit

unread,

Nov 20, 2009, 4:55:10 AM11/20/09

to

actually the algorithm was a modification to the following algo:
long H= 5381;

for(i = 0; i<str.length; i++)

H= (H* 33) ^ str[i];
But I don't see how rossum's algorithm can find the initial string
given there are 93 ascii codes between printables characters codes
33-126 . In his case it is only using two characters 'A' 'B', in
reality there are many inputs.

rossum

unread,

Nov 20, 2009, 5:16:09 AM11/20/09

to

On Thu, 19 Nov 2009 15:46:00 -0800 (PST), Kartit <ana...@gmail.com>
wrote:

>What about Rainbow tables, I read about them somewhere? any idea how
>are they implmeneted?
Rainbow tables would probably be overkill for reversing this hash
function - it is not a cryptographic hash and can easily be reversed.
I am an amateur so if I can reverse it then it must be extremely weak.

The Wikipedia article on Rainbow Tables is reasonable and has some
useful links to further reading.

rossum

unread,

Nov 20, 2009, 5:38:00 AM11/20/09

to

On Fri, 20 Nov 2009 01:55:10 -0800 (PST), Kartit <ana...@gmail.com>
wrote:

>On Nov 20, 4:16 am, Maaartin <grajc...@seznam.cz> wrote:

>> On Nov 20, 12:46 am, Kartit <ana...@gmail.com> wrote:
>>
>> > What about Rainbow tables, I read about them somewhere? any idea how
>> > are they implmeneted?
>>
>> Just UTFG and look at the first result:http://en.wikipedia.org/wiki/Rainbow_table
>> But you don't need it, this hash is very weak as rossum has shown. If
>> you're looking for shorter string leading to a given hash value, just
>> modify more bits at once, I suppose you can get strings of 16 or less
>> characters without much effort.
>>
>> What is the reason for using 22 as the multiplication factor? Because
>> of it being even, the hash looses all information about all but the
>> last 64 characters, i.e.,
>> hash(a.concat(suffix)) == hash(b.concat(suffix))
>> for any String a, b, and suffix with suffix.length() >= 64.
>
>actually the algorithm was a modification to the following algo:
> long H= 5381;
>
> for(i = 0; i<str.length; i++)
> H= (H* 33) ^ str[i];

That is a stronger hash - my algorithm cannor deal with odd
multipliers such as 33.

>But I don't see how rossum's algorithm can find the initial string
>given there are 93 ascii codes between printables characters codes
>33-126 . In his case it is only using two characters 'A' 'B', in
>reality there are many inputs.

No algorithm can find the original string; all any algorithm can do is
to find a string that gives the samehash result. There are an
infinite number of inputs that will give that particular output.

If the input strings have a specific format then you may be able to
narrow down the search somewhat.

rossum

bmearns

unread,

Nov 20, 2009, 8:30:27 AM11/20/09

to

Not to be rude, but this is the third time now you've changed the
algorithm. What exactly are you working on here?

As several people (including myself) have mentioned, it's impossible
to find for certain the "initial string" that was used to create a
hash because this is a many-to-one function. There are literally an
infinite number of inputs that can lead to any given hash, so there is
no way, given only the hash, to figure out which of these inputs was
actually used. That's the basis behind rossum's use of 'A' and 'B':
he's just producing one of the possible inputs that could produce the
required hash. Because of that multiplication by 22, each character in
the input shifts the current hash up by 1 bit (and modifies it
besides, but that's basically irrelevant). So for all intents and
purposes, you only need to concern yourself with the least significant
bit of each input character (the one that will not overlap with the
existing hash): if an lsb of 1 doesn't work, then an lsb of 0 will.
That's not to say that you can just swap out any letter in the input
for one with the same lsb and get the same hash, because the other
bits do affect the hash. In other words, whether you need a 1 or a 0
in the lsb for the current character depends on the specific
characters that have already been used.

But like rossum said, multiplying by 33 completely changes it. 33
isn't even, so there's none of this shifting going on. For each input
character, all of the bits completely overlap (in general) with the
existing hash. I think it's still probably not a cryptographically
secure hash, though: I don't know how to reverse it, but I'm sure a
half-way decent cryptanalyst would be able to pretty easily, without
resorting to rainbow tables.

-Brian