Changinga single character in the file completely alters its fingerprint. For example, if the first character of Nice weather is changed to lowercase (nice weather), the hash SHA256 will generate another fingerprint:
Good hash functions produce fingerprints that are similar to those that would be obtained if the fingerprint sequence was uniformly chosen at random. In particular, for any possible random result (a sequence of 64 hexadecimal characters), it is impossible to find a data file F with this fingerprint in a reasonable amount of time.
Method 1. You scroll through the entire space of passwords. You calculate the fingerprint, h(P), for each password, checking to see whether it appears in the stolen data. You do not need a lot of memory, because prior results are deleted with each new attempt, although you do, of course, have to keep track of the possibilities that have been tested.
Scrolling through all the possible passwords in this way takes a long time. If your computer runs a billion tests per second, you will need 2612/(109 x 3,600 x 24) days (1,104 days), or about three years to complete the task. The feat is not impossible; if you happen to have a computer network of 1,000 machines, one day will suffice. It is not feasible, however, to repeat such a calculation every time you wish to test additional data, such as if you obtain a new set of username/fingerprint pairs. (Because you have not saved the results of your computations, you would need an additional 1,104 days to process the new information.)
This memory requirement is too large. Method 2 is no more feasible than method 1. Method 1 requires too many computations, and method 2 requires too much memory. Both cases are problematic: either each new password takes too long to compute, or precomputing all possibilities and storing all the results is too large a task.
Is there some compromise that requires less computing power than method 1 and less memory than required for method 2? Indeed, there is. In 1980 Martin Hellman of Stanford University suggested an approach that was improved in 2003 by Philippe Oechslin of the Swiss Federal Institute of Technology in Lausanne and further refined more recently by Gildas Avoine of the National Institute of Applied Sciences of Rennes (INSA Rennes) in France. It demands less computing power than method 1 in exchange for using a little more memory.
Here is how it works: First, we need a function R that transforms a fingerprint h(P) into a new password R(h(P)). One might, for instance, consider fingerprints as numbers written in the binary numeral system and consider passwords as numbers written in the K numeral system, where K is the number of allowable symbols for passwords. Then the function R converts data from the binary numeral system to the K numeral system. For every fingerprint h(P), it computes a new password R(h(P)).
To generate a data point in this table, we start from a possible password P0, compute its fingerprint, h(P0) and then compute a new possible password R(h(P0)), which becomes P1. Next, we continue this process from P1. Without storing anything other than P0, we compute the sequence P1, P2,... until the fingerprint starts with 20 zeros; that fingerprint is designated h(Pn). Such a fingerprint occurs only once in about 1,000,000 fingerprints because the result of a hash function is similar to result of a uniform random draw, and 220 is roughly equal to 1,000,000. The password/fingerprint pair [P0, h(Pn)], containing the fingerprint that starts with 20 zeros is then stored in the table.
For a good database with almost no gaps, the memory needed to store the calculated pairs is a million times smaller than that needed for method 2, as described earlier. That is less than four one-terabyte hard disks. Easy. Also, as will be seen, using the table to derive passwords from stolen fingerprints is quite doable.
The sequences below represent separate chains of calculations leading from passwords (Mo, No,..., Qo) to fingerprints and other passwords, until the desired fingerprint (and thus the password that precedes it) pops out. (The long dotted line represents may other lines similar to the top two.)
Many computations must be done to establish the first and last column of the rainbow table. By storing only the data in these two columns and by recomputing the chain, hackers can identify any password from its fingerprint.
Jean-Paul Delahaye is a professor emeritus of computer science at the University of Lille in France and a researcher at the Research Center in Computer Science, Signal and Automatics of Lille (CRIStAL). He recently published Les Mathmaticiens Se Plient au Jeu (Belin, 2017), a French collection of articles from Pour la Science.
Scientific American is part of Springer Nature, which owns or has commercial relations with thousands of scientific publications (many of them can be found at
www.springernature.com/us). Scientific American maintains a strict policy of editorial independence in reporting developments in science to our readers.
Please contact Student Accessibility and Inclusive Learning Services at
480-965-1234 or email
student.ac...@asu.edu if you need accommodations on the math placement assessment for a diagnosed disability.
3a8082e126