C++ Password Hashing

1 view

Skip to first unread message

Brook Mithani

unread,

Aug 5, 2024, 2:05:04 AM8/5/24

to cordytersou

Passwordhashing is a one-way process of securing plain text password by creating a bit string of a fixed size called hash using cryptographic hash function. Cryptographic hash functions designed to be a one-way function, that is, a function which is infeasible to invert.

The gist of authentication is to provide users with a set of credentials, such as username and password, and to verify that they provide the correct credentials whenever they want access to the application. Hence, we need a way to store these credentials in our database for future comparisons. However, storing passwords on the server side for authentication is a difficult task. Let's explore one of the mechanisms that make password storage secure and easier: hashing.

A simple approach to storing passwords is to create a table in our database that maps a username with a password. When a user logs in, the server gets a request for authentication with a payload that contains a username and a password. We look up the username in the table and compare the password provided with the password stored. A match gives the user access to the application.

As explained by Dan Cornell from the Denim Group, cleartext refers to "readable data transmitted or stored in the clear", for example, unencrypted. You may have also seen the terms plaintext and plain text. What's the difference? According to Cornell, plaintext refers to data that will serve as the input to a cryptographic algorithm, while plain text refers to unformatted text, such as the content of a plain text file or .txt. It's important to know the distinction between these terms as we move forward.

Storing passwords in cleartext is the equivalent of writing them down in a piece of digital paper. If an attacker was to break into the database and steal the passwords table, the attacker could then access each user account. This problem is compounded by the fact that many users re-use or use variations of a single password, potentially allowing the attacker to access other services different from the one being compromised. That all sounds like a security nightmare!

A more secure way to store a password is to transform it into data that cannot be converted back to the original password. This mechanism is known as hashing. Let's learn more about the theory behind hashing, its benefits, and its limitations.

In cryptography, a hash function is a mathematical algorithm that maps data of any size to a bit string of a fixed size. We can refer to the function input as message or simply as input. The fixed-size string function output is known as the hash or the message digest. As stated by OWASP, hash functions used in cryptography have the following key properties:

Commonly used hashing algorithms include Message Digest (MDx) algorithms, such as MD5, and Secure Hash Algorithms (SHA), such as SHA-1 and the SHA-2 family that includes the widely used SHA-256 algorithm. Later on, we are going to learn about the strength of these algorithms and how some of them have been deprecated due to rapid computational advancements or have fallen out of use due to security vulnerabilities.

Using SHA-256, we have transformed a random-size input into a fixed-size bit string. Notice how, despite the length difference between python1990K00L and python, each input produces a hash of the same length. Why's that?

Using hexdigest(), you produced a hexadecimal representation of the hash value. For any input, each message digest output in hexadecimal format has 64 hexadecimal digits. Each digit pair represent a byte. Thus, the digest has 32 bytes. Since each byte holds 8 bits of information, the hash string represent 256 bits of information in total. For this reason, this algorithm is called SHA-256 and all of its inputs have an output of equal size.

Some hash functions are widely used but their properties and requirements do not provide security. For example, cyclic redundancy check (CRC) is a hash function used in network applications to detect errors but it is not pre-image resistant, which makes it unsuitable for use in security applications such as digital signatures.

Throughout this article, we are going to explore the properties that make a hash function suitable for usage in security applications. To start, we should know that even if we were to find the details on how the input to a cryptographic hash function gets computed into a hash, it would not be practical for us to reverse the hash back into the input. Why's that?

The modulo operator gives us the remainder of a division. For example, 5 mod 3 is 2 since the remainder of 5 / 3 is 2 using integer division. This operation is deterministic, given the same input always produces the same output: mathematically, 5 / 3 always results in 2. However, an important characteristic of a modulo operation is that we cannot find the original operands given the result. In that sense, hash functions are irreversible.

Knowing that the result of a modulo operation is 2 only tells us that x divided by y has a reminder of 2 but it doesn't tell us anything about x and y. There is an infinite number of values that could be substituted for x and y for x mod y to return 2:

Another virtue of a secure hash function is that its output is not easy to predict. The hash for dontpwnme4 would be very different than the hash of dontpwnme5, even though only the last character in the string changed and both strings would be adjacent in an alphabetically sorted list:

The irreversible mathematical properties of hashing make it a phenomenal mechanism to conceal passwords at rest and in motion. Another critical property that makes hash functions suitable for password storage is that they are deterministic.

A deterministic function is a function that given the same input always produces the same output. This is vital for authentication since we need to have the guarantee that a given password will always produce the same hash; otherwise, it would be impossible to consistently verify user credentials with this technique.

To integrate hashing in the password storage workflow, when the user is created, instead of storing the password in cleartext, we hash the password and store the username and hash pair in the database table. When the user logs in, we hash the password sent and compare it to the hash connected with the provided username. If the hashed password and the stored hash match, we have a valid login. It's important to note that we never store the cleartext password in the process, we hash it and then forget it.

Whereas the transmission of the password should be encrypted, the password hash doesn't need to be encrypted at rest. When properly implemented, password hashing is cryptographically secure. This implementation would involve the use of a salt to overcome the limitations of hash functions.

Hashing seems pretty robust. But if an attacker breaks into the server and steals the password hashes, all that the attacker can see is random-looking data that can't be reversed to plaintext due to the architecture of hash functions. An attacker would need to provide an input to the hash function to create a hash that could then be used for authentication, which could be done offline without raising any red flags on the server.

The attacker could then either steal the cleartext password from the user through modern phishing and spoofing techniques or try a brute force attack where the attacker inputs random passwords into the hash function until a matching hash is found.

A brute-force attack is largely inefficient because the execution of hash functions can be configured to be rather long. This hashing speed bump will be explained in more detail later. Does the attacker have any other options?

Since hash functions are deterministic (the same function input always results in the same hash), if a couple of users were to use the same password, their hash would be identical. If a significant amount of people are mapped to the same hash that could be an indicator that the hash represents a commonly used password and allow the attacker to significantly narrow down the number of passwords to use to break in by brute force.

Additionally, through a rainbow table attack, an attacker can use a large database of precomputed hash chains to find the input of stolen password hashes. A hash chain is one row in a rainbow table, stored as an initial hash value and a final value obtained after many repeated operations on that initial value. Since a rainbow table attack has to re-compute many of these operations, we can mitigate a rainbow table attack by boosting hashing with a procedure that adds unique random data to each input at the moment they are stored. This practice is known as adding salt to a hash and it produces salted password hashes.

With a salt, the hash is not based on the value of the password alone. The input is made up of the password plus the salt. A rainbow table is built for a set of unsalted hashes. If each pre-image includes a unique, unguessable value, the rainbow table is useless. When the attacker gets a hold of the salt, the rainbow table now needs to be re-computed, which ideally would take a very long time, further mitigating this attack vector.

According to Jeff Atwood, "hashes, when used for security, need to be slow." A cryptographic hash function used for password hashing needs to be slow to compute because a rapidly computed algorithm could make brute-force attacks more feasible, especially with the rapidly evolving power of modern hardware. We can achieve this by making the hash calculation slow by using a lot of internal iterations or by making the calculation memory intensive.

A slow cryptographic hash function hampers that process but doesn't bring it to a halt since the speed of the hash computation affects both well-intended and malicious users. It's important to achieve a good balance of speed and usability for hashing functions. A well-intended user won't have a noticeable performance impact when trying a single valid login.