Proposal: Rotating Salt Algorithm for Password Hashes

Status

This is a draft.

Summary

This page describes a proposal for a method of generating password hashes that would make it more difficult for an attacker to obtain the cleartext password from a password hash.  This is done by applying salts to the hash in dynamic and non-standard ways.  The goal is to create an system of hashing and storing passwords that will make dictionary attacks against passwords require significantly more work.

Hash

This proposal is not concerned with how a hash is generated, and assumes that a hash is generated using a sufficiently strong and irreversible hashing algorithm such as SHA1.

Salt

A salt is a pseudo-random string that is combined with the cleartext before it is hashed.  The hash then has the same salt combined with it (see next section).  When validating a password, the salt is extracted from the salted hash, applied to the password that is being tested, and the salted cleartext password is hashed.  The result is compared to the raw hash and tested for a match.  It looks something like this:

  // salted_hash is the hash of the correct password stored in a database somewhere.
  // guess is the cleartext password we are testing against salted_hash
  raw_hash = extract_hash(salted_hash)
  salt = extract_salt(salted_hash)
  salted_guess = apply_salt(guess, salt)
  hashed_guess = hash(salted_guess)
  is_match = (hashed_guess == raw_hash)

A few questions arise from the above: how are the salt and hash extracted from salted_hash?  See the next section.  How is the salt applied to the guess?  The reverse of extract_salt (see the next section).  How is the salted_guess hashed?  See the previous section (we don’t care, it’s not relevant to this proposal.  It’s a strong, irreversible hash).

The purpose of salting a hash, as opposed to just storing the raw hash of the cleartext password, is to render useless any pre-rendered hash dictionaries.  For an attacker to test a password against a dictionary of pre-rendered hashes, they would have to have rendered this dictionary using the same salt as your password.  If your salt is sufficiently random, the attacker is unlikely to have such a dictionary on hand.

Briefly, salting hashes slows down the attack process considerably, but does not make it any more computationally intensive.

Salting Algorithms

A salting algorithm in this context is an algorithm for combining a raw hash with a salt into a salted hash.  A salting algorithm must be reversible; given a salted hash it must be possible to programmatically extract the original raw hash and salt.

As an example, a common salting algorithm is as simple as:

  salt = generate_salt()                         // "1234"
  salted_cleartext = concatenate(salt, ":", cleartext_password) // "1234:password"
  raw_hash = hash(salted_cleartext)              // "2fa775528b939009d08f56fda4cd19cb9bcb557f"
  salted_hash = concatenate(salt, ":", raw_hash) // "1234:2fa775528b939009d08f56fda4cd19cb9bcb557f"

This is simple and reversible.  A cleartext password guess can be tested using:

  // Assume our guess is correct, "password"
  // password_hash is "1234:2fa775528b939009d08f56fda4cd19cb9bcb557f", from the previous example.
  salt = extract_salt(password_hash)             // "1234"
  raw_hash = extract_hash(password_hash)         // "2fa775528b939009d08f56fda4cd19cb9bcb557f"
  salted_guess = concatenate(salt, ":", guess)   // "1234:password"
  hashed_guess = hash(salted_guess)              // "2fa775528b939009d08f56fda4cd19cb9bcb557f"

The problem with this commonly used algorithm (and it’s practically similar cousins) is the ease with which an attacker can identify the salt from the raw hash.  More complex algorithms can be used in an attempt to obfuscate where the salt and the where the password lie.  Continuing the above example using the salt “1234″ and the raw_hash “2fa775528b939009d08f56fda4cd19cb9bcb557f”, one could inject the salt into the hash as:

  2fa7175528b9329009d0384f56fda4cd19cb9bcb557f

Note that it’s much harder, given the above string, to determine where the salt lies.  If you look carefully you can see that the individual characters of the salt “1″, “2″, “3″, and “4″ have been interspersed throughout the original hash.  This creates a situation wherein if the attacker doesn’t know the salting algorithm, doesn’t know the salt, doesn’t know the raw hash, and doesn’t know the original password, an attacker needs to not just test a dictionary of password guesses against the hash, but also a dictionary of salting algorithms per password.

If the attacker knows the salting algorithm, their job is no harder than if a simpler salting algorithm were used. This will be addressed later.

Multiple Salting Algorithms

Using complex salting algorithms makes an attacker’s job harder if they don’t know your salting algorithm, but this relies entirely on a secret that (once exposed) renders your algorithm useless no matter how complex.  If this secret is leaked or already known by an inside attacker, an attacker’s job is not much harder than if you used the common “<salt>:<hash>” salting algorithm.

To mitigate (not eliminate) this drawback, one can support multiple salting algorithms using salting algorithm codes (SAC).  An SAC would be an identifier attached to the beginning of the password hash that identifies which salting algorithm was used.  Checking a password guess against the known hash might change to look something like the algorithm below.

In this next example, the password_hash uses a more complicated salting algorithm of inserting the 4 character salt at indexes 4, 12, 18 and 19.  It also now begins with “001″ which is a three digit code identifying which salting algorithm was used for this hash.  In other words, “001″ tells us that the salt is at characters 4, 12, 18, and 19, and that the remaining characters are the raw hash.

  // Assume our guess is correct, "password"
  // password_hash is "0012fa7175528b9329009d0384f56fda4cd19cb9bcb557f".
  sac = extract_sac(password_hash)                 // "001"
  salted_hash = extract_salted_hash(password_hash) // "2fa7175528b9329009d0384f56fda4cd19cb9bcb557f"
  salt = extract_salt(salted_hash, sac)            // "1234"
  raw_hash = extract_hash(salted_hash, sac)        // "2fa775528b939009d08f56fda4cd19cb9bcb557f"
  salted_guess = concatenate(salt, ":", guess)     // "1234:password"
  hashed_guess = hash(salted_guess)                // "2fa775528b939009d08f56fda4cd19cb9bcb557f"

This allows us to create a large number of secret salting algorithms which are identified by a unique string attached to the password hash.  This means that if an attacker knows or is able to discover one SAC and its associated algorithm, their job of cracking the hash is only made easier for a subset of password hashes using that same algorithm.  Obviously, if they don’t know the meaning of the SAC codes this makes their already difficult job even more so.

The next section addresses the fact that this still relies on a secret (just a larger pool of secrets) and is easily undermined by an insider, source code leak, or other kind of exposure.

SAC Rotation

This proposal introduces a set of secrets, and that is all it will ever be.  As such, it will always be “vulnerable” (in that it doesn’t make the attackers job harder; it will never make their job easier than standard “<salt>:<hash>” mechanisms) to exposure of this secret.  This section proposes a means of mitigating the damage by rotating the set of SACs and their associated algorithms on a regular basis or following key events such as termination of an knowledgeable employee or a known security breach.

Think of a set of SACs and associated algorithms in generations.  When it makes sense to do so (following a breach, on a regular basis, etc), it is possible to rotate the current generation of SACs to produce a new generation of SACs. 

At a high level, the process may work as follows:

  • A new set of SACs (let’s call it B) and associated algorithms is generated.
  • Going forward, all newly generated password hashes use the B generation of SACs.
  • For each existing hash in the database using an old generation SAC (let’s call it A):
    • The salt and raw hash are extracted based on SAC A, which is indicated by the hash prefix.
    • The salt is injected back into the raw hash using SAC B, and B is the new hash prefix.

Thus with a relatively small amount of effort you can make past secrets obsolete and useless.

Known Issues

This presents a new weakness that must be address.  If an attacker knows the SACs and associated algorithms from a given generation (G1) and are able to obtain a set of password hashes when G1 is the current generation, they can use this information (the ability to derive the salt and raw hash from these G1 hashes) to assist in discovering the algorithm used to salt any later generation hashes they are able to obtain.  If this attacker steals a database of hashes from G9, they can systematically look for passwords hashes that contain the same set of characters as their G1 counterpart and easily figure out the G9 algorithm.

The underlying problem is that if the raw hash and salt stay the same during a rotation, an attacker who knew the raw hash and salt during any previous generation can determine the meaning behind the new SAC so long as they are able to obtain the new password hash and the password owner has not logged in.