passwords part 1 how they get hacked

I’m sure you’ve heard something about passwords getting hacked recently; Yahoo, Formspring, Last.fm, Sony, etc. One might even call it an epidemic - it seems like almost twice a week there’s a security breach happening on some major service. It’s come to my attention that, while the breaches are ultimately the developer’s faults, the majority of the world knows very little about what really happens once they signup for a service with a password. In response to that fact, I’ve decided to start this three-part series regarding exactly how passwords work, how they should work, and ultimately why they suck no matter what. To properly understand why we need to use passwords, we need to first understand why bad passwords don’t work, and how they get hacked.

So you’ve just registered for an awesome new service. It asked you a variety of questions - first name, last name, maiden name of your brother’s divorced ex-girlfriend, and most importantly a username and password. Once you hit that submit button, there’s a crazy world of security that those credentials fly into, and whether it’s truly secure or not depends on the developer’s skill and how much they really care about your credentials. There’s dozens (probably hundreds) of different ways that your credentials can be handled, but I’m only going to focus on the three most common ones here.

Clear-Text

How they're Stored

Clear-text passwords is by far the most frowned-upon method for storing passwords. Unfortunately, regardless of how strong the developer community urges people to avoid this, it's still increasingly common. Especially in older services (like Yahoo Voice) you see clear-text passwords very often, because good security standards had not been developed yet.

The storage method here is pretty straight forward. There’s a database in the background somewhere that contains two fields - username and password. As soon as you hit that registration button the service will store your credentials in the database exactly as you typed them - no encryption, no hashing, no nothing. When you try to log in the next time the service will do a simple query to check that the login credentials you entered match a pair in the database. If so, they’ll let you in.

A good way to detect if a service is using clear-text passwords is to try and do a password reset (“Forgot your password?”). If you do a password reset and they send you an email containing your old password, that means they are storing your passwords in clear-text. Keep in mind, if a service is using any method other than clear-text, they will never be able to tell you what your password is. This is a dead giveaway, and warrants an email to the webmaster telling them they’re going against all sorts of standards and are abusing their user’s security.

How they get Hacked

Here's a fact that you're going to wish you didn't know: no server is secure. No matter how much time, effort, and research goes into making sure no one malicious can access a server, there's always a way. The reward is so high that you can always be assured that someone will have done more research and can find a way to break in. Once this somebody breaks into the server, it's a simple command ("SELECT * FROM users") to dump every users passwords. Lucky for them, the vast majority of people use the same password for every service, so now they've jumped from your Yahoo Voice password to your bank account.

One-Way Hashing

How they are Stored

You may have heard some of these terms before: md5, sha1, encrypt, jargon. Those (aside from jargon) are all terms referring to some sort of hashing technique. Hashing means taking some sort of text and converting it into something else, usually not recognizable. One-way hashing is a variant of hashing that cannot be done in reverse. That means that once a piece of text gets hashed there is no conventional way to change that hash back into the original text. For example, md5 hashing the text "password" would result in the text "5f4dcc3b5aa765d61d8327deb882cf99".

Hashing passwords is a significantly safer way to store passwords. Similar to clear-text passwords, developers will take the username and password you entered, and put them into their database. However, before they put the password in they will run a hashing function on it. This is wonderful, because it means that the developers have no direct access to your credentials, and neither does anyone that hacks into their database.

When you log in the next time, the developer will go into their database and find whichever password is matched with the username that you entered. They will then run the same hashing function on the password that you entered to login, and see if it matches the hash that was stored when you registered. If it does, you’re authenticated and you are logged in.

How they get Hacked

As I noted above, I want to stress that one-way hashing is significantly safer than clear-text passwords, but it is not 100% safe. Most secure hashing algorithms have been around for a long time, and hackers have been fighting with them just as long. For a long time hackers resorted to something called brute-forcing. Essentially, the hackers would run a script that would run a hashing function on every possible letter/number combination until they got a hash that matched your password. This is why you're commonly told to have an 8 character password with upper and lowercase letters, as well as number and ideally symbols - the more characters in your password, the longer it will take for a brute-force attack to find it. Unfortunately, an average computer could crack the password "pas$w0Rd" in just 57 days. Experienced hackers often have powerful computers, or even botnets that can help them find those passwords much faster.

More recently, though, hackers have built rainbow tables. Rainbow tables are large databases of text and their hashed equivalent. It’s as if someone ran a brute-force attack on all the most likely passwords, and stored them in a database. This means that now hackers can take the large list of hashed passwords they have, and compare it against their rainbow table. When they find two hashes that match, they know what your password is. This means that hackers can turn an entire database of hashed passwords into clear-text in a matter of hours.

Salted One-Way Hashes

How they are Stored

Salted one-way hashes take the same principles from one-way hashes, but add a little bit of randomization into the equation. Again, these credentials will still be stored in a database like the previous two examples, and the password will be hashed like in one-way hashing. However, before they hash the password they mix it up with some 'salt'. Salt is a random string that is uniquely generated for each user. The great thing with salting is that there are many different ways that a developer can choose to mix the salt with the password, and the hacker has no idea which was chosen. For instance, if your password is "password" and the salt is "salt", I could change it to "saltpassword", "passwordsalt", "passswaorldt", or any other combination. The salted password is then hashed, and the username, password, and salt are stored in the database.

When you go to log in the next time the developer will grab the salt and the hashed password that match your username. They will use the same salting combination from before on your login password, then they will hash it and see if it matches the stored hash. If it does, you’re authenticated and can log in.

How they get Hacked

As a a general rule, I consider services that are using salted one-way hashes on their password to have passed the threshold to be considered "secure". Not in the sense that no one can break into them, but in the sense that if a major breach occurs, it will take a hacker a significant amount of work to crack everyone's passwords. The key here lies in two facts - the hacker doesn't know how the developer combined the password and the salt, and the hacker doesn't have pre-generated rainbow tables.

However, this mechanism is still not unbeatable. If a hacker were to get a salted list of hashes, they would first take a known value - likely a user that they registered while they were trying to break in, and they know the password to. There are only a handful of likely combinations that the developer would have used to combine the salt and the password - there can be nearly infinite, but most developers stick to known patterns because complex ones require unnecessary programming time and processing to achieve. The hacker can try all of the likely combinations of the known password and the known salt until they find the correct pattern that the developer chose. Once they have the pattern, they can begin brute forcing on each user until they get the correct password. Generally the hacker would not run a full brute-force attack on every password, because they would have to run a unique attack on every user due to the salting. Instead they use a list of a couple thousand of the most common passwords, and brute-force from there. Naturally they won’t crack every users account, but they will likely make a significant dent.

Bonus: Host-Proof Hosting

Host-Proof Hosting is a pretty uncommon method for securing applications, but it is generally regarded as the most secure way to handle user's passwords. In fact, Host-Proof Hosting is so secure that I - nor any of my research - can't think of any way to steal the passwords without infecting the client with a virus or physically observing the password being entered. Host-Proof Hosting takes away all the risk of storing passwords in a database by simply not storing passwords. Instead, all of the data that you generate while interacting with the application will be encrypted with a two-way hash that can only be decoded with a key. The encrypted data is sent to your browser, and is decrypted using your password as the key. By using this method, your password never actually has to be sent to the server - all of the hashing and decoding can be done on your local computer.

Check out part two of the series, where I discuss methods for creating a better password policy.