A new chapter in password cracking is about to begin.Laurie Harker, Minneapolis Star Tribune / Getty Images
Jeremi M Gosney (@jmgosney) is a world-renowned password cracker and security expert. He is the Founder & CEO of the password-cracking firm Sagitta HPC, and a member of the Hashcat development team. Jeremi also helps run the Security BSides Las Vegas, Hushcon, and PasswordsCon conferences.
Me: “The full dump from the 2012 LinkedIn breach just dropped, so you’re probably not going to see much of me over the next week.”
If you’re just waking up from a coma you would be forgiven for thinking that it’s still 2012.
But no, it’s 2016 and the LinkedIn breach is back from the dead—on its four-year anniversary, no less.
If you had a LinkedIn account in 2012, there’s a 98 percent chance your password has been cracked.
Back in 2012, fellow professional password cracker d3ad0ne (who regretfully passed away in 2013) and I made short work out of the first LinkedIn password dump, cracking over 90 percent of the 6.4 million password hashes in just under one week.
Following that effort, I did a short write-up ironically titled The Final Word on the LinkedIn Leak.
But those 6.4 million unique hashes posted on a Russian password-cracking forum in June 2012 only accounted for a fraction of the total LinkedIn database.
This second dump, on the other hand, contains 177.5 million password hashes for 164.6 million users, which aligns perfectly with LinkedIn’s user count in the second quarter of 2012.
After validating the data that I received with several individuals, I concluded that this does appear to be a nearly complete dump of the user table from the 2012 LinkedIn hack.
I say “nearly complete” because there are some e-mail addresses in the dump that do not have hashes associated with them (the hash was replaced with the string “xxx”), and there are also some hashes that are not associated with an e-mail address (e-mail address is NULL.) While I presume the hashes not associated with any e-mail address are deleted accounts, I cannot even venture a guess as to why some of the password hashes are missing.
That’s the way it goes when you’re working with second-hand data from an unknown source—you just can’t get a pristine database dump these days.
You may think those 178 million password hashes is a lot, and you wouldn’t be wrong.
But some 362 million passwords, allegedly from Myspace, have recently been posted for sale on the darkweb elsewhere. What makes the LinkedIn breach more notable? While Myspace also acknowledged the breach, the data actually holds very little analytical value due to the fact the passwords were dramatically altered before being hashed.
Those passwords were all converted to lowercase and truncated to just 10 characters, so it’s impossible for us to know what the original input data was.
Further, two of the top 10 passwords from the Myspace list appear to be created by spammers creating fake profiles and likely do not reflect the choices of actual end-users.
So as it stands today, the LinkedIn breach is the largest and most relevant publicly-acknowledged password breach in Internet history.
Password cracking and the age of enlightenment
As Ars explained a few months after the first batch of LinkedIn passwords spilled, password cracking is an endless feedback loop. We crack the passwords so that we can learn about passwords which helps us to crack more passwords, which we can then analyze and use to crack more passwords. We start off with a small amount of data that enables us to crack a small number of passwords.
Those passwords then give us some insight into how passwords are created, which enables us to crack more in the future.
And it’s not just passwords we’re interested in, either.
Any short, low-entropy, human-generated string—e.g. usernames and screennames, e-mail addresses, etc.—are all potentially useful.
Similar to what we’ve learned in the absence of external factors such as password complexity policies, the username selection process is not all that different from the password selection process.
The more data we can accumulate and analyze, the more successful we are at cracking passwords.
Back in the early days of password cracking, we didn’t have much insight into the way people created passwords on a macro scale.
Sure, we knew about passwords like 123456, password, secret, letmein, monkey, etc., but for the most part we were attacking password hashes with rather barbaric techniques—using literal dictionaries and stupid wordlists like klingon_words.txt. Our knowledge of the top 1,000 passwords was at least two decades old. We were damn lucky to find a password database with only a few thousand users, and when you consider the billions of accounts in existence even back then, our window into the way users created passwords was little more than a pinhole.
Those were the dark ages of password cracking.
The age of enlightenment came after 32 million non-unique plaintext passwords from RockYou were leaked to the Internet.
Suddenly that pinhole turned into porthole, and for the first time in history we got a solid look at how users were creating passwords on a mass scale.
The RockYou breach revolutionized password cracking. No longer were we using crap like list_of_kitchen_appliance_manufacturers.txt for wordlists.
Everyone was just using rockyou.txt, and they were cracking a significant percentage of passwords. Markov statistics, mangling rules, everything was being based off what we learned from the RockYou passwords.
The RockYou breach coincided with another turning point in password cracking history: the advent of general-purpose GPU computing.
By harnessing the parallel processing capabilities of graphics cards we could now crack password hashes tens of times faster than with a regular CPU. Meanwhile, software like Hashcat helped bring GPU password cracking into the mainstream, displacing now-obsolete techniques like rainbow tables.
Instead of pushing pixels, we were pushing RockYou-powered passwords, and we were cracking password hashes with unprecedented speed and success.
This fueled a wave of new password research, and when other large password breaches came our way—eHarmony, Stratfor, Gawker, and LinkedIn, for instance—we were ready and waiting.
But most post-RockYou breaches have paled in comparison to the latest LinkedIn leak. Breaches from Zappos, Evernote, and LivingSocial (with 24 million, 50 million and 50 million respectively) would have made for fantastic password statistics, except those hashes never saw the light of day.
I’m sure the Adobe breach (at 130 million) was an amazing win for whoever stole the encryption key, but the rest of us are stuck playing a crossword puzzle.
It’s certainly possible that there are some other large password databases slowly making their way across the darkweb from companies that don’t even know that they’ve been breached, but as far as confirmed data breaches go, RockYou was the previous password cracking standard for relevant and useful breaches.
Enlarge / In light of the site’s breach, those endless LinkedIn “your connection did X!” e-mails seem harmless.
Bloomberg for Getty Images
As in 2012, I was lucky to get my hands on this new LinkedIn data about a week after its announcement. Using a single Sagitta HPC Brutalis packed with eight Nvidia GTX Titan X graphics cards, I managed to recover 85 percent of the passwords on the first day, despite the fact that I was cracking so many passwords so quickly that the whole system slowed to a crawl. Working with the rest of the Hashcat development team, we managed to reach 88 percent by the end of the third day, and we crossed the 90 percent threshold on the fourth day.
This all happened a full two days faster than when working with the first LinkedIn dump, which contained only a small fraction of the number of hashes. On the sixth day, we teamed up with rival password cracking team CynoSure Prime to close out the effort at a solid 98 percent, cracking a total of 173.7 million passwords.
While the RockYou breach revolutionized password cracking with “only” 32 million passwords, this second wave of LinkedIn data is nearly six times larger.
And given how many times this data has exchanged hands over the past two weeks, it’s surely just a matter of time before the full data is made publicly available. When it is, any password cracker worth their salt (ha!) should be able to crack 80-90 percent of the passwords on their own.
This means hackers will soon have a drop-in replacement for RockYou that is over five times more effective: a new de facto wordlist, new patterns to analyze to generate new rules, and new statistics for probabilistic password cracking. When you take both RockYou and LinkedIn and combine them with eHarmony, Stratfor, Gawker, Gamigo, Ashley Madison, and dozens of other smaller public password breaches, hackers will simply be more prepared than ever for the next big breach.
A global failure made worse
Let’s quickly remember why we hash passwords in the first place: password hashing is an insurance policy.
It ensures that should the password database be compromised in any way or through any vector, including physical theft, the passwords will not be recovered until engineers have an opportunity to identify and contain the breach, notify the public, and give users an opportunity to change their passwords anywhere else they may have used them.
The stronger and slower the password hashing is, the more time a sites buys for itself and its users in the event of a breach.
Therein lies the problem. We’ve known about the necessity of slow hashing since the 1970s, yet due to a global failure in threat modeling, adoption has been extremely low.
It is only in light of a string of high-profile breaches in the last five years that slow hashing has begun to make its way into the mainstream.
Thanks to services like LinkedIn, who negligently failed to employ slow hashing (the combined 184 million passwords dumped in 2012 and this year all used unsalted SHA1), hackers have had more than a few fantastic opportunities to collect and analyze massive amounts of password data.
For the love of god, do not try to downplay the incident by saying something stupid like “Most of the passwords on the list appear to remain hashed and hard to decode.”
What this means is even if the next big breach does employ slow hashing, it likely will not be anywhere near as effective as it would have been even five years ago. Post-LinkedIn, it will now take hackers many fewer attempts to guess the correct password than it otherwise would have.
That’s not to say that online services shouldn’t employ slow hashing today.
If they aren’t using something like bcrypt or Argon2 for password storage, then they’re doing things very, very wrong.
But slow hashing is no longer as effective of a solution as it could have once been had it only been adopted sooner. Hackers again have the upper hand.
Examining the breach, LinkedIn didn’t have very much of an insurance policy.
It was employing raw SHA1 for password hashing, but perhaps even worse is the fact that the company never even attempted to cash in on it.
Back in, 2012 they failed to identify and acknowledge the breach in a timely fashion, and when they eventually did, they apparently only forced a password reset for the accounts belonging to the initial 6.4 million hashes.
The evidence suggests that the remaining 165 million accounts were allowed to use those same compromised passwords.
That’s not the way this should work. When you suspect a password database has been compromised, even just in part, you cash in on that insurance policy immediately by activating your incident response team and your public relations team.
Companies ideally should notify the general public and users in an expedited manner, forcing a password reset for all users as soon as the breach is contained and the threat has been eradicated.
By the time LinkedIn made a statement about the breach, in contrast, I already had 70 percent of the passwords cracked.
Every moment LinkedIn hesitated was potentially devastating for its users.
And for the love of god, do not try to downplay the incident by saying something stupid like “Most of the passwords on the list appear to remain hashed and hard to decode.” Instead, companies should just acknowledge the plain and simple fact that if password hashes have been accessed, users are at real and measurable risk of account takeovers.
This data has been making its way around the darkweb for five years now.
If we professional password crackers could get this dump to 98 percent in six days, then surely those who have had years to work on it have achieved similar success. Who knows what such crackers have used the data for.
If you had a LinkedIn account in 2012 and have since been the victim of a hacking attempt or identity theft, this very well could be the reason why.
So what actions do you, the user, need to take now?
For starters, go change your passwords for LinkedIn and any other services where you may have used the same or similar password.
For as many bad passwords as there were in the LinkedIn dump, there were certainly a lot of really fantastic ones, too.
Given the fact that it may take service providers years to identify and acknowledge that your account has been compromised—as criminals could be doing literally anything with your credentials in the meantime—it is important to recognize that having a unique password per account is far more important than length, complexity, randomness, or anything else you’ve been told that you need.
By using a unique password for each of your accounts, you are limiting the scope of a breach to just that one account.
The average person has at least 26 online accounts; IT professionals usually have hundreds.
It is absolutely crucial that you employ a good password manager, and let your password manager generate a new random password for each of your accounts.
And when you do catch wind of a site or service being compromised, always change your password immediately—even if you do not receive an e-mail from the service instructing you to do so.
Finally, ensure you have multi-factor authentication or two-step verification enabled for your most critical accounts. While I personally have yet to be impressed by any vendor’s MFA/2SV deployment, it does generally add an extra hurdle for hackers to jump through.
It can certainly be effective.
By following this advice, you personally can stay one step ahead of hackers… even if your service providers can’t.