To a smart attacker, Twitter and other social networks are veritable cornucopias of personal information being broadcast for the world to see.
Scammers are already employing them for so-called “open source information gathering,” but the researchers at this year’s Black Hat conference felt that they could do better.
They created a machine-learning model that creates highly clickable spear phishing links for Twitter.
The talk was lead by ZeroFOX Senior Data Scientist Philip Tully and Data Scientist John Seymour.
Their goal was to find a better and more efficient way to phish victims over social media.
For those unaware, phishing is the art and science of sending phony messages to victims in order to get them to willingly surrender information or money.
There are, as you read this, thousands of bogus websites designed to look exactly like PayPal or other services.
These are sent to victims who then enter their usernames and passwords, not realizing that it’s being sent to the scammer.
Phishing goes hand-in-hand with social engineering.
A classic example is an attacker that simply calls a target’s office on the phone, pretending to be a confused employee and asking for a password or other critical information. Remember, it pays to double check that the CEO actually sent an email asking for money before you make that wire transfer.
Tully and Seymour’s idea was to take the grunt work out of targeted spear phishing without lowering the success rate.
Tully said they took inspiration from the classic Eliza bot, that simple repeats statements back as questions.
A more recent touchstone was Microsoft’s TayandYou Twitter bot. Like Tay, the research team wanted to use a machine learning system in order to craft better messages than Eliza.
Tay, of course, had its problems. “It turned semi-disastrous because Twitter is a sewer of content,” said Tully, referring to the end result of the Tay experiment as a “race-baiting terrible nazi bot.”
Most of us are familiar with machine learning and neural networks through those trippy Deep Dream images Google released.
Simply put, the idea is to create a system that is can be “taught” to perform a task, and then perform that task autonomously.
The kicker, is that machine learning systems are meant to get better, and be capable of dealing with novel circumstances not foreseen by their creators.
It’s a technology that’s gaining a lot of interest; so much so that Google I/O 2016 saw the search giant positioning itself as a leader in machine learning.
Finding a Target
The research team used a multi-step process, beginning with a corpus of Twitter users to target.
The team then applied a machine learning algorithm called clustering, which divided the list into similar groups.
To do this, the team’s system automatically scraped data from Twitter accounts, such as follower and following count, how long the user had been on Twitter, and if the user had changed their account from the default settings. No eggs, in other words.
Among the clusters, the team sought individual high-value targets.
Tully said the team looked at how similar targets are to their own cluster, versus how different targets are from other clusters.
Twitter, they point out, has a fairly user-friendly API, making it easy to gather lots of information in a way that a computer can use.
Twitter is also a relaxed space, at least linguistically.
“Twitter is a good venue to use this,” said Tully. “They have a low bar; you don’t have to create the most convincing or compelling Tweet because of broken syntax.”
Twitter is also a valuable platform for attack because the character limit encourages the use of URL shorteners. You probably wouldn’t click on the link www.paypals.com.ru/hackingtools, but you might click on bit.ly/totallysafelink.
Getting to Know You
Once targets have been selected, the team used a different set of machine learning algorithms to scour the target’s Twitter history and craft the perfect phishing Tweet.
This makes the bogus Twitter accounts seem not only more natural, but far more relevant to the victim, increasing the likelihood of a successful phish.
For the construction of bogus Tweets, the team tried two different methodologies. One was built on Markov chains, which look at a corpus of words to calculate the most likely word to follow another word.
It’s like predicting tomorrow’s weather knowing what happened today, based off years of historical weather data.
Tully said the team also looked at a LSTM (long short-term memory) algorithm, which is far more complex.
This system required 2 Million Tweets to train it and nearly a week of computational work.
Despite LSTM being the superior system, they found that Markov-generated Tweets worked just as well.
This type of generated Tweet also had the benefit of taking milliseconds to train, and the model can be used on any language.
Interestingly, the team found out that keeping their Tweets topical to the targets was far easier than expected. “At first we were doing a lot of topic modeling,” said Seymour. “But then we found a bag of words worked really friggin’ well.”
The final step was to automatically send out the generated Tweet at a random time during which the victim was known to be active.
Did you know that the Twitter API can return activity history? It sure can.
In the team’s pilot program, they started looking at hashtags like #cat, #infosec, and #pokemongo.
To their surprise, the benign links included with their generated Tweets were being clicked on 17 percent of the time after only two hours.
Two days later saw a 30 to 60 percent click through rate, depending on how strictly you define a “real” user on bot-infested Twitter.
This easily beat existing data on generated spam messages, which the team said had a success rate of about 5-14 percent.
It was a little short of the 45 percent accuracy of highly-targeted spear phishing messages, but the team pointed out that their Tweets were created with a fraction of the time and labor.
Many tech companies, like Google, Facebook, and Microsoft, have declared that the next big thing are bots.
These would-be digital representatives would speak or type with you naturally, and could help you find your way home or purchase movie tickets for you. What these researchers propose is that in addition to helpful bots, bad bots may be watching you as well.