Academic and industry researchers are applying big data analysis techniques to find security flaws in terabytes of software code.
A non-profit research lab, working with Stanford University, is developing a machine learning system that will analyze terabytes of software code to find security flaws and fix them.
Draper Laboratory, a non-profit research institute formerly part of the Massachussetts Institute of Technology, is building the system in collaboration with a group at Stanford University led by machine learning pioneer Andrew Ng.
Dubbed DeepCode, the system has already been used to detect security vulnerabilities such as the Heartbleed Bug in OpenSSL, Brad Gaynor, associate director for Cyber Systems at Draper, told eWEEK in an email interview.
The institute is currently increasing the magnitude of data on which DeepCode makes its decisons by a factor of 1,000, he said.


“DeepCode is a fundamentally new approach to cyber security,” Gaynor said. “The system collects and ingests massive amounts of software, makes this software searchable, indexes the known bugs and security vulnerabilities, and identifies—in new or existing code—matches to any previously identified flaws.”

Researchers have worked for decades to build systems to warn of potential vulnerabilities in software. Commerical systems typically focus on static analysis, where source code is analyzed for known bad patterns, or dynamic analysis, where software execution is observed for signs of defects.
However, such approaches tend to only find known classes of software vulnerabilities and produce a high proportion of false positives.
By using machine learning and pattern analysis techiniques, two fundamental areas of artificial intelligence research, researchers hope that DeepCode will learn what good code and bad code looks like, according to Draper. Once trained to recognize vulnerabilities, the researchers will use the system to identify flawed code and recommend repairs.

“Ultimately, the goal of DeepCode is to find all instances of all known software bugs,” Gaynor told eWEEK. “We quantitatively measure the accuracy of our analytics, and will share statistically-meaningful accuracy data as we roll out the initial platform features over the coming months.”
Previously, the team working on DeepCode claims to have used the same technology to identify subtle attacks in progress by analyzing large volumes of network traffic. In an academic paper published in November, industry and academic researchers were able to use a similar machine-learning system to identify otherwise undetected command-and-control traffic within an enterprise environment.
Draper is working with Stanford University and well-known machine-learning pioneer Andrew Ng, an associate professor at the university who also co-founded online-learning platform Coursera, creating the online learning platform’s popular machine learning course.
The professor worked with Google to create the “Google Brain” project, which used machine learning and thousands of clustered computers to attempt to mimic some aspects of the human mind. Ng is currently chief scientist at Chinese search firm Baidu.
The DeepCode project has funded by both the U.S. Air Force Research Laboratory and the Defense Advanced Research Projects Agency (DARPA) as part of the agency’s Mining and Understanding Software Enclaves (MUSE) program.
Draper Laboratory has other contracts with the U.S. government including acting as the attackers, or Red Team, in various simulated cyber-attack exercises to assess federal agencies’ system defenses.

Leave a Reply