reader comments 69
Share this story

A demonstration of SQL injection in action.
Thirty-one-year-old Laurie Love is currently staring down the possibility of 99 years in prison.

After being extradited to the US recently, he stands accused of attacking systems belonging to the US government.

The attack was allegedly part of the #OpLastResort hack in 2013, which targeted the US Army, the US Federal Reserve, the FBI, NASA, and the Missile Defense Agency in retaliation over the tragic suicide of Aaron Swartz as the hacktivist infamously awaited trial.
Love is accused of participating in the #OpLastResort initiative through SQL injection attacks, an increasingly common tactic.
SQL injections have recently been detected against state electoral boards, and these attacks are regularly implicated in thefts of financial info.

Today, they’ve become a significant and recurring problem.
SQL injection attacks exist at the opposite end of the complexity spectrum from buffer overflows, the subject of our last in-depth security analysis. Rather than manipulating the low-level details of how processors call functions, SQL injection attacks are generally used against high-level languages like PHP and Java, along with the database libraries that applications in these languages use. Where buffer overflows require all sorts of knowledge about processors and assemblers, SQL injection requires nothing more than fiddling with a URL.
As with buffer overflows, SQL injection flaws have a long history and continue to be widely used in real-world attacks.

But unlike buffer overflows, there’s really no excuse for the continued prevalence of SQL injection attacks: the tools to robustly protect against them are widely known.

The problem is, many developers just don’t bother to use them.
One of Microsoft’s less valuable innovations
The earliest description of these attacks probably came in 1998, when security researcher Jeff Forristal, writing under the name “rain.forest.puppy,” wrote about various features of Microsoft’s IIS 3 and 4 Web servers in the hacker publication Phrack.
IIS came with several extensions that provided ways to generate webpages based on data from databases.

Then and now, most databases use variants of a language called SQL (Structured Query Language) to manipulate their data.

Databases using SQL organize data into tables built up of rows and columns.

Each table represents a particular kind of thing.

Each column of the table represents a particular fact about that thing, and each row of the table is an instance of that thing.
So, for example, a table named “people” might have columns for “age” and “name,” with each row in the table representing a specific person.
SQL is used for both defining these tables and columns and for manipulating the rows within them.
IIS had several different ways of writing SQL commands—”queries”—to find and retrieve information in an SQL database.

The best-known and longest-lived of these is ASP (Active Server Pages), a system for writing webpages with embedded programming (usually using JavaScript or VBScript) that typically include substantial amounts of database access.

At the time, IIS also included something called IDC (Internet Database Connector) that was a less flexible way of sending an SQL query to a database and tabulating the results.
Sometimes, the SQL query that these IDC and ASP files used for grabbing information from the database was hardcoded; that is, the same query was used every single time the webpage was loaded.

But because the query was often written to take one or more parameters, the data shown on that page could change as those parameters changed.
For example, an online store might have a page named order.asp to display the contents and status of an order.

Each order is identified by an order ID, with the order ID passed as a parameter to the page as part of the URL: order.asp?orderID=1234, say, to view the order with ID 1234.

The ASP page would take the parameter from the URL and combine it into an SQL query, which was then sent to the database for lookup.
What Forristal noticed was that the way these parameters were combined to build the query meant that an attacker could force the database to execute other queries of the attacker’s choosing.

This act of subverting the application to run queries chosen by an attacker is called SQL injection.
A popular tool to this day
Since their initial discovery, SQL injection flaws have routinely been discovered in the wild and used to compromise vast quantities of data. While Forristal looked at Microsoft’s software first, SQL injection was an industry-wide problem; sites using Java, PHP, ColdFusion, Ruby, and Python have all had SQL injection flaws.
Virtually every technology that can be used to build dynamic, database-driven websites is susceptible to SQL injection.
And attacks using SQL injection are abundant.

Earlier this year, a Florida man was charged with felony hacking after using SQL injection to read sensitive data from an election site.
SQL injection vulnerabilities in the Joomla CMS and popular WordPress plugins have put hundreds of thousands of blogs at risk of attack. Security firm HBGary was devastatingly attacked in 2011 after members of the Anonymous collective discovered SQL injection flaws in a custom-developed content management system.

The group responsible for that attack would later go on to call itself Lulzsec.
What makes these attacks particularly valuable to attackers—and devastating to victims—is their dual power to both read sensitive data from a database and to write new or updated data to the database. With this power to inject SQL, attackers could potentially read usernames, passwords, credit card numbers, Social Security numbers, or whatever else happened to be stored in the database.
But they could also add their own data.
In a well-designed system, passwords, for example, won’t be stored as plain text but instead will be irreversibly transformed into a hashed value. Merely reading the database therefore won’t be sufficient to let an attacker log in and snoop around; while the database will contain the necessary username, it won’t reveal the password.

The ability to write to the database, however, means that an attacker can simply create additional user accounts—ones for which they do know the passwords—and log in with those.
Similarly, if the database is used to store any kind of auditing information, such as tracking logins and other user activity, the ability to write to it means that hackers can clean up after themselves, deleting any record of their attack.
To understand how these attacks happen, we need to understand how Web applications use databases.

Basic SQL queries have a fairly simple structure.

For example, a call to search for a particular order might look like

SELECT * FROM orders WHERE orderid=1234

The * means “retrieve all the columns,” orders is the name of the table of data, and WHERE orderid=1234 restricts the data to only include the rows where the value of the orderid column is 1234.

The order ID number being searched for would typically be taken from parameter embedded in the URL.
So the 1234 from order.asp?orderID=1234 is combined with SELECT * FROM orders WHERE orderid= to produce the SQL query SELECT * FROM orders WHERE orderid=1234, and that query will be sent to the database.
The fragment of code that does this might look something like:

// first, pull the order ID from the URL query
var orderid = Request.QueryString(“orderid”);
// then use the order ID to construct a query
var query = “SELECT * FROM orders WHERE orderid = ” + orderid;
// and send the whole string to the database to execute

But two things together make this problematic.

First, SQL databases allow multiple queries to be strung together one after the other.

To perform two searches, one could write:

SELECT * FROM orders WHERE orderid=1234 SELECT * FROM orders WHERE orderid=5678

Different options and databases might change this a little—for example, by requiring a semicolon between the two queries—but the basic principle is the same.
Second, URLs are not constrained to being numbers; they’re just text strings.
So while the order ID should be a single number like 1234, it doesn’t have to be.

An attacker could request, say, order.asp?orderID=1234 SELECT * FROM orders WHERE orderid=5678, putting an SQL fragment into the URL itself.
If the application does not take care to protect itself against SQL injection attacks, the query it constructs will include the attacker’s code, and the database will run both queries together.
The way the query string is built will often change the exact code the attacker has to write.
Integer parameters, as with the order ID, are typically the easiest to handle because they’re typically not modified or altered in any way.
String parameters can introduce some additional complexity, because strings have to be wrapped in quote marks within the SQL query.

The code to do this might look something like:

// first, pull the name from the URL query
var customername = Request.QueryString(“name”);
// then use the name to construct a query
var query = “SELECT * FROM customers WHERE customername = ‘” + customername + “‘”;
// and send the whole string to the database to execute

In this situation, simply appending an SQL command to the URL won’t work because of those single quote marks. Making a request to, say, customer.asp?name=SELECT * FROM orders would produce the following query:

SELECT * FROM customers WHERE customername = ‘SELECT * FROM orders’

This won’t reveal any data; it will just look for a customer with the rather implausible name of “SELECT * FROM orders”.

But the attacker can solve this, prepending the attack query with a single quote mark, producing:

SELECT * FROM customers WHERE customername = ” SELECT * FROM orders’

That stray quote mark at the end of the query might cause the database to reject the query, so the usual solution is to comment it out, so that it is ignored, using -⁠-.

Accordingly, the URL would look like customer.asp?name=’ SELECT * FROM orders -⁠⁠-, and the query would be:

SELECT * FROM customers WHERE customername = ” SELECT * FROM orders –‘

Query in, data out
Attackers can do a lot with the ability to simply run queries.

Destructive actions in particular, such as deleting data, don’t really need much in the way of output.
Some databases let you run command-line programs from within queries, which again can be all hackers need to do their dirty work.

But generally, an attacker needs to read information as well as write it, both to retrieve valuable data stored in the database and to learn about the names of the database’s tables and columns.
Running a query of the attacker’s choosing is one thing; seeing the results is another. When a query is run, the usual response from the database is a table of data.

Applications vary in the way they react to this.
Some applications will simply present the user with every bit of data returned in the table, making it very easy for an attacker to extract information from the system.
Others will expect the result table to have particular columns and ignore any data not included in those columns.

This isn’t an insurmountable problem for attackers, as they may be able to ensure that the data they want appears in one of those columns by altering the form of the query they write.
Similarly, an application may iterate through every row of data returned by the query and show them all to the attacker.

Again, this represents the easiest situation for data exfiltration.

Alternatively, if the application expects only a single row to be returned, it may only show the first row that the injected query returns.

The application might even issue an error message if it’s expecting only a single row to be returned by the database.

These situations make extracting data more complex, though again, it can be addressed by structuring the injected query just so.
The other main response to a query is an error message, indicating that something went wrong.

Typically when injecting SQL, this will be an error from the database to complain that the query was malformed in some way.

Again, applications vary in how helpful they make this error message.

The best case for the attacker is when the developer is lazy and simply spits back the full error message to all and sundry.

This is often done during development, as it makes the develop/test/debug cycle quicker, but it shouldn’t ever happen in real production systems.
To take advantage of this, the attacker can ensure that every injected query results in an error that reveals some information.

For example, trying to convert a string into an integer will usually result in an error, and that error message will include the string that couldn’t be converted.
So, to extract the database name, an attacker could run a query looking like this (in SQL Server):

SELECT * FROM orders WHERE orderid=CONVERT(int, db_name())

db_name() here is an SQL Server function that returns the name of the current database.

CONVERT tries to convert that name into an integer.

This will fail, and it will produce an error message looking like:

Conversion failed when converting the nvarchar value ‘databasenamehere’ to data type int.

Thereby revealing the database’s name to an attacker.

Going in blind
Better applications will show some generic error page, robbing the attacker of this useful information.

Even when the application is strict in what it displays—suppressing error messages, showing only columns that it’s supposed to, and not showing multiple rows when only one is expected—attackers who can inject SQL might be able to piece together the information they need.
One common technique for doing this is called “blind” SQL injection. In blind injection attacks, the attacker cannot write a query that simply displays the desired information.
Instead, the attacker constructs queries that extract one bit of data (literally, in the common case) at a time.

Consider again the query to show an order:

SELECT * FROM orders WHERE orderid=1234

Presuming that order exists, this request will return exactly one row. We can add additional terms to the query and it will still return that one row:

SELECT * FROM orders WHERE orderid=1234 AND 1=1

Conversely, we could make it return exactly zero rows to get some kind of an “order not found” page:

SELECT * FROM orders WHERE orderid=1234 AND 1=0

Armed with this knowledge—we have a website that is returning one particular page for “true” and one particular page for “false”—we can start probing the database for information.

For example, perhaps we know that the victim is running Microsoft SQL Server, and we have an exploit of some kind that works against, say, SQL Server 2012. We therefore want to know if the system is running SQL Server 2012.

To do that in SQL Server, we’d run a query like:

SELECT SERVERPROPERTY(‘ProductMajorVersion’) = 11

This will return true if running on SQL Server 2012 but false otherwise.

This can be injected to produce a query such as:

SELECT * FROM orders WHERE orderid=1234 AND SERVERPROPERTY(‘ProductMajorVersion’) = 11

If running on SQL Server 2012, this will produce the normal order page. Otherwise, it will produce the error “no order found” page, thereby giving the attacker a single piece of information. More advanced techniques built on the same basic principle can allow several bits of information to be extracted at once.
It will not be a great surprise to learn that extracting information bit by bit is slow and tedious.

Accordingly, a variety of tools exist to automate the process.

They can systematically send a barrage of requests to a site vulnerable to SQL injection in order to extract data from it.

Because this process is slow, blind injection won’t be used to perform mass extraction.
Instead, it might be used to retrieve key pieces of information, such as administrative login credentials, and those credentials will be used to take further system action.
String sanitization
SQL injection is widely known and, because so much data in databases is sensitive, can be extraordinarily high-impact. Why do developers keep writing such bugs?
Much discussion of SQL injection describes it as being substantially a problem of input validation.

The order ID in our example above, for example, should be an integer, not an arbitrary piece of text, and so we could fix the code to make it safe by forcing the order ID to be a number, ignoring any other characters.

For numeric data this is easy enough to enforce, as converting strings into integers is easy and robust.
But for strings, which legitimately could contain any character under the sun, the process is a little harder. Or perhaps more accurately, it’s a process that developers continue to screw up.
The basic idea is one familiar to many developers: the special characters (such as quote marks) need to be transformed so that the query treats them as part of a string, rather than the end of a string.
In SQL Server, quote marks are escaped by doubling them.

A single quote mark, ‘, is used to denote the start or end of a string; two quote marks together, ”, are treated as a literal quote mark when included as part of a string. MySQL uses backslashes for a similar purpose; to embed a single quote into a string, it must be written as \’ (and to embed a backslash, use \\).
This should, then, be straightforward for developers to handle: make sure any quote marks in the input are escaped, and the database will treat them correctly.

This can be done, and it does work if it’s done consistently, with every single string properly cleaned up.

But if even one is forgotten, the security of the entire system can be undermined.
Often, the “solutions” to this problem are inadequate at best. Many devs will have come across Web forms that prohibit characters such as the single quote from being used in usernames or passwords.

They might even go further and prohibit SQL-like words, such as SELECT, from being used.

These things are telltale signs that a developer is vaguely aware of SQL injection (and perhaps certain other problems, such as cross-site scripting) but has either opted not to make a robust fix or lacks confidence in the fix they have applied.
Better yet: stop treating everything as a string
How security flaws work: SQL injectionThe better solution requires changing how we think of SQL injection.
The problem is the way that these applications add the parameters to their SQL queries.
In most programming languages, there’s a hard distinction between the various special keywords and syntax of the programming languages, on one hand, and the data values that the program manipulates, on the other.

For example, many languages have a keyword goto to jump from one place in the code to another.

They also support text strings, and a text string could contain the text “goto”.

These two things are separated; the program will never see the “goto” text string and try to jump to a different location.

Although the string happens to look like a language keyword, it is opaque; most languages don’t try to treat data as if it were part of the program, so there is no risk that a string that reads “goto” will accidentally be interpreted as a goto keyword.
(A few languages, such as shell scripting and tcl, blur this distinction in some ways, but few people use these on the Web.)
SQL, too, maintains this distinction and does not confuse data as if it were commands. Most programs that work with databases, however, are not written in SQL directly.
Instead, they build their SQL queries as strings of text and send those strings to the database to process.
By manipulating the queries as pieces of text, however, they create confusion between program commands and data.
String manipulation is done without concern for the syntax and grammar of SQL.
String manipulation doesn’t distinguish between the query and the query’s parameters, and so it doesn’t correctly handle those troublesome quote marks.

A mere string doesn’t know that some quote marks should be escaped, because they’re meant to be parameters, and that others should not, because they’re mean to be part of a query.
Developers use strings to build queries because building strings is a basic part of almost all programming languages, and so it’s a very simple, familiar way for developers to accomplish tasks. Most development languages don’t have any integrated language support for SQL (though C# does have something along those lines), so using strings and string manipulation is the simplest way to get started with writing database-driven applications.
Most beginners’ tutorials will start with string manipulation, so the habit is inculcated pretty quickly, too.

At some point budding developers will get bitten by an SQL injection bug, so they’ll try escaping strings, and that will be as good as it gets.
But mainstream database APIs offer a better way of working with SQL—better both because it isn’t prone to SQL injection attacks, and better because it can have slightly superior performance, too.

And unlike escaping of inputs, this alternative approach can’t really be screwed up.
It doesn’t rely on checking and validating every single parameter individually (leaving an application vulnerable if even one parameter should be neglected).
That better way is through the use of prepared statements. Prepared statements provide separation between the query and the parameters the query uses.

The exact details vary slightly from language to language and API and API, but the basic idea is the same: instead of splicing together a bunch of strings to create the query, a prepared statement uses a query with placeholders used for each parameter.

The values that these parameters should take aren’t populated by manipulating strings; they’re assigned using the prepared statement API.
For example, using C# and the ADO.NET API, the code would look something like this:

// create the prepared statement object from a database connection
SqlCommand command = connection.CreateCommand();
// define the query that the prepared statement uses, with @name being a placeholder for a parameter
command.CommandText = “SELECT * FROM customers WHERE customername = @name”;
// define the parameter as being a string type of any length
command.Parameters.Add(“@name”, SqlDbType.VarChar, -1);
// and set the parameter’s value to the query string from the URL
command.Parameters[“@name”].Value = Request.QueryString(“customername”);

The API knows that the @name parameter should be a string (varchar is the SQL name for variable-length character sequences), so when the @name parameter has its value specified (in the last line), the API will always do the right thing.
It doesn’t matter if attackers try to pass in quote marks, escape characters, or anything else as they try to trick the system into executing their SQL query; the system will always treat the parameter as string data.
Using prepared statements like this rules out any chance of forgetting to escape data or escaping it improperly, because it entirely removes the need to escape data in the first place.

The query can still fail—an attempt to put string data into an integer parameter, for example, will still cause an error—but it will do so in a safe way that does not risk executing attacker-specified SQL queries.
A problem that shouldn’t still be with us, but is
When looking at buffer overflows, one of the major themes was a never-ending series of escalations between defenders and attackers. New techniques, such as non-executable memory and address space layout randomization, have been introduced in operating systems to try to make buffer overflows easier to detect by developers and harder to exploit by attackers, but the core problem is caused primarily by the behavior of the C and C++ languages.
It will not go away until those languages are either abandoned or radically altered.

There’s no perfect solution that’s compatible with C and C++, so we have instead this range of mitigations and defenses.
The same is not true of SQL injection. Prepared statements are a robust and virtually universal solution to the problem of SQL injection.

Every API worth using supports them, and yet SQL injection flaws remain in abundance.

Commercial software, open source software, custom-developed software—they’re all afflicted.
But with developers apparently unwilling to do things the right way, it’s time for the creators of these frameworks to do the right thing and force all database access to go through prepared statements.

This won’t make it impossible for wayward developers to use string manipulation and continue to create highly exploitable flaws, but it will make it much less attractive as the default approach to writing data-driven applications.
Likewise, tutorials and training need to start doing things the right way from the outset.
Instead of creating bad habits and then trying to fix them, developers should be taught to do them properly from the outset.
All this makes SQL injection a particularly frustrating flaw.

This is something that should be an historic artifact, a relic from a time before people understood the importance of writing secure applications, before widespread awareness of hackers and the techniques they use.

That it isn’t shows that there’s something not quite right about industry software development practices: we know how to do better, but we’re not. Our credit cards, passwords, and other sensitive data are going to continue to be placed at needless risk as a result.

Leave a Reply