- "Buffer overflows
- Injection attacks
- DoS attacks
- Memory leakage
- Information disclosure
- Compromised systems"
What is the common factor between all of those vulnerability classes? If you have heard advice on how to prevent or fix them, chances are that advice prescribed input validation. It's a glib and common answer, especially to address most web application vulnerabilities: SQL injection, cross-site scripting, command injection, and others. It has over 100 million Google hits. But this advice may be hurting security more than it is helping.
When developers hear "input validation" they unsurprisingly think they must add checks to the code that reads the user's input from places like POST or GET parameters to ensure it is valid. For example, abc
is not a valid email address while abc@google.com
is.An empty string is not a valid address, while 123 John & Betty's St.
is. Unfortunately, this doesn't stop anything. If that (valid) address is placed into the SQL query "SELECT * FROM people WHERE address = '$address'"
, it will break the query. Whoever runs the application could be hacked via this SQLi vulnerability and lose all their data.
The only way to ensure SQL injection vulnerabilities are prevented is to ensure each of the database calls cannot incorporate dynamic SQL command/query syntax. Ensure each query string is composed of nothing but static strings, with any dynamic elements safely escaped and filled in by the database driver using prepared statements at the point the data is sent to the database. Internally, the address 123 John & Betty's St.
, which is a valid input, becomes '123 John & Betty\'s St.'
an encoded value, before insertion into the database query.
Similarly, the best way to ensure we do not have javascript injection and errors is to not create javascript strings ourselves, but to use library code, such as PHP's json_encode at the point that we create the javascript to send to the browser. Internally, the (once again, valid) address 123 John & Betty's St.
becomes "123 John & Betty's St."
before insertion into the generated JSON, an encoded value.
The best way to prevent command injection vulnerabilities is to provide each argument to a library that will properly encode/escape the arguments and assemble the final command. For example, using Java's Runtime.exec with a string array instead of a combined command string your code put together, or PHP's escapeshellarg on each of the dynamic arguments. In this case the valid 123 John & Betty's St.
becomes '123 John & Betty'\''s St.'
before insertion into the command to be executed.
The best way to prevent cross-site-scripting vulnerabilities and HTML errors is to properly escape data for insertion into generated HTML pages using library functions like PHP's htmlspecialchars or <%= in Rails' erb at the point we create the HTML. Internally, the valid address 123 John & Betty's St.
would turn into the encoded string 123 John & Betty's St.
before insertion into the HTML or, if we used ENT_QUOTES to encode for insertion into an HTML attribute, 123 John & Betty's St.
The method used for escaping special characters to encode data for each of these outputs is different. If we attempted to only do our validation and encoding on an address as it comes in, we will likely leave injection flaws open on some avenues and encoding artifacts corrupting the data elsewhere. This is why we do our encoding at the point we send the data out to another source, and not before.
"Input validation" is a bad way to describe this process. Why don't we call it "output encoding" instead?