Classifying the Information Flow Lifecycle of Variables - Mitigating Injection Attacks
In continuation of my strong inclination towards the effectiveness of having a secure coding framework used by developers compared to teaching secure coding to developers (esp. for non-software companies), here are some thoughts towards the effort of mitigating the injection attacks like cross-site scripting, SQL injection, OS injection etc.
The inspiration for this idea stems from my recent visit to an auto repair shop that wanted to perform a dye test to locate the exact valve that was leaking in my car and expanding on the talk by Brian Chess and Jacob West at Black Hat about Taint Propagation.
The lifecycle of any variable that is declared by an application code belongs to one of the following categories:
- The variable is set by the application code only and passed around within the application server itself and never leaves the server host; we will call this a "server side variable".
- The variable is set by the application code only and passed around to other server side components like a database server but never passed to a client browser; we will call this “multiple server side variable”.
- The variable is set by the application code only and can be passed around between a server and a client, with the intention that client never tampers the variable; we will call this a "server fixed variable".
- The variable is intended to be set by the client side and is passed around between a server and a client one or more times; we will call this a "client side variable".
In the above scenarios, we have identified four variable types based on the information flow or alternately identified 4 trust levels types of variables in an application code. Now, imagine a secure coding framework has the ability to identify these four variable types by setting them with a standard nomenclature that differentiates them from each other.
For example: A server fixed variable will always be declared with a tag "pi" in its name i.e. stringpi x or intpi y etc, or a client side variable will always be declared with a tag "sigma" in its name i.e. stringsigma x or intsigma y etc.
With this ability of a secure coding framework to classify variable type, one could apply specific input validation/output encoding technique depending on the variable type.
Lets say a client side variable maintains it tag for each and every operation that is performed on it right from when it is declared in an application code. Since it is a client side variable and hence a low trust variable, a regex check is applied to it for each operation.
Similarly, lets say a server side variable that never leaves the server and hence is a high trust variable a regex check is applied to it only during compile time of the application code.
For e.g. every time the stringpi is declared, operated upon, moved, copied etc. a specific input validation technique that is most optimal for that variable type is applied by the framework like regex, parameterization, white-list approach etc.
Similarly a client side variable might get applied to both input validation and output encoding to prevent, lets say, cross-site scripting attack. With this approach, we achieve a number of objectives for mitigation against injection attacks:
- No variable goes unobserved, since it can be traced through the standard nomenclature.
- As soon as a variable is operated upon, based on the variable type, a requisite input validation/output encoding technique is applied by the framework.
- The developers job gets easier since if he/she is working on an independent code that is a module for the master code, the variable type is identified and hence corresponding input validation / output encoding technique automatically gets applied through the framework.