Context
Published in PHP Architect on 22 Dec 2005A textbook definition of context is “the circumstances in which an event occurs; a setting.” When speaking about PHP security, the context of your data is important. This is better explained by example:
<?php
$username = 'chris';
$filename =
"http://host/profile.php?user=$username";
$contents = file_get_contents($filename);
?>
Let’s examine each line of code individually:
- A literal string is assigned to the variable
$username
. The data ischris
, and its context is a PHP variable. - The data within
$username
is interpolated within another string, and this string is assigned to the variable$filename
. The data ishttp://host/profile.php?user=chris
, and its context is a PHP variable. - This is the most important line of code in the example,
primarily because it changes the context in which the data within
$username
and$filename
is used. Becausefile_get_contents()
can be used to fetch content from remote sources (ifallow_url_fopen
is enabled, which it is by default), this line of code initiates an HTTP request. Whereas, before, the data in$filename
was just a string, now it’s actually being used as a URL. More importantly, the data in$username
is now the value of a query string parameter within this URL. The context has changed.
Because chris
is safe
to be used as a query string parameter without interfering with the format of
the URL, this works as expected. However, it is always best to escape output,
and this is no exception. I prefer to use arrays to help me keep up with data
that has been prepared for a specific context:
<?php
$url = array();
$username = 'chris';
$url['username'] = urlencode($username);
$filename =
"http://host/profile.php?user={$url['username']}";
$contents = file_get_contents($filename);
?>
Although using urlencode()
on chris
is unnecessary (it does nothing), this example illustrates a best practice. It’s
not always possible to solve output problems by restricting the format of your
data, and urlencode()
is an escaping function
created specifically for this purpose: preserving data in the context of a URL.
Here’s another example:
<?php
$target = 'http://host/profile.php?user=chris';
$query_string = "?target={$target}";
$filename =
"http://host/login.php{$query_string}";
$contents = file_get_contents($filename);
?>
What should be escaped with urlencode()
?
$target
$query_string
$filename
- None of the above
The first answer is correct, because $target
is being used as the value of a query string parameter:
<?php
$url = array();
$target = 'http://host/profile.php?user=chris';
$url['target'] = urlencode($target);
$query_string = "?target={$url['target']}";
$filename =
"http://host/login.php{$query_string}";
$contents = file_get_contents($filename);
?>
The data within $filename
is the following:
http://host/login.php?target=http%3A%2F%2Fhost%2Fprofile.php%3Fuser%3Dchris
Therefore, the value of $_GET['target']
in the login.php
script is the following:
http://host/profile.php?user=chris
In other words, the original value has been preserved, despite the fact that the data has existed in multiple contexts.
Another important point is that the escaping needs to be performed on individual parameter values; it should not modify the overall format of the URL.
HTML
Most PHP apps generate output that is rendered
by a browser, and this is usually of type text/html
, typically
identified in a Content-Type
HTTP entity header:
Content-Type: text/html
Preferably, the character set is also indicated:
Content-Type: text/html; charset=UTF-8
HTML has its own format, and the data we generate in PHP can affect that format. For example:
<?php
$first = 'Chris';
$last = 'Shiflett';
$name = "{$first} {$last}";
echo "$name";
?>
In this simple example, the name is going to be
displayed in bold and within its own paragraph. This markup is intentional. The
data within $first
and $last
is intended to be raw data, however, free of markup or anything that might be
interpreted by the browser. Therefore, it’s best to escape it:
<?php
$html = array();
$first = 'Chris';
$last = 'Shiflett';
$html['first'] = htmlentities($first, ENT_QUOTES, 'UTF-8');
$html['last'] = htmlentities($last, ENT_QUOTES, 'UTF-8');
$name = "{$html['first']} {$html['last']}";
echo "$name";
?>
Although the escaping is unnecessary in this case, skipping this step can easily create cross-site scripting (XSS) vulnerabilities:
<?php
$first = $_POST['first'];
$last = $_POST['last'];
$name = "{$first} {$last}";
echo "$name";
?>
Although this example also illustrates a failure to filter input, escaping alone prevents the cross-site scripting (XSS) vulnerability:
<?php
$html = array();
$first = $_POST['first'];
$last = $_POST['last'];
$html['first'] = htmlentities($first, ENT_QUOTES, 'UTF-8');
$html['last'] = htmlentities($last, ENT_QUOTES, 'UTF-8');
$name = "${html['first']} ${html['last']}";
echo "$name";
?>
Without ensuring that the data in $first
and $last
is only considered to be raw data, an attacker can
take advantage of any client-side technology by providing data in $_POST['first']
and/or $_POST['last']
that takes advantage of this context.
Sometimes, even in the context of HTML, there are subtle differences:
value="" />
This also illustrates a cross-site scripting (XSS)
vulnerability, but an exploit must be slightly different, because the context
of the data within $_POST['user']
is now the attribute of
an HTML tag. For example, to generate a simple popup window in JavaScript, the
following value can be provided:
">alert('XSS');<"
Luckily, these subtle differences don’t affect PHP
developers much, because htmlentities()
accounts for all characters that can alter the context of data within HTML.
SQL
When PHP developers send data to a database, it is often through the use of an SQL query:
$sql = 'SELECT * FROM users';
There are many databases with which PHP can
communicate, and each client library has its own function that executes SQL
queries. For example, MySQL provides mysql_query()
:
$result = mysql_query($sql);
Databases, much like PHP variables, are designed to store data, so this context is safe for any data (within practical limitations such as the amount of memory or disk space available). However, this is not the case with the SQL query:
$sql = "SELECT *
FROM users
WHERE username = '{$_POST['username']}'
AND password = '{$_POST['password']}'";
This example creates an SQL injection
vulnerability (and suggests that passwords are stored improperly, but that’s
irrelevant to the present topic), because an attacker can provide data in $_POST['username']
and/or $_POST['password']
that modifies the format of the SQL
query. For example, my favorite username for testing authentication forms is
the following:
chris' --
Because two hyphens (--
) indicate
the start of a comment, this effectively reduces the SQL query to the
following:
SELECT *
FROM users
WHERE username = 'chris'
If this is being used to verify access
credentials, I can gain access to the chris
account
without knowing the password. Of course, another important thing to note is
that these assumptions are only true if the data in $sql
is executed by a database. Until a function like mysql_query()
is used, this data is just a string in a PHP variable.
Luckily, there are escaping functions that preserve data in the context of an SQL query. However, because of subtle differences in the ways various databases interpret and execute SQL, it is best to use a database-specific escaping function. For example:
<?php
$mysql = array();
$mysql['username'] =
mysql_real_escape_string($_POST['username']);
$mysql['password'] =
mysql_real_escape_string($_POST['password']);
$sql = "SELECT *
FROM users
WHERE username = '{$mysql['username']}'
AND password = '{$mysql['password']}'";
?>
Because mysql_real_escape_string()
considers the character encoding of the current connection to MySQL, a
connection to the database must exist.
URLs
URLs adhere to a strict format, and the context of data depends upon where in the URL it is used. For example:
$host = $_POST['host'];
header("Location: http://{$host}/");
The context of the data in $host
is intended to be the hostname of a URL used in a Location
header. There isn’t actually an escaping function to help you ensure that the
data within $host is only considered to be a
hostname. In this case, it is actually best to ensure that the hostname adheres
to the proper format, or, better, that it is one of a known set of valid
values:
<?php
$clean = array();
switch ($_POST['host']) {
case 'shiflett.org':
case 'faculty.co':
$clean['host'] = $_POST['host'];
default:
/* Error */
}
header("Location: http://{$clean['host']}/");
?>
Without this filtering, the example illustrates an HTTP response splitting vulnerability.
In a URL, there is typically only one context in which the
data is dynamic — the values of query string parameters. As demonstrated earlier,
this data can be escaped in order to preserve it by simply using urlencode()
:
<?php
$url = array();
$url['zip'] = urlencode($_POST['zip']);
header("Location: http://host/weather.php?zip={$url['zip']}");
?>
Of course, understanding context doesn’t eliminate the need to filter input, and this is a perfect example. If the ZIP is expected to be five digits, a simple check will ensure that it is. In general, filtering ensures data integrity while escaping ensures data preservation.
Nested Contexts
One of the most frequently asked questions on PHP mailing lists and forums is how to properly escape a link. For example:
<a href="<?php echo $link; ?>">Click Here</a>
This becomes slightly more complicated when $link
consists of other data:
$link = "http://http://host/weather.php?zip={$zip}";
What needs to be escaped, and how?
- Escape
$link
withhtmlentities()
. - Escape
$link
withurlencode()
. - Escape
$zip
withhtmlentities()
. - Escape
$zip
withurlencode()
.
The correct answers are the last then the first, because the
context of the data in $zip
is the value of a query
string parameter, and the context of the data in $link
is HTML.
For example:
<?php
$html = array();
$url = array();
$url['zip'] = urlencode('zip');
$link = "http://http://host/weather.php?zip={$url['zip']}";
$html['link'] = htmlentities($link, ENT_QUOTES, 'UTF-8');
?>
<a href="<?php echo $html['link']; ?>">Click Here</a>
This is why the HTML entity of an ampersand (&
)
is used to separate query string parameters in URLs when the URLs exist within
HTML.
Until Next Time…
I hope this article helps you better understand and appreciate context. As stated earlier, many web application vulnerabilities can be traced to a developer’s failure to properly account for context.
Although escaping is emphasized more than filtering in this article, it should never be considered a substitute. (Some examples omit filtering in order to focus on context.) However, many common web application vulnerabilities such as cross-site scripting (XSS) and SQL injection are escaping problems, not filtering problems, and it’s always best to address the root cause of a problem rather than a symptom.
Until next month, be safe.