In this article, we will be exploring some of the many data filters that are offered by PHP. These functions are available since PHP 5, and no extension installation is required since it is part of PHP 5's core. We will also be looking at what data filtering is and why it should be used in web applications.
What is Data Filtering?
PHP is often characterized as a 'weak' programming language. This is mostly because PHP is known to be an easy to learn language that is used as a footstep into web programming. Most of this misunderstanding is down to authors and tutorials that write about PHP and often concentrate only on how easy it is to write programs in PHP that simply collect data and then send it on through email or to a database, all the while forgetting to mention data validation. Beginners then go on to write these 'easy' scripts and find themselves subject to SQL injections and other forms of attacks that are easily preventable. One of the reasons why data validation is not mentioned in these tutorials and books is because validating user input is too 'complicated' for beginners and won't comply with the notion that PHP is supposed to be 'easy' to program with. In reality, it only takes a few simple steps to validate user input. So what exactly do we mean by data validation and why is it so important? Validating data becomes important when your application starts to accept user input. The rule of thumb is not to trust any data that comes from outside your application i.e. from forms or through the browser. While any data that originate from within your application is 'safe'. Any data that comes from outside needs to be 'sanitized' before it is accepted into your application. Example of 'safe' data is:
$myvar = "A safe variable";
The code above contains a variable that is defined within your application and can therefore be trusted. While the following data cannot be trusted:
$user = $_POST['username']; $ID = $_GET['id'];
$user
that comes from a form that is used to collect the user name. This data cannot be
trusted since it comes from outside our application. On its own the variable is
not harmful, but if it is used in a database it could potentially present a
security problem (as we will demonstrate shortly).The second variable called $ID
is contained in a query string that
can easily be tampered with when shown in a browser. For example if your query
string was generated like this:
delete.php?id = echo $row['id'];
Then on the browser it will show the following line when the delete script is run:
delete.php?id=2
Any attacker will then be able to simply change the number to a letter to crash our application, which surprisingly in many cases reveals more security information about the application.
Let's take a practical look at one of the most common and serious attacks that take place when data validation is not implemented, SQL injections. Below is a sample of a login script. Assume that a form takes the user name and password and sends it to a processing script that grants a user access to the rest of the application if their login details are correct:
As it is, there is nothing wrong with the above code, but consider if the variables contain the following information:
$user = "Dantago !Noabes"; $pass = "x' OR 'a'='a";
What does the above information do? The data contained in the $pass
variable is trying to fool your MySQL
into thinking that the user is authenticated and that they should have access
to the rest of the application. How does it do that? Take a look at what the MySQL
code looks like with the above variables:
The password ='x' OR 'a'='a'
will always evaluate to true since we are checking if 'a' equals 'a'. If the
users' password was for example 'generic' then it would be the same as saying password ='x' OR 'generic' = 'generic'
which is true. So the user is then authenticated.
Using PHP Filters
So how does PHP help to validate data? You can create your
own filters or use PHP's data filters that come with PHP version 5.2 and above.
In addition, you can also prevent SQL injection using a function available in PHP
called mysql_real_escape_string()
.
Let's look at how we can avoid SQL injection using the code from our previous
example:
Again, there is nothing wrong with the code above. It will now look like this in MySQL:
The difference now is that MySQL will try to match the user name Dantago !Noabes
with the literal password 'x' OR 'a'='a'
and will fail(unless 'a' is the correct password).
Data validation does not only revolve around SQL attacks. Other data can be
validated, such as checking to see that an email address or URL is written in
the proper format or ensuring that a particular value is of the right type.
This can be particularly useful when checking that a query string value that is
passed on to the application is what it is supposed to be. For example, a user
ID is usually an integer and not a letter or string. This can be validated by
using int()
or is_numeric()
.
PHP 5 comes with the following filter functions (full list of functions available on the PHP website):
PHP Filter Functions:
Function |
Description |
filter_has_var() |
Checks if a variable of a specified input type exists |
filter_id() |
Returns the ID number of a specified filter |
filter_input() |
Get input from outside the script and filter it |
filter_input_array() |
Get multiple inputs from outside the script and filters them |
filter_list() |
Returns an array of all supported filters |
filter_var_array() |
Get multiple variables and filter them |
filter_var() |
Get a variable and filter it |
PHP Filters
Field |
Description |
FILTER_CALLBACK |
Call a user-defined function to filter data |
FILTER_SANITIZE_STRING |
Strip tags, optionally strip or encode special characters |
FILTER_SANITIZE_STRIPPED |
Alias of "string" filter |
FILTER_SANITIZE_ENCODED |
URL-encode string, optionally strip or encode special characters |
FILTER_SANITIZE_SPECIAL_CHARS |
HTML-escape '"<>& and characters with ASCII value less than 32 |
FILTER_SANITIZE_EMAIL |
Remove all characters, except letters, digits and !#$%&'*+-/=?^_`{|}~@.[] |
FILTER_SANITIZE_URL |
Remove all characters, except letters, digits and $-_.+!*'(),{}|\\^~[]`<>#%";/?:@&= |
FILTER_SANITIZE_NUMBER_INT |
Remove all characters, except digits and +- |
FILTER_SANITIZE_NUMBER_FLOAT |
Remove all characters, except digits, +- and optionally .,eE |
FILTER_SANITIZE_MAGIC_QUOTES |
Apply addslashes() |
FILTER_UNSAFE_RAW |
Do nothing, optionally strip or encode special characters |
FILTER_VALIDATE_INT |
Validate value as integer, optionally from the specified range |
FILTER_VALIDATE_BOOLEAN |
Return TRUE for "1", "true", "on" and "yes", FALSE for "0", "false", "off", "no", and "", NULL otherwise |
FILTER_VALIDATE_FLOAT |
Validate value as float |
FILTER_VALIDATE_REGEXP |
Validate value against regexp, a Perl-compatible regular expression |
FILTER_VALIDATE_URL |
Validate value as URL, optionally with required components |
FILTER_VALIDATE_EMAIL |
Validate value as e-mail |
FILTER_VALIDATE_IP |
Validate value as IP address, optionally only IPv4 or IPv6 or not from private or reserved ranges |
If you are running PHP5.2 or higher, you should have access to these functions.
To verify that they actually exist, run a script with the phpinfo()
and scroll down to where it
says filter. It looks something like this:
If you are by any chance running your own server, then for Linux or Unix type on the command line:
pecl install filter .
...and if you are running the Windows platform, download php_filter.dll from https://pecl4win.php.net/ext.php/php_filter.dll, save it in your extensions folder. Just make sure that the file matches the PHP version that is installed on your system and restart the server so that PHP loads it.
So how do the functions work? You may have noticed that the functions are very
unwieldy and certainly a pain to type out. Another thing is that there are only
seven filter functions and of these, only four actually do any filtering. filter_has_var(), filter_input(), and filter_input_array()
are all using super global arrays, such as $_GET and $_POST. You should refer
to the superglobal variable by its equivalent filter constant. Below is a list
of some of the constants:
Table 3. Showing Constants.
Constant |
Superglobal |
INPUT_COOKIE |
$_COOKIE variables |
INPUT_GET |
$_GET variables |
INPUT_POST |
$_POST variables |
INPUT_SERVER |
$_SERVER variables |
This list is not complete, so consult your PHP manual or visit https://www.php.net for a full list of constants.