Data Filtering [con't]
The Three Filters
So let's take a look at the three filters that we just
discussed. First up is the filter_has_var()
function. This function helps in determining if a variable exists in an input
array. It takes two arguments, one of which is any of the constants listed in
table 3 on page one of this article, and the other is a variable. For example to
check if $_POST array contains a variable called 'avar', you would do:
The filter_list()
function, when executed, generates a list of filters that is supported by the
current server. Below is a list of filters supported by my server:
The third and final function is the filter_id()
function. It takes one argument and is used in combination with the names
listed in the filter_list()
function. It is executed in the following way: filter_id($avalue). When
executed it returns a numerical value that represents one of the supported
filters i.e.
filter_id('float');
259
So how is this function useful for you? The main usefulness of this function lies in the fact that it is used as a substitute for PHP constants used by the main filter functions. For example, instead of using FILTER_VALID_FLOAT as an argument when filtering input, you can use:
filter_id('float')
The remaining four filter functions work best when you set their options correctly. PHP offers over thirteen filters, which can broadly be categorized in the following way:
Sanitizing Filters - This filter strips certain characters from a given value and returns the sanitized version. An example use would be:
When you run the code above you will get the following result:
103
Customized Filters - This filter accepts a user defined function and processes the data according to its requirements.
Validating Filters - This filter validates a given value to check if it is of a particular type such as float or integer or that a given value conforms to a specific format, i.e. URL address.
When you run the code above you will get the following result:
int(20)
The filters are identified by constants and many of them have options, flags and in some cases both. Most of the time you are required to use flags with their related filters except for the two listed below, which can be used with any filters:
Flag |
Description |
FILTER_REQUIRE_SCALAR |
This rejects any value that isn't scalar. In other words, the value must be one of the following data types: Boolean, integer, floating point number, resource (such as a database link or file handle), or string. |
FILTER_REQUIRE_ARRAY |
This rejects any value that isn't an array. When this flag is set, the filter is applied to all elements of the array. It also works on multidimensional arrays, applying the filter recursively to each level. |
Below is a list of some of the constants used to sanitized and validate data:
Filter |
Options |
Flags |
Description |
FILTER_SANITIZE_EMAIL |
No Options |
No Flags |
Removes all characters except letters, digits, and !#$%&'*+- /=?^_`{|}~@.[]. |
FILTER_SANITIZE_ENCODED |
No options |
FILTER_FLAG_STRIP_LOW FILTER_FLAG_STRIP_HIGH FILTER_FLAG_ENCODE_LOW FILTER_FLAG_ENCODE_HIGH |
URL- encodes a string. Setting the flags optionally strips or encodes characters with an ASCII value of less than 32 (LOW) or greater than 127 (HIGH). |
FILTER_SANITIZE_MAGIC_QUOTES |
No Options |
No Flags |
Escapes single and double quotes by inserting a backslash in front of them in the same way as the addslashes() function. |
FILTER_SANITIZE_NUMBER_FLOAT |
NO Options |
FILTER_FLAG_ALLOW_FRACTION FILTER_FLAG_ALLOW_THOUSAND FILTER_FLAG_ALLOW_SCIENTIFIC |
Removes all characters except digits and the plus and minus signs. The flags optionally permit a decimal fraction, the thousands separator, and scientific notation (using uppercase or lowercase E). The decimal point and thousands separator are left untouched. If not set, the decimal point is removed, but not the fraction, for example, 10.5 becomes 105. |
FILTER_SANITIZE_NUMBER_INT |
No Options |
No Flags |
Removes all characters except digits and the plus and minus signs. The decimal point is removed, if present, but not the fraction, for example, 10.0 becomes 100. |
FILTER_VALIDATE_EMAIL |
No options |
No flags |
Checks that a value conforms to the email format. |
FILTER_VALIDATE_FLOAT |
decimal |
FILTER_FLAG_ALLOW_THOUSAND |
Checks for a floating point number or integer; returns false for any other data type. The decimal option permits the use of a comma as the decimal point. Setting the flag accepts numbers containing a thousands separator (comma is the default, but period is used when decimal is set to ','). The returned value is always stripped of the thousands separator, with a period as the decimal point. |
FILTER_VALIDATE_INT |
min_range max_range |
FILTER_FLAG_ALLOW_OCTAL FILTER_FLAG_ALLOW_HEX |
Checks for an integer; returns false for any other data type. Specify the minimum and maximum acceptable values as an associative array using min_range and max_range (you can set just one or both together). Flags permit octal and hexadecimal numbers. Rejects numbers with a decimal point, even if the fraction is 0, for example, 10.0. |
FILTER_VALIDATE_REGEXP |
regexp |
No flags |
Validates a value against a Perl compatible regular expression. The whole value is returned, not just the part that matches the regular expression. |
FILTER_VALIDATE_IP |
No Options |
FILTER_FLAG_IPV4 FILTER_FLAG_IPV6 FILTER_FLAG_NO_PRIV_RANGE FILTER_FLAG_NO_RES_RANGE |
Checks that a value is an IP address. Flags allow you to specify only IPv4 or IPv6, or not from private or reserved ranges. |
For a full list if constants, for both validation/sanitizing filters, please visit the PHP website. The filters themselves are designed to be used for both single variables and multiple variables. To work with single variables PHP has two filter functions:
filter_input() filter_var()
The only difference between the two is that the filter_input()
function processes variables that came
through global arrays such as $_POST and $_GET, while filter_var()
processes variables from
any other source. The filter_input()
funtion
takes four arguments:
type- refers to the superglobal array that you intend to use, i.e.
$_GET
variable - refers to the name of the variable you want to filter
filter - refers to the filter that you want to use. If omitted, PHP
uses the default filter
options - refers to any flags or option you want to set for the
operation.
Needless to say, the filter_var()
function does not take the "source" option, but does the rest. As a
way to demonstrate how to filter variables, I've created a small form with the
following code:
And the code to process the form input:
So what have we done here? Basically, we created a form that has a text box
that will take input from the user and a menu box that is populated with the
various filter constants. The idea is that a user enters a value and then
selects a filter option. The list of constants is not exhausted so feel free to
add more constants to the array that populates the menu. The $constants
array has the following
format:
'float' => 'FILTER_VALIDATE_FLOAT'
Now when a constant is selected by the user, the value 'float'
will be sent to the processing
PHP code. This value is then fed to the filter_input
function:
$constants = filter_input(INPUT_POST,'input', filter_id($_POST['val']));
You might be asking why we are using the filter_id()
function here. This is because we need to convert the string 'float' to its
constants equivalent. Look at the section of the article that deals with the
three functions at the beginning of this page to get a better idea of what
happens with the code. The constant equivalent of float is FILTER_VALIDATE_FLOAT,
which is actually
what the filter_input
function requires.
Filtering Multiple Variables
Filtering multiple variables goes along the same lines as the filtering single variables. There are two functions that are used to deal with multiple variables:
filter_input_array filter_var_array
The filter_input_array
function takes the following arguments:
type- refers to the superglobal array that you intend to use, i.e.
$_GET
definition- refers to a array that defines the arguments. In this case
it's a multidimensional array that determines how the variables are to be
filtered.
The filter_var_array
function takes the following arguments:
data- refers to an array containing the variables that you want to
filter
definition- same as filter_input_array
function.
Below are some examples of how to use the above functions. Let's start with filter_var_array function:
The code above should be easy to understand; first, we set the variables that we want to filter in the array:
...$data = array('name' => 'Dantago',
Then we define how we want them to be filtered:
$definition = array('name' =>array('filter',FILTER_SANITIZE_STRING, 'flags',FILTER_FLAG_NO_ENCODE_QUOTES),
Now, if you want to filter values that come from a form, you will need to use
the filter_input_array
function. For variables coming from a form, use the same design as $definition
above and simply change, this:
$result = filter_var_array($data, $definition);
to this:
$result = filter_input_array(INPUT_POST(or INPUT_GET), $definition);
Conclusion
While these functions provide excellent data validation for PHP applications, they are a bit cumbersome to use. I would suggest that anyone intending to use them should create a class that will not only make it easier to use them but will save time and make for faster development.
Original: June 24, 2009