Data Filtering with PHP / Page 2 | WebReference

Data Filtering with PHP / Page 2


[prev]

Data Filtering [con't]

The Three Filters

So let's take a look at the three filters that we just discussed. First up is the filter_has_var() function. This function helps in determining if a variable exists in an input array. It takes two arguments, one of which is any of the constants listed in table 3 on page one of this article, and the other is a variable. For example to check if $_POST array contains a variable called 'avar', you would do:

The filter_list() function, when executed, generates a list of filters that is supported by the current server. Below is a list of filters supported by my server:

The third and final function is the filter_id() function. It takes one argument and is used in combination with the names listed in the filter_list() function. It is executed in the following way: filter_id($avalue). When executed it returns a numerical value that represents one of the supported filters i.e.

filter_id('float');

would return:

259

So how is this function useful for you? The main usefulness of this function lies in the fact that it is used as a substitute for PHP constants used by the main filter functions. For example, instead of using FILTER_VALID_FLOAT as an argument when filtering input, you can use:

filter_id('float')

The remaining four filter functions work best when you set their options correctly. PHP offers over thirteen filters, which can broadly be categorized in the following way:

Sanitizing Filters - This filter strips certain characters from a given value and returns the sanitized version. An example use would be:

When you run the code above you will get the following result:

103

Customized Filters - This filter accepts a user defined function and processes the data according to its requirements.

Validating Filters - This filter validates a given value to check if it is of a particular type such as float or integer or that a given value conforms to a specific format, i.e. URL address.

When you run the code above you will get the following result:

int(20)

The filters are identified by constants and many of them have options, flags and in some cases both. Most of the time you are required to use flags with their related filters except for the two listed below, which can be used with any filters:

Flag

Description

FILTER_REQUIRE_SCALAR

This rejects any value that isn't scalar. In other words, the value must be one of the following data types: Boolean, integer, floating point number, resource (such as a database link or file handle), or string.

FILTER_REQUIRE_ARRAY

This rejects any value that isn't an array. When this flag is set, the filter is applied to all elements of the array. It also works on multidimensional arrays, applying the filter recursively to each level.

Below is a list of some of the constants used to sanitized and validate data:

Filters used to sanitize and validate data:

Filter

Options

Flags

Description

FILTER_SANITIZE_EMAIL

No Options

No Flags

Removes all characters except letters, digits, and !#$%&'*+- /=?^_`{|}~@.[].

FILTER_SANITIZE_ENCODED

No options

FILTER_FLAG_STRIP_LOW FILTER_FLAG_STRIP_HIGH FILTER_FLAG_ENCODE_LOW FILTER_FLAG_ENCODE_HIGH

URL- encodes a string. Setting the flags optionally strips or encodes characters with an ASCII value of less than 32 (LOW) or greater than 127 (HIGH).

FILTER_SANITIZE_MAGIC_QUOTES

No Options

No Flags

Escapes single and double quotes by inserting a backslash in front of them in the same way as the addslashes() function.

FILTER_SANITIZE_NUMBER_FLOAT

NO Options

FILTER_FLAG_ALLOW_FRACTION FILTER_FLAG_ALLOW_THOUSAND FILTER_FLAG_ALLOW_SCIENTIFIC

Removes all characters except digits and the plus and minus signs. The flags optionally permit a decimal fraction, the thousands separator, and scientific notation (using uppercase or lowercase E). The decimal point and thousands separator are left untouched. If not set, the decimal point is removed, but not the fraction, for example, 10.5 becomes 105.

FILTER_SANITIZE_NUMBER_INT

No Options

No Flags

Removes all characters except digits and the plus and minus signs. The decimal point is removed, if present, but not the fraction, for example, 10.0 becomes 100.

FILTER_VALIDATE_EMAIL

No options

No flags

Checks that a value conforms to the email format.

FILTER_VALIDATE_FLOAT

decimal

FILTER_FLAG_ALLOW_THOUSAND

Checks for a floating point number or integer; returns false for any other data type. The decimal option permits the use of a comma as the decimal point. Setting the flag accepts numbers containing a thousands separator (comma is the default, but period is used when decimal is set to ','). The returned value is always stripped of the thousands separator, with a period as the decimal point.

FILTER_VALIDATE_INT

min_range max_range

FILTER_FLAG_ALLOW_OCTAL FILTER_FLAG_ALLOW_HEX

Checks for an integer; returns false for any other data type. Specify the minimum and maximum acceptable values as an associative array using min_range and max_range (you can set just one or both together). Flags permit octal and hexadecimal numbers. Rejects numbers with a decimal point, even if the fraction is 0, for example, 10.0.

FILTER_VALIDATE_REGEXP

regexp

No flags

Validates a value against a Perl compatible regular expression. The whole value is returned, not just the part that matches the regular expression.

FILTER_VALIDATE_IP

No Options

FILTER_FLAG_IPV4 FILTER_FLAG_IPV6 FILTER_FLAG_NO_PRIV_RANGE FILTER_FLAG_NO_RES_RANGE

Checks that a value is an IP address. Flags allow you to specify only IPv4 or IPv6, or not from private or reserved ranges.

For a full list if constants, for both validation/sanitizing filters, please visit the PHP website. The filters themselves are designed to be used for both single variables and multiple variables. To work with single variables PHP has two filter functions:

filter_input()
filter_var()

The only difference between the two is that the filter_input() function processes variables that came through global arrays such as $_POST and $_GET, while filter_var() processes variables from any other source. The filter_input()funtion takes four arguments:

type- refers to the superglobal array that you intend to use, i.e. $_GET
variable - refers to the name of the variable you want to filter
filter - refers to the filter that you want to use. If omitted, PHP uses the default filter
options - refers to any flags or option you want to set for the operation.

Needless to say, the filter_var() function does not take the "source" option, but does the rest. As a way to demonstrate how to filter variables, I've created a small form with the following code:

And the code to process the form input:

So what have we done here? Basically, we created a form that has a text box that will take input from the user and a menu box that is populated with the various filter constants. The idea is that a user enters a value and then selects a filter option. The list of constants is not exhausted so feel free to add more constants to the array that populates the menu. The $constants array has the following format:

'float' => 'FILTER_VALIDATE_FLOAT'

Now when a constant is selected by the user, the value 'float' will be sent to the processing PHP code. This value is then fed to the filter_input function:

$constants = filter_input(INPUT_POST,'input', filter_id($_POST['val']));

You might be asking why we are using the filter_id() function here. This is because we need to convert the string 'float' to its constants equivalent. Look at the section of the article that deals with the three functions at the beginning of this page to get a better idea of what happens with the code. The constant equivalent of float is FILTER_VALIDATE_FLOAT, which is actually what the filter_input function requires.

Filtering Multiple Variables

Filtering multiple variables goes along the same lines as the filtering single variables. There are two functions that are used to deal with multiple variables:

filter_input_array
filter_var_array

The filter_input_array function takes the following arguments:

type- refers to the superglobal array that you intend to use, i.e. $_GET
definition- refers to a array that defines the arguments. In this case it's a multidimensional array that determines how the variables are to be filtered.

The filter_var_array function takes the following arguments:
data- refers to an array containing the variables that you want to filter
definition- same as filter_input_array function.

Below are some examples of how to use the above functions. Let's start with filter_var_array function:

The code above should be easy to understand; first, we set the variables that we want to filter in the array:

...$data = array('name' => 'Dantago',

Then we define how we want them to be filtered:

$definition = array('name' =>array('filter',FILTER_SANITIZE_STRING, 'flags',FILTER_FLAG_NO_ENCODE_QUOTES),

Now, if you want to filter values that come from a form, you will need to use the filter_input_array function. For variables coming from a form, use the same design as $definition above and simply change, this:

$result = filter_var_array($data, $definition);

to this:

$result = filter_input_array(INPUT_POST(or INPUT_GET), $definition);

Conclusion

While these functions provide excellent data validation for PHP applications, they are a bit cumbersome to use. I would suggest that anyone intending to use them should create a class that will not only make it easier to use them but will save time and make for faster development.

Original: June 24, 2009


[prev]