Perl Subroutine Primer | WebReference

Perl Subroutine Primer

By Dan Ragle


[next]

If you've not been scripting for long (and even if you've been scripting for a long time) you may think that one of the simplest ways to repeat sections of code in your Perl scripts is to copy and paste the code and then rename the variables that the code is applied to. For example, if we have three variables, named $charlie, $snoopy, and $woodstock, and we need to apply a regex substitution to each of them, we might be tempted to write:

This is relatively simple to do, but invariably, such an approach will become more and more cumbersome the longer your script becomes (and, to the programmers that follow you, as long as it lives). For example, what happens later in the code when you realize you have several more variables all of which must have the exact same substitution applied? Do you search for the original code, then copy and paste it again and again? Doing so not only wastes your time, but also introduces greater possibilities for error, since each time you copy the code (and edit it for your new variable names), you could copy something incorrectly or accidentally edit the code in a way contrary to your intention. More importantly, consider the ongoing maintenance requirements for the code. Should your initial substitution prove to be faulty, incomplete, or simply need extension, a maintenance programmer (perhaps even you) will need to find and re-edit each instance of the code to correct the problem.

You may have already guessed (if the title didn't already give it away) that we're going to examine the use of user-defined subroutines in Perl; blocks of code that can accept, operate on, and/or return variables and values that can be reused throughout your Perl script without needing to copy and paste it each time it's required. When you define and use subroutines, the code logic is neatly contained in a single location of the script and can be called (and later modified, when necessary) without requiring you to track down every instance of the logic in the script at large. Our small copy and paste sample from above, for example, could be altered to look like this:

Admittedly, we've not saved ourselves any space in this small example, but later when we need to trim the white space from additional strings (and even later, when we need to modify or amend our trim_space functionality in a future version of the script) we'll be thankful that we've already created and defined a subroutine to do it; code that we can just call directly instead of copying and pasting it repetitively.

Subroutines in Perl are easy to define and use; but like practically any other aspect of any programming language, there are some rules that you need to know --and follow-- to make the most of them (and avoid shooting yourself in the foot). The remainder of this tutorial focuses on the basic uses of subroutines in the Perl language; including how to define them, call them, pass variable parameters into them, and to return unique values from them.

Basic Subroutine Definitions in Perl

Perl subroutines can be defined as either named or anonymous subroutines.

A named subroutine can be declared and defined anywhere in a Perl script, using the syntax:

Where NAME is the name you select for your subroutine, and BLOCK is the actual programming logic, surrounded by curly braces ({...}). When selecting names for your general subroutines, it's recommended that you use all lower case names; since, by convention, subroutine names with all capital letters indicate actions that are triggered automatically as needed by Perl; such as AUTOLOAD or DESTROY, and mixed case names are often utilized as package names. If you haven't already, you'll learn about these technologies later as you continue your Perl education. But for now, keep your user-defined subroutine names to all lower case and you'll avoid this ambiguity in the future.

Here's an example of a basic subroutine definition, as well as the code we might use to execute (call) it:

In general, subroutines can be called from anywhere within your Perl script (we'll talk more about this actual process later in this article), whether the subroutine has been defined or not. Because of this, programmers often collect all of a script's subroutines in the same place, such as the bottom of the script after the main execution logic. This is the approach we used in the above example script.

When calling subroutines in some specific situations, however, it's necessary that the subroutine be declared before it can actually be called. In these cases, you can choose to either declare and define the subroutine before the code that calls it; or you can just declare the subroutine (without initially providing a code block) with the intent to "fill in the code details" at some other point in the application. For example:

When we access the hello_string subroutine in the above example without using parentheses (and without the older style & notation), we are calling the subroutine as a list operator, which may change the precedence of the subroutine call in relation to other operators that exist within the same command. In order to use a subroutine as a list operator as in the above example, the subroutine must have already been declared, which is what we accomplished with the sub hello_string code above. This statement is called a forward declaration; and is rarely needed (for standard user defined subroutines that are called using parentheses, and, as we'll discuss later, forward declarations are unnecessary) unless you want to create and call your user-defined subroutines in the same manner as the built-in Perl list operators, such as print or shift. Still, it's there when you need it.

A subroutine can also be defined as an anonymous subroutine, which is assigned as a reference to a variable and then called by dereferencing the variable assigned to:

If the idea of defining anonymous subroutines using references in this way confuses you, you may want to take a look at our previous tutorial which focused on the use of references to create Nested Data Structures in Perl.

Subroutines can also be defined with prototypes, which allow them to enforce the types and number of parameters that are passed to them. We'll revisit this process later in this tutorial; but for now, let's look at how subroutines are called in Perl.

Calling Subroutines

To call a named subroutine, you can use one of several possible formats. The most common formats are demonstrated below:

Each of these formats has properties that are unique to it. Let's briefly examine each:

say_hello_to("Dan");
In this format the subroutine is called and executed using the passed in string literal parameter Dan (we'll discuss the passing of parameters later in this article). Since our intention to call a subroutine was made clear (via the inclusion of the parentheses) the actual subroutine (say_hello_to) could have been defined anywhere within the script (or imported in, or required, etc.). This format is probably the most common in modern Perl scripts.

say_hello_to "Dan";
Again, we call the subroutine and pass it the same parameter. In this case, we haven't used an ampersand or parentheses, thus our intent here may be ambiguous to the perl interpreter. Therefore, as we discussed briefly in the previous section, Perl requires that we have declared the subroutine (and optionally also defined its code) before we actually call it in this manner. In the above example, we both declared and defined the code for the say_hello_to subroutine prior to our actual call, so we're ok.

&say_hello_to("Dan");
This call is most similar to the first, and is common in older Perl scripts; the subroutine is executed with the string literal parameter Dan. When you call a subroutine in this manner, note that prototype checking is disabled. We'll discuss prototypes in a bit more detail later.

&say_hello_to;
Finally, we call the subroutine without passing any parameters. In this special case, where we use the ampersand and we omit any parameters, the special @_ array is passed to the subroutine as is (i.e., with whatever it currently contains).

Note here an important distinction between calling a named subroutine, and referring to the name of a named subroutine. When we call a subroutine --i.e., actually execute the code of the subroutine-- we (typically) use one of the formats described above. When we refer to the subroutine's name, however --without executing its code-- we always include the leading ampersand (&), just as we would always refer to a named array with its leading @, or a named hash with its leading % sign. For example, in the following assignment:

we see that the assignment to $ref_to_sub is not the actual subroutine, but instead is the return value from the subroutine. In this context we must use the ampersand, since it's the name of the subroutine we want and not the call:

In the first example, had my_sub not been declared before the assignment (my $ref_to_sub = \my_sub;) and use strict was in force, then the compiler would've complained about the bareword my_sub.

Passing Parameters

While some subroutines may be helpful as static code blocks (that simply repeat the exact same action as needed without any variation) most subroutines become more useful by passing parameters--sending variable values to the subroutine that it should act on, and possibly receiving unique values back from the subroutine in response.

In Perl, the passing of parameters to and from subroutines is very straightforward. Parameter values are always passed to and returned from subroutines as a list of scalar values; and those values can be referenced within the subroutine via the special @_ array; which Perl automatically sets up on each entry to the subroutine to contain the values that have been passed to it. Consider, for example, this expansion of our say_hello_to subroutine above:

Note that we can access the @_ array and expect it to have the values passed to the subroutine without having to do anything special on our part; Perl sets up this array automatically for us. And our earlier version of say_hello_to shows you a common way to convert a passed parameter to a "named" parameter, by just assigning the values to a private (my) variable, i.e.:

Recall that shift, when not provided with a parameter, automatically defaults to operating on the @_ array. The above construct is very common in Perl programming.

In the above example, we created a copy of the value of the parameter and placed it in $name so we could work with it. We could modify $name in the say_hello_to subroutine above without fear of changing the variable that was passed to it (in this case, $parameter_name). In computer programming parlance, this is called pass-by-value, i.e., only the values of the passed parameters are used without changing the underlying parameter in the main program chunk, but in some cases you want to adjust the underlying parameter, the actual parameter in the program that you are passing to the subroutine. Perl provides two primary ways to accomplish this.

First, you can refer directly to the $_ elements of the @_ array, and changes made to those elements will be reflected in the actual variables that you passed to the subroutine. For example:

Note, however, that only the individual elements of the @_ array that are treated in this manner. Assigning values directly to the @_ array will not only not change the passed in values, but it will also remove your ability to refer to the passed in values themselves.

In the above example, you may have expected the print statement to print giraffes, hippos, kangaroos (or perhaps snakes, hippos, kangaroos) but in fact it's the original, unchanged array that is printed, since the assignment in the new_oz subroutine:

has no effect on the values that were "passed into" the subroutine.

The second way to pass a variable into a subroutine such that it can be changed within the subroutine is to pass it by reference; that is, instead of passing the scalar values of the variable(s) we instead pass a reference to the variable itself:

Again, in computer programming parlance, this is known as pass-by-reference. If this use of references confuses you or is new to you, you may wish to check out our earlier tutorial on nested data structures in Perl.

Note that in either of the above usages, you can only change from within the subroutine something that was actually changeable in the main program, i.e., if you pass a literal value to the subroutine and try to change it directly, you'll get an error:

We noted earlier that all parameters passed to and from Perl subroutines are passed as a simple list of scalar values. This means that when you pass an array or a hash to the subroutine, it's not the named array or hash that is passed to the subroutine, but instead all of the individual values are sent to the subroutine in a long list, i.e.:

As you can imagine, this can be very inefficient if your hashes/arrays are large (or the subroutine is called often). Passing such parameters by reference as in the earlier example is more efficient.

We can now define subroutines and pass to them unique values to be operated on. But how do we return unique values to the calling program, i.e., such as we might with a mathematical function?


[next]