Perl format Primer (1/2)
[next] |
Perl format
Primer
By Dan Ragle
Though often referred to with the backronym "Practical Extraction and Reporting Language," Perl is probably more often associated with tasks involving the parsing and manipulation of files and data than it is the actual, formal reporting of that data. Taking a close look at your corporate or Web host's servers (assuming you are privileged enough to look at such things) will often reveal multiple Perl-based scripts with a host of critical functions, such as receiving data from a Web form and storing it in a database, reading a log file a line at a time and producing a separate text file with summarized results, synchronizing specific data elements among multiple servers, sending automated E-mail messages to a list of users and so on. Less frequent, however, will be the reporting tasks that are assigned to the language; the formal presentation of parsed data in a format that allows more casual data browsers--i.e., human beings--to make sense of the information.
Though Perl shines when it comes to making data manipulation jobs simple to
code and execute, it's not entirely without formal reporting possibilities.
In this brief tutorial, we'll examine the two main core functions that can be
used to create formatted reports with the Perl language: format
,
for the insertion of data elements into formatted report lines; and write
,
which outputs the formatted results to a file or STDOUT
for examination.
format
While you are certainly welcome to code up a Perl-based report manually--i.e.,
tracking line counts, page numbers, and page headers, lining up the output of
print statements so that data is listed in the appropriate column, developing
separate sub routines to print page and column header information, etc.--the
core format
function provides a template-based means to produce
simple reports without requiring such manual shenanigans. In a nutshell, you
can define report line and header templates using the format
function
that will then be automatically filled in and utilized whenever the write
function is called. We'll examine write
in more detail later in
this tutorial; for now, let's focus on the creation of the report line and header
formats using format
.
To define a line or header template for your report, you use the following syntax:
format [name] =
picture line 1
argument line 1
picture line 2
argument line 2
...
picture line n
argument line n
.
The dot at the end is not a typo; it's the official end of a format definition. It must be the first--and only--character on the line in order for Perl to interpret it as the end of the format template.
The name supplied for the format is important, and can be directly related to which file this format will be applied to. Each filehandle you use in your Perl script will automatically assume that the format that is defined with the same name as the filehandle will be applied to it. Or, in other words:
# open up MYFILE for writing
open(MYFILE,">myfile.txt") or die "Can't open up myfile: $!\n";
my ($name,$salary);
# now this line format will automatically apply to MYFILE
format MYFILE =
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>> Salary: @###########.##
$name, $salary
.
For now, ignore the picture and argument lines within the format, and concentrate
on the name applied. Since this format declaration was named MYFILE
it's automatically applied to the MYFILE
filehandle; or, in other
words, when we write
to MYFILE
, Perl will automatically
use the MYFILE
format for writing.
If no name is supplied to format
, then STDOUT
is
assumed; in which case the format
declaration in turn automatically
applies to STDOUT
. Note that format
names have their
own namespace in Perl, and you are therefore allowed to create names which may
be identical to existing variables or functions (though, for the sake of clarity,
you may not want to do this). As we shall see later, it is then possible to
assign a specially named format directly to any specific filehandle, even if
the format name is not the same as the filehandle itself.
In addition to the default report line format for each filehandle as described
above, a special format can be defined as the page header for each filehandle.
To define a specially formatted page header for a file, you add the _TOP
suffix to the format name:
# open up MYFILE for writing
open(MYFILE,">myfile.txt") or die "Can't open up myfile: $!\n";
my ($name,$salary);
# now this line format will automatically apply to MYFILE
format MYFILE =
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>> Salary: @###########.##
$name, $salary
.
# and this page header format will automatically apply to MYFILE
format MYFILE_TOP =
Employee Names and Salaries Page: @>>>>>>>>>>
$%
------------------------------------------------------------------
.
To assign a page header for STDOUT
, you would need to define
a format named STDOUT_TOP
.
The picture line of a format definition declares the literal text and
the field definitions that will appear within the body of your
report. Literal text is exactly that: it will appear exactly as you specify
it within the written output. Field definitions begin with either a
caret or an at sign (^
or @
) and denote a field
into which a supplied value should be automatically inserted at run time.
Immediately following the caret or at sign is a series of characters that
denote the justification and length of the data that will be placed within the field:
Character | Meaning |
> | right justified |
# | right justified (numeric only; can include a decimal point) |
< | left justified |
| | center justified |
* | left justified, fill in all data from value |
The argument lines must immediately follow each picture line, and should contain
only those variables or expressions--separated by commas--that will actually
be filled into the field definitions on the previous line. The variables or
expressions can actually appear anywhere on the line, but for readability I
suggest you follow the typical convention of visually lining up the data variables
with the appropriate format fields (if possible), as in the examples above and
below. Data is always justified within the defined length of the field in the
format--the field width is never expanded or contracted to fit the data you
are plugging in (with the exception of *
fields, which we'll discuss
later). If the actual data you provide exceeds the width of the field then the
data will be truncated. You may wrap your arguments to more than one line if
you enclose all the arguments in curly braces, but if you do so, the opening
curly brace must be the first token on the first line.
format
definitions are processed by the perl interpreter at compile
time, not run time, and therefore any variables that you wish to use within
your format must be visible to the interpreter--either declared earlier in the
case of lexically scoped variables, or within the same routine in the case of
dynamically scoped variables. (But the actual value to be contained within the
variables can, of course, be adjusted throughout the script.) The format definitions
themselves are global in nature, and therefore you can only have one unique
format name per package. If you define the same format name twice in the package,
the last one that the compiler sees is the one that will be applied throughout
that package for that format name whenever you write
it (even if
the format
occurs after the write
in question).
Some examples should help you to see how basic format definitions are
utilized. For each of the formats, assume that the $name
variable has already been defined and it contains John Smith
and the $salary
variable contains 78293.22.
format MYFILE =
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>> Salary: @###########.##
$name, $salary
.
Produces the output:
Name: John Smith Salary: 78293.22
While:
format MYFILE =
Name: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< Salary: @###########.##
$name, $salary
.
Produces the output:
Name: John Smith Salary: 78293.22
And:
format MYFILE =
Name: @|||||||||||||||||||||||||||| Salary: @###########.##
$name, $salary
.
Produces the output:
Name: John Smith Salary: 78293.22
In each of the above cases, we could also have used the > right justification for the salary variable, but for our purposes:
format MYFILE =
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>> Salary: @>>>>>>>>>>>>>>
$name, $salary
.
would have worked just as well. Note that in this latter case, we couldn't
have specified an explicit decimal place; as the .
and anything
following it would have been assumed to be literal text to be displayed by Perl.
This is fine if you know for certain that your numeric data is all pre-formatted
to the exact same precision (if you're using sprintf
to format
the numbers, for example), but for numbers with potentially differing precisions
or formats the results will probably not be what you want. For example, if,
instead of 78293.22 in the previous example we instead had the number 78293.20,
our output using the template above would have looked like this:
Name: John Smith Salary: 78293.2
While this is technically accurate, it looks unprofessional in a printed report and can be difficult to read in a columnar format. Using the # based notation avoids this problem, as Perl automatically rounds and formats the numbers to appear within the defined field with the specified decimal places in the correct position. The flip side of this convenience is the fact that # based fields must be filled with numbers; attempting to assign a text string to such a field will produce a nicely formatted zero in place of the actual text you wished to display.
For fields defined with an @
sign (as in all the examples above),
you may use either static variables or the result of an expression to fill data
into the field, for example:
format MYFILE =
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>> Salary: @>>>>>>>>>>>>>>
&last_name_first($name), $salary
.
where the last_name_first
function is defined elsewhere in your code
and returns the desired value. The argument lines are first evaluated before the results
are plugged into the format; making the use of arrays legal, too:
my @data_line=("John Smith",78293.22);
format MYFILE =
Name: @>>>>>>>>>>>>>>>>>>>>>>>>>>>> Salary: @>>>>>>>>>>>>>>
@data_line
.
Thus far, we have discussed only the use of @
type fields in a
format definition. A ^
field can be used to denote special types
of processing for the defined field. With #
based fields, the field
will automatically be cleared (blanked) if the data value is undefined. For
all other justifications, the data is "filled" into the field; as much data
as can fit in the field is placed into it, and the data variable is then reset
such that the data that was actually placed into the fill-in field is removed
from the variable. Due to this special type of processing, the values supplied
to caret fields must be scalar variables that contain a text string.
Exactly how much data is placed
into the field (and subsequently removed from the variable) depends on the settings
of the special internal variable $:
, which is typically set to space, newline,
and dash. In other words, Perl automatically fills in the data, filling in the field
as much as possible, up to the last newline, space, or dash in the specified text and
then removes that much data from the variable. Having a look at the following before
and after example should help to explain the concept:
my $my_string = "This is center justified text, longer than the format.";
open(MYFILE,">myfile.txt") or die "Can't open up myfile: $!\n";
format MYFILE =
^|||||||||||||||||||||||||||||||||||
$my_string
.
write MYFILE;
# myfile.txt now contains:
# This is center justified text,
print "$my_string\n"; # prints "longer than the format."
Thus, it's important for you to note that the variables used for filling into
caret-based fill in fields will be modified each time write
is processed.
So what good are these fill in fields? They allow you to easily "flow" data over multiple lines in a single format definition. For example:
my $name="John Smith";
my $salary=78293.20;
my $job_desc="John's job is to dominate the world with his wits and a toothbrush.";
format =
Name: @<<<<<<<<<<<<<<<<<<<<<<< Salary: @###########.##
$name, $salary
Job Description: ^<<<<<<<<<<<<<<<<<<<<<<<<<<
$job_desc
^<<<<<<<<<<<<<<<<<<<<<<<<<<
$job_desc
^<<<<<<<<<<<<<<<<<<<<<<<<<<
$job_desc
^<<<<<<<<<<<<<<<<<<<<<<<<<<
$job_desc
.
write;
# result:
#
# Name: John Smith Salary: 78293.20
# Job Description: John's job is to dominate
# the world with his wits and
# a toothbrush.
#
Notice in the above example that an additional blank line is printed at the end of the job description block. This is because the data we provided didn't completely fill our defined format; i.e., we defined 4 lines to hold the data, but then only provided 3 lines of actual data to use. You can use a single tilde (~) on a line to suppress any line that would be completely blank due to a lack of data. Compare the above results with this:
my $name="John Smith";
my $salary=78293.20;
my $job_desc="John's job is to dominate the world with his wits and a toothbrush.";
format =
Name: @<<<<<<<<<<<<<<<<<<<<<<< Salary: @###########.##
$name, $salary
Job Description: ^<<<<<<<<<<<<<<<<<<<<<<<<<<
$job_desc
^<<<<<<<<<<<<<<<<<<<<<<<<<< ~
$job_desc
^<<<<<<<<<<<<<<<<<<<<<<<<<< ~
$job_desc
^<<<<<<<<<<<<<<<<<<<<<<<<<< ~
$job_desc
.
write;
# result:
#
# Name: John Smith Salary: 78293.20
# Job Description: John's job is to dominate
# the world with his wits and
# a toothbrush.
Note the extra blank line is now gone. On lines that are displayed, the tilde itself will be replaced with a single blank space.
Placing two tildes consecutively on a line tells the perl interpreter to repeat that line, continuously replacing the fields with the supplied expressions, until the data is exhausted. Thus, the above example could even be shortened to this:
my $name="John Smith";
my $salary=78293.20;
my $job_desc="John's job is to dominate the world with his wits and a toothbrush.";
format =
Name: @<<<<<<<<<<<<<<<<<<<<<<< Salary: @###########.##
$name, $salary
Job Description: ^<<<<<<<<<<<<<<<<<<<<<<<<<<
$job_desc
^<<<<<<<<<<<<<<<<<<<<<<<<<< ~~
$job_desc
.
write;
# result:
#
# Name: John Smith Salary: 78293.20
# Job Description: John's job is to dominate
# the world with his wits and
# a toothbrush.
You can use the double tilde character construct with @
type fields,
just make sure that the expression you use will eventually run out of data.
For example, the following will not produce the effect you want, but will instead
produce a Runaway format error:
my @peanuts=("Charlie","Lucy","Linus","Snoopy","Woodstock");
# this is WRONG!
format =
Peanuts characters:
@<<<<<<<<<<<<<<<<<<<<<< ~~
@peanuts
.
write;
What you probably wanted was this:
my @peanuts=("Charlie","Lucy","Linus","Snoopy","Woodstock");
format =
Peanuts characters:
@<<<<<<<<<<<<<<<<<<<<<< ~~
shift(@peanuts)
.
write;
# result:
# Peanuts characters:
# Charlie
# Lucy
# Linus
# Snoopy
# Woodstock
Finally, let's take note of one last formatting construct you may find helpful.
When you declare a field with @*
, it's the same as saying "Fill in
all the available data on this line, regardless of the data length." Using this field
will result in a line the length of which won't be known until run time, since the line
will be the length of the data provided. You can use an asterisk with caret fields,
too; but there's not much reason too (unless you for some reason want the contents
of the variable you supply to be cleared as part of the write
process).
Certain internal Perl variables are available to you to utilize directly
within your template formats and to control the application of the formats to
the output files. On the next page we examine those variables
and then conclude our tutorial with a look at the write
function.
[next] |
Created: December 1, 2005
Revised: December 9, 2005
URL: https://webreference.com/programming/perl/format/index.html