Sams Teach Yourself XML in 24 Hours, Complete Starter Kit, 3rd Edition. Part 1
[next] |
Files and Directories in Perl
Excerpted from Sams Teach Yourself Perl in 24 Hours, 3rd Edition by Clinton Pierce. ISBN 0672327937, Copyright © 2005. Used with the permission of Sams Publishing.
HOUR 10
Files and Directories
What You'll Learn in This Hour
|
Files in your operating system provide a convenient set of storage concepts for data. The OS enables a name to be given to the data (a filename) and provides an organizational structure, called a file system, so that you can find the data later. Your computer's file system then organizes files into groups called directoriesÂsometimes called folders. These directories can store files or other directories.
This nesting of directories inside directories provides a treelike structure to the file system on your computer. Each file is part of a directory, and each directory is part of a parent directory. In addition to providing an organizational structure for your files, the operating system also stores data about the file: when the file was last read, when it was last modified, who created it, the current size of the file, and so onÂcalled metadata (see Hour 5, "Working with Files"). This organization is true of almost all modern computer operating systems.
In the case of the Macintosh (pre-Mac OS X), this structure still holds true, except that the top-level directory is called a Volume, and the subdirectories area is called Folders. Perl allows you to access this structure, modify the organization, and examine the information about the files. The functions that Perl uses for these tasks are all derived from the Unix operating system, but they work just fine under whatever operating system Perl happens to be running on. Perl's file system manipulation functions are portable, meaning that if you use Perl's functions to manipulate your files and query them, you should have no problems running your code under any operating system Perl supports, providing that the directories are structured similarly.
Getting a Directory Listing
The first step in obtaining directory information from your system is to create a directory handle. A directory handle is something like a filehandle, except that instead of a file's contents, you read the contents of a directory through the directory handle. To open a directory handle, you use the opendir function:
opendir dirhandle, directory
In this syntax, dirhandle
is the directory handle you want to open and directory
is the name of the directory you want to read. If the directory handle cannot be
openedÂbecause you don't have permission to read the directory, the directory
doesn't exist, or because of some other reasonÂthe opendir function returns false.
Directory handle variable names should be constructed similarly to filehandlesÂ
using the rules for variable names outlined in Hour 2, "Perl's Building Blocks:
opendir(TEMPDIR, Â/tmp') || die "Cannot open /tmp: $!";
All the examples in this hour use forward slashes (/) in the Unix style because it is less confusing than the backslashes (\) used by Windows and MSDOS and works just as well with those operating systems as with Unix.
Now that the directory handle is open, you use the readdir function to read it:
readdir dirhandle;
In a scalar context, readdir
returns the next entry in the directory, or undef if none
are left. In a list context, readdi
r returns all the (remaining) directory entries. The
names returned by readdir
include files, directories, and (for Unix) special files;
they are returned in no particular order. The directory entries . and .. (representing
the current directory and its parent directory) are also returned by readdir
. The
directory entries returned by readdir
do not include the pathname as part of the
name returned.
The following example shows how to read a directory:
opendir(TEMP, Â/tmp') || die "Cannot open /tmp: $!";
@FILES=readdir TEMP;
closedir(TEMP);
In this preceding snippet, the entire directory is read into @FILES
. Most of the time,
however, you're not interested in the . and .. files. To read the filehandle and eliminate
those files, you can enter the following:
@FILES=grep(!/^\.\.?}$/, readdir TEMP);
The regular expression (/^\.\.?$/) matches a leading literal dot (or two) that is
also at the end of the line, and grep
eliminates them. To get all the files with a particular
extension, you use the following:
@FILES=grep(/\.txt$/i, readdir TEMP);
The filenames returned by readdir
do not contain the pathname used by opendir.
Thus, the following example will probably not work:
opendir(TD, "/tmp") || die "Cannot open /tmp: $!";
while($file = readdir TD) {
# The following is WRONG
open(FILEH, $file) || die "Cannot open $file: $!\n";
# Process the file hereÂ
}
closedir(TD);
Unless you happen to be working in the /tmp
directory when you run this code, the
open(FILEH, $file)
statement will fail. For example, if the file myfile.txt
exists
in /tmp, readdir
returns myfile.txt
. When you open myfile.txt
, you actually
need to open /tmp/myfile.txt
using the full pathname. The corrected code is as
follows:
opendir(TD, "/tmp") || die "Cannot open /tmp: $!";
while($file=readdir TD) {
# Right!
open(FILEH, "/tmp/$file") || die "Cannot open $file: $!\n";
# Process the file hereÂ
}
closedir(TD);
Globbing
The other method of reading the names of files in a directory is called globbing. If
you're familiar with the command prompt in MS-DOS, you know that the command
dir *.txt
prints a directory listing of all the files that end in .txt
. In Unix, the
globbing (sometimes called wildcard matching) is done by the shell, but ls *.txt
has nearly the same result: The files whose names end in .txt
are listed.
Perl has an operator for doing just this job; it's called glob. The syntax for glob is
glob pattern
where pattern is the filename pattern you want to match. The pattern can contain
directory names and portions of filenames. In addition, the pattern can contain
any of the special characters listed in Table 10.1. In a list context, glob returns
all the files (and directories) that match the pattern. In a scalar context, the files are
returned one at a time each time glob
is queried.
Now check these examples of globbing:
# All of the .h files in /usr/include
my @hfiles=glob(Â/usr/include/*.h');
# Text or document files that contain 1999
my @curfiles=glob(Â*1999*.{txt,doc}')
# Printing a numbered list of filenames
$count=1;
while( $name=glob(Â*') ) {
print "$count. $name\n";
$count++;
}
An important difference between glob and opendir/readdir/closedir
is that glob
returns the pathname used in the pattern, whereas the opendir/readdir/closedir
functions do not. For example, glob(Â/usr/include/*.h')
returns Â/usr/include'
as part of any matches; readdir
does not.
So which should you use? It's completely up to you. However, using the
opendir/readdir/closedir
functions tends to be a much more flexible solution
and will be used in most of the examples throughout this book.
Perl offers an alternative way to write pattern globs. Simply placing the pattern
inside the angle operator () makes the angle operator behave like glob
:
@cfiles = ; # All files ending in .c
The syntax that uses the angle operator for globbing is older and can be confusing.
In this book, I will continue to use the glob
operator instead for clarity.
Created: March 27, 2003
Revised: February 3, 2006
URL: https://webreference.com/programming/perl_24/1