Minimal Perl for Unix and Linux People: Part 2/Page 5
[previous] [next]
Perl as a (Better) Find Command: Part 2
6.5.1 Using Perl for reliable timestamp sorting
A classic problem is that of identifying the most recently modified (i.e., newest) file within a particular branch of the file system, which might reflect the most recent order received, the latest blog uploaded, the last Unix configuration file modified, and so forth. To find the newest file, a knowledgeable Unix programmer might compose a command like the following:
What does that pipeline do? The find command emits the pathnames of the relevant files; the xargs
command submits them as arguments to ls
, whose -lrdt
options sort their listings in ascending order by modification time; and then the tail-1
command peels off the listing that comes out lastÂthe one for the newest file. At least, you'd expect it to be the pathname of the newest file, on the basis of (dodgy) advice from books or colleagues, or your own experiences with similar commands.
As discussed earlier, it's considered fiendishly clever to use xargs
with find instead of an -exec
clause, because doing so is guaranteed to minimize the number of processes required to handle all the arguments. In fact, the find | xargs approach is so efficient, and so highly revered in Unix culture, and so impressive to your colleagues, and so, well, cool, that the only bad thing you could possibly say about its use for this task is: It's not guaranteed to produce the correct results! 18
Why can't it be trusted? Because the ls command isn't guaranteed to sort all the filenames in one batch. That can lead to an incorrect result, because the most recent file from the final batch is always the last one provided as input to tail and therefore the one emitted by the pipeline. Therefore, if so many filenames are presented to xargs that it has to divvy them up for processing by two or more ls commands, there's no guarantee that the file of interest will be processed in the critical final batch and that the correct pathname will emerge from the pipeline.
Note that this isn't a criticism of xargs itself, which does an admirable job of running the separate ls commands as efficiently as possible. The problem is that sorting isn't an operation that can be done in piecemeal fashion Âall the filenames must be sorted in one batch. For this reason, the find | xargs
approach just isn't suited to solving this problem.
The modified solution shown next uses a custom Perl script called most_ recent_file instead of xargs, which has two distinct advantages:
- It always produces the correct answer.
- It works even on non-Unix systems that have Perl.19
Here are the results from using the xargs-based technique shown earlierÂand its Perl alternativeÂfor finding the most recently modified file under /etc on my Linux-equipped laptop:
The wrong answer is the one produced by the first pipeline, because find generated so many arguments that xargs
couldn't present them all to ls in one batch.
In contrast, most_recent_file
(shown in Listing 6.1) always produces the correct answer.
That script may look intimidating at first, due to its size, but if you look more closely, you'll see that it's mostly comments.
It starts by using the stat function to obtain the file's data. The value it returns for the index of 9 is the time of the file's last modification, represented by a large integer number that represents the seconds that elapsed to that time from an ancient reference point.
The rest of the script is devoted to keeping constant track of the most recent modification time seen thus far, along with its associated filename, and then printing the "winning" name after all input has been processed (in the END block). The logic goes like this: If the current file's $mtime value is larger than the largest one seen thus far (stored in $newest), the current filename replaces the earlier one as our latest idea of the one most recently modified.
That's all it takes to write a Perl script that avoids the predilection of the xargs-based solution for identifying the wrong file as most recently modified, when many must be examined.
Next, we'll discuss another limitation of xargs, and how Perl can once again be of assistance. It involves wrangling pathnames that contain whitespace characters, which has historically been a vexing problem for Unix system administrators.
[previous] [next]
URL: