Using RSS News Feeds | 5
Using RSS News Feeds
The XML::RSS Module
Now that you've had a change to glance at two RSS examples, it's time to introduct the XML::RSS module. XML::RSS is a subclass of XML::Parser, a Perl module maintained by Clark Cooper that utilizes James Clark's Expat C library. XML::RSS was developed to simplify the task of manipulating and parsing RSS files. A deep understanding of XML is not a prerequisite for using XML::RSS since the XML details are hidden inside the class interface.
While XML::RSS is capable of creating RSS files, we will be
focusing on parsing existing RSS files in this column. You can read
more about the capabilities of XML::Parser in the module's
documentation or by typing:perldoc XML::RSS
The Code
Well, let's look at the code shall we? Lines 16-17 load the XML::RSS and LWP::Simple modules. We've already talked about XML::RSS in brief, but what does LWP::Simple do? Good question! The answer is simple (puns intended). It's a procedural interface for interacting with a Web server. It's also the little cousin of LWP::UserAgent, a fuller object oriented interface. We'll be using one of the library's subroutines later in the code to fetch an RSS file from the Web.
In lines 20-21 we initialize two variables that we're going to use later.
Line 25 starts the main
code body. The first thing we do is verify that the user
typed exactly one command-line parameter. This parameter is then assigned
to the $arg
variable in
line 28.
Next we create a new instance of the XML::RSS class and assign the
reference to the $rss
variable on
line 31.
Now we must determine whether the command-line parameter the user
entered is an HTTP URL or a file on the local file system
(lines 34-46). On
line 34, we us a
regular expression to look for the characters http:
.
If the command-line argument starts with these characters, we can safely
assume that the user intends to retrieve an RSS file from a Web server.
On line 35 we pass the
argument to the get()
function, which is a part of
LWP::Simple, and assign the results to the $content
variable. On line 36 we call
die()
if $content
is empty. If this happens,
it means there was an error retrieving the RSS file. If the RSS file
was downloaded successfully, $rss->parse($content)
is called
which parses the RSS file and stores the results in the object's internal
structure (line 38).
If the command-line argument does not contain the http:
characters, we assume the argument is a file instead of a URL on
lines 41-46. The
first thing we do is assign the value of $arg
to the $file
variable and test for the existence of
the file (lines 42-43).
Then we call $rss->parsefile($file)
(line 45), which parses
the RSS file and stores the results in the object's internal structure.
The parsefile()
method parses a file, whereas the
parse()
method parses the string that's passed to it.
Lastly, we call the print_html
subroutine on
line 49, which converts
the RSS object in nicely formatted HTML.
print_html
As you examine this subroutine, you will begin to understand
the internal structure of the XML::RSS object. The critical portion
of the subroutine is contained on
lines 76-79. In this
foreach
loop, we iterate over each of the RSS items.
Next, let's take a look at rss2html.pl in action.
Produced by Jonathan
Eisenzopf and
Created: September 1, 1999
Revised: Septemver 1, 1999
URL: https://www.webreference.com/perl/tutorial/8/