RSS and Atom in Action: Newsfeed Formats | Page 4
[previous]
RSS and Atom in Action: Newsfeed Formats
4.3 The simple fork: RSS 2.0
Let's return to the RSS history lesson. Not everybody was happy about the new RSS 1.0 format, especially Dave Winer, who had argued against RDF and lobbied to keep RSS as simple as possible. Winer rejected RSS 1.0 and released a new version of RSS, a minor revision of RSS 0.91, which he called Really Simple Syndication (RSS) 0.92. Thus, RSS was forked. The RDF advocates urged users to go with the RDF-based RSS 1.0 specification, and Winer urged users to stick with simple, safe, and compatible RSS 0.92.
Winer continued to develop the simple fork of RSS. He published new specifications for RSS 0.93 and RSS 0.94. With each release, he tweaked the format and added more metadata. In RSS 0.93, he added new subelements to the <item>
element: <pubdate>
and <expirationdate>
. In RSS 0.94, he dropped <expirationdate>
from the specification. Eventually, Winer published what he called the final version of RSS, dubbed RSS 2.0.
4.3.1 The elements of RSS 2.0
The RSS 2.0 specification provides a detailed description of each element allowed in an RSS 2.0 newsfeed. You can find the specification here: https://blogs.law.harvard.edu/tech/rss. Figure 4.3 summarizes the XML elements that make up RSS 2.0, using the same notation as our previous figures, with one twist. Elements shown in gray were added subsequent to RSS 0.91.
Some of the new elements added since RSS 0.91 deserve explanation:
|
4.3.2 Enclosures and podcasting
The <enclosure>
element was added to RSS 0.92 in 2002 and it remains in RSS 2.0, but it was not widely used until 2004, when the podcasting craze began. Podcasting is the practice of distributing audio files via RSS. Specialized podcast client software looks for enclosures, downloads each enclosed file, and copies it to your Apple iPod. The word podcasting is something of a misnomer because any sort of file can be distributed as an <enclosure>
, not just audio files destined for an iPod or other digital audio player.
For more information about podcasting, see chapter 18, which presents a podcast server, and chapter 19, where we build a download podcast client you can use to automate the download of RSS enclosures.
4.3.3 Extending RSS 2.0
By the time RSS 2.0 was released in August 2002, everybody recognized the value of RSS 1.0-style extension modules. Winer decided to allow the same type of extensions to RSS. He did so by adding this sentence to the RSS 2.0 specification: "An RSS feed may contain elements not described on this page, only if those elements are defined in a namespace."Funky RSS
At the same time that Winer added extensions to RSS, he also he made all of the subelements of<item>
optional. You must specify either a title or a description, but nothing else is required. Because of this, some users started to substitute elements from other XML specifications, such as Dublin Core, for the optional standard elements. For example, they started using the Dublin Core <dc:date>
instead of the native RSS <pubdate>
. And some started to use the Content Module <content:encoded>
element to include item content instead of using the native RSS <description>
element. Winer discourages th is practice because it makes parsing RSS more complex. He calls newsfeeds that employ it funky, but such newsfeeds are perfectly valid according to the RSS 2.0 specification. Unfortunately, funky RSS is a fact of life, and if you are writing an RSS parser, you'll have to take it into account. We'll show you how to do this in chapter 5.
4.4 The nine incompatible versions of RSS
After learning the history of the RSS specifications, the fork, and the funkiness, you may not be too surprised to learn that parsing RSS is tricky. The RSS specifications on both sides of the fork are informal and simpleÂperhaps too simple. Simplicity and informality can be virtues, but for specifications, they cause problems. No version of RSS has gone through a rigorous standardization process, and it shows. An influential blogger named Mark Pilgrim has been following the development of RSS closely, and he has made some important contributions. Working with Sam Ruby, another influential blogger, Pilgrim developed a newsfeed validation service at https://www.feedvalidator.org/ that handles all of the commonly used RSS and Atom newsfeed formats. He also wrote one of the best newsfeed parsers available, the Universal Feed Parser, which we'll cover in chapter 5. Pilgrim pointed out that there were nine incompatible versions of RSS. Table 4.1 summarizes these incompatible versions and the author, date, and status of each.
For more information on each of these versions of RSS, see the specifications found on the Web at the following addresses:
|
From a developer's perspective, the RSS situation looks like a nightmare, but it's really not that bad. The good news is that if you stick to the basic elements — <item>
, <title>
, <description>
, <pubdate>
, and <link>
— or you use a good parsing library, you'll be able to parse RSS with relative ease. We'll show you how to do it in the next chapter. The even better news is that help is on the way, and its name is Atom.
This excerpt is taken from Chapter 4 of RSS and Atom in Action, written by Dave Johnson, and published by Manning Publications Co., Copyright © 2006 Manning Publications Co. All rights reserved.
[previous]
URL: