RSS and Atom in Action: Newsfeed Formats | Page 4 | WebReference

RSS and Atom in Action: Newsfeed Formats | Page 4


[previous]

RSS and Atom in Action: Newsfeed Formats

4.3 The simple fork: RSS 2.0

Let's return to the RSS history lesson. Not everybody was happy about the new RSS 1.0 format, especially Dave Winer, who had argued against RDF and lobbied to keep RSS as simple as possible. Winer rejected RSS 1.0 and released a new version of RSS, a minor revision of RSS 0.91, which he called Really Simple Syndication (RSS) 0.92. Thus, RSS was forked. The RDF advocates urged users to go with the RDF-based RSS 1.0 specification, and Winer urged users to stick with simple, safe, and compatible RSS 0.92.

Winer continued to develop the simple fork of RSS. He published new specifications for RSS 0.93 and RSS 0.94. With each release, he tweaked the format and added more metadata. In RSS 0.93, he added new subelements to the <item> element: <pubdate> and <expirationdate>. In RSS 0.94, he dropped <expirationdate> from the specification. Eventually, Winer published what he called the final version of RSS, dubbed RSS 2.0.

4.3.1 The elements of RSS 2.0

The RSS 2.0 specification provides a detailed description of each element allowed in an RSS 2.0 newsfeed. You can find the specification here: https://blogs.law.harvard.edu/tech/rss. Figure 4.3 summarizes the XML elements that make up RSS 2.0, using the same notation as our previous figures, with one twist. Elements shown in gray were added subsequent to RSS 0.91.

The XML elements that make up RSS 2.0

Some of the new elements added since RSS 0.91 deserve explanation:

  • You can now specify categories at the channel level or at the item level by using the <category> element. Multiple categories are allowed. If you are using a well-known taxonomy or categorization system, you can note that by specifying the URI of the taxonomy in the optional domain attribute.
  • The item-level <comments> element may be used to specify the URL of the comments page for a specific weblog entry.
  • The item-level <guid> element can be used to specify a globally unique ID (GUID) for each item. Unless you specify the attribute ispermalink= "false", the GUID will be considered the permanent link to the web representation of the newsfeed item. Unfortunately, this introduces the opportunity for confusion because the <link> element is sometimes used as the permanent link to the item.
  • The item-level <author> element lets you specify an author's email address. If you want to specify the author's name, you can use the Dublin Core module's <dc:creator> element.
  • The item-level <enclosure> element can be used to attach a file to an item. To include a file, you must specify the file's URL, content-type, and length.

4.3.2 Enclosures and podcasting

The <enclosure> element was added to RSS 0.92 in 2002 and it remains in RSS 2.0, but it was not widely used until 2004, when the podcasting craze began. Podcasting is the practice of distributing audio files via RSS. Specialized podcast client software looks for enclosures, downloads each enclosed file, and copies it to your Apple iPod. The word podcasting is something of a misnomer because any sort of file can be distributed as an <enclosure>, not just audio files destined for an iPod or other digital audio player.

For more information about podcasting, see chapter 18, which presents a podcast server, and chapter 19, where we build a download podcast client you can use to automate the download of RSS enclosures.

4.3.3 Extending RSS 2.0

By the time RSS 2.0 was released in August 2002, everybody recognized the value of RSS 1.0-style extension modules. Winer decided to allow the same type of extensions to RSS. He did so by adding this sentence to the RSS 2.0 specification: "An RSS feed may contain elements not described on this page, only if those elements are defined in a namespace."

Funky RSS

At the same time that Winer added extensions to RSS, he also he made all of the subelements of <item> optional. You must specify either a title or a description, but nothing else is required. Because of this, some users started to substitute elements from other XML specifications, such as Dublin Core, for the optional standard elements. For example, they started using the Dublin Core <dc:date> instead of the native RSS <pubdate>. And some started to use the Content Module <content:encoded> element to include item content instead of using the native RSS <description> element. Winer discourages th is practice because it makes parsing RSS more complex. He calls newsfeeds that employ it funky, but such newsfeeds are perfectly valid according to the RSS 2.0 specification. Unfortunately, funky RSS is a fact of life, and if you are writing an RSS parser, you'll have to take it into account. We'll show you how to do this in chapter 5.

4.4 The nine incompatible versions of RSS

After learning the history of the RSS specifications, the fork, and the funkiness, you may not be too surprised to learn that parsing RSS is tricky. The RSS specifications on both sides of the fork are informal and simple—perhaps too simple. Simplicity and informality can be virtues, but for specifications, they cause problems. No version of RSS has gone through a rigorous standardization process, and it shows. An influential blogger named Mark Pilgrim has been following the development of RSS closely, and he has made some important contributions. Working with Sam Ruby, another influential blogger, Pilgrim developed a newsfeed validation service at https://www.feedvalidator.org/ that handles all of the commonly used RSS and Atom newsfeed formats. He also wrote one of the best newsfeed parsers available, the Universal Feed Parser, which we'll cover in chapter 5. Pilgrim pointed out that there were nine incompatible versions of RSS. Table 4.1 summarizes these incompatible versions and the author, date, and status of each.

The nine incompatible versions of RSS-Part 1
The nine incompatible versions of RSS-Part 2

For more information on each of these versions of RSS, see the specifications found on the Web at the following addresses:

From a developer's perspective, the RSS situation looks like a nightmare, but it's really not that bad. The good news is that if you stick to the basic elements — <item>, <title>, <description>, <pubdate>, and <link> — or you use a good parsing library, you'll be able to parse RSS with relative ease. We'll show you how to do it in the next chapter. The even better news is that help is on the way, and its name is Atom.


RSS and Atom in Action: Newsfeed Formats

This excerpt is taken from Chapter 4 of RSS and Atom in Action, written by Dave Johnson, and published by Manning Publications Co., Copyright © 2006 Manning Publications Co. All rights reserved.

[previous]

URL: