The Anatomy of an RSS Feed | WebReference

The Anatomy of an RSS Feed

By Kris Hadlock


RSS has become the standard data format for communicating news, updates or any other type of information that a company or individual wants to syndicate to a large audience. The name is an acronym that stands for Really Simple Syndication, which is an XML format that consists of designated elements that are consistent for all RSS feeds and conform to the XML 1.0 specification. These elements need to stay consistent to allow for a standardized data format that RSS aggregators can then consume. In this article we'll take a look at the elements in this structure.

An RSS feed always starts with an <rss> element, which contains an attribute called version, which specifies the version of the RSS feed. Here we focus on the RSS 2.0 format because it's the most commonly used today.

The child of the <rss> element is the <channel>. This element is the containing element for the important data or content within the feed.

In order to describe an RSS feed there are some tags that can be added to the beginning of a feed. The required <channel> elements are <title>, <link&gt; and <description>. Optional channel elements are <language>, <copyright>, <managingEditor>, <webmaster>, <pubDate>, <lastBuildDate>, <category>, <generator>, <docs>, <cloud>, <ttl>, <image>, <rating>, <textInput>, <skipHours> and <skipDays>.

  • language – The language of the content in the channel.
  • copyright – The copyright notice for the content of the channel.
  • managingEditor – An e-mail address for the editorial content producer.
  • webMaster – An e-mail address for the webmaster.
  • pubDate – A date that represents the publication date for the content in the channel.
  • lastBuildDate – The last date and time that the content was changed.
  • category – Allows for the ability to add one or multiple categories that a channel belongs to.
  • generator – The program that created the channel.
  • docs – URL for the documentation for the format of the RSS feed.
  • cloud – Provides a process to register with a "cloud" that will be used to notify about updates.
  • ttl – Stands for time to live, which tells the length of time the channel can be cached.
  • image – Specifies an image file to be displayed in the channel.
  • rating – PICS rating for the channel.
  • textInput – A text input field that can be displayed with the channel.
  • skipHours – Tells aggregators to skip for specified hours.
  • skipDays – Tells aggregators to skip for specified days.

RSS feeds are grouped into items, for example an item group could be considered news stories from a news Web site, blog posts from a weblog and so on. The following feed consists of an item from a weblog, which consists of a post. Typically an RSS feed for a weblog has multiple items that represent all of the posts to the blog. Following is an example of the RSS feed data that can be found in a blog.

This feed structure is standard for a weblog, as it consists of the most commonly used elements. By taking a look at this structure you can see that it's abstract, which is why it can contain any form of information. Even though the structure is abstract you can tell by looking at the feed elements that the data is straightforward. Each item element can contain the following sub-elements in order to describe the item in detail:

  • guid – The guid is an element that contains a string that uniquely identifies the item.
  • pubDate – The pubDate is the date that the item was published.
  • title – The title is the title that is specified for the item; in this case it's the title of the weblog post.
  • description – Contains the main data for the item, this element is used for the body of the weblog post in this case.
  • link – Contains a full URL to the individual page in which the specific item exists in detail.
  • author – Represents the author of the content that is presented within this item group.
  • category – Allows the item to be included into one ore more category.
  • comments - URL of page that contains comments related to the item.
  • enclosure – Can be used to describe a media object if one is attached to the item.
  • source – The RSS channel that the item came from.

Conclusion

RSS is a format that's become the standard for syndicating information as data. This allows developers to rely on the structure of the files to create programs that can read or parse the data into a readable format.

About the Author

Kris Hadlock is the co-founder of 33Inc alongside Robert Hoekman. He is the author of Ajax for Web Application Developers and has been a feature writer for numerous Web sites and design magazines.

Original: September 14, 2006