Minimizing 404 - Page Not Found - Errors

Original by:
Updated by:

What Is a "404" Error?

When a Web site visitor requests a nonexistent URL from a Web server, the server sends the visitor an error page. This event is recorded as a 404 Not Found error in the Web server log. Encountering an error page is a frustrating experience for a Web site visitor, and studies have indicated this is a leading reason why people leave Web sites.

There are several possible causes for a 404 Not Found error:

Incorrect or outdated link on one or more of your pages
Incorrect or outdated link to your site from another site
Search engine index contains an outdated page
Outdated user bookmark
Visitor made error when manually entering a URL

Identify and Fix Incorrect and Outdated Links on Your Pages

In addition to reviewing 404 error reports for your site, you can identify broken links using a link checker or validator. Many tools are available for checking Web links. Products range from commercial software to freeware. Some products can be installed on your own computer system so that you can check links before you put a page on the Web. Others operate as online services and can only check links on pages that are accessible from the Internet.

Free online link checking services include:

W3C Link Checker check links and anchors in Web pages or full Web sites.
LinkScan/QuickCheck checks one page at a time interactively.
Dr. Watson checks one page at a time interactively.

Formats vary, but in most cases you will receive a list of links for each page that was checked. The report will show which links produced an error. Some reports only list the bad links; some include additional information about errors they find.

Link checkers may not validate links within scripts or non-HTTP links such as FTP or mailto links. Non-HTTP links do not generate 404 Not Found errors, but are mentioned here because the browser error messages and mail nondelivery messages these types of links generate when they do not work are just as frustrating to site visitors. Because broken non-HTTP links are more difficult to identify, their maintenance requires special attention in site management planning.

Identify Other Web Pages with Old or Broken Links to Your Site

If your server log contains 404 Not Found errors that don't seem to be generated from your own Web pages, they may be the result of links on other Web sites. If you have access to referrer page information in your Web server logs or reports, you can use this information to identify sites with links which have generated 404 Not Found errors on your site.

When you identify a Web page with a bad link to your site, either from referrer page information or a search, visit the page and look for the link to your site. Check to see whether it needs to be updated or corrected. If so, look for a contact to whom you can provide the correct link information.

Help the Search Engines

Sometimes you may need to publish Web pages that are expected to have a very short life. For these ephemeral pages, it may be desirable to avoid search engine indexing altogether. Meta robot tags are HTML tags which can be included in a Web page header to instruct search engine robots not to index a Web page by using the noindex directive. This tag can additionally ask search engines not to follow any links from the page by including a nofollow directive as well. Here is an example of a header:

Practice Good Web Site Ecology

The obvious way to prevent your URLs from becoming outdated within your own Web site, in links from other Web sites, and in your visitors' bookmarks, is to never change them. Unfortunately, this is more easily said than done.

Even if your site is not a business site, register a domain name for it. If you create your site using an ISP's domain name, and later wish to change ISPs, it may be impossible to direct visitors from your old site location to your new one.

Careful planning of your information space can help reduce the number of URL changes you need to make. Consider the life expectency of your information in your planning. When information becomes out of date, will you replace it with new information at the same URL? Will you keep it as archival information? Will you replace it with a summary of the old information and a link to newer information? Think of ways to reduce, reuse, and recycle to create URLs that will live forever even if some of the information they represent changes.

When planning ahead doesn't work, redirects can be a useful technique to gently guide your visitors to the information they want in its new location. Some browsers will even update their bookmark database to use the new URL in the future if the user had bookmarked the old URL.

There are two types of redirects, client side and server side.

Client side redirects provide a simple way to transport a visitor to a different page. This method requires replacing each page which has been moved or deleted with its own redirect page. Redirect pages include meta refresh tags in the header section of the document. Because some search engines penalize sites which use refresh tags, it's a good idea to use them together with meta noindex tags.

The example below shows a header that would redirect users to https://www.mysite.com/otherdirectory.otherpage.html:

Client side redirects are processed by the user's browser. The "15" in the meta refresh tag in the example instructs the browser to wait 15 seconds before fetching the new page. It is possible to set this value to 0, but doing so makes it difficult for visitors to return to previously visited pages using their back buttons. For this reason, and because client side redirects are not supported by some older browsers, the body of your redirect page should explain that the requested page has been superceded or moved and provide a link to the new page (the same one used in the refresh tag), including its URL. Redirect pages represent your site just as much as your content pages do. They should be friendly and helpful, and they should conform with the rest of your site design.

Server side redirects instruct your Web server to give visitors a different page when they request a non-existent URL. They are usually implemented at the directory level rather than on a page by page basis as client side redirects are. Server side redirects are processed by the Web server, not visitors' browsers. They can be implemented in different ways on different servers. For example, they may require placing information in the configuration file, or you may need place the redirect commands in an .htaccess file. When possible, redirect users to the information they were seeking in the original directory rather than making them look for it from your home page or via a search.

Make Your URLs Error-Resistant

The best URLs are short and simple. When this is not possible, you can still reduce the chances of typos and other URL problems by avoiding upper-case letters and special characters in your URLs.

Web servers treat the URLs "www.mysite.com/myfile.html" and "www.mysite.com/MyFile.html" and "www.mysite.com/MYFILE.HTML" as different documents. Using all lower-case characters for directory and file names reduces capitalization errors when people type URLs by hand. Similarly, URLs which contain underscores can be problematic because underscores can look like spaces when viewed online as links.

Other charcters should be avoided in file and directory names because they may be interpreted in a special way by the server or the browser and produce different results in a URL than you intended. These include colons (:), forward slashes (/), tildes (~), percent signs (%), at symbols (@), question marks (?), plus signs (+), equal signs (=), ampersands (&), carets (^), curly braces ({ }), square brackets ([ ]) and commas (,).

Give Your Visitors What They Came For

There are a number of techniques you can use to reduce 404 not found errors and minimize the frustration that can lose visitors. Some may be more helpful for your site than others. By using these techniques when you organize, create, and maintain your Web pages you can provide a better experience for the users of your site.

About the Author

Marsha Glassner spent about five years as a webmaster at a federal agency in San Francisco. She has also done "tons" of user training and support which has had a significant effect on her Web philosophy.

[next]