Advanced Web Performance Optimization [con't]
A specific caching example
Let's look at a specific example as we build up the caching efficiency for WebSiteOptimization.com's logo, l.gif
. First we request the image from Internet Explorer:
To demonstrate the default Apache configuration, we eliminated the cache control directives from our httpd.conf
file, and the response was as follows:
This image was last modified June 19, 2004 and will not be changed for some time. It is clear from these response headers that this object does not change frequently and can be safely cached for at least a year into the future. Note the lack of Expires
or Cache-Control
headers, and the inclusion of an ETag header for the image. Next we'll show how to add cache control headers.
mod_expires
and mod_headers
. For Apache, mod_expires
and mod_headers
handle cache control through HTTP headers sent from the server. Because they are installed by default, you only need to configure them. Before adding the following lines, first check that they are not enabled. On many operating systems, they are enabled by default. For Apache 1.3x, enable the expires
and headers
modules by adding the following lines to your httpd.conf
configuration file:
For Apache 2.0, enable the modules in your httpd.conf
file like so:
Target files by extension for caching
One quick way to enable cache control headers for existing sites is to target files by extension. Although this method has some disadvantages (notably the requirement of file extensions), it has the virtue of simplicity. To turn on mod_expires
, set ExpiresActive
to on:
Next, target your website's root HTML directory to enable caching for your site in one fell swoop. Note that the default web root shown in the following code (/var/www/htdocs
) varies among operating systems.
ExpiresDefault A300
sets the default expiry time to 300 seconds after access (A) (using M300
would set the expiry time to 300 seconds after file modification). The FilesMatch
segment sets the cache control header for all .html
files to 86,400 seconds (one day). The second FilesMatch
section sets the cache control header for all images, external JavaScript, and Cascading Style Sheet (CSS) files to 31,536,000 seconds (one year).
Note that you can target your files with a more granular approach using multiple directory sections, like this:
For truly dynamic content you can force resources to not be cached by setting an age of zero seconds, which will not store the resource anywhere (or you can set Expires
to A0
or M0
):
Target files by MIME type. The disadvantage of the preceding method is its reliance on the existence of file extensions. In some cases, webmasters elect to use URIs without extensions for portability. A better method is to use the ExpiresByType
command of the mod_expires
module. As the name implies, ExpiresByType
targets resources for caching by MIME type, like this:
These httpd.conf
directives set the same parameters, only in a more flexible and readable way. For expiry commands you can use access
or modified
, depending on whether you want to start counting from the last time the file was accessed or from the last time the file was modified. In the case of WebSiteOptimization.com
, we chose to use short access offsets for text files likely to change, and longer access offsets for infrequently changing images.
Note the AllowOverride All
command. This allows webmasters to override these settings with .htaccess
files for directory-based authentication and redirection. However, overriding the httpd.conf
file causes a performance hit because Apache must traverse the directory tree looking for .htaccess
files.
After updating the httpd.conf
file with the preceding MIME-based code, we restart the HTTP daemon in Apache for Linux using this command from the shell prompt:
Red Hat Enterprise, Fedora, and CentOS all make use of the service
command. Note that the commands to restart the HTTP daemon vary among operating systems. On most systems, you can use the apachectl
command or the /etc/init.d/apache2
init script to start, stop, or restart Apache. Some administrators choose to do Apache configuration and control entirely through a web interface such as Webmin, or through an OS-specific graphical utility.
HTTP header results. We updated the httpd.conf
configuration file with the MIME type code in the preceding section. Let's look at the how the headers change when we request the WebSiteOptimization.com logo (l.gif
):
The headers for our home page logo now look like this:
As a result, this resource has cache control headers. We left the ETag in as we use one server. Note also that the Server field is also stripped down, to save some header overhead. This is done with the ServerTokens
command:
This minimizes the response header from this:
to the minimal:
Our images are now cacheable for one year. We could eliminate other headers, such as Cache-Control
, ETags
, and Accept-Ranges
, but we don't gain as much by doing so.
Cache control with Microsoft IIS. You can do cache control in Internet Information Server (IIS) by accessing the IIS Manager and setting headers on files or folders. First, navigate with the IIS Manager to the file or directory that you want to target (see Figure 9.6, "Using IIS Manager to set caching policy").
Right-click Properties and choose the HTTP Headers tab. Check "Enable content expiration" and then set the appropriate time frame (see Figure 9.7, "Setting content expiration in IIS"). This will land you on the screen that includes the HTTP Headers tags and content cache options.
If your site is not organized in directories for cache control optimization, it can be quite cumbersome to set cache control policies for a large number of files. See https://www.port80software.com/support/articles/developforperformance2 for more details about IIS cache control. You can't set cache control headers by MIME type settings with this technique, so Port80 wrote CacheRight to deal with this issue. CacheRight is basically "mod_expires plus" for IIS.
Using mod_cache
With Apache version 2.2, mod_cache has become suitable for production use. mod_cache implements a content cache that you can use to cache local or proxied content.
This improves performance by temporarily storing resources in faster storage. It can use one of two provider modules for storage management:
mod_disk_cache
, which implements a disk-based storage manager.mod_mem_cache
, which implements a memory-based storage manager. You can configuremod_mem_cache
to operate in two modes: caching open file descriptors or caching objects in heap storage. You can usemod_mem_cache
to cache locally generated content or to cache backend server content formod_proxy
when configured usingProxyPass
(a.k.a. reverse proxy).
Content is stored in and retrieved from the cache using URI-based keys. Content with access protection is not cached. Example 9.1 shows a sample mod_cache
configuration file.
Example 9.1. Sample mod_cache
configuration file
CacheDirLevels
, set to 5
, is the number of directory levels below the cache root that will be included in the cache data. CacheDirLength
, set to 3
, sets the number of characters in proxy cache subdirectory names.
For more details, see the Apache documentation.
This chapter is an excerpt from the book, Website Optimization: Speed, Search Engine & Conversion Rates Secrets by Andrew B. King, published by O'Reilly Media, Inc., July 2008, ISBN 0596515081, Copyright 2008 O'Reilly Media, Inc.
Andrew B. King is the President of Website Optimization, LLC, a Web performance and search engine marketing firm based in Ann Arbor, Michigan. King is the author of two books on Web site optimization, Website Optimization: Speed, Search Engine & Conversion Rates Secrets, (O'Reilly, 2008), and Speed Up Your Site (New Riders, 2003). Before starting his own company, King worked for Jupitermedia as managing editor of WebReference.com and JavaScript.com.
Original: August 25, 2008