What is HTTP Delta Encoding? | 3
|
5. Delta encoding and rsync compared
In order to create a delta-encoded representation of an instance, the server needs both that current instance and some previous instance (hopefully, one already cached at the client) in order to compute the difference. This means that a server that supports delta encoding needs to store and manage a set of older instances. Another approach, called rsync [13], avoids the need for the server to have both instances available.
In rsync, the client starts by segmenting its cache entry into a set of fixed-sized blocks, and then computes a special kind of checksum for each such block. It then sends the list of checksums to the server in its request. The server can then search the current instance for blocks that have matching checksums; it then sends an encoding of the current instance that does not include any block already held by the client. The special checksum allows the server to match blocks that appear at arbitrary offsets in the new instance, so the rsync encoding can be very compact if the change is a small insertion.
However, rsync requires sending an entire block if the value of just one byte changes. One can reduce the impact by using smaller block sizes, but then this requires the client to send a longer list of block checksums. Therefore, while rsync can be easier to implement than delta encoding, it might not be as efficient for transferring small differences.
6. Delta encoding algorithms and formats
The key to success with delta encoding is to generate a very compact representation of the difference between two similar files. A naive approach would be to use a simple text-based format such as the output of the UNIX "diff -e" command, but this doesn't work for non-text inputs, and works relatively poorly for text files. Compressing the output can help a little, but because "diff" outputs the entire line that has changed, it cannot produce compact results for single-byte changes.
Several formats have been developed specifically for representing deltas, for any content type. The best known algorithm (in terms of output compactness, not necessarily coding or decoding speed) is vcdiff [9, 10].
Delta encoding will be most useful when it is widely supported by Web servers, proxies, and clients. This depends on the adoption of a standard for the extension of HTTP to support delta encoding. Over the past several years, a group of people have been developing such a standard. While this is not an official IETF working group, we have been following the IETF standards process and hope to eventually create a formal IETF standard.
The current specification for basic delta encoding [11] has been submitted as a "Proposed Standard," but it has not yet (as of this writing) been accepted. This would only be the first of several steps on the IETF standardization process [2], and the design could change again.
There is also a document describing the vcdiff encoding format [10], but it, too, is not yet finished.
We have some hope that the framework that we have developed for delta encoding in HTTP can be further extended, both to support delta clustering (see section 1) and perhaps to support rsync (section 5).
Next Page: An example HTTP exchange
Comments are
welcome
Written by
Jeffrey C. Mogul and
Revised: November 14, 2000
URL: https://webreference.com/internet/software/servers/http/deltaencoding/intro/3.html