Roadmap96: MAP16 - FTP File Compression | WebReference

Roadmap96: MAP16 - FTP File Compression

Roadmap96

RM 01 02 03 04 05 06 07 08 09 10
11 12 13 14 15 16 17 -B 18 19 20
21 22 23 24 25 XT XT 26 XT 27   

MAP16: FTP FILE COMPRESSION

"Travel is glamorous only in retrospect."
-- Paul Theroux, quoted in The Observer

Own a personal computer for any amount of time and you will quickly realize that the amount of storage space on your computer is limited. One way to deal with this problem is to use a compression software program that "squishes" your unused computer programs, thus freeing up a little more of your disk space for other programs.

It turns out that storage space problems are not limited solely to personal computers. As the number of files that are available through FTP increases daily, FTP sites are actively looking for ways to squeeze more files into a limited amount of space. The FTP sites usually accomplish this by using file compression.

The good news is that a compressed file takes up a lot less space on the FTP site's computer. The bad news is that a compressed file is absolutely useless until you uncompress (or "decompress") it.

Wait ... it gets worse. Before you can uncompress a file, you have to know what compression method was used to compress the file in the first place. Unfortunately, there is no one standard FTP file compression method; there are HUNDREDS of different file compression methods in use today. :(

If you have to know what compression method was used before you can uncompress a file, how are you ever going to figure out which method was used? Well, it is actually pretty easy:

  1. Most FTP directories have a READ.ME file (or README, README.TXT, README-uploads, etc.) that shows an index of all the files that are in that directory. Some really nice FTP sites have expanded READ.ME files that mention what compression method was used and where you can get a free copy of the software needed to uncompress the files.
  2. Look at the files' extensions. By looking at the extensions and comparing them to the chart below, you will be able to determine what compression method was used and what particular software is needed to uncompress the file.

Fortunately, most uncompression software is either public domain (meaning that it is completely free) or shareware (meaning that you can get a copy of it for free, but the author expects you to send him or her some money for the program if you decide to keep it and use it). Best of all, most uncompression software is available through FTP! :)

The list below shows some of the most popular extensions that you are bound to encounter during your visits to FTP sites around the world. It also shows transfer modes needed to retrieve files with these extensions, what uncompress software package you need to uncompress the files after you retrieve them, and it even gives some additional comments about each of the extensions.

Paraphrasing something I said in MAP01, I want you to be aware that the one compression method that is not listed below is going to be the one compression package that you ADORE. Please do not take this personally. There are literally HUNDREDS of compression methods in use today and there is no way that I can list all of them.

SUGGESTION: Save the following list, and use it as a reference tool when you encounter an extension that you have never seen before. Also, please notice that the following list talks about "archie". Archie is an FTP search tool that we will discuss tomorrow.

The following list was adapted, with permission, from "The EFF's Guide to the Internet."

FILE           TRANSFER UNCOMPRESS
EXTENSION      MODE     PACKAGE      ADDITIONAL COMMENTS
------------   ------   ----------   -------------------
 
.txt or .TXT   ASCII                 By itself, this means the file is
                                     a document rather than a program,
                                     and does not need to be 
                                     uncompressed.
 
.ps or .PS     ASCII                 A PostScript document (in Adobe's
                                     page description language).  You 
                                     can print this file on any 
                                     PostScript-capable printer or use 
                                     a previewer, like the GNU 
                                     project's GhostScript.
 
.doc or .DOC   ASCII                 Another common extension for text
                                     documents.  Be careful, though: 
                                     .doc and .DOC extensions are also 
                                     sometimes used for Microsoft Word 
                                     documents (which are Binary 
                                     files).  The duck theory will 
                                     help you determine the
                                     difference.  No decompression is 
                                     needed, unless it is followed by:
 
.Z             Binary   uncompress   This indicates a Unix compression
                                     method.  After you download the 
                                     file, you can uncompress it by 
                                     typing
 
                                          uncompress filename.Z
 
                                     and pressing ENTER on your host 
                                     system's command line.  "u16.zip" 
                                     is an MS-DOS program that will
                                     let you download .Z files and 
                                     uncompress them on your own 
                                     computer.  The Macintosh 
                                     equivalent program is called 
                                     MacCompress (use "Archie" to find 
                                     these).
 
.zip or .ZIP   Binary   PKZip or     This indicates the file has been
                        Zip/Unzip    compressed with a common MS-DOS
                                     compression program, known as 
                                     PKZIP (use "Archie" to find 
                                     PKZIP204G.EXE or later).  Many 
                                     Unix systems will let you un-ZIP 
                                     a file with a program called 
                                     "unzip".
 
.gz            Binary   gunzip       A Unix version of ZIP.  To 
                                     uncompress, type
 
                                          gunzip filename.gz
 
                                     on your host system's command
                                     line.
 
.zoo or .ZOO   Binary   zoo          A Unix and MS-DOS compression
                                     format.  Use a program called
                                     "zoo" to uncompress.
 
.shar or .Shar Binary   unshar       Another Unix format.  Use 
                                     "unshar" to uncompress.
 
.tar           Binary   tar          Another Unix format, often used
                                     to compress several related files
                                     into one large file.  All Unix
                                     systems will have a program 
                                     called "tar" for "un-tarring" 
                                     such files.  Often, a "tarred" 
                                     file will also be compressed with 
                                     the "gz" (.tar.gz or .tgz) 
                                     method, so you first have to use 
                                     "uncompress" and then "tar".
 
.sit or .Sit   Binary   StuffIt      A Macintosh format that requires
                                     the StuffIt program.
 
.sea or .SEA   Binary   none         A Macintosh format that is a 
                                     self-extracting archive.  No 
                                     decompression program is needed.
 
.bin or .BIN   Binary   MacBinary+   A Macintosh format that requires 
                                     MacBinary+ to uncompress.
 
.ARC           Binary   ARC or       Another MS-DOS format, which
                        ARCE         requires the use of the ARC
                                     or ARCE programs.
 
.LHZ           Binary   LHARC        Another MS-DOS format; requires
                                     the use of LHARC.

There are a few last words of caution from our friends at the EFF:

Check the size of a file before you get it. The Net moves data at phenomenal rates of speed. But that 500,000-byte file that gets transferred to your host system in a few seconds could take more than an hour or two to download to your computer if you're using a 2400-baud modem. Your host system may also have limits on the amount of bytes you can store on-line at any one time.

Also, although it is really extremely unlikely you will ever get a file infected with a virus, if you plan to do much downloading over the Net, you'd be wise to invest in a good anti-viral program, just in case. (1)

Also, if you are a PC user, you'll want to avoid downloading files with extensions like ".sit" or ".hqx". Those are Macintosh files that probably will not run on your PC.

 

FTPMAIL AND BINARY FILES

In MAP15, I showed you that it is possible to get FTP files using e-mail by sending an e-mail letter to an FTPmail server with the following commands in the body of your e-mail letter

          reply <your Internet address>
          connect <FTP site address>
          <transfer mode>
          chdir <directory>
          get <filename>
          quit

Before I introduce you to the new stuff, there are a couple of things that I want to review with you.

First, the

reply <your Internet address>

command tells the FTPmail address where you want the file sent. If you use the example that I gave you yesterday

          reply [email protected]
          connect rs.internic.net
          ascii
          chdir /internic/faq
          get roadmap.faq
          quit

without changing the "reply" address, FTPmail is going to send the file to *ME*, not to you. Please remember to change the "reply" line to include *YOUR* Internet e-mail address.

Also, I did not mention this yesterday but FTPmail limits you to only one CHDIR command per letter. Finally, yesterday I asked you to contact your local Internet Service Provider to see if they place any size limits on file transfers. If they do, there is an additional command that you need to add to your list of commands

chunksize <size>

This command will break the files into chunks that your system can handle. If your system has a 50,000-character limit on messages from the Internet, your chunksize command should be

chunksize 49000

(you want to make sure that you set your chunksize smaller than what your system's limits are). This command will break your file into 49,000-character chunks and will then send the chunks to you.

You already know how to retrieve ASCII files using FTPmail. Today, I am going to show you how to retrieve Binary files using FTPmail.

Binary file transfers using FTPmail are not difficult ... they just require a few additional steps. Because all e-mail has to be in ASCII form, FTPmail has to encode your Binary file into ASCII before it can e-mail the file to you. Once you get the file, you can then decode the file back into Binary.

Fortunately, there are two ways that FTPmail can encode Binary files into ASCII. The first way it can do this is through something called "uuencode." As long as you have a "uudecode" program -- and "uudecode" programs are all over the place (chances are your site has "uudecode" stored on its system) -- the whole process is simple. The second encoding type that you can use is called "btoa" (Binary to ASCII). Your local Internet Service Provider will be able to tell you a little more about "btoa".

So, to get ASCII files using FTPmail, you would use the following commands in the body of your letter to the FTPmail address:

          reply <your Internet address>
          connect <FTP site address>
          ascii
          chdir <directory>
          chunksize <size>
          get <filename>
          quit

and to get Binary files using FTPmail, you would use the following commands in the body of your letter to the FTPmail address:

          reply <your Internet address>
          connect <FTP site address>
          <uuencode or btoa>
          chdir <directory>
          binary
          chunksize <size>
          get <filename>
          quit

 

HOMEWORK

Take a break. You have earned it. :)

SOURCES

(1) "The EFF's Guide to the Internet." Reprinted by permission.


Start Lesson Seventeen | Go to the Roadmap96 Syllabus | Go to the Roadmap96 Homepage


Originally written by Patrick Douglas Crispen