Drag and Drop CGI | 9
Chapter 12
A Perl-based Web Site Search Engine
Features of the ICE Script
- Returns links to pages matching search criteria
- Searches multiple directories
- Thesaurus feature for synonyms and abbreviations
- Allows ``and'' and ``or'' combinations of search words
- Very simple installation
- Runs on UNIX, Windows, and the Mac
- Very simple to generate search indexes
- Suitable for Web sites up of to a few thousand documents
What You Need to Use This Script
- Perl version 4 or 5
- Text editor
- FTP client
- Telnet client
Why You Would Want to Use This Script
We'll go on record as saying that any text-heavy site over ten full pages in length would benefit from a search engine. Before you run off and implement this script, though, take a moment to think about your objectives in building the site. Sites like the following probably do not require a search engine:
- A personal ``vanity'' page
- A graphics-heavy portfolio site for a sole proprietor, a small service provider, or a product provider
- A straightforward ``storefront'' site consisting mostly of graphics and some marketing copy (even if the site is 10 to 15 pages in length)
- The typical converted-brochure, four-page ``gotta-have-a-Web-site-now'' presence with which many small companies begin their Internet life
- A logical layout complemented by a thoughtfully designed index and/or visual sitemap will probably do the job for sites like these.
- A site that offers extensive product or technical support content
- A site featuring dynamic content (e.g., catalogues that have products and descriptions that change frequently)
- Any database-driven site, particularly those used in corporate intranets
- A Web site of any size that visitors will need to search for particular words or concepts
- To ensure good performance at runtime (regardless of Web site size), we selected a search engine that relies on a precompiled index. This has a couple of implications for creating and maintaining the site. First, graphics and ``artistic text'' (graphics that contain words) are not captured when indexing Web pages. And second, the index must be regenerated whenever new site content is added. In either case, the ease-of-use benefits to the site visitor outweigh the disadvantages.
"This code is free, but copyrighted. No liability whatsoever is accepted for any loss or damage of any kind resulting from any defect or inaccuracy in this information or code.
Feel free to modify the forms' front-end according to your needs, but please leave the pointer to the ICE homepage in there, so that people can always find an up to date version of the software.
If you really like ICE, or if you want to inspire new features or enhancements, you may feel free to send me a token of appreciation. This could be a such as a sample of your favorite beer, music, or literature - or even a postcard from your home town.
If you use ICE on a professional server, or install it as a commercial service for your customer, I'd appreciate a small shareware fee."
We decided to use ICE instead of writing our own search script for several reasons. First, it was freely distributable, and Herr Neuss gave us permission to include it in this book. Second, it is written in Perl, while most other search engines use C. Third, and most important, it already exists and it has been real-world tested.
Nerd Note
One of the cardinal rules of the Internet and programming:
Never code yourself something you can snarf off the Net for free.
We enthusiastically subscribe to this idea, as you can see from the various scripts included in this book. To loosely paraphrase a famous dead guy: "If I reach great heights, it is only because I stand on the shoulders of giants" (Sir Isaac Newton, circa 1676).
Comments are welcome
Copyright © 1997 Addison-Wesley Pub Co. and
Created: Oct. 24, 1997
Revised: Oct. 27, 1997
URL: https://webreference.com/dev/dndcgi/start.html