March 18, 2000 - The trie Data Structure
March 18, 2000 The trie Data Structure Tips: March 2000
Yehuda Shiran, Ph.D.
|
The trie data structure is based on two principles: a fixed set of indices and hierarchical indexing. The first requirement is usually met when you can index the data by some or all of the digits 0 to 9, or by the alphabet. For example, let's take the ten digits 0 to 9. At the top level you have a 10-element array. Each of these array's elements may point to another 10-element array, and so on. If you index the data by the alphabet, there will be 26 elements at the top level array, and each element will point to another 26-element array, and so on.
One of the advantages of the trie data structure is that its hierarchy depth depends on the amount of data stored in it. Each element of data is stored at the highest level of the hierarchy that still allows a unique retrieval. As the data structure getting loaded with data, former data elements lose their uniqueness and need to be pushed down the hierarchy. Thus, the more data being loaded the deeper the hierarchy will be.
Let's take an example. Suppose we have an element that can be identified by the sequence 8376. Assume also that the trie data structure consists of the trie[]
array at the top level. If 8376 is the only data element to be stored, it can be stored as trie[8]
. Assume now that another element labeled as 8453 is added. To make room for the second element, we allocate another 10-element array and have trie[8]
point to it. We push 8376 down and now is indexed as trie[8][3]
, while the new element, 8453 will be stored under trie[8][3]
. Storing all eight presidents is explained on the next page.
Learn about our trie-based dialer in Column 57, The Doc Dialer, Part I: Introduction.