Performance Optimizations for High Speed JavaScript / Page 3

Performance Optimizations for High Speed JavaScript [con't]

Technique 3: Avoid adding short strings to long strings

Our previous result looked good--very good, in fact. The improved function stringFill2() is much faster due to the use of our first two optimizations. Would you believe it if I told you that it can be improved to be many times faster than it is now?

Yes, we can accomplish that goal. Right now we need to explain how we avoid appending short strings to long strings.

The short-term behavior appears to be quite good, in comparison to our original function. Computer scientists like to analyze the "asymptotic behavior" of a function or computer program algorithm, which means to study its long-term behavior by testing it with larger inputs. Sometimes without doing further tests, one never becomes aware of ways that a computer program could be improved. To see what will happen, we're going to create a 200-byte string.

The problem that shows up with `stringFill2()`

Using our timing function, we find that the time increases to 62.54 microseconds for a 200-byte string, compared to 27.68 for a 100-byte string. It seems like the time should be doubled for doing twice as much work, but instead it's tripled or quadrupled. From programming experience, this result seems strange, because if anything, the function should be slightly faster since work is being done more efficiently (200 bytes per function call rather than 100 bytes per function call). This issue has to do with an insidious property of JavaScript strings: JavaScript strings are "immutable."

Immutable means that you cannot change a string once it's created. By adding on one byte at a time, we're not using up one more byte of effort. We're actually recreating the entire string plus one more byte.

In effect, to add one more byte to a 100-byte string, it takes 101 bytes worth of work. Let's briefly analyze the computational cost for creating a string of N bytes. The cost of adding the first byte is 1 unit of computational effort. The cost of adding the second byte isn't one unit but 2 units (copying the first byte to a new string object as well as adding the second byte). The third byte requires a cost of 3 units, etc.

C(N) = 1 + 2 + 3 + ... + N = N(N+1)/2 = O(N²). The symbol O(N²) is pronounced Big O of N squared, and it means that the computational cost in the long run is proproportional to the square of the string length. To create 100 characters takes 10,000 units of work, and to create 200 characters takes 40,000 units of work.

This is why it took more than twice as long to create 200 characters than 100 characters. In fact, it should have taken four times as long. Our programming experience was correct in that the work is being done slightly more efficiently for longer strings, and hence it took only about three times as long. Once the overhead of the function call becomes negligible as to how long of a string we're creating, it will actually take four times as much time to create a string twice as long.

(Historical note: This analysis doesn't necessarily apply to strings in source code, such as html = 'abcd\n' + 'efgh\n' + ... + 'xyz.\n', since the JavaScript source code compiler can join the strings together before making them into a JavaScript string object. Just a few years ago, the KJS implementation of JavaScript would freeze or crash when loading long strings of source code joined by plus signs. Since the computational time was O(N^2) it wasn't difficult to make Web pages which overloaded the Konqueror Web browser or Safari, which used the KJS JavaScript engine core. I first came across this issue when I was developing a markup language and JavaScript markup language parser, and then I discovered what was causing the problem when I wrote my script for JavaScript Includes.)

Clearly this rapid degradation of performance is a huge problem. How can we deal with it, given that we cannot change JavaScript's way of handling strings as immutable objects? The solution is to use an algorithm which recreates the string as few times as possible.

To clarify, our goal is to avoid adding short strings to long strings, since in order to add the short string, the entire long string also must be duplicated.

How the algorithm works to avoid adding short strings to long strings

Here's a good way to reduce the number of times new string objects are created. Concatenate longer lengths of string together so that more than one byte at a time is added to the output.

For instance, to make a string of length N = 9:

Doing this required creating a string of length 1, creating a string of length 2, creating a string of length 4, creating a string of length 8, and finally, creating a string of length 9. How much cost have we saved?

Old cost C(9) = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 9 = 45.

New cost C(9) = 1 + 2 + 4 + 8 + 9 = 24.

Note that we had to add a string of length 1 to a string of length 0, then a string of length 1 to a string of length 1, then a string of length 2 to a string of length 2, then a string of length 4 to a string of length 4, then a string of length 8 to a string of length 1, in order to obtain a string of length 9. What we're doing can be summarized as avoiding adding short strings to long strings, or in other words, trying to concatenate strings together that are of equal or nearly equal length.

For the old computational cost we found a formula N(N+1)/2. Is there a formula for the new cost? Yes, but it's complicated. The important thing is that it is O(N), and so doubling the string length will approximately double the amount of work rather than quadrupling it.

The code that implements this new idea is nearly as complicated as the formula for the computational cost. When you read it, remember that >>= 1 means to shift right by 1 byte. So if n = 10011 is a binary number, then n >>= 1 results in the value n = 1001.

The other part of the code you might not recognize is the bitwise and operator, written &. The expression n & 1 evaluates true if the last binary digit of n is 1, and false if the last binary digit of n is 0.

New highly-efficient `stringFill3()` function

It looks ugly to the untrained eye, but it's performance is nothing less than lovely.

Let's see just how well this function performs. After seeing the results, it's likely that you'll never forget the difference between an O(N²) algorithm and an O(N) algorithm.

(See Figure 3)

stringFill1() takes 88.7 microseconds (millionths of a second) to create a 200-byte string, stringFill2() takes 62.54, and stringFill3() takes only 4.608. What made this algorithm so much better? All of the functions took advantage of using local function variables, but taking advantage of the second and third optimization techniques added a twenty-fold improvement to performance of stringFill3().

Deeper analysis

What makes this particular function blow the competition out of the water?

As I've mentioned, the reason that both of these functions, stringFill1() and stringFill2(), run so slowly is that JavaScript strings are immutable. Memory cannot be reallocated to allow one more byte at a time to be appended to the string data stored by JavaScript. Every time one more byte is added to the end of the string, the entire string is regenerated from beginning to end.

Thus, in order to improve the script's performance, one must precompute longer length strings by concatenating two strings together ahead of time, and then recursively building up the desired string length.

For instance, to create a 16-letter byte string, first a two byte string would be precomputed. Then the two byte string would be reused to precompute a four-byte string. Then the four-byte string would be reused to precompute an eight byte string. Finally, two eight-byte strings would be reused to create the desired new string of 16 bytes. Altogether four new strings had to be created, one of length 2, one of length 4, one of length 8 and one of length 16. The total cost is 2 + 4 + 8 + 16 = 30.

In the long run this efficiency can be computed by adding in reverse order and using a geometric series starting with a first term a1 = N and having a common ratio of r = 1/2. The sum of a geometric series is given by a1 / (1-r) = 2N.

This is more efficient than adding one character to create a new string of length 2, creating a new string of length 3, 4, 5, and so on, until 16. The previous algorithm used that process of adding a single byte at a time, and the total cost of it would be n (n + 1) / 2 = 16 (17) / 2 = 8 (17) = 136.

Obviously, 136 is a much greater number than 30, and so the previous algorithm takes much, much more time to build up a string.

To compare the two methods you can see how much faster the recursive algorithm (also called "divide and conquer") is on a string of length 123,457. On my FreeBSD computer this algorithm, implemented in the stringFill3() function, creates the string in 0.001058 seconds, while the original stringFill1() function creates the string in 0.0808 seconds. The new function is 76 times faster.

The difference in performance grows as the length of the string becomes larger. In the limit as larger and larger strings are created, the original function behaves roughly like C1 (constant) times N², and the new function behaves like C2 (constant) times N.

From our experiment we can determine the value of C1 to be C1 = 0.0808 / (123457)² = .00000000000530126997, and the value of C2 to be C2 = 0.001058 / 123457 = .00000000856978543136. In 10 seconds, the new function could create a string containing 1,166,890,359 characters. In order to create this same string, the old function would need 7,218,384 seconds of time.

This is almost three months compared to ten seconds!

Impressed? Continue to the next page to learn about "buffering."

[prev] [next]