Archive for the ‘performance’ Category

Found an Optimization in my Netflix Code

Tuesday, April 29th, 2008

I have this loop that loops over the 100 million plus entires of the Netflix Prize data set. I had read over the weekend something on reddit.com about performance, and there was a whole thread trying to answer the question: Why does performance matter? Many people were arguing that there is no reason to focus on performance. Some people even showed that by rewriting some examples, it would be slightly faster. Though, in their rewrites, they made the code more unreadable and not as optimal as it could have been. (it involved factoring numbers)

The biggest thing that most people seem to miss about performance is that why question. They think it can require large changes and if the code becomes unreadable, that is the cost of performance. I disagree. Performance can usually be gained with no cost to readability. As for why, because there is no reason to make the machine do more work than is needed to generate the correct answer. Back to my example.

This loop needed to update these 100 million pieces of data. In the loop, there was a number that worked out to be a constant. I had assumed that the compiler would generate the correct code: a divide by a constant can be turned into a multiply by the inverse for a huge speedup. It didn’t generate that code for me. After I made the code change myself, it went from 62 seconds to 47 seconds, just by adding a 1/ and changing the /= to a *=. Seems like that is worth it.

The second big win was from data ordering. In this loop, there is a 2-d array. I had moved this code from a different subsection where it was accessing the array like this: array[constant][variable] - iterating over the right-hand subscript. In this loop though, the access was different: array[variable][constant], which means it was not accessing memory contiguously per iteration. I made another small change, I swapped the subscripts and updated my dataset to reflect this change. After this change, the code went to 25 seconds per iteration.

So, with no changes in readability, I was able to go from 62 to 25 seconds, which is a big win. Why do it? Well, I can now run 2,000 iterations in 14 hours rather than the 34 hours it took before, to generate the exact same result. These are the kinds of important changes people should look for within their own code, once their algorithm works and is readable.

-Edward

Order (n^2) in time

Tuesday, April 22nd, 2008

I was reading a set of posts on reddit about functional programming.  At some point in the thread, the topic went to order of algorithms.  Someone commented that he had implemented something in an order n-squared since he was only using a few elements in the list.  He said that since no one noticed, it didn’t matter.  It was very frustrating to read that.

I wrote about this in my blog when I worked at Adobe and will write a bit about it now.  In all of my travels as a specialist in code optimization, the thing I see the most is people implementing code in O(n^2) that could be written in linear or constant time.  I know Netscape had a few of them when I was working on that project.  Since it is so pervasive, let me give some concrete examples of why/when/how to look for this.

Let’s create a table with elements being the number of elements in question, linear and n-squared being the time it would take to processes that number of elements with that algorithm:

Elements Linear n-squared
1 1 1
10 10 100
100 100 10,000
1,000 1,000 1,000,000

See, it grows pretty fast!  If your n is 1, 10, or even 100, no one will notice a difference in the time.  But, if the number of elements goes a just a bit more, suddenly the time take will be noticeable.  But, there is a catch, sometimes it pays to use n-squared to much higher ranges.  The actual order is M*O(n) + K for linear, and M*O(n^2) + K for n-squared, with M and K different for each.  So, let us create a new table:  1000 * n + 1000 will be the linear, and 1 * n^2 + 1 will be the n-squared:

Elements Linear n-squared
1 2000 2
10 11,000 101
100 101,000 10,001
1,000 1,001,000 1,000,001

In this case, the linear takes more than a thousand elements before it is better than the n-squared algorithm.  M represents something that happens with each operation, like a data copy, while the K represents things like setup time, pre-calculating, etc.

So, the important thing to remember, algorithm selection depends on the dataset it will be used on.  In most cases, the number of elements will grow to be a lot larger than initially expect, so it almost always pays to switch to a linear algorithm if you can find one.  There are rare cases where the number of elements, the setup cost, and per operation cost of the linear doesn’t make it worth using, so you have to be aware of that.  This is only considering the speed cost of an algorithm, there is also the same set of arguments for memory cost.

I’ll leave with this: at one point Netscape’s browser used linked lists with no tail pointer (and it still might), which means the operation to build a linked list would be n-squared.  This saved a 4-byte pointer, and if the lists were small enough, there was no impact on performance.  The easiest way to find n-squared is feed in a thousand or even ten thousand elements and see where the code “freezes” the machine.  Creating a website with a thousand elements would create a thousand link list elements, which would cause a million operations, which would then take many seconds to display while the user is waiting.  Switching it over to linear would have meant that one could create a page with a million elements before seeing this kind of slowdown, and the thousand element page would have been instantaneous from the users perspective.

-Edward