Archive for the ‘Netflix Prize’ Category

Found an Optimization in my Netflix Code

Tuesday, April 29th, 2008

I have this loop that loops over the 100 million plus entires of the Netflix Prize data set. I had read over the weekend something on reddit.com about performance, and there was a whole thread trying to answer the question: Why does performance matter? Many people were arguing that there is no reason to focus on performance. Some people even showed that by rewriting some examples, it would be slightly faster. Though, in their rewrites, they made the code more unreadable and not as optimal as it could have been. (it involved factoring numbers)

The biggest thing that most people seem to miss about performance is that why question. They think it can require large changes and if the code becomes unreadable, that is the cost of performance. I disagree. Performance can usually be gained with no cost to readability. As for why, because there is no reason to make the machine do more work than is needed to generate the correct answer. Back to my example.

This loop needed to update these 100 million pieces of data. In the loop, there was a number that worked out to be a constant. I had assumed that the compiler would generate the correct code: a divide by a constant can be turned into a multiply by the inverse for a huge speedup. It didn’t generate that code for me. After I made the code change myself, it went from 62 seconds to 47 seconds, just by adding a 1/ and changing the /= to a *=. Seems like that is worth it.

The second big win was from data ordering. In this loop, there is a 2-d array. I had moved this code from a different subsection where it was accessing the array like this: array[constant][variable] - iterating over the right-hand subscript. In this loop though, the access was different: array[variable][constant], which means it was not accessing memory contiguously per iteration. I made another small change, I swapped the subscripts and updated my dataset to reflect this change. After this change, the code went to 25 seconds per iteration.

So, with no changes in readability, I was able to go from 62 to 25 seconds, which is a big win. Why do it? Well, I can now run 2,000 iterations in 14 hours rather than the 34 hours it took before, to generate the exact same result. These are the kinds of important changes people should look for within their own code, once their algorithm works and is readable.

-Edward