I have just spent a couple of late evenings hacking away on a piece of performance critical code and I have naturally been profiling it extensively. Here are a few lessons learned that might help some of you doing the same.
Running code under a profiler will affect the performance characteristics In most cases a profiler will give a good indication of your performance bottlenecks, but there are occasions when it will point you in the wrong direction. My biggest mistake was to spend 4+ hours rewriting a small (10 lines) routine over and over again since dotTrace (running in Tracing profiling mode) indicated that it took > 20% of the execution time. It seems as if the Tracing profiling mode in dotTrace will skew the results heavily when making frequent calls to small routines. When running the same code in Sampling mode my small method didn't even show up. Doing "custom profiling" with the Stopwatch class also showed the same result as with Sampling mode. [UPDATE! using the EAP version of dotTrace 4 and running with CPU Instruction tracing the Tracing mode worked a lot better - comparing it to Sampling mode the difference was usually in the range 1 - 2 % which is quite acceptable]
Use real-life data when measuring performance This is a no-brainer, but I was running with highly artificial data for a long time since I was "just in early development stages - will fix it later"-mode.
Don't wait too long with performance testing Use performance testing early in the project to evaluate the efficiency using various approaches to solving the problem at hand. Doing a complete rewrite after you have a nice, working (but too slow) solution is not fun.
Don't optimize too soon This may seem contrary to the previous paragraph, but this is just trying to point out that early performance testing should not make you focus on fine-tuning a few methods, but keep your eyes on the big picture. Keep your code as clean as possible for as long as possible.
I have personally found that "Optimize too soon" is usually where I fail. The simple reason is that it is so much fun comparing measurements and see that you have just cut another 2% in execution time...