A few weeks ago Mark Drew wrote a series of blog entries on the topic of performance. In fairness to Mark, he does state more than once that his primary goal was to compare the relative performance of creating structs versus CFCs versus Java objects. However, it's the comparison of various CFML server engines in two of his blog entries that caught many people's eyes, including mine, and that's what I'd like to comment on here.
As I commented on one of Mark's blog entries, the type of "loop 10,000 times" style of micro-benchmarks he used in his testing are seriously flawed. It's very difficult to draw any broad conclusions or make meaningful generalizations from the results of such tests. Among the problems with these types of tests are:
- Micro-benchmarks can make differences that are small and insignificant seem large and important, to an arbitrary degree. Let's assume the difference between two operations is one-tenth of a millisecond; with 10 iterations the difference is one millisecond--insignificant; with 100 iterations the difference is 10 milliseconds--still no big deal; but with 10,000 iterations the difference is 1,000 milliseconds--whoa, that's a big difference! Want to make the difference seem even larger? Just run more iterations. The question is: how many of these operations do you realistically expect to perform during execution of a single request?
- Micro-benchmarks don't reflect real-world applications. They artificially isolate individual operations and don't give you any context about the relative importance of these operations versus what else might be happening when you execute a request. For example, accessing an external resource--such as a database, or web service, or LDAP server, or mail server, etc.--is going to hundreds or thousands of times slower (or more!) than executing any tag or function that doesn't access an external resource.
- Micro-benchmarks don't reflect the real-world execution environment. Most developers use single-processor computers (though this has been changing recently), and the micro-benchmarks are usually executed within a single request. In the real-world, most web servers are multi-processor machines, and the most important question is: how does it behave under load? That is, what does the performance look like when executing multiple requests simultaneously?
The bottom line is that micro-benchmarks can cause you to focus on trivial or insignificant items, and can produce misleading results that don't translate into the real-world. Micro-benchmarks are like parking your car in the driveway and revving the engine while it's in neutral to see how fast it'll go. Sure, you can measure the maximum RPMs, which will give some indication of how fast the car might go, but it doesn't tell you what's really going to happen when you put it into gear and head out onto the highway.
In order to illustrate these points, we created some tests that we think are more meaningful (but which still have some serious limitations, as I'll discuss below), to see what results we'd get. Basically what we did was modify one of Mark's original tests to remove the nested loops and instead execute the CreateObject function a fixed number of times within a loop:
<cfloop from="1" to="#x#" index="i">
<cfset oItem = CreateObject('component', 'Person')>
<cfset oItem.setname("Bob" & i )>
For completeness, here's what Person.cfc looks like:
<cfset this.name = "">
<cfset this.age = 0>
<cfset this.name = arguments.name>
<cfset this.age = arguments.age>
Then we used a commercial load testing tool (Microsoft Application Center Test) configured to run three simultaneous clients with no wait time between requests so there were always three requests executing simultaneously. We ran successive test sessions at increments of 100--first 100 CreateObject calls, then 200, then 300, etc., up to 1000--and measured the average response times for one-minute test durations.
The test server has dual-CPU 3.0GHz Intel Xeon processors with hyperthreading, 1.0GB of RAM, and is running Windows Server 2003 and IIS 6.0 (that is, it's a pretty typical modern web server). We ran the tests with CFMX 7.0.2, BlueDragon Server JX 6.2.1 and 7.0 beta2, BlueDragon.NET 6.2.1 and 7.0 beta2, and just for fun, last night's developer build of BD JX and BD.NET 7.0 (we've done some performance enhancements in the the month or so since BD 7.0 beta2 was built and I was curious to see how it did).
Here are the results, which are quite different from Mark's micro-benchmarks (click to get a larger image):
What can we conclude from these results? For one thing, if you relied on Mark's micro-benchmarks to conclude that BlueDragon is slower than CFMX for CreateObject calls, you'd be wrong. However, I'd caution against generalizing too broadly from these results, because these tests contain some (but not all) of the same flaws as the micro-benchmarks.
For example, while you may have an application that creates 100, or 500, or even 1000 CFCs via CreateObject, are they really going to be all of the same type? Probably not. And the Person.cfc from Mark's tests is pretty simple--what happens if you're creating CFCs that contains dozens of functions that contain lengthy or complex logic of their own instead of just two simple setters? How will that affect performance?
Further, I only had three simultaneous clients running (though with no wait times, this is a higher load that you might think at first). What happens if this is higher? or lower? or if we add wait times?
My main point is that the only meaningful performance testing is testing of your application in your production environment under traffic conditions that you reasonably expect to see in the real world. Be skeptical of any artificial performance benchmarks--especially micro-benchmarks--and always look at them with a critical eye with the questions: "How generalizable are these results?" and "How do these results relate to the real world?".
There's only one way for you to know which CFML server is the fastest for you and your application: test them yourself.
One final point. We do all of our performance testing of BlueDragon on multi-CPU servers using external load testers to generate various levels of traffic. Any claims we make regarding BlueDragon performance are based on these types of tests, and testing of real-world customer applications, and not on micro-benchmarks nor any testing that relies on the server measuring itself (as is typically done with micro-benchmarks).