Migration Solutions for ColdFusion Applications to ASP.NET
      
Vince Bonfanti's Weblog

Comparing CFML Server Performance (a response to Mark Drew)

A few weeks ago Mark Drew wrote a series of blog entries on the topic of performance. In fairness to Mark, he does state more than once that his primary goal was to compare the relative performance of creating structs versus CFCs versus Java objects. However, it's the comparison of various CFML server engines in two of his blog entries that caught many people's eyes, including mine, and that's what I'd like to comment on here.

As I commented on one of Mark's blog entries, the type of "loop 10,000 times" style of micro-benchmarks he used in his testing are seriously flawed. It's very difficult to draw any broad conclusions or make meaningful generalizations from the results of such tests. Among the problems with these types of tests are:

  • Micro-benchmarks can make differences that are small and insignificant seem large and important, to an arbitrary degree. Let's assume the difference between two operations is one-tenth of a millisecond; with 10 iterations the difference is one millisecond--insignificant; with 100 iterations the difference is 10 milliseconds--still no big deal; but with 10,000 iterations the difference is 1,000 milliseconds--whoa, that's a big difference! Want to make the difference seem even larger? Just run more iterations. The question is: how many of these operations do you realistically expect to perform during execution of a single request?

  • Micro-benchmarks don't reflect real-world applications. They artificially isolate individual operations and don't give you any context about the relative importance of these operations versus what else might be happening when you execute a request. For example, accessing an external resource--such as a database, or web service, or LDAP server, or mail server, etc.--is going to hundreds or thousands of times slower (or more!) than executing any tag or function that doesn't access an external resource.

  • Micro-benchmarks don't reflect the real-world execution environment. Most developers use single-processor computers (though this has been changing recently), and the micro-benchmarks are usually executed within a single request. In the real-world, most web servers are multi-processor machines, and the most important question is: how does it behave under load? That is, what does the performance look like when executing multiple requests simultaneously?

The bottom line is that micro-benchmarks can cause you to focus on trivial or insignificant items, and can produce misleading results that don't translate into the real-world. Micro-benchmarks are like parking your car in the driveway and revving the engine while it's in neutral to see how fast it'll go. Sure, you can measure the maximum RPMs, which will give some indication of how fast the car might go, but it doesn't tell you what's really going to happen when you put it into gear and head out onto the highway.

In order to illustrate these points, we created some tests that we think are more meaningful (but which still have some serious limitations, as I'll discuss below), to see what results we'd get. Basically what we did was modify one of Mark's original tests to remove the nested loops and instead execute the CreateObject function a fixed number of times within a loop:

<cfset x="100">     
<cfloop from="1" to="#x#" index="i">
    <cfset oItem = CreateObject('component', 'Person')>
    <cfset oItem.setname("Bob" & i )>
    <cfset oItem.setage(20)>
</cfloop>
done!

For completeness, here's what Person.cfc looks like:

<cfcomponent>
    <cfset this.name = "">
    <cfset this.age = 0>
	
    <cffunction name="setName">
        <cfargument name="name">
        <cfset this.name = arguments.name>
    </cffunction>

<cffunction name="setAge"> <cfargument name="age"> <cfset this.age = arguments.age> </cffunction> </cfcomponent>

Then we used a commercial load testing tool (Microsoft Application Center Test) configured to run three simultaneous clients with no wait time between requests so there were always three requests executing simultaneously. We ran successive test sessions at increments of 100--first 100 CreateObject calls, then 200, then 300, etc., up to 1000--and measured the average response times for one-minute test durations.

The test server has dual-CPU 3.0GHz Intel Xeon processors with hyperthreading, 1.0GB of RAM, and is running Windows Server 2003 and IIS 6.0 (that is, it's a pretty typical modern web server). We ran the tests with CFMX 7.0.2, BlueDragon Server JX 6.2.1 and 7.0 beta2, BlueDragon.NET 6.2.1 and 7.0 beta2, and just for fun, last night's developer build of BD JX and BD.NET 7.0 (we've done some performance enhancements in the the month or so since BD 7.0 beta2 was built and I was curious to see how it did).

Here are the results, which are quite different from Mark's micro-benchmarks (click to get a larger image):

What can we conclude from these results? For one thing, if you relied on Mark's micro-benchmarks to conclude that BlueDragon is slower than CFMX for CreateObject calls, you'd be wrong. However, I'd caution against generalizing too broadly from these results, because these tests contain some (but not all) of the same flaws as the micro-benchmarks.

For example, while you may have an application that creates 100, or 500, or even 1000 CFCs via CreateObject, are they really going to be all of the same type? Probably not. And the Person.cfc from Mark's tests is pretty simple--what happens if you're creating CFCs that contains dozens of functions that contain lengthy or complex logic of their own instead of just two simple setters? How will that affect performance?

Further, I only had three simultaneous clients running (though with no wait times, this is a higher load that you might think at first). What happens if this is higher? or lower? or if we add wait times?

My main point is that the only meaningful performance testing is testing of your application in your production environment under traffic conditions that you reasonably expect to see in the real world. Be skeptical of any artificial performance benchmarks--especially micro-benchmarks--and always look at them with a critical eye with the questions: "How generalizable are these results?" and "How do these results relate to the real world?".

There's only one way for you to know which CFML server is the fastest for you and your application: test them yourself.

One final point. We do all of our performance testing of BlueDragon on multi-CPU servers using external load testers to generate various levels of traffic. Any claims we make regarding BlueDragon performance are based on these types of tests, and testing of real-world customer applications, and not on micro-benchmarks nor any testing that relies on the server measuring itself (as is typically done with micro-benchmarks).

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Thank you, Vince, for taking the time to do this sort of benchmarking. As you also said in your comment to Mark's entry, this "fallacy of large loop testing" is something I've argued against in the past. I had more anecdotal assertions from some leading CFML developers, but had not yet done a conclusive benchmark like this.

Of course, there will be those who will argue against this conclusion, even with (or using) your own caveats, just as they did against my proposition. It just goes against conventional wisdom, but I do think it represents real wisdom. Certainly, it would be nice to see still more such benchmarking to offer still more conclusive evidence.

As to the observation about CF vs BD, that's certainly another whole kettle of fish, and there your caveats seem especially germane. But I appreciate you making a case against the fallacious use of large loops for "proof cases" of alternative approaches.

For those interested, the blog entry I'd done in the past is at:

http://bluedragon.blog-city.com/fallacy_of_loop_te...

I did write it while I worked with Vince and I posted it then on the BD blog I kept at the time, but the assertion made wasn't at all BD-specific.
# Posted By Charlie Arehart | 12/12/2006 11:10 PM
Hi Vince, you are definately correct, the charts look very skewed on my blog because the difference increases massively and ALL charts will (or should!) have an upward differential trend.

Again, these series of posts was actually meant to test the performance of different objects against each other, on different engines. It wasn't a digg or support for some engine.

I shall setup a load testing tool when I have some time and run the tests again to see if my assumption that loading up Java objects is faster.
# Posted By Mark Drew | 12/13/2006 3:04 AM
You are of course right to say "the only meaningful performance testing is testing of your application in your production environment under traffic conditions that you reasonably expect to see".
However, the speed of CreateObject() originaly came up in the context of creating a (large) number of VOs to send back to a Flex client via Remoting. In this case, the speed differences (even if only a few ms) really do stack up quickly.
# Posted By Tom Chiverton | 12/13/2006 4:27 AM
Hi Tom,

I assume the "speed difference" you're referring to is the difference between StructNew() and CreateObject()? If so, then yes, there is some valid information to be gleaned from Mark's micro-benchmarks comparing the two, and you're probably better off using StructNew().

However, if you really do need to have several hundred CreateObject() calls within a CFML page and you're trying to decide which CFML server engine will perform better running that page, Mark's micro-benchmarks could lead you to the wrong conclusion. You'll get better information by doing the type of performance testing I've described here.
# Posted By vinceb | 12/13/2006 8:00 AM
Vince,

Just out of curiosity, did you test Railo against both CFMX and BD? I know Mark also used Railo in his tests and would be interested to know how it fit into the picture performance wise. By the way, BD is a great product and I'm not trying to take anything away from that.

Thanks!
# Posted By C Brockman | 12/13/2006 8:07 AM
No, due to time constraints we didn't test Railo.
# Posted By vinceb | 12/13/2006 9:14 AM
I too would like to see railo tested
# Posted By Jeff Gladnick | 12/13/2006 9:35 AM
Hi Vince,

nice to read, that BD7 beat CFMX 7.x. I had that impression too. Since in my tests in opposition to Mark Drew's tests BD was significantly faster. But I must admit, I used the BD7 beta, so, since I guess you made some improvements in performance it's clear why BD7 is faster.
I will test your script on Railo 1.1 and post the Results in our newsgroup. But I think it would only be fair, if you would do the tests with Railo.

Regards from Zürich/Switzerland

Gert
# Posted By Gert Franz | 12/13/2006 10:10 AM
Hi again,

I just ran the tests as I posted earlier and here are the results. Railo 1.0 and my local Beta II of Bluedragon 7.0 are about the same.
Railo 1.0 100.5% average, Bluedragon 7.0 100%. But Railo 1.1 is at about 94.4% average time. I will post the detailled results in the Railo blog tomorrow.
But I am sure that concerning Components the .net version of BD will beat Railo 1.1 as well.
I must say that I'm impressed how good BD7 does compete with components.
In our own tests BD beat MX in nearly every test.

Gert
# Posted By Gert Franz | 12/13/2006 10:58 AM
Interesting article and debates going on. I'm curious if anyone has any speculation on why the graph path of CFMX 7.0.2 and BD.NET 6.2.1 follow each other nearly identically? I would have thought these 2 engines would show more unique performance differences considering they run on different virtual machines.
# Posted By Jordan Clark | 12/13/2006 7:41 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.9.2.001. Contact Blog Owner

company media information terms of use privacy policy contact us
This page was dynamically built on the BlueDragon CFML Engine