Global Spin: Performance Myths

Performance Myths

Click here to order from a bookstore near you.

Before any performance problems can be solved, it's important to understand the pitfalls a Web developer can encounter while attempting to optimize a Web application. It's difficult enough to come to the conclusion that an application should be tested for performance before it's offered to the public. But after the decision is made, it's sometimes even more difficult to get beyond technology preconceptions to discover the aspects of an application that might cause performance bottlenecks when put into production. For applications that are already being used by site visitors, it can be difficult to realize that problem areas aren't always obvious and that the solutions to performance problems aren't necessarily straightforward.

It isn't always easy to pinpoint the real reason why an application is perceptibly slow. There are many links in the chain between the Web server and the site visitor, and any one of them can suffer from performance degradation that is noticeable by the user. A perceived lag in image load time, for instance, can be the result of a slow connection between the user and the Internet, or it could be caused by the Web server's connection. In addition, load from other users accessing the same images could cause the same effect, as could load from users accessing unrelated applications and tying up server resources.

Performance problems also can affect each other, causing greater or lesser-perceived slowdowns. In the case of slow graphics, a slow connection from the user to the Internet could mask a slow connection from the Web server to the Internet, but it also could exacerbate any delays in the transmission caused by server lag. When a site is deemed slow overall by disgruntled visitors, it's difficult to tell which of these factors had the most effect on the slowdown and which of the factors are solvable within the scope of the Web server.

Perception itself can be misleading. A slow site can appear slower when site elements rely on unusual browser tricks or high-bandwidth technologies for basic navigation or form input. A Flash interface that takes two seconds to download might take ten seconds to initialize on a computer with a slow processor, for instance, and the user might not be able to tell the real reason for the slowdown. Even a site that relies on mouseovers to activate meaningful elements of a visual navigation interface might run into trouble when the "on" graphics take more time to load than it takes the user to overshoot the graphic and move on to the next menu choice. In cases like these, simple confusion over interface elements might make a site seem less responsive than it should, which makes duplicating the source of the slowdown even more difficult for site personnel who have no such difficulties.

With Web applications, the same uncertainties apply. Perceived lag can be caused by server-side slowness or network delays. Server-side performance loss can come from many sources, including system architecture, network connection, and simple server overload from too many interested visitors. Because each visitor feels like the unique user of the Web application, it's all too likely that performance loss from any source is intolerable and will cause users to go elsewhere.

Luckily, common gateway interface (CGI) applications all share a few common performance bottlenecks that can be easily identified and removed. On the flip side, Perl CGI programs won't be helped by many performance enhancements that would seem obvious in other environments. Optimizing the application at the source code level, benchmarking application at runtime to gauge the performance differences between one style and another, and switching to a different database based on external benchmark scores are all unlikely to solve performance problems that plague Perl CGI programs. Even worse, switching the development language from Perl to a compiled language such as C or Java is likely to do more harm than good. Each "solution" is more likely to drain development than fix performance issues because it doesn't address the real reasons for performance loss.

Program Runtime

As odd as it might seem, the time it takes a Perl CGI application to run is rarely ever a factor in the application's performance. To understand this, it's important to know the difference between program runtime and the total time and system resources taken when using a program to complete a specific task.

With a normal instance of a program written in Perl, the program is run interactively at the command line. After this happens, the program is loaded off disk, and the Perl compiler:

Also is loaded off disk
Is executed
Configures itself based on files it loads from the disk
Accepts the Perl program as an argument
Parses the program and checks for syntax errors
Compiles the program and optimizes it
Executes the resultant bytecode

Because the first seven steps usually take less than a second, a simple Perl program run in this fashion seems to run instantaneously (that is, in less than a second). Similarly, the total execution time of more complex programs seems to depend solely on the length and complexity of the program being run. A log analysis program that takes twelve seconds from start to finish spends most of that twelve seconds analyzing logs, so the fraction of a second spent on the first seven steps are seen as trivial in ordinary use.

Perl CGI programs are run using all the same steps, but the relative importance of each step becomes skewed greatly because of the demands placed on a Web server in normal use. Although an administrative Perl program is likely to be run once by a single user willing to wait for a response, a Web application written in Perl is more likely to be accessed hundreds of times per second by users expecting subsecond response times–most of whom are likely to run the same or similar Web applications again within a few seconds. With this environment, the time spent in the first seven steps is no longer trivial; that is, "less than a second" becomes much too long when compared to a runtime of a few milliseconds. The processor power used in performing the main task of the Web application has to be divided up between the running instance of an application and dozens of other copies, which are in the process of being loaded off disk, compiled, and instantiated.

Compile Time is Much Longer

When any Perl program is run, the Perl runtime first compiles it from source code. This process isn't trivial even for the tiniest of Perl programs; in those cases, Perl spends much more time and processing power compiling the program than it does running the program itself.

This should make perfect sense to anyone familiar with the process necessary to compile programs in a language such as C. For C programs, the compile step is one that takes minutes or hours, even for a program designed to run for only a few seconds. Perl is similar in this regard, even though the time scales are seconds and milliseconds, respectively, for most Perl Web applications.

The effects of the compile step aren't usually noticed when executing a Perl program. This is because the compile step of most Perl programs is only a second or two when executed by a single user, far less than most users would notice when executing an average system program. This happens no matter how long the program itself takes to run. The program either executes very quickly, in which case, the added compile time is seen as a reasonable total runtime, or it executes over a longer time period than the compile time, in which case, the time spent compiling is inconsequential compared to runtime.

If the same circumstances were to apply to a C program, the effect would be much more noticeable. If a standard program took half an hour to initialize before performing a simple two-second task, the overall perceived runtime would be interminable. Even if the program took an hour to run after the initial half hour of compiling, it would still seem like an incredible waste to compile the program before every use.

On the time scale used by most Web applications, this is exactly the case. A Web application usually needs to respond in milliseconds to keep up with the rate of incoming requests, so a half second spent compiling the program is an eternity. Compared to that eternity, a millisecond faster or slower isn't likely to make much of a difference.

Disk I/O is Much Slower

The first optimization any Web server makes to its operation is caching files in memory instead of accessing them from the disk. The reason for this is raw speed; a file in memory can be accessed much more quickly than a file on disk. Because a Web site is likely to be made up of hundreds of small text files that change very infrequently, it's possible to keep a large number of them in memory for quicker access. If files are accessed a thousand times a minute, the Web server gets an orders-of-magnitude speed increase by loading the files off disk once a minute instead of each time they're accessed. On top of this, it's usually possible to check each file to see if it's been changed since it was last read from disk. This saves the need to reload files unless they've been altered.

The Web server isn't able to cache files used by a CGI Web application, however. The main program file has to be read from disk each time, as do Perl module files, data files, and any other supporting files for the application. Some of these files might be cached within the Perl CGI program if they're being called from within the program itself; however, none of the files associated with compiling the application is going to be cached by Perl because the files are used only once throughout the compilation process.

Even if files used during runtime are cached for use later in the same program, within the CGI model, there's no way to cache these files for use by later instances of the program, let alone for other Perl programs that use the same files. A text file being searched for the words "dog" and "cat," for instance, could be loaded into memory once and used both by the "dog" search routine and the "cat" search routine; however, the next time the program is called to perform the same search on the same file, it has to load the file from disk all over again. This repeated disk access is likely to be much more time-consuming than the algorithms that actually perform the search.

Data Structure Setup Takes Time

A program that provides access to an external data set has to have access to that data internally before it can deal with the data in a meaningful way. An Extensible Markup Language (XML) document, for instance, has to be translated from the text representation of the document into a Perl data structure, which can be directly acted upon by Perl functions. (This process is generally called "parsing the document," but it can involve many more steps than simply breaking the document into chunks. More detail on handling XML can be found in Chapter 16, "XML and Content Management.") Similar processes have to occur when accessing a database, reading a text file off disk, or retrieving data from the network. In all cases, the data has to be translated into a form that's usable within the Perl program.

The process of setting up data structures as a program runs, also known as instantiation, can be very time-consuming. In fact, the time taken by instantiating data might be hundreds of times greater than the time spent actually processing it. Connecting to a busy database can take as long as five or six seconds, for instance, while retrieving data from the same database might take only a fraction of a second under the same circumstances. Another example would be parsing a log file into a hash structure for easier access and aggregation, which could take seconds while the actual aggregation functions take only milliseconds.

Again, note that the time scales on which a Web application operates are likely to seem miniscule, but the important relationship to consider is the relative time and processing power taken by instantiating a Web application as compared to running it. The combined total of these two steps might still be much less than a second, but when evaluating the source of performance bottlenecks, it would be counterproductive to concentrate on a process that takes up only a small percentage of the total runtime while ignoring parts of the process that take up the majority.

The Effect of Minor Optimization

Understandably, the first thing a Perl CGI programmer looks at when trying to make an application run faster is the program code, even when the time it takes to run that code is a fraction of the time spent loading and compiling it. The urge comes from Perl's reputation as a beginner language. If it's so easy to write a program in Perl, the idea goes, most early Perl programs must be written badly with many inefficiencies. This idea is particularly popular among programmers with a background in C. Much of the design work that goes into a C program addresses the need to optimize the program's memory footprint, processor load, and runtime.

From a Perl programmer's perspective, though, optimization is a secondary concern. The primary concern for many Perl programmers is to complete the tasks required of the program using as many existing Perl techniques and modules as possible. In this respect, Perl can be seen more accurately as a solution set rather than a simple programming language. The final result is to make creating a particular application possible with the tools Perl makes available. For Web applications, this becomes even more crucial; in many cases, it's more important for a Web programmer to meet a deadline and deliver a complete and functioning Web application than it is to achieve the greatest possible performance within that application. Performance is seen more as a pleasant side effect than a primary goal, especially when the goal is getting the job done, not merely getting it done faster.

Luckily, Perl does most of the optimizing for the programmer. Because Perl programs tend to use similar styles and techniques, many of those techniques have been quietly optimized over the years to give common programs a performance boost without the need to explicitly optimize the program. In addition, many Perl optimizations are expressed in programmatic hints that have made their way into desired Perl programming style.

Because of these subtle optimizations that Perl recognizes and encourages, it's actually possible to waste time modifying a particular piece of Perl code that already is being quietly optimized behind the scenes. In fact, it's more likely than not that optimizing code written in a Perl style by rewriting it using the rules of another language (such as C) might actually cause more harm than good.

There is an exception to this rule, but it involves major optimization outside the bounds of Perl rather than minor optimization within them. Modules can be written in C or other compiled languages for use with Perl, and within these modules, it's possible to optimize data structures and algorithms that give real performance improvements in cases in which Perl isn't capable of providing them on a general scale. Many common Perl modules use compiled code to provide performance improvements to often-used algorithms and data structures, so it's possible to benefit from these optimizations without having to develop them from scratch.

Optimization Can Increase Development Time

Web programming is a special discipline that spans a variety of skills, but it usually ends up falling between the cracks when it comes time to plan the budget. Web applications are likely to be misunderstood because the possibilities of Web sites are still being explored, and the timelines and decisions surrounding Web applications are likely to vary from nebulous to terrifying.

In a situation like this, the cost of optimizing a Web application can be prohibitive when compared to the potential benefits of spending that time improving the application by adding new features or fixing interface errors. Better yet, it would be a boon to Web programmers to find ways to improve performance across the board without incurring so much of a penalty to development time. In this way, simple optimizations that cause large improvements in performance are far more valuable than minor optimizations that cause only incremental performance increases. Minor optimization always can be carried out at a later time, but in most Web development environments, it's much more likely that new features and updated interfaces will create new work and tighter schedules before that point is ever reached.

An Exception: XS-Optimized Modules

One optimization that can make a real difference when dealing with Web applications is XS, which is the interface language provided by Perl for extending the language using compiled modules written in C. The idea is analogous to the programmer's ability in C to optimize algorithms by writing them in assembly language for a specific processor–a technique that is widely used in programs that need to perform specific calculations with high performance. XS bridges the gap between compiled C–with all the restrictions and processor-specific performance optimizations–and Perl.

Modules written using XS can provide noticeable improvements to both runtime and processor load by compiling optimized handlers for large data structures and complex algorithms that Perl would be less than efficient in handling by itself. It does this by enabling a module developer to specify the interface to a Perl module in the XS language, which is then compiled into a shared library. The XS language can contain complete C functions or it can reference existing C header files to provide a direct or indirect interface to those functions. After it is compiled, the shared library and its associated Perl wrapper module can be used as any other Perl module would be. This enables XS-optimized modules to replace existing all-Perl modules if there's a need to improve performance or to provide a link to existing C code.

Two areas in which XS noticeably improves performance is in database access and XML processing. In the case of database access, the Perl DBI module provides a Perl interface to database driver modules written in C. This enables the individual drivers for each database to be written using C-based interfaces, which are more commonly found in vendor-provided development toolkits than pure Perl interfaces would be. It also encourages driver modules to optimize the raw transfer of data to and from the Perl interface routines in the main DBI module, while providing a seamless interaction between the driver layer and those routines. For XML processing, similar XS modules enable Perl XML processors to use existing parsers and algorithms written for C applications, which optimize the handling of data structures that would otherwise be unwieldy if represented as native Perl references and variables.

Luckily, most of the common cases in which XS is needed already have been addressed by the modules that are most commonly used. DBI, as mentioned before, uses XS to optimize data transfer speeds whenever possible. XML parsing performance is improved using the expat parser (through the XML::Parser module), and interactions with the parsed documents can be optimized by using Xerces for Document Object Model (DOM) processing, Sablotron for XSLT transformations, and Orchard for XML stream handling. (More details on XML processing using Perl is available in Chapter 16.) In each case, the interface to the module is left up to Perl, and C is used simply to accelerate those parts of the modules that are likely to get the most frequent use.

Of course, additional XS modules can be created as the need arises to incorporate C libraries or compile code for performance enhancements. Graphics libraries, windowing toolkits, and proprietary communications protocols are some of the many uses that XS modules can be written to address. This frees the core of Perl to handle the interfaces between these libraries. The development of compiled modules using XS is a topic that is outside the scope of this book, but it can be noted here that most of these performance enhancements are similarly outside the realm of minor optimization and are generally exceptions to the rule.

Perl and C Differences

It would seem logical that the performance problems Perl sees are due to it being an interpreted language. There's no distinct compilation step seen when running a Perl program, so it's easy to come to the conclusion that the Perl compiler is actually an interpreter; it's also common to hear Perl referred to as a scripting language, which lumps it into the same category as shell scripting languages or high-level languages, such as Lisp. Both of these are interpreted and neither is known for high performance. These ideas are reinforced by Perl programmers' recent forays into graphic user interface programming, which usually results in an application that is less responsive overall than a comparable C program.

Most programs in the Web environment that can be compared to Perl CGI applications are both compiled and optimized for performance. The Web server, for instance, is almost always a compiled C program optimized to serve files to remote clients as fast as possible. Server modules, as well, are compiled in with the Web server and optimized to provide their services–such as authentication or encryption–with as little overhead as possible. Because these applications are optimized for performance and all are compiled, the connection between compiled applications and performance optimization is a strong and obvious one.

Writing a compiled application in C to avoid the overhead of interpreting Perl programs might seem to be the answer. This also would seem to enable more thorough optimizations to be made using the standard techniques developed for other compiled C programs. As common as it is, however, this answer is wrong. Compiled CGI applications are just as slow as their Perl brethren because the differences seen between Perl and C programs in other arenas aren't nearly as pronounced when encountered in Web applications.

Even if it did improve the performance of a Web application in use, compiling a Web application presents its own problems during development. Traditional single-user applications have slow development cycles that end when the application is shipped to a user. This comes from the complexity and robustness of end-user applications, such as Microsoft Word, that are required to perform many tasks in a coordinated fashion with few errors and very little chance to fix errors.

On the other hand, Web applications are likely to change rapidly, and they are likely to be created and modified by the same developers who use them. This comes from the Web tradition of editing text files–usually HTML or plain text–for immediate distribution by way of the Web site. Changes are made interactively as errors show up on the pages displayed. In many cases, a page is changed and changed again many times over the course of a few minutes as variations of page layout or word choice are made and tested in context.

Because Web applications are edited and tested in an environment that encourages near-instant feedback, the time spent compiling a C application after every change might become cumbersome when developing for the Web. Developers might be discouraged from trying different variations of an implementation because the time spent compiling each possibility would be prohibitive.

C CGI is Still Slow

Compiled CGI programs are still CGI programs. Chapter 5, "Architecture-Based Performance Loss," describes a bottleneck that can afflict any CGI process. A compiled C program is no exception; C programs still have to be loaded from disk and executed, program variables and data structures still have to be instantiated, and any files related to the program still have to be loaded fresh from disk each time the program runs.

Any optimizations a C process would normally benefit from in single-user use are still tiny in comparison to the overhead caused by CGI invocation of the program. In fact, the major difference in overhead between a CGI program written in C and a work-alike written in Perl is the compilation step necessary before the Perl program is executed. Because the Perl program has to pass through this compile step every time it is executed, and because the C program is compiled once before it is executed the first time and then run as a system binary from then on, it would seem that Perl could never catch up to the performance achievable by the equivalent C program. Fortunately, both compilation overhead and instantiation overhead are artifacts of the way CGI programs handle requests, so even this difference between Perl and C can easily be overcome by switching away from the CGI protocol.

C Optimizations Aren't Automatic

It's been said that it's easier to write a bad C program than it is to write a bad Perl program. Conversely, it's easier to write an optimized Perl program than it is to write an optimized C program. C enables great flexibility when developing a program, but this flexibility can be detrimental when writing Web applications.

Issues such as memory allocation and garbage collection are an anathema to Web programmers because they add complexity and difficulty to the application design process. A Web programmer doesn't want to have to deal with transitory memory leaks, data type mismatches, and system corruption due to incorrect memory allocation when there are more important issues with which to be concerned. These issues include consistent user interfaces and easy access to data sources. Perl's assumptions might not produce the pinnacle of optimized code, but that is more than made up for by the sheer number of optimizations and corrections Perl makes that the Web programmer never has to know about.

Perl offers optimizations that are automatic, like the optimized regular expression engine and the XS-optimized DBI module, which interfaces with a database (see Chapter 14, "Database-Backed Web Sites"). In Perl 5.6, for instance, many optimizations were made to the Perl compiler itself, which affected common Perl keywords, such as sort, by increasing the optimization of nonstandard ways of calling the function. These improvements are present in each version of every module or core release made available.

Because of the high-level abstraction inherent in developing a Perl program, future optimizations to Perl will be utilized by Perl-based applications automatically. This can't be said for a compiled C program for two reasons. First, the optimizations present in a particular version of a C compiler or in a particular version of a system library aren't likely to be so common as to improve the performance of all–or even a majority of–the Web applications written in C. Even if that was the case, though, the applications would have to be recompiled against the new version of the compiler or libraries, which requires a separate step to be performed each time the improvements were made available.

C Programs Still Connect to a Database

One similarity between Perl programs and C programs when creating Web applications is the supporting applications and services to which either application would need to connect. Because a Web application is likely to interact with system resources such as database servers, network support applications, groupware, and other server applications, a Web application written in Perl is likely to rely on transactions between these system applications as much as it relies on internal processing within the Perl application. Similarly, a Web application written in C or Java would have to integrate the same system-level applications, so any bottlenecks caused by system applications would cause the same performance problems for Web applications written in either Perl or C.

A database server, for example, is going to be accessed in similar ways by both a Perl application and a C application. Either one is likely to use the same underlying network interface because Perl module developers generally use the C libraries made available by database server developers. Even if the modules so created behave differently from the standpoint of the programmer who is developing Web applications in Perl, transactions between the database and the Perl application are still likely to be identical to those between the database server and a C application. This is especially true for database servers such as Oracle that rely heavily on network protocols to facilitate database interaction because the network protocols bridge the gap between any specific implementations of the protocols in C or Perl.

One important difference between database access in C and Perl is the ease with which Perl applications can be written to access a database. Database access from Perl has become both efficient and easy to develop in the past few years due to the DBI interface module, which creates a generalized database for use within Perl programs. Optimized interfaces to each supported database can then be written to conform to the DBI specification, which enables database driver developers to concentrate on the details of interfacing with a particular database without having to develop an understandable application program interface (API) for use by application developers. It also enables application developers to learn one database interaction API and apply it to any database made available to DBI. The DBI module is covered in more detail in Chapter 14.

Java is a language commonly used for Web development. It illustrates this myth perfectly. Although Java is a precompiled language, database performance from a Java application is likely to be slower overall than database performance from a Perl application. This is due in large part to the available Java Database Connection (JDBC) drivers being used to access database servers from Java servlets and Java applications. These drivers aren't nearly as optimized for performance as are their C or Perl counterparts, so any application that sees heavy database use is likely to be slowed considerably by the inefficiency of JDBC drivers.

Misleading Benchmarks

Another myth when testing a Perl CGI program for Web performance is that reasonable information can be garnered from the benchmark facilities provided within Perl. The mere presence of these tools would seem to indicate that they would be useful in diagnosing the cause of performance loss, but the tools themselves aren't suited to diagnosing the problems a Web application would face. In fact, most of the results given when using these tools on a Web application are confusing at best and ludicrous at worst.

Perl benchmarks are designed only to test the relative performance of algorithms within the scope of a Perl program. These benchmarks, even when run against an entire Perl program, would measure only runtime and would completely ignore compile time and the other steps necessary to run a program. Because those other steps are much more likely to affect the overall performance of a Web application, the results themselves are useless even when they seem to make sense.

Benchmarking also doesn't take the special nature of Web requests into account. Differing connection speeds cause variable effects on a Web server, which in turn can change the performance characteristics of the Web application being accessed. Because these effects can be duplicated only within the Web server environment itself, the aspects of the applications that need to be tested are outside the scope of the benchmarking tools. Better tools are available, but they would have to operate outside of Perl entirely.

Benchmarks Measure Only Runtime

One common mistake when benchmarking a Perl CGI Web application is using the Benchmark module to time the program from start to finish. It would seem reasonable that capturing the time when the program starts and the time when the program ends, and then subtracting the former from the latter, would give a reasonably accurate measure of the total time the program takes to run. Benchmark.pm provides a group of convenience methods that gives a precise system time that is partitioned into user and system CPU time as well as "wallclock" or perceived time. It also can perform calculations on pairs of these times to determine the difference between them. Listing 8.1 gives an example of how Benchmark is used in such a fashion.

Listing 16.1 Benchmarking a Program

01 #!/usr/bin/perl
02 
03 require 5.6.0;
04 use warnings;
05 use strict;
06
07 use Benchmark;
08
09 my $t1 = Benchmark->new();
10
11 print "Content-type: text/plain\n\n";
12
13 print "This is a CGI script...\n";
14
15 for my $x (1..99)
16 {
17   my $square = $x * $x;
18   print "The square of $x is $square.\n";
19 }
20 
21 print "...which does something silly.\n";
22 
23 my $t2 = Benchmark->new();
24 
25 my $total_time = timediff($t2,$t1);
26 
27 print "\nThis script took ". timestr($total_time)  . " to run.\n\n";

The program in Listing 8.1 illustrates a basic way in which the Benchmark module might be used to determine the total runtime of a program. Lines 03 through 05 include the basic reference checks and modules that are essential to any well-written Perl program. Line 07 includes the Benchmark module itself, and line 09 instantiates a new Benchmark time object, which is assigned to the variable $t1. (Note that this object is more complex than the time stamp given by the built-in localtime function, but for the purposes of this program, it can be thought of in the same way.) The time stored in $t1 is the start time of the program proper. It should be noticed at this point that this isn't really the start time of the benchmarking program, but the start time of the first accessible part of the program; it isn't feasible to get a time before this point.

Lines 11 through 21 of the program make up the core of the program being benchmarked. In theory, if we weren't benchmarking the program, these lines would comprise the program in its entirety. In this case, the program does very little and does it pretty inefficiently; it prints a line of text, uses a simple but inelegant algorithm to determine the square of each number from 1 to 99, and then prints each result. Line 21 declares the folly of such an enterprise and ends the program proper. This section of the program could be replaced with any Perl program, CGI or otherwise.

After the central program has finished, line 23 instantiates a second Benchmark time object to capture the moment. Because we now have the start time and end time of the program, line 25 uses the timediff function provided by the Benchmark module to determine the difference between time $t2 and time $t1, which is returned as a third time object and stored in $total_time. Line 27 then converts that into a string and displays it.

The first problem with timing a program in this fashion becomes apparent as soon as the program is run. Running this program from the command line produces the following output:

Listing 16.

Content-type: text/plain

This is a CGI script...
The square of 1 is 1.
The square of 2 is 4.
The square of 3 is 9.
...
The square of 97 is 9409.
The square of 98 is 9604.
The square of 99 is 9801.
...which does something silly.

This script took 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) to run.

Something doesn't look right in this output; the last line states that the program took no time to run. As it turns out, the Benchmark module records "wallclock" time only in increments of seconds, so a program like this–which takes only a few milliseconds to run–won't even register as having taken any time. Also, it would appear that this process required only 10 milliseconds of CPU time, which gives some indication of the load on the processor caused by executing the program; however, that still doesn't translate to any meaningful demarcation of time that could be used to test the overall performance of a Web application.

It might be argued that the program itself does nothing useful, causing the output to be skewed toward the zero direction. Unfortunately, most common Web applications return a similar value because the time taken by Perl to execute this type of application is usually less than a second. For instance, substituting Listing 7.7 from Chapter 7, "Perl for the Web," gives very similar output:

Listing 16.

Content-type: text/plain

Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML><HEAD><TITLE>SQL Database Viewer</TITLE>
</HEAD><BODY>
...
</BODY></HTML>
This script took 0 wallclock secs ( 0.32 usr + 0.00 sys = 0.32 CPU) to run.

Again, very little information was actually given about the total runtime of the program, even though the CPU time taken by this program appears to be about 30 times greater than the previous program. It would seem that even a program that does real work should take so little time as to be practically instantaneous, so the idea that Perl CGI is slow would seem to have no merit.

No Internal Benchmarks for Compile Time

Benchmarks like this only tell half the story when dealing with a Perl program. The idea of benchmarking a compiled program from within the program itself would make some sense; there isn't much that happens before the program is run, and the runtime of the program (or parts of the program) is important only if it's noticeable to the single user interacting with it.

In Perl, however, steps that occur before the benchmarking section of the program is ever reached cause much of the effective runtime and processor load. Notably, the time taken by loading the Perl compiler and compiling the program are ignored completely by benchmarks of this type. Because the Perl compiler also compiles the benchmark code, there's no way to start the timer earlier and catch the time spent before the core program is reached. The beginning of the program in terms of source code is executed only after it is compiled at runtime. Before that point, no Perl can be executed because it hasn't yet been compiled.

It's not possible to get around this problem by embedding the entire program inside a separate timing program, as was the case with Listing 8.1. Any code embedded in this way is considered a part of the main Perl program, and it is compiled at the same time. As a result, the Perl compiler has already been loaded and the program has already been compiled before any embedded code is reached and executed, so the timing results still would exclude compile time. In fact, there is no way to get a benchmark of the total runtime of a Perl program from within itself or an encompassing Perl program. In all cases, the time spent loading the Perl compiler and compiling the program falls outside the scope of the Perl-based timing code.

There are ways to trick the Perl compiler into compiling the timed code after the rest of the program is running, but they result in benchmarks that are so unique as to be useless. You can compile and evaluate the code at runtime using eval, for instance, by loading the program off disk, assigning it to a string variable, and processing it with eval. Tricks like these are made possible because Perl enables the program to take a finer grain of control over sections of code, if necessary. (Applications like this are explored in Chapter 9, "The Power of Persistence.") However, there's no guarantee that the process used to compile and execute the code in this modified way will have any relationship to the process used by Perl to compile and execute the program independently; thus, any benchmarks given by this process would not be indicative of real performance. This problem is compounded further by the fact that time taken by loading the Perl compiler is still being ignored.

The Effect of Connect Time

An additional factor in application performance that shouldn't be ignored is the speed of visitors' Internet connections and the time it takes to connect to the site and retrieve the results of a page. Connection speed can affect site performance in more ways than one, and it's even possible for visitors with slow connections to disrupt the performance of an otherwise-fast site for visitors with more bandwidth. It's very difficult to test a site based on connection types, but considering the effects in a general sense might avoid some surprises.

When benchmarking a Web application written in Perl, it's easy to forget that the application will be accessed through a network connection that doesn't provide instantaneous access to the information being provided. A Web application might respond in a matter of milliseconds, but high latency between the visitor's computer and the Web server might add seconds to the time it takes to establish a connection. Independent of latency, available bandwidth might be limited to the point in which page contents returned by the Web application take even more precious seconds to be transferred downstream to the visitor. Although there isn't much that can be done within Perl to improve the performance of an application over a slow network connection, it's still important to keep this kind of overhead in mind when determining whether an application is providing reasonable performance at a particular rate. (This idea is discussed in more detail in Chapter 15, "Testing Site Performance.")

A slow connection can affect other visitors' experience as well. Depending on the Web server and the Web application, it's possible that an application will have to stay connected and running as long as it's processing the request, which would include the time necessary to transmit the results back to the visitor's computer. This means that an application that technically takes only a few milliseconds to process a request can remain open hundreds of times longer while transferring the results. This prevents that server process from answering any other requests in the interim. If enough of these slow requests come in simultaneously to clog the pipes, it's possible that visitors with fast connections could be kept waiting for a connection to open up before their requests can even start to be processed. Situations like these are more common than they should be, unfortunately, because network congestion can create an environment in which otherwise fast connections can become slow enough to create the same effect.

Slow upstream connections can become as big a performance drain as slow downstream connections. A Web application that requires large files or other data streams to be sent from the client to the Web server for processing can suffer doubly from a slow connection speed. A forum application that accepts long text submissions, for instance, will have to wait until the entire request is transmitted to the server before it's possible to start processing the request. Because the Web server process is occupied while this is happening, and then occupied while the information is being processed and the result is being returned through the same slow connection, it's possible to have a Web application that posts decent benchmarks in local testing take long enough to time out the visitor's Web browser connection.

This kind of upstream lag is very common due to the asymmetric nature of most DSL and cable modem connections; a connection that has a decent downstream transfer rate might have only a tenth of that bandwidth open for an upstream response. Because upstream and downstream connections can interfere with each other on both the client and the server, it's doubly important to check the performance of a Web application under slow network circumstances.

Unfortunately, it's very difficult to test a site based on connection speeds. Most benchmarking applications assume that the site should be tested by overloading it with as many requests as possible as quickly as possible. With these, the goal is to saturate all available bandwidth with simultaneous requests and see how many requests the Web application can process before losing performance or shutting down entirely. In many cases, it's not even possible to set preferences on the benchmarking application to test connections that are slower than the one used for testing. Generally, the test connection is a LAN with hundreds of times more bandwidth than a site visitor would have. On top of that, it's very difficult to tell the average speed of visitor connections–even after the fact. Chapter 15 discusses a few ways to simulate slower connections while maintaining a reasonable server load.

Database Benchmarks Expect Different Circumstances

Despite the lip service paid to testing the performance of Web applications, in practice they aren't likely to be tested or benchmarked frequently. Database servers, on the other hand, are some of the most aggressively benchmarked applications available. As a result, it would seem that the task of performance testing the database-enabled aspects of a Web application has already been done to satisfaction. Most database benchmarks take for granted circumstances that are very different from the kind of usage patterns a Web application would impose on a database server; thus, benchmarks produced in regular server testing–even those from third-party groups–are likely to be inadequate when testing the performance of a database in terms of the Web applications it's supporting.

Databases are likely to be benchmarked in terms of transactions per minute. Few databases are likely to be compared in terms of the number of seconds it takes to connect, however, so the connection time from a CGI application still has to be factored in when testing the performance of a database-backed Web application.

The usage patterns of a Web application are unlike most database transactions. Most database front-end applications are likely to access the database in a straightforward fashion; they connect to the server, log in, and start processing a stream of transactions in response to interactive commands. When testing a database in this context, the most important aspects would be the number of concurrent connections, the maximum rate at which transactions can be processed, and the total bandwidth available to returning results. These would correspond respectively to the number of users who could access the system at the same time, the complexity of the programs they could use to access the database, and the amount of time they would have to wait for a complete response. This makes it more reasonable to test the database by having a group of identical applications access the database simultaneously, process as many transactions as possible to determine the maximum transaction rate, and retrieve as much data as quickly as possible to determine the maximum transfer rate.

With a Web application, however, the usage pattern would be very different. A CGI Web application is likely to connect to a server, log in, process a single transaction, retrieve a subset of the results, and disconnect, only to reconnect again a moment later. As a result, testing in a Web context would need to concentrate more on the total time taken by connecting to and disconnecting from the database, the overhead due to logging in and processing a transaction, and the elapsed time necessary to retrieve the necessary subset of data needed by the application. (Chapter 14 has more detail about how to make Web applications access a database in a more continuous manner.) As it turns out, the best way to simulate such odd request patterns is by running the Web application itself in production testing.

***begin sidebar

The Real Cost of Optimization

When I first went to a meeting of the San Diego Perl Mongers, a local Perl user group, another attendee explained a problem he was having with the performance of a Web application. Bill was working for a university in San Diego as a Web programmer, and one of his duties was the care and feeding of an application used to process class registrations through the school's Web site. At the start of every quarter, the site would get pounded by traffic as the students all logged in at the same time to search through and register for classes. Unfortunately, the application wasn't able to handle the traffic, and it responded to the flood of requests slowly–if at all. This made the students furious with the site, and it caught the attention of Bill's superiors.

He had inherited this application from a previous Webmaster. It was written in Perl for use with CGI, and the application itself was full of twists, convoluted functions, and hard-to-follow pathways. Bill was having a devil of a time just understanding the program enough to fix errors or add features. So, he had no idea how to optimize it to make it react more efficiently to the temporary surges of traffic it was likely to see. Previous attempts had been made to improve the speed of the program by optimizing the code in sections of it, but none had improved the application's performance to the point where it was usable under load. (Unfortunately, they didn't realize this until the next registration period started and more students got angry at the system.)

At some point, the decision was made to rewrite the entire application in a compiled language such as Java. As a result, Bill's job wasn't to fix the application completely, but to patch parts of it wherever possible to improve performance while the new application was being written. With this in mind, he latched on to an idea that the database accessed by his Web application was itself slow, so if it was possible to increase the performance of the database, it might be possible to improve the performance of the application to the point where it was usable. Bill came to the Perl Mongers meeting with this goal in mind, but he ended up getting a tutorial on the mechanics of CGI instead.

Bill's problem was not that the application itself was slow, but that it was being overwhelmed by overhead whenever a large group of users accessed it simultaneously. Having read this book up to this point, you should recognize that Bill's problem wasn't going to be solved by tuning the database or optimizing sections of code, and it wasn't going to be solved by translating the whole mess into Java, either. The crux of the problem was that Bill's program was running as a CGI process, which by now should send shivers of apprehension down your spine. The overhead of starting, compiling, and instantiating his complex program and connecting to an already-overloaded database was completely overshadowing the main task of the program–processing the students' requests. At that meeting, Bill was told what you've read here: tuning the application itself wouldn't help more than incrementally, but running the application in a persistent environment would give exponential performance improvements, especially in the circumstances he was seeing.

I don't know if Bill was ever able to act on that knowledge, or whether the university eventually did rewrite the entire application in Java. Hopefully clear heads and reasonable performance testing prevailed. I do know that I've seen the same situation many times since then, which is the main reason why I've written this book.

***end sidebar

Summary

When it comes to Perl CGI, performance myths abound. Most people place the blame for poor performance on Perl because it's not precompiled or hand-optimized, but in truth, neither of these factors has much effect on CGI performance. Optimizing the runtime of Perl programs doesn't help (aside from a few exceptions), and rewriting Web applications in C doesn't make a difference if the application is still being called through CGI. When determining the cause of a performance problem, internal Perl benchmarks are likely to give misleading data because there's no way to add in the compilation and instantiation time required to get a complete picture of how long a CGI Web application really takes from start to finish.