Architecture-Based Performance Loss
part of Perl for the Web
When considering the overall performance of a Web application, the system architecture is the most important factor on which to focus. Both hardware and software architecture create the environment under which a Web application operates, and both play a role in determining the maximum efficiency possible for a Web application. Inefficient architecture can create bottlenecks that are the limiting factor in Web application performance, regardless of the individual cause.
Performance loss due to an inefficient architecture affects every aspect of a Web application. A sluggish application causes dependent applications and other system processes also to perform slowly, which can cascade into every part of a server environment. If each application is affected by the same architectural performance limits, the effects can accumulate to the point at which the entire system is unusable even when individual applications are supposed to perform with much greater efficiency.
With Perl CGI applications, the major contributor to an inefficient architecture is the common gateway interface (CGI) itself. The CGI protocol was designed to fit an environment that didn't have persistent applications in mind. Thus, it inherited an execution model that was well-suited to early Web programming, but that is inefficient when placed under load. Whether written in Perl or other languages, CGI programs start and stop with each Web request, regardless of whether the requests are coming in once every few days or hundreds of times a second. Performance is reasonable in the former case, but as soon as requests start coming in at a pace that is faster than the CGI applications can process, the overhead of starting and stopping the applications overwhelms the system.
The performance characteristics of a CGI architecture can affect the performance of related Web application systems as well. Database servers, for instance, assume that an application connects once to the system and that it continues to process transactions interactively disconnecting only when all transactions are complete. Systems like these respond poorly to the stop-and-go pattern of a CGI applications, which causes the database server to process the client connection repeatedly over the course of its interaction with the Web application. The performance of both the Web application and the database suffers as a result, creating a theoretical limit to the number of database transactions any Web application can process in a given time period.
The only way to really find architecture bottlenecks is by stressing the system and checking the outcome. With a complex system, this process can be difficult even with the best diagnostic tools. Unfortunately, the nature of most Web applications makes load testing a low priority for the development team. The low priority results in shaky testing methods with ineffective tools that shed no light on potential trouble spots. Developing the right testing practices isn't impossible, but it requires a clear understanding of how to really test Web applications in a form that's representative of the actual load they'll have to deal with in production.
The Nature of CGI
It might seem that for a book about efficient Web applications, we are spending a lot of time discussing a technologyCGIthat isn't efficient at all. The reason is that most Web application developers equate Perl with CGI and vice versa, and they assume that the performance problems caused by CGI require switching to an application language such as Java or C. However, this "solution" isn't a solution at all; it merely replaces the application development language with a new one without changing the architecture used to access the application. A Java application accessed through CGI (as many servlets are) has all the performance bottlenecks of an equivalent Perl CGI program, as does a CGI program written in C, COBOL, or shell scripting.
In short, CGI has performance problems. In general, the performance problems are solely due to the architecture of CGI processes in relation to the Web server. More specifically, CGI is slow because of the assumptions made when using it. These assumptions all are based on the Web server performance model rather than on the application performance model. CGI processes are managed poorly, accessed inefficiently, and restarted without need.
In comparison, most interactive applications would fail miserably under such a model. If a common application, such as Microsoft Word or the GNU Image Manipulation Program (GIMP), were implemented using a stateless request-and-response model, it would be nearly impossible to use because the application would be constantly exiting and restarting to handle even the most basic interactions with the user. Instead, these applications apply a more reasonable model to handle interactions as events while keeping the program in memory and running.
One Process Per Request
CGI is based on the same model that Web servers use for every other file: a file is requested, the file is accessed, and the file is delivered. For static HTML and image files, which are the mainstay of the Web, this model works well because it simplifies the relationship between the browser and the Web server. The only information a browser needs to pass along to a Web server is the location of the file to be accessed, which is encoded in a uniform resource locator (URL). The URL gives basic information about the server from which the file should be accessed, the path to the file on that server, and the name of the file. For instance, a URL to the file that contains an HTML representation of this chapter might look like this:
This URL indicates that the file chapter05.html can be retrieved through the HTTP protocol from the computer called www in the domain globalspin.com, in a directory called thebook. This idea of a single generic way to locate any file on the Web is the Web's greatest strength, even though URLs won't necessarily correspond to an actual file of that name in a file system directory with the specified name. Because each interaction between the browser and the server can be given a distinct URL, it's possible to start the interaction at any point. You then can continue through hyperlinks to any other point of interaction in any order. This behavior is called stateless, and it's the backbone of the World Wide Web (WWW).
When you need to provide dynamic responses to Web requests, the CGI protocol can access server-side applications in a fashion similar to static files. Each application would still have a URL of the same form as static files. However, the Web server understands that files in certain directories should be accessed as executable programs instead of as files to be transferred. The CGI application would then be responsible for returning a result to the Web server, which would be passed back to the client browser. (See Figure 5.1.)
***Insert figure 5.105hpp01.tiffla
Accessing a CGI application (bottom) is analogous to accessing an HTML file (top).
As Figure 5.1 shows, the interaction between the client browser and the Web server is essentially identical even though the Web server is handling the two requests differently. Thus, a URL for the second interaction would look like this:
Like the URL for a static file, this URL would indicate that the file could be accessed from the computer www.globalspin.com by executing the program chapter05.cgi from the cgi-bin directory. Again, the interaction specified by this URL should provide a consistent result, regardless of which URLs the browser has accessed previously.
Because the HTTP protocol and the requested URL define the transaction between client and server, the CGI protocol specifies the interaction that takes place between the Web server and the executed application. For each request, CGI requires the Web server to start the application and provide a predefined set of information about the environment and the request itself. In return, the application is expected to emit a Multipart Internet Mail Encoding indicatorsuch as text/html for an HTML documentfollowed by the document to be returned to the client. (CGI also defines ways for the Web server and CGI application to communicate more robust information, such as form variables and headers, but the core interaction remains the same.)
Comparison with Word or GIMP
Although the CGI model of request processing might look reasonable, it falls apart when used to create the kind of interactive applications most Web visitors are used to. Web users don't just use the Web; they also use a host of applications on their local machines to perform a wide variety of tasks. The Web browser itself is running on the client machine to provide responsive behavior when using interface elements, such as the Back button or drop-down menus. Because this is the environment with which most users are familiar, it's not surprising that so many people expect the same kind of response when using Web applications. With CGI, however, the Web application itself isn't designed to produce this kind of behavior, so performance suffers and users are left unsatisfied.
As an analogy, consider the way a desktop application works. A word processing program, such as Microsoft Word, or a graphics manipulation program, such as GIMP, works different from a CGI application. When a user runs Word or GIMP, an application is started that then takes requests in the form of keyboard and mouse input. The application doesn't stop and restart every time a new menu choice is made; if it did, the continuous stopping and starting would make the program unusableand this book would take even longer to write. Instead, each new request is handled as a part of the overarching program, whether it's a minor aspect of the program (display a particular screen) or a core function (check the spelling in the document). These applications spend a relatively long time initializing a monolithic application because the understanding is that the initialization happens only once each session.
This underscores the reason why neither application is likely to be replaced by a Web application in the near future. If Word was reimplemented in HTML with a CGI back end, for instance, every request to make a word bold or add a line of text would require another call to the Web server. The CGI architecture would require that the Word executable be restarted for each request, and the result would be an application that took minutes to perform even the simplest action. If this situation sounds familiar, it's because most Web applicationssuch as online stores, Web news services, and Web mailoperate under the same conditions and end up being as painfully slow as Word would be.
The Nature of Databases
Databases are optimized differently than are CGI applications. Standard database access software is likely to support a fixed (if large) number of users who access the database simultaneously with continuous connections that need to be optimized for faster transaction processing. The server is optimized to provide the best performance possible in this environment with an emphasis on providing a smooth performance curve for client applications and their users.
CGI applications access databases differently than do custom clients, however. A CGI Web application is capable only of connecting to the database for as long as it's active, which means that the application connects to the database anew with each request. This runs contrary to the continuous way a database is normally used, so many of the aspects that haven't been optimized on the database server are receiving a disproportionate amount of load. As a result, database performance from a Web application becomes much slower than the performance that the database would otherwise provide.
It is possible to get a fixed number that represents the theoretical limit of CGI-to-database processing speed. Because databases are limited in the number of simultaneous connections they support and the time spent connecting to the database is known, it's possible to calculate the maximum number of requests the database supports each second. This upper limit is a telling reminder that the performance limits due to CGI are restrictive despite the capabilities of the hardware or database server.
Many Users, Many Connections, and Continuous Access
Database servers are optimized to support connections from many clients at the same time, but there are basic differences in the way traditional clients and CGI processes access a database. These differences aren't a function of Perl style or the database server, but the architecture of the CGI interaction itself.
A database server is likely to be optimized for access by individual named users who connect to the database, prepare and execute a reasonable number of transactions, and finally disconnect from the database after all transactions have been completed. Because performing the sum total of these steps as quickly as possible is the usual goal when optimizing a database, database server vendors are likely to optimize the steps that are performed most oftentransactions. As a result, the time taken by a standard transaction, such as a SELECT on a million-record table, is likely to be optimized while the time required to connect to a database is not. After all, time saved in performing transactions is likely to be multiplied dozens or hundreds of times in a single database session, while a connection happens only once. Half a second shaved off the overhead of performing each transaction is worth minutes of time per session, for example, while the same half a second is barely missed from a three-second connect time. Database benchmarks reinforce this focus by testing the maximum throughput of transactions in a single long session while ignoring connect time entirely.
The database also is likely to be optimized to human interface speeds. This means that any specific part of the database interaction can take as long as a few seconds if the user is expecting an immediate response, and interactions that are expected by the user to be complex can take as long as necessary. This is because the client software indicates to the userby displaying an hourglass icon, for instancethat more complex processes are expected to take time, so the only interactions that have to be quick are those that aren't acknowledged by the client interface as being complex. Still, "quick" is defined as taking less time than the user notices, so the amount of time available can be counted in seconds.
Another assumption made when optimizing the database server is that client connections are created infrequently and held open continuously while the user interacts with the database. For user interactions handled through a custom database client, this assumption is perfectly reasonableusers connect once at the beginning of a session and stay connected throughout the session. In most cases, a connection is used continuously for minutes or hours before being closed by the user. However, CGI applications accessing the database open and close database connections as quickly as the applications run their course, which means that dozens of applications might be opening connections every second. This increases the relative importance of connection time disproportionately and exposes delays that would otherwise be unnoticed.
The Theoretical Limit of CGI-to-Database Connections
One easy (if disturbing) way to calculate the theoretical limit of CGI access to a database is by taking the total number of simultaneous connections that the database enables (100, for instance) and dividing it by the average connect time for the database (say, 3 seconds.) Because a database does not enable a new connection to be made until the old connection is released, the result is the theoretical maximum number of CGI requests per second that the database will support (33, in this case.)
The reason this number is so discouraging from an architectural standpoint is the fact that it's a theoretical maximum. Even if the individual query from a CGI request takes a fraction of a millisecond, overall those requests can't be processed faster than the database makes connections. The numbers get worse fairly quickly for difficult database transactions, which can take ten seconds or more to process. Because even the easiest transactions are weighing down the real response time of the database, it's difficult to compensate for such slow transactions, and overall performance suffers even more.
This limit also can be a cause for great concern if the maximum number of database connections is reached and exceeded. Depending on the database server, results might vary from a fatal error in the CGI process to unusual behavior among all database connections. Data integrity is a prime concern when designing a Web application. Thus, behavior like this can be very distressing if it happens unexpectedlyyet another reason to invest time in determining the performance characteristics of Web applications.
The limit can't be raised much by upgrading the database to accept more simultaneous connections. It would seem that upgrading the database to enable ten times the number of connections would increase the limit by the same factor, but the effect isn't so direct. The total number of database connections enabled by a database might be just an issue of licensing, but the number of connections the database server can start each second is a function of the processor power available to the server. Thus, even if a license is purchased to increase simultaneous connections ten times over, the individual connection time might increase by the same factor if the server can't support those connections.
Because the theoretical limit is caused by the way a CGI application accesses the database, the solution to the problem is to change the architecture of the system. Again, a small change to the processing model releases much of the overhead of CGI processing and raises the theoretical limit to rely on factors for which the database is optimized. These factors include transaction speed and total data retrieved. A detailed discussion of how to implement a better database connection architecture can be found in Chapter 14, "Database-Backed Web Sites."
The Nature of Performance Testing
Probably the least mature part of the Web programming process is the poor attempt at performance testing most developers employ before unleashing their applications on the Web. Compared to commercial applications, the amount of testing performed on most Web applications is minimal.
In many cases, performance testing is practically nonexistent. In good cases, performance might be tested by simulating a flood of Web requests. In the worst cases, the simple fact that the site is accessible by one userusually the developeris considered proof enough that the application is capable of production-level performance. Either way, the results of such tests are rarely indicative of the true load an application will see in production.
Even in cases in which performance testing is performed with some rigor, the testing utilities themselves are incapable of providing an accurate replication of the production environment.
Ten Employees Click Submit
An embarrassing but recognizable way in which many companies handle "performance testing" is by manually stressing a site with continuous single-user access. The most pitiful case of this is ten employees trying to access a Web application at the same time, supposedly simulating "ten simultaneous users."
As robust as this test might seem at first glance, it really doesn't simulate the kind of load that would truly stress test an architecture. In fact, it's not likely to stress even the least efficient Web applications because it underestimates the expected load on a standard Web application by hundreds of times. "Simultaneous users" are a fiction when applied to Web applications because the term implies user sessions that are continuous from start to finish. The real measure of Web performance is the number of requests that can be answered in a particular time period, usually standardized at one second. Ten developers clicking the Submit button at the same time won't create more than a temporary spike of a few requests per second, which isn't much of a test when the production server might sustain a load of hundreds of requests per second.
Besides, testing the performance of a Web application should not focus on determining if the application supports the load expected in normal use. Rather, performance tests should determine the amount of load that overloads an application, which then can be compared to estimated traffic levels to determine how much of a margin there is between the two. (If an application is capable of supporting 300 requests per second and traffic is expected to average 30 requests per second, for instance, there is a better margin for growth than there would be with a maximum of 35 requests per second.) Determining the point of overload also gives a benchmark against which later performance improvements can be tested.
Built-In Performance Tools
Many times, the tools included with an application development environment are inadequate for determining the maximum load a server can handle. In some cases, the performance tools are designed solely to monitor the usage statistics of an application already in production, so there's no way to determine the circumstances under which the application would be overwhelmed with requests. Even if tools are included to produce requests in a test environment, they might be incapable of generating the load that would normally be seen in production. For instance, if it takes a tool longer to generate, process and record a request than it takes the Web server to respond to it, the tool might reach saturation before the Web application does. Also, in some cases, the tools are designed to run in the same environment on the same machine, which means that the performance tools compete with Web applications for system resources.
Another downfall of most performance tools is that they rarely pinpoint the cause of a bottleneck, even when one is indicated. A program that stresses an application to the point where it starts to drop requests, for instance, still is unlikely to indicate the conditions under which the requests were lost. Was the network overloaded with requests, which caused request packets to be lost en route to the server? Was the Web server daemon overloaded with file access and request processing, which caused it to queue incoming requests until some were lost? If requests were handled correctly by the Web server, were no responses returned from an overburdened Web application? If a performance tool isn't capable of determining the cause of a failure, the wrong assumptions might be made about the source of the problem. This might lead to an inappropriate solution, which doesn't actually address the real problem.
Good performance tools have a few things in common. They take the structure of the Web application into account, or at least enable that structure to be considered when developing tests of the application. They also simulate real-world requests garnered from log or proxy information to prevent artifacts of the testing process from clouding the results or breaking the application's flow. Good performance tools would simulate a number of requests that would far exceed the current capabilities of the Web application. They would do this even if that would mean simulating millions of users with high-speed access, all throttling the server at the same time. Of course, all these features would be meaningless if the results of the tests weren't made available in a complete, understandable format that could be used to track down bottlenecks. (Further discussion of performance testing can be found in Chapter 15, "Testing Site Performance.")
Perl CGI applications are notable for poor performance in production environments, but a major source of overhead and instability is the architecture inherent in any CGI application. Just as inefficient hardware architecture can hamper application performance no matter how well the application is written, inefficient system architectures such as CGI can limit the performance that an otherwise-efficient Perl Web application can achieve. The stop-and-start nature of CGI doesn't just affect the application itself, but also any connections to other systems on which it might rely. This causes inefficient interactions that slow overall performance even more. The real solution to these problems is a combination of testing to determine the circumstances causing performance loss and architectural changes to smooth over the differences between the way in which high-performance applications are designed to work and the way in which Web requests are likely to be received.