Prototypes Versus Live Sites
part of Perl for the Web
Web applications are commonly seen as the programs we interface with while using a site that's already in production, but they can just as easily be prototypes of those same sites. Because of the fluid nature of Web development, even sites that don't have a formal prototype stage have prototypes. The Web enables sites to change from day to day. Thus, initial releases of a site can be updated to include new or updated Web applications as the need presents itself. Each revision of the application can be seen as the prototype for the next. Therefore, each iteration is a chance for more development to be done with the application.
With large-scale Web applications, the prototype process is likely to be lengthy. A prototype on the Web with many people evaluating it goes through lots of changes and many revision cycles. During most of the process, features won't be implemented in their final form. Rather, they are implemented in a vague approximation that behaves similarly to the final form. This makes it difficult to determine a detailed structure for the prototype because the underlying functionality might change dramatically, rendering the details obsolete.
Perl is used to prototype many sites because of the speed and ease with which Perl common gateway interface (CGI) can be used to mock up a site in a near-complete form. As they are being designed, smaller sites might use Perl CGI exclusively because it's readily available to site programmers. Larger, more established sites still use Perl CGI to create prototypes of new features for the same reason and with the intention of reimplementing the design using another language and environment when the details are finalized.
The switch from prototypes to live sites usually leaves Perl CGI behind, though. CGI doesn't provide the performance that a Web application needs in production, and using CGI for live sites can sometimes cause the sites to be slow or inaccessible because the CGI processes consume every available clock cycle and megabyte of the Web server.
Perl Means Fast Prototyping
One of the most attractive reasons to use Perl for Web site design is the ability to prototype a Web application in great detail with little effort. Perl is designed to be a problem-solving language that glues disparate systems together to form a coherent wholea description that fits most Web applications precisely.
Perl also benefits from its "interpreted" nature, which eliminates the need for a compile step when developing applications in Perl. (Perl isn't really interpreted at all; programs are actually compiled when run, as is discussed later in this chapter in the section entitled "The Cost of Compilation.") This makes Perl CGI development follow the process used in Web site development, where modifications to source files can be viewed immediately by reloading the resultant page in a client browser. This enables instant feedback to the programmer by presenting intermediate results for modification and exposing errors as soon as they occur.
Perl's history as a glue language also enables prototypes to be created without having to write much underlying implementation code. Publicly available Perl modules contain interfaces that provide database connections, error-handling routines, network interaction, and other common tasks that Web applications perform; thus, it is easy to find a group of modules that can be combined easily to fit the needs of the prototype application. Programmers also can find examples of Perl CGI on the Web that implement common Web applications such as user forums, news publishing, and data browsers.
The object-oriented features of Perl also provide a way to prototype a Web application one feature at a time, by designing simple interfaces to the parts of the application without needing to know details of the implementation behind the interface. This is how most public Perl modules work, and converting an existing piece of Perl code to an object module is trivial. Few restrictions on variable types or scoping are enforced by default, so disparate program sections can be included and patched together with a minimum of effort.
***begin paper box
Case Study: Amazon.com
1. Houston, Lori [online]. Amazon's Production Software Group Builds Auction Site Prototype. [cited Feb. 1, 2001]. Available from Internet: http://perl.oreilly.com/news/amazon_0100.html.
***end paper box
Perl Modules for Common Tasks
Part of the attraction of writing a Web application prototype in Perl is the number of modules Perl provides to perform common Web-related tasks. Most languages have application program interfaces (APIs) and libraries for integrating disparate systems into an application, but Perl excels in this by bringing the majority of those interfaces together in a single module architecture that can be navigated easily.
The Comprehensive Perl Archive Network (CPAN) lists hundreds of modules for use with Perl. The functions are as varied as database connection (DBI), graphics production (GIMP::), Web browsing (LWP), XML processing (XML::*), and even Telnet access (IO::Telnet). All modules on CPAN are arranged by author or name and are available for immediate downloadmany with complete documentation for both installation and use. CPAN itself is truly a network with mirrors on dozens of sites around the world. Many interfaces to CPAN exist. The interfaces even include a Web-based portal site with search engines and documentation at http://www.cpan.org.
Each module on CPAN provides a lightweight and general interface to the module's functions, usually with module documentation that emphasizes usage examples first and detailed instructions later. Many of the code samples included in module documentation are designed to run as-is in user programs, which makes incorporating the features of a module into a prototype that much easier. The DBI module, for instance, has extensive documentation for its API and features. It also includes sample code for common functions, and hints on optimizing database access using the module.
If a module that meets the specific needs for a Web application prototype isn't available on CPAN, it's still likely that some section of code can still be applied to the prototype as a module. Perl modules are just Perl routines in a generically accessible location, so it's very easy to convert any algorithm or code snippet into a Perl module for use in a prototype.
The greatest strength of most Perl modules is the fact that they are released under the Artistic License, the open-source license under which Perl itself is made available. The Artistic License was created specifically to provide the greatest flexibility to Perl programmers when writing and using Perl modules. Modules released under the Artistic License might be used in any fashion, individual or corporate, public or private. This enables modules to be used in proprietary or open-source projects without fear that a license will be violated in future use, which can be a real concern when developing a Web application which is then sold as an original work. Not all Perl modules are made available under the Artistic License, but nearly all modules on CPAN are. Check the documentation for each module if there is a concern; language such as "This module is released under the same terms as Perl" indicates that the Artistic License is in effect.
A Profusion of Available Perl CGI
Perl also suits Web application prototyping well because of the Perl CGI scripts that are already available. These scripts address many of the common forms a Web application might take, so it's often easy to find a CGI script that is close to the desired final result and modify it to suit a specific Web application prototype. Scripts have already been written that implement Web forums, user administration, database access, site searching, and a host of other standard Web application tasks.
CGI's ubiquity as a Web programming protocol can't be overestimated. CGI is not only available for every platform and Web server, but also is almost always included as a default application environment. Perl, as well, is present by default in practically every UNIX Web server installation, and it is made freely available to less common Web server platforms, such as Windows or Macintosh. In fact, the combined ubiquity of Perl and CGI causes neither technology to register on lists of popular server add-ons while PHP, Java servlets, and similar technologies rise to the top of such lists with only 50 percent acceptance or less. This gives the impression that Perl and CGI aren't being used in Web server environments at all, even when a quick tour around the Web shows that they're just as common as ever.
As a result, common aspects of Web applications are likely to have been implemented by CGI programmers already. Even if most of these solutions are never offered as examples for download, that still leaves a wealth of available Perl applications that can be used as fodder for new prototypes. Perl's lax style enables code from similar applications to be incorporated into a prototype with little fuss. This makes it possible to take a number of existing Perl CGI programs and combine functions from them into a new prototype using Perl glue to smooth out incompatibilities in the code. After this hybrid is created, it's then much easier to insert the few new functions that might be necessary to make a prototype workable.
There's More Than One Way to Do It
The philosophy behind Perl development is to get the job done first and worry about the implementation later. Because of this, the language has been designed to enable the simplistic programming stylecalled baby talk by Larry Wall, Perl's creatorto be used alongside more mature and optimized code with no detectable difference at runtime. This makes it easier to develop a Perl application that performs functions that are unfamiliar. Because the application itself can be expressed in its simplest form first, the general structure of the application can be defined in terms of real working code before more complex featuressuch as advanced error checking or higher-performance algorithmsare added. In fact, in cases where quick, temporary solutions are needed, the complex code might never need to be added at all.
Perl also is designed to reduce the impact of style variations in the operation of code. This is a programmatic version of "Do what I mean, not what I say," which enables programmers conversant with one style of programming (in C, for example) to program in that style instead of a more "Perlish" style. The Perl compiler then uses the most efficient method to implement either style, usually with quite a bit of overlap between the implementations of similar functions implemented in different ways. The value of this style neutrality is discussed in greater detail in Chapter 7, "Perl For the Web." This philosophy makes prototyping a Web application in Perl simple because style isn't an issue when developing the prototype. Languages such as Java, on the other hand, enforce restrictions on programming style and interface design, making the initial architecture of an implementation much more important than it would be in Perl.
Perl then allows code to be revised and optimized at a later time by abstracting away features of the code. The object-oriented features of Perl can be used to provide a generic interface to common application functions. The code underlying these functions can then be optimized independently of the main program code and potentially reused in other applications. To assist optimization, Perl has adjustable levels of strictness that give the programmer the ability to define regions of code with strict or loose styles pertaining to variables, subroutines, or references. A Perl application also can be made to emit more stringent warnings in situations in which the program is likely to do something different from what is expected. Because enabling or disabling these warnings is as easy as specifying use warnings or no warnings at any point in the code, it's possible to achieve a very fine grain of control over which ranges of code need strictness and which don't.
Perl CGI Means Poor Performance
Perl CGI can be a common way to develop a site initially, but performance problems become apparent when a Perl CGI application is put under load. This situation is encountered frequently when a prototype application is placed in a production environment with the potential for slow response times, unusual behavior, and system failures. After a site encounters performance problems due to one Perl CGI script, it's likely that no others will be allowed in a production environment again.
Perl CGI works fine when testing a Web application initially, but as the site gets more usage and the application is under more load, the CGI process inflicts more and more overhead and eventually outstrips the capabilities of the Web server. The processor gets overloaded, memory fills up, the database runs out of available connections, and the system grinds to a halt much sooner than would be expected.
It's here that Perl's on-the-fly compiler has the biggest drawback; much of the CPU load of executing a Perl CGI process comes from compiling and initializing the application. The obvious conclusion is that Perl is too slow for the task at hand; there certainly aren't any companies lauding the impressive speed of Perl CGI, and there are fewer examples of speed-optimized Perl programs in everyday use. (Problems with this conclusion are covered in greater detail in Chapter 5, "Architecture-Based Performance Loss.")
One-Time Tasks Under Load
Perl CGI processes work very well in a prototype-testing environment. Only one user is generally accessing the Web application at a time, with all interface lag perceived by that user caused solely by the processing of the user's requests. This is analogous to the way a Perl program is used outside of Web processing, so all the assumptions inherent in that architecture are true; only one request occurs at a time, and after the request is fulfilled, the program and its data are no longer needed.
These assumptions also are correct when a prototype is being demonstrated. The demonstrator is the only user accessing the system, and any delays caused by normal processing are minimal and can be covered by explaining the process while waiting. As a result, any slowness caused by CGI processing is unlikely to be caught during the demonstration or usability testing phases. Even if sluggishness is noticed by individual users, it's likely to be so mild as to be ignored in favor of more pressing concerns.
Moving a prototype into production is a different story. The assumptions of one user, few requests, and no competition for resources are no longer valid. As more visitors use the Web application simultaneously, the load on the application increases exponentially. This is due to the nature of Web requests; they don't come in one at a time the way they would in a single-user application, and they aren't fixed in scope in the way most multiuser applications (such as databases or groupware) are fixed in scope. Web requests can come in simultaneously in ever-increasing quantities, with no simple way of queuing up requests for orderly processing. Instead, a Web server starts up as many processes as there are requests. The load of the additional processes makes the existing processes even slower, so more requests come in while earlier requests still are being processed. Under load, this situation can easily spiral out of control, causing all the Perl CGI processes to grind to a halt. All requests then are lost in the process.
Memory Footprint of Perl Processes
A Perl CGI process can use anywhere from 1MB to 15MB of RAM. Taken individually, processes taking this amount of system memory are easily manageable and unlikely to overwhelm even the least sophisticated Web server. However, the Perl compiler executable used in CGI processing is likely to be linked statically rather than dynamically, resulting in a Perl interpreter that isn't shared between processes. If many CGI processes are running at the same time, the Perl interpreter might be duplicated dozens of times, taking up even more memory in overhead.
If a new process is started before an old process finishes, the new process takes up even more system memory. This is likely to happen repeatedly as a site gets more frequent visitors, which causes more overlap and a greater drain on system resources. For example, if a site gets even 20 requests per second for a CGI application that requires as little as two seconds to process, 40 CGI processes might be starting and stopping continuously, taking up to half a gigabyte of RAM overall. As the rate of requests increases100 per second is still a very conservative number for a production Web applicationthe memory requirements increase not just additively, but geometrically. Each process adds CPU load that slows down other requests, increasing the number of processes necessary and feeding the cycle again. The limits of available RAM can quickly be exceeded in even the most expensive Web servers.
When available RAM is exceeded by the needs of CGI processes, most servers start using swap space on disk to create virtual memory for the processes. At this point, the processes become phenomenally slow; processing time per request increases by tenfold, increasing the number of active processes by the same factor. The overall load of the processes quickly becomes too much for the system and all processing grinds to a halt. Once again, requests are lost and connected systems might be left in an indeterminate state, causing unusual behavior and administrative nightmares.
The Cost of Compilation
The main reason for most of the overhead of Perl CGI is compilation into bytecode and instantiation of program data. Every Perl program is compiled before execution, and any programno matter what language it's written inneeds to initialize the memory structures and system libraries it uses during runtime. Compiling is necessary for any program that is run directly in a system environment without an interpreter.
Compilation is rarely a step that's seen when running a single-user program. C programs, for instance, are compiled over the course of minutes or hours. The compilation step also is where much of the system-related error checking takes place, so compilation might take even longer if the first compile step isn't successful and numerous tries are necessary. As a result, the compilation step is usually performed before the program is ever shipped to the user. Even open-source programs, which are shipped as source code, generally are compiled only once before being installed on the destination system.
Perl programs, however, are designed to be compiled just before they are executed. This makes Perl more immediately responsive to system environment changes and enables Perl programs to be more portable across platforms than a compiled executable would be. The Perl runtime compiles each program in a matter of seconds (or milliseconds), including all libraries and core utilities. The compile and execute steps are both started when a Perl program is started, which adds to the impression that Perl programs are interpreted rather than a compiled. This impression is strengthened by the fact that the Perl runtimesometimes called the Perl interpreterneeds to be invoked to compile and execute each Perl program.
For some Perl programs, compilation can take longer than a few seconds. This becomes more likely the larger a program becomes, with modules adding significant bulk to the program. Instantiating variables, system connections, and large data structures also can add considerably to the overall time a program takes to start. Parsing large XML documents into Perl data structures, for instance, is likely to take a noticeable amount of time. This additional compilation and instantiation time creates an upper limit to the range of Perl CGI solutions because more complex processes would involve prohibitive performance losses.
Compiling a Perl program is a larger CPU drain than simply executing the compiled program. This holds true for all but the most processor-intensive Perl programs because compiling the program requires the same text manipulation, system integration, and disk I/O that a Perl program itself would perform, but on a more complex scale. In fact, in most cases, a Perl CGI process uses up to twenty times more processor time starting up that the same process would use if precompiled. The reasoning used to determine this overhead is discussed in more detail in Chapter 9, "The Power of Persistence."
Prototypes for Web applications are a great use for Perl, and many Perl programs start out as Perl prototypes. Perl's flexibility and powerful system integration modules make prototype design and implementation particularly easy. Perl CGI performance hits the wall in a short time, though, and Perl CGI processes take more memory, use more CPU time, and can eventually overload the system and stop Web server processing entirely. This is mostly due to the overhead of compiling a Perl process on the fly, which only takes a second or two, but which requires even more processing power than the application alone.