You are here: Global Spin -> Perl for the Web

Perl for the Web

part of Perl for the Web

Click here to order from a bookstore near you.

Perl is sometimes seen as yesterday's Web application language. It is not as well-advertised as Java or C#, and many new application servers and content management systems are making their names with technologies such as Python, PHP, or Java Server Pages (JSP). Perl, in contrast, is advanced mostly by noncommercial organizations with little interest in buzz and a real focus on solutions that work quickly and quietly.

Of course, within the Perl community, there's a different story. Perl programmers communicate with each other at a frantic pace. Their communication consists primarily of new and updated modules, solutions to new challenges, and changes to Perl itself. To those who listen more to technical explanations and design specifications, Perl is a vibrant language with a strong following and an excellent future. Perhaps this is why Perl programming manuals consistently top the technical book bestseller lists at online stores and brick-and-mortar booksellers alike.

All buzz aside, Perl is an excellent Web development choice for many reasons, including the following:

The "Swiss Army chainsaw" has more tools available than ever.

This chapter provides some basic insights into the advantages that Perl brings to Web development. They insights are from the standpoint of the technical evaluator and the programmer who might be new to Perl, Web applications, or both. The chapter also contains concrete examples of the work that Perl does for Web sites at development time and beyond. Perl is useful not only for delivering information over the Web, but also for serving as an excellent systems management and discovery tool.

Any Programmer is a Perl Programmer

One of the most overlooked benefits of building applications in Perl is the abundance of Perl programmers. This reality stands in stark contrast to the myth that most programmers use Java or C, which comes about from the use of those languages to teach computer science courses. Perl programmers generally don't come from a classroom. They usually learn the language when performing system administration, integration, or Web development tasks. As a result, Perl has become both an easy-to-learn programming language as well as a popular one.

Many programmers learn Perl as a scripting language for common system tasks. The programmers use Perl to write test harnesses and to install scripts and command-line utilities for compiled applications. Programmers don't write these programs in a compiled language such as C because these types of programs are usually too changeable. They also don't write these programs in an abstracted language such as Java because they require integration with too many other system utilities. As a result, many one-time or evolving programs are written and revised in Perl.

Other programmers learn Perl by developing extensions for preexisting compiled programs. In addition, they use Perl to provide application program interfaces (APIs) for applications that allow custom access through plug-in extensions. One such API is the plug-in API for the GIMP, an image editor that provides automation and filter plug-ins through a Perl interface to the internal graphics API. All the GIMP's functions, from graphics file translation to pixel-level editing, are available through the Perl API.

The GIMP API was originally written in a proprietary language called Script-Fu, but the GIMP's creators later decided that Perl provided a more robust interface and would be easier to develop with. Incidentally, the same GIMP API can be accessed within stand-alone or Web-based Perl applications, which provide a powerful way to create graphics programmatically. Many other program interfaces are added to Perl's tool set this way.

Alternately, some people become programmers by learning Perl first. In Web development, this occurs most often when a graphic artist or content producer learns Perl Common Gateway Interface (CGI) scripting to add functionality to a site. Many of these programmers learn Perl in bits and pieces in response to a particular need, and Perl's shallow learning curve encourages them to develop their skills as new needs arise. As a result, the line between Web designer and Perl programmer is blurring to form a new group of programmer/designers–Web application developers.

Perl's glue language background and flexible syntax means that experience with C/C++, PHP, Python, JavaScript, and even HTML can be applied immediately to Perl programming. Thus, there are many types of programmers who would be comfortable using Perl to start developing a Perl-based Web application.

Programmers with a C Background

C programmers find the structure of Perl programs familiar because C was a design inspiration during Perl development. Many of Perl's most basic constructs–such as looping, conditionals, and function calls–are duplicated faithfully from C with only minor variations. In fact, C style is one of the dialects of Perl, as shown in Listing 7.1.

Listing 16.1 C Style in a Perl Program

01 $num = scalar (DBI->data_sources);
02 for ($i = 0; $i < $num; $i++) {
03   push @datasources, {DBI->data_sources}[$i];
04 }

Listing 7.1 is a modified fragment of Listing 7.7, a SQL query processor that is brought forward to illustrate the similarities in C and Perl coding styles. See Listing 7.7 for the complete program.

In Listing 7.1, an HTML list is generated using the values contained in a Perl array returned by a function. A for loop is used to step through the array, just as it would in a similar C program. The arguments to for are the same as in C:

Within the block, the loop variable $i determines which element of the array returned by the DBI->datasources function is added to the @datasources array.

The for keyword is a synonym for the more Perlish foreach keyword; both accept the C-style arguments, as listed in Listing 7.1, and the customary foreach syntax, as listed in Listing 7.2. The example could have been written either way without so much as a hiccup on the part of the Perl compiler.

Listing 16.2 Perlish Dialect Version of a C-Style Loop

01 foreach (DBI->data_sources($_))
02 {
03   push @datasources, $_;
04 }

The main difference between Listings 7.1 and 7.2 is the focus of the loop. The C style uses an explicit loop variable and requires the programmer to extract the appropriate member of the array returned by DBI->data_sources, as needed. The returned array could have been assigned to its own variable, for example, @raw_datasources, but confusion between that and the more important @datasources array would result. Instead, the Perlish version uses the $_ variable, which is set automatically by the foreach (or for) loop to point to the current member of the array specified. Perl provides many of these convenience variables to reduce the amount of explicit variable definition. Nonetheless, it is always acceptable to define variables explicitly.

Because this book's focus is on performance, it is helpful to note that the for and foreach synonyms are identical in terms of speed, and the C or Perlish dialects post similar benchmarks. The variation in Listing 7.1 would require the Perl interpreter to assign more variables and would request the array of data sources from DBI more often. However, the resulting speed difference most likely would be unnoticeable. See Chapter 8, "Performance Myths," for a discussion of Perl benchmarks of loop processing and similar constructs.

Another benefit of Perl's C heritage is the tight integration between many C libraries with their Perl interfaces. An example of this integration is the set of object classes and methods based on the Document Object Model (DOM). This model most commonly is used with HTML or XML document description and navigation. These libraries use the same method calls, object classes, and exception types as their C or Java brethren. So, programmers familiar with the libraries in other languages are right at home using them in Perl. See Chapter 18, "XML as a B2B Interface," for more information on the Perl interface to DOM.

Overall, C programmers have an easy time transitioning to Perl programming. Solution sets from C are generally applicable to C and the idiomatic differences between C and Perl can be discovered while programming, resulting in a shallow learning curve that can be overcome almost immediately.

Programmers with a PHP Background

PHP was developed specifically to address many of the needs presented by this book. These needs include easy database access, embedded programming, and faster prototyping. To a great degree, it has succeeded. The language is based in large part on C syntax, with modifications made to accommodate Web-style programming, to reduce unnecessary restrictions on variable types, and to allow object-oriented programming.

PHP is an interesting case of the influence Perl has on recently developed programming languages. PHP programmers find Perl very familiar, largely because PHP (like C) has very Perl-like syntax. In addition, PHP object dereferencing, variable notation, looping, and subroutines are almost identical to Perl. Also, the rest of Perl's constructs are similar enough to PHP that programmers spend little time adjusting. Last, the project life cycle afforded by an interpreted language, which involves more frequent changes and less structured debugging than is required by compiled applications, is already familiar to PHP programmers.

The PHP and Perl examples in Listing 7.3 and 7.4 access a database and deliver SQL queries through methods on an object class. The similarities are striking.

Listing 16.3 Object Method Access in PHP

01 //connect to the database
02 $db = new DB;
03 $db->connect("test","localhost","root","")
04      or $error = "Connection failed";
06 //if the database connection worked, send the query
07 if (!$error) {
08   $db->query($query)
09        or $error = "Query failed";
10 }

Listing 16.4 Object Method Access in Perl

01 # connect to the database
02 $dbh = DBI->connect($datasource, $user, $password)
03        or $error = "Connection failed: $DBI::errstr";
05 # if the database connection worked, send the query
06 unless ($error) 
07 {
08   $sth = $dbh->do($query)
09          or $error = "Query failed: $DBI::errstr";
10 }

Aside from the C-like syntax–using curly braces to denote a program block, grouping attributes with quotes, and ending statements with a semicolon–the PHP example and Perl example both use the same variable notation (e.g. $error), the arrow operator (->) to call object methods off an object stored in a variable, and short-circuit operators such as or. In fact, the PHP example is syntactically valid Perl; the only reason to use the latter style is for convenience–unless $error is more English-inspired than if !$error–or because of underlying data structures like the DBI object, which is described in more detail below.

As a result, migrating to Perl can be simple for a programmer with PHP experience. Idiomatic differences would be overlooked by the Perl interpreter, and programmers can learn underlying modules quickly as applications are designed and developed. The addition of Perl's many available interface modules (such as DBI) help PHP programmers by providing object interfaces that offer a useful layer of abstraction.


Perl was designed with first-time programmers in mind. The forgiving nature and do-what-I-mean philosophy of Perl were big factors in making it the de facto standard for CGI programming.

The object-oriented language constructs that were added in Perl 5 are an example of Perl's flexibility. A Perl program could be written in a completely object-oriented fashion to facilitate reuse and extensibility. On the other hand, a Perl program could be written using only procedural code with no object notation at all, or with a mix of objects and procedural style. Compared to strict object-oriented languages such as Java, Perl is unusually flexible in allowing the programmer to choose the programming style, instead of the language imposing it.

This friendliness and flexibility in the Perl language was no accident. From the beginning, Larry Wall, Perl's creator, intended Perl to be as much like English in structure and syntax as it is similar to C or awk. Like English, Perl has many ways to state the same functional messages. The comparisons between Perl, C, and PHP syntax in the previous sections of this chapter are examples of the flexibility of programming style built into Perl. This flexibility at the core of the language reflects not only the diversity of Perl's language ancestors, but also the diversity of programming styles that Perl programmers are expected to need. In fact, some programmers take this flexibility to the extreme by choosing to make their Perl seem as much like English as possible. Listing 7.5 is an example.

Listing 16.5 Perl Poetry by Ryan Koppenhaver

01 #!/usr/bin/perl
02 if ($^I =~ m/cool/) {
03     go("clubbing")
04 } else {
05     go('hack', 'Perl')
06 }
08 sub go {
09     print "Guess I'm just another @{[reverse @_]}er.\n";
10 }

This poem is actually a functional Perl program that produces the answer to the poem's question as output. Note the use of keywords such as if, else, and print to fill both the roles of poetic substance and programmatic function. The poem also uses Perl simplicity shortcuts (such as omitting the semicolons that would normally come at the end of the third and fifth lines) to improve readability and to keep the focus on the central message. This poem was listed on the PerlMonks Web site (, which is discussed in more detail later in this chapter.

Perl poetry is a common form of creative expression among Perl programmers. Programs in any language can be poetic in their own right to those who understand the function of the program, but Perl is unique among programming languages in that it enables complete English sentences to be used in functional programs. In fact, Perl poetry contests are held yearly to find new ways of expressing the same concepts both poetically and functionally. The poems can get very elaborate; many poems print themselves as output, and others produce additional poems when run. One Perl poem even uses poetic functions to turn a sad poetic input into a happy poetic output.

These poems are perhaps best appreciated with a knowledge of Perl; however, the ability to get a sense of a program's inner workings simply by reading the program as an English text is invaluable to anyone who wants to evaluate a program quickly. By using the English syntax that Perl provides, it's possible to create code that is both aesthetically pleasing and functional.

HTML markup is another skill that's generally seen as more art than programming, but in-depth knowledge of it is essential when creating Web applications. Even when creating a Perl-based Web application, HTML might make up the majority of the code because it is also the majority of the program's output. In fact, a common practice is to embed small sections of Perl code within a standard HTML page to keep the HTML and other content sections readable.

Listing 7.6 is an example of Perl embedded in HTML. The Perl code in this example is used only to enumerate a list in an otherwise HTML-centric page; so, the emphasis is placed on the HTML formatting, not Perl.

Listing 16.6 Perl Embedded in HTML with PSP

01 <template file="page.psp">
02 <p>Font sizes on this browser are in the following range:</p>
03 <ul>
04 <perl>
05     foreach $size (1..7) {
06         print qq{<li><font size="$size">Size $size</font></li>\n};
07     }
08 </perl>
09 </ul>
10 </template>

In this case, readability of the page is improved by embedding the Perl function in an HTML-like <perl> tag similar to the <script> tag used to denote client-side scripting languages, such as JavaScript. The Perl functions are evaluated just as though they were contained in their own program. The output is blended with the rest of the HTML page before the page is delivered. The <template> tag works in a similar fashion, but it references Perl code that is included from another page. By separating Perl functions out and including them separately, you can emphasize the HTML aspects of the page, which are more likely to resemble the final output than stand-alone Perl code. Further discussion of Perl templates and embedding can be found in Chapter 13, "Using Templates with Perl Applications."

HTML experts won't have a difficult time making the transition to Perl when programming for the Web. Perl's procedural nature makes it easier to learn than JavaScript's object-oriented functions. In addition, the capability to embed Perl into an HTML page or to use HTML-like tags to express program functions enables Perl to be learned one function at a time while Web applications are being developed.

The Perl Programmer Next Door

Although "Perl Developer" is a title rarely seen in corporate America, many computer professionals have enough Perl programming knowledge to develop a Web site. Systems administrators, analysts, database administrators, and many other programmers use Perl to perform their jobs. These programmers might not consider themselves Perl experts, but they are likely to have the skills for developing for the Web using Perl.

Systems Administrators

Systems administrators, especially UNIX administrators, almost certainly have a working knowledge of Perl. The complexity involved in administering a complex UNIX system encourages administrators to develop customized tools that help them perform repetitive tasks more quickly and easily. In addition, the core UNIX philosophy of having many simple programs that do one thing and do it well creates the need for a glue language such as Perl. A glue language can use groups of system programs to perform a task that is more complex than could otherwise be done with any of the programs individually. Perl's ease of use and interpreted nature also enables it to be used in developing single-use applications that would otherwise be prohibitive to develop in a compiled language. As a result, administrators develop applications in Perl even when the applications are used only once. However, each application might take on a life that is longer than the developer initially envisioned.

In addition, UNIX system administrators with no direct Perl knowledge know more about Perl than they realize. Many of Perl's built-in functions are identical in both name and function to the system utilities found on UNIX or Windows systems. Functions such as crypt, grep, localtime, and others are either UNIX function work-alikes or direct links to system functions. Other common Perl keywords, such as open, close, fork, and exec, refer to system functions that are familiar enough to enable any system programmer to grasp their meaning in Perl and to use them effectively with little additional training.

Perl regular expressions also should be familiar to the UNIX system administrator. The regular expression syntax in Perl is based on awk, a common UNIX regular expression utility. Because regular expression syntax is both the most powerful and the most difficult part of Perl to learn, awk knowledge can give you a good head start on Perl programming. Awk also is likely to be used in conjunction with shell script programming. Thus, a system administrator familiar with awk is likely to have used it in a context very similar to the way in which regular expressions are used in Perl.

In Web programming, a thorough knowledge of system administration can be a valuable asset. Web applications frequently interact with all levels of the systems on which they reside. These levels include database drivers, system files, or compiled programs. An understanding of the issues that can arise concerning Web server access to these systems–including file permissions, which are a common source of confusing error messages–can make the transition to Web application programming in Perl smoother.


Generally not regarded as systems programmers at all, Quality Assurance (QA) analysts, systems analysts, and data analysts are nonetheless likely to have developed Perl-related skills on the job for many of the same reasons as have systems administrators. Data compilation, batch processing, and system discovery processes are all made more accessible by using Perl programs at the system level.

QA analysts are likely to use Perl to design test harnesses, especially when performing long sets of custom tests against a specific program or system utility. Perl also enables sections of a program to be accessed directly through module interfaces. With such access, QA analysts can test database access, network interaction, or other secondary program functions that might not be accessible by the program itself.

Systems analysts use Perl to discover information about a system and to compile and analyze data produced by the respective system. Perl's capability to interact with the server environment at all levels enables systems analysts to write programs that determine if aspects of the environment are available. This testing is done prior to testing them individually to determine their capabilities. Perl also offers the rest of its facilities to system analysts who need to compile the information into text or database repositories, which then can be aggregated by additional Perl programs and presented in a usable form.

Data analysts, including those who perform numeric analysis or data mining on large sets of preexisting data, use Perl to facilitate the analysis of potentially unique data sets without requiring the extended programming that compiled applications would entail. These analysts also use Perl's glue language capabilities to connect directly to the databases and other data set formats that they need to analyze without having to translate into an intermediate format.

Any analyst is likely to have the necessary skills to develop Web applications in Perl, especially when the application involves the analysis and presentation of data derived from a large source. Analysts are likely to be familiar with programming style (if not specifically Perl style) and familiarity with the systems and interfaces used by Perl make the transition even easier.

Database Administrators

Database administrators, although they might never have interacted with Perl or Perl-like languages, are well-suited to Web application development in Perl. Databases are an essential part of Web application infrastructure; the data provided in most medium- or large-scale applications comes from a database through the Structured Query Language (SQL). To use SQL effectively requires a solid understanding of the underlying structure of the database, and database administrators are likely to have created the databases themselves.

Good database design lends itself to good Web application design because many of the principles are the same. Certainly, a well-designed database is essential to a high-performance Web application that frequently accesses the database to insert or retrieve data. SQL queries themselves can be either written poorly or optimized greatly, and a database administrator usually knows the most efficient way to retrieve data from the database. In addition, database administrators can alter the database itself to enhance the performance of Web applications, as is discussed in Chapter 14, "Database-Backed Web Sites."

Database administrators are likely to have programmed in a Perl-like fashion because of the need for shell scripting when designing a complex database schema. Because SQL is a programming language that makes immediate changes to a database, it's usually a good idea to create complex schema-creation queries in a more structured and repeatable fashion than most SQL interfaces allow. Database administrators create shell scripts or Perl programs to automate the process of testing complex queries in a way that enables them to be undone as a whole rather than in parts. They also might create Perl scripts to perform database maintenance on a regular basis.

Database administrators are likely to use Perl-type programming within the database itself through languages such as PL/SQL. PL/SQL is a procedural language that executes functions in the Oracle database and that creates stored procedures that can be called from other SQL queries. Although the languages provided by database manufacturers are not likely to be as robust as Perl, they bridge the gap between the data-centric nature of SQL and procedural programming enough to give database programmers a taste of what is possible.

The transition from database design to Web application development might be a leap for some database administrators, but in general, they are likely to take to the programming tasks of a database-backed Web site very quickly. The benefits of having a solid grounding in database concepts–and potentially an intimate knowledge of the database schema being accessed–make database administrators a valuable asset to any Web-application design team.

Thousands of Lines of Existing Web Code

The Web contains a large base of developed and tested Perl code. Perl programmers tend to be a gregarious bunch. So, the first thing a new Perl hacker is likely to do with his five-line "Hello, world!" program is to share it with the nearest archive site for comments and corrections. This urge gets stronger as programmers become more prolific, so the end-result is an ocean of well-designed Perl examples for any possible situation.

A central repository for the best of this code, called the Comprehensive Perl Archive Network (CPAN), was formed to give more structure to the process of sharing Perl modules and documentation. Modules are considered fit for public consumption after they're published on CPAN. Other types of code are too specialized or too unfinished to qualify for a CPAN listing. They reside on other sites that offer their own archives of Perl code to meet particular needs.

Catalog sites, such as CGI Resources, are devoted to organizing and presenting Perl-based CGI or mod_perl software for use by others. There also are Perl-related news and discussion sites such as PerlMonks, which include archives of Perl code that has been submitted for discussion. In addition, mainstream sites, such as Slashdot (, use Perl as a Web application environment and sometimes post the code that operates the site.

CGI Resource Index

The CGI Resource Index ( is a Web site that collects links to CGI scripts and programs written in a number of languages, including Perl, Tcl, and C. As could be expected, though, over ninety percent of the CGI programs the site has collected are written in Perl.

The site has dozens of application categories, from auctions to image indexing. Site visitors rate each CGI program listed on the site, and the highest-rated programs in each category appear at the top of the list. Not all of the archived programs are well-written, but they serve as a good starting point when deciding how to approach a Perl-based Web application, regardless of whether it is represented in the archive.

It's interesting to note that not all the programs listed in the CGI Resource Index are free, as contrasted with program archives such as CPAN that assume all archived software is available for free download. Both approaches can be helpful when evaluating Perl programs to use as the basis for a Web application; commercial applications might fill in where public modules aren't available.


PerlMonks ( is a discussion and archive site devoted to the Perl programming community, with a specific emphasis on improving their Perl skills by interacting with more advanced programmers.

The PerlMonks site has a code library on par with CGI Resources Index, but its main attraction is the smaller snippets of code that abound in the development forums. Code might be offered for comments and criticism, or it might also be offered as the answer to a query or the result of a challenge. Together, the combination of working code and expert commentary makes a search through the PerlMonks libraries a worthwhile experience.

PerlMonks also shines as an example of the cooperative spirit that pervades the Perl community. The most prominent names in Perl development are the same people that teach and inspire other Perl developers by offering help for beginners, tips for advanced users, and interesting challenges for Perl experts.

The Seekers of Perl Wisdom section, for instance, is a clearinghouse for questions regarding Perl style, technique, and experience. The site also contains sections for Perl poetry, meditations on life as a Perl hacker, and cool uses of Perl in the real world. The former is a good place to look for inspiration when starting a new project. PerlMonks also has a special place for the most infamous of Perl styles–obfuscation. The fascination with indecipherable Perl code has given Perl a reputation as a language only for the initiated. In truth, however, it's merely an expression of love for the language. It has little effect on mainstream use.


The programmers at (, perhaps the most visible of high-traffic sites driven by Perl, have taken a unique view regarding the Perl programs they created over the years to handle the unique needs of their popular site. Instead of hiding their code or packaging it to sell to other sites in the same situation, they've taken the time to create an open-source distribution of the code, appropriately named Slash.

SlashCode (, the site devoted to distributing and discussing the Slash code base, also serves as a working example of how to customize Slash for another site. Discussions of Slash applications, development, and distribution are carried out using the Slash modules and templates.

The SlashCode site also lists dozens of examples of other sites that have applied Slash to their own ends, from the Berkeley High School student newspaper to earthDot, an environmental advocacy news site. Each site gives visual clues to its Slash origins as well as new techniques to explore when adapting the Slash code base to a particular end.

Standard CGI Example

To get an idea of the simplicity that Perl CGI offers a new programmer, an example of Perl connecting to a database is included in the following section. This example is in the simple CGI style, but it is reiterated in Chapter 12, "Environments For Reducing Development Time," within a higher-performance, more sustainable design model.

Universal SQL Query Processor

Accessing a database might seem to be a straightforward process, and the advent of standards such as SQL might reinforce that notion. However, good tools to access databases are still few and far between, so most database users still browse databases through custom programs. Worse yet, many database users are forced to plod through a schema, one query at a time, by using a command-line interface and the formatting nightmare it presents.

The Web, however, provides an excellent venue for exploring and manipulating data. The flexibility of HTML formatting enables data to be presented attractively, regardless of whether the style is specific to the current schema or general to all SQL-accessible data.

Before designing a specific format for Web presentation of database content, you should browse the data in a universal format to get a sense of what is available. The most flexible way to do this is by offering a generic SQL portal to any database accessible by the Perl CGI engine.

Form Access with

Most Perl CGI programs begin with, the Perl module that handles Web server interaction through the CGI. provides a simple interface to headers, form variables, and other architectural elements that comprise the communications layer between a Web server and a CGI application. See Chapter 5, "Architecture-Based Performance Loss," for an architectural overview of CGI.

With Web programming in mind, offers methods for creating forms and other HTML structures, as well as methods for accessing form variables and modifying page attributes. As Listing 7.7 shows, HTML forms can be written entirely using method calls to the CGI object.

Listing 16.7 Database Access Through SQL (

001 #!/usr/bin/perl
003 #-----------------------------------------
004 # 
005 # - CGI application example
006 # 
007 #-----------------------------------------
009 # include libraries
010 require 5.6.0;
011 use strict;
012 use warnings;
013 use CGI;
014 use DBI;
016 # declare some variables
017 my ($q, $dbh, $sth, $query, $datasource, $user, $password, $error, 018 $field, $result, $results);
019 my (@datasources);
021 # initiate CGI parser object
022 $q = CGI->new;
024 # begin the page
025 print $q->header, 
026       $q->start_html('SQL Database Viewer'),
027       $q->h2('SQL Database Viewer');
029 # build a (safe) list of data sources
030 foreach (DBI->available_drivers)
031 {
032   eval {
033     foreach (DBI->data_sources($_))
034     {
035       push @datasources, $_;
036     }
037   };
038 }
041 # display the entry form
042 print $q->start_form;
044 print qq{<p>Choose a datasource:</p>\n};
045 print $q->popup_menu(-name => 'datasource', 
046                      -values => \@datasources);
048 print qq{<p>Specify username/password:</p>\n};
049 print $q->textfield(-name => 'user',
050                     -size => 10);
051 print $q->password_field(-name => 'password',
052                          -size => 10);
054 print qq{<p>Enter a SELECT query:</p>\n};
055 print $q->textarea(-name => 'query',
056                    -rows => '5',
057                    -cols => '40',
058                    -wrap => 'virtual');
060 print $q->p, $q->submit;
061 print $q->end_form;
063 # get form variables
064 $datasource = $q->param('datasource');
065 $user = $q->param('user');
066 $password = $q->param('password');
067 $query = $q->param('query');
069 # check form variables
070 if ($query)
071 {
072   $error = "Improper datasource specified" unless ($datasource =~ /^dbi/i);
073   $error = "Query should start with SELECT" unless ($query =~ /^select/i);
074 }
076 # if a query is specified and form variables are OK,
077 if ($query and !$error)
078 {
079   # connect to the database
080   $dbh = DBI->connect($datasource, $user, $password)
081          or $error = "Connection failed: $DBI::errstr";
083   # if the database connection worked, send the query
084   unless ($error) 
085   {
086     $sth = $dbh->prepare($query)
087            or $error = "Query failed: $DBI::errstr";
088     $sth->execute or $error = "Query failed: $DBI::errstr";
089   }
090 }
092 # if any errors are present, display the error and exit
093 if ($error) {print $q->p("Error: $error"), $q->end_html and exit;}
095 # if the query produced an output,
096 if ($query and $sth->{NAME})
097 {
098   # start a data table
099   print qq{<table border="1">\n};
100   print qq{<tr>\n};
102   # display the fields as table headers
103   foreach $field (@{$sth->{NAME}})
104   {
105     print qq{<th>$field</th>\n};
106   }
107   print qq{</tr>\n};
109   # display the results in a table
110   while ($results = $sth->fetchrow_arrayref)
111   {
112     print qq{<tr>\n};
113     foreach $result (@$results)
114     {
115       print qq{<td>$result</td>\n};
116     }
117     print qq{</tr>\n};
118   }
120   # finish the data table
121   print qq{</table>\n};
122 }
124 # finish the page
125 print $q->end_html;
127 # disconnect from the database
128 $dbh->disconnect if $dbh;

After including the necessary libraries and setting up some variables for use in the program, line 021 of the program creates a new CGI query object, $q. The $q object provides a unified interface to all CGI methods, including form creation and query variable access.

After setting up the environment, lines 029 and 032 call methods off the DBI object to build a list of available data sources. Then, lines 041 through 060 create a basic HTML form with a drop-down box for choosing the DBI data source, text boxes for username and password entry, a textarea box for the SQL query, and a Submit button. (DBI access methods are covered in more detail in the following section, "Accessing a Database with")

The program is designed to be accessed multiple times with different information. Forms generated by the program submit information back to it with more information added each time the user submits. This style of programming is common with CGI, but it is much less common in more modern Web programming styles because it leads to monolithic code bases as programs get larger. Large, single programs become difficult to develop in a Web environment, especially on the time scales that Web programming requires. For this example, however, the multiple-access style is useful in keeping the program portable and compact.

Checking Variables with Regular Expressions

Form variables returned by the browser should almost always be checked for validity. The only time they shouldn't be checked is when any possible input is acceptable, as in the case of the $user and $password variables. These variables can blank or contain any characters. No matter what controls are added to the HTML form itself, the HTTP response enables any data to be passed as form variables. Most of the time, it is enough to check that form variables are not empty or that they do not contain data that will crash the application after they are used.

In the case of a database browser, you should restrict the possible SQL queries to those that won't alter the content of the database. To do this, line 073 uses Perl's regular expression engine to check the $query form variable provided by the $p object. The simple regular expression /^select/ returns a true value only if the query starts with the keyword SELECT, and the i modifier relaxes the restriction to enable both uppercase and lowercase versions of the SELECT keyword.

Similar checking is performed on the data source itself in line 072 to gracefully catch obviously malformed data sources, which might result in an uglier error if passed to the database connect method. This type of error checking becomes more important if the page does not display correctly–or at all–due to a fatal error in the program.

Accessing a Database with

In this example, any database that is visible to Perl is available to the query processor. The module that provides this functionality is This module unites a number of secondary database drivers (called DBD modules) under a common object structure and method syntax. DBI is a boon to Web application programmers because it enables database-generic code to be used with any database without modification. This comes in handy when developing a prototype using an inexpensive database system, such as PostgreSQL, and transferring the resultant programs to a production database such as Oracle.

Because DBI offers connections to many databases, the query processor needs a way to indicate which database and data source the query is intended to access. DBI provides convenience methods for enumerating available database drivers and data sources. So, lines 029 and 032 use the available_drivers and data_sources methods to get lists of possible data sources available. (An eval block catches errors from the available_drivers method; DBI installs many drivers by default. Not all drivers apply to every system running this program.) DBI takes these data sources as connection strings in the same form in which they are given by the data_sources method. Therefore, it's possible to list the sources without any translation or parsing.

Once a data source is chosen and a query is submitted, lines 079 to 089 pass the query to the specified data source, and lines 098 to 121 format the output in as generic a way as possible. This is assisted by the NAME attribute of the statement handle $sth used in line 96, which references a list of the field names returned by the query. By listing these names as headings in an HTML table in lines 103 to 106, you can provide an understandable view of the results without knowing in advance either their size or contents.

Error Handling

Because something can always go wrong, it is important to check for potential errors and declare them in a format that the Web server and browser understand.

One way to catch error messages is by redirecting them to the Web browser as they appear, perhaps even setting them apart in red text. This method is commonly seen in ASP-style applications when a particular snippet of code fails to run or returns an error. Unfortunately, the error messages produced by Perl are of more interest to the programmer of the application than the user. Thus, you should treat common or foreseeable errors with a little more charm.

Perl errors are produced in plain text and don't necessarily require the program to halt, so common errors can be found and restated in a way that is more understandable to the program's user. For example, a query that is rejected by the specified data source might give a cryptic error message and a useless line number if allowed to halt the program on its own. Catching the error and presenting it in plain English can greatly improve the usability of a program.

It's also important to give the user a chance to correct any typos or other errors without having to type the entire query again. This is the situation in which the form creation qualities of come in handy. If the user has filled in the form fields and the values are present as variables–as the data source and query variables are likely to be–the CGI query object inserts those values into the form's input fields as it creates them, as illustrated in Figure 7.1. The form creation methods don't have to be modified for this case, which reduces both duplication of effort and the complexity of the program.

***Insert figure 7.107hpp01.tiffSC

Figure 7.1

Script output and prefilled fields

System Administration Example

Perl's unrivaled systems administration facilities make it a natural for back-end Web tasks that would otherwise be done by hand (or not at all). Tasks that are usually performed at the command line– such as regular searches on a database or continuous monitoring of a network-accessible resource–lend themselves to Perl-based automation and the eventual provision of a Web application interface to the same functions.

Downtime Logging and Notification

Immediate notification of a server failure can be valuable, especially when the failure occurs after hours when it would ordinarily go unnoticed. For public Web services and e-commerce sites, quick notification can mean the difference between a temporary blip and a major outage; Web users do retry a site a few minutes after an initial setback, but few keep trying to reach an inaccessible site for much longer than that. An outage of even half an hour can lose significant business, but a Web failure might go unnoticed by site staff for hours–or in the case of a late-night Saturday fault, for days. Checking the availability of a Web site continuously would be an impossible chore if done manually; a Web developer's time can always be spent better.

Fortunately, Web servers can check on other Web servers and notify staff of inaccessible pages and server faults no matter what time of day the outage occurs. A simple Perl application running as a time-triggered independent process can perform this job admirably, and additional notification –to e-mail, cell phone, or pager–can be added without much fuss, as shown in Listing 7.8.

A log of downtime can come in handy during later analysis, as well. For example, uptime as a percent of total can be used to evaluate a server for replacement or upgrades. Customers of a Web-based service might ask for certification of uptime performance as well.

Listing 16.8 Server Monitor Through LWP (

01 #!/usr/bin/perl
03 #-----------------------------------------
04 # 
05 # - LWP monitor example
06 #
07 #-----------------------------------------
09 # include libraries use 5.6.0;
10 use warnings;
11 use strict;
12 use LWP::UserAgent;
13 use HTTP::Request::Common;
14 use Net::SMTP;
16 # set up variables
17 my ($server_url);
18 my ($ua, $result);
20 # set environment
21 $server_url = $ARGV[0];
23 # check the URL
24 $ua = LWP::UserAgent->new;
25 $result = $ua->request(GET $server_url);
27 # was the check successful?
28 # if so, write the current time to the uptime log
29 if ($result->is_success)
30 {
31   # not so fast! check for the string "success"
32   if ($result->as_string =~ /success/)
33   {
34     my $time = localtime;
36     open (LOG, '>>/tmp/server-monitor.log');
37     print LOG "$time - 200 - $server_url\n";
38     close LOG;
39   }
40   else
41   {
42     # call the page fail subroutine with an error code
43     page_failed(500, $server_url);
44   }
45 }
46 # if not, e-mail a notification and log the failure
47 else
48 {
49   # call the page fail subroutine with the result code
50   page_failed($result->code, $server_url);
51 }
53 # e-mail a notification and log the failure
54 sub page_failed 
55 {
56   my $error = shift;
57   my $url = shift;
59   my $time = localtime;
61   # send an email notification
62   my $smtp = Net::SMTP->new('');
64   $smtp->mail('');
65   $smtp->to('');
67   $smtp->data();
68   $smtp->datasend("To: recipient\\n");
69   $smtp->datasend("Subject: $server_url not responding\n");
70   $smtp->datasend("\n");
71   $smtp->datasend("The page at $server_url is not responding.\n");
72   $smtp->datasend("Please check it.\n");
73   $smtp->dataend();
75   $smtp->quit;
77   # send a text message
78   $result = $ua->request(POST '',
79                          [mobilenum   => '8885551212',
80                           callbacknum => '8885551234',
81                           message     => 'Server down.']);
83   # write a line to the downtime log
84   open (LOG, '>>/tmp/server-monitor.log') or die "Log file: $!";
85   print LOG "$time - $error - $url\n";
86   close LOG;
87 }

Accessing a URL with LWP

At its core, the downtime logging application simply checks a server URL the way any browser would, by accessing it over the Web and evaluating the result. This is made possible by LWP, a module that provides HTTP, HTTPS, and FTP interfaces to Perl programs. LWP is actually a collection of individual modules that access servers, wait for a response, and parse the result–all are offered through a high-level interface that masks many of the individual parts.

There are many ways to use LWP, but the simplest is to invoke the HTTP::Request::Common module in conjunction with the LWP::UserAgent module. HTTP::Request::Common makes assumptions about HTTP session variables so that they don't have to be defined explicitly. The module also provides streamlined method calls for use in basic request and response queries.

The server monitor uses LWP to access the page as a browser would to determine whether the page is accessible. If not, the error reported by LWP (usually an HTTP error code such as 404 for "Not Found" or 500 for "Internal Server Error") is passed to a subroutine, which logs the error and notifies site personnel that the site is inaccessible.

Finding Success with Regular Expressions

A nonerror result code doesn't necessarily mean that the Web site is accessible, however. Databases and other secondary systems sometimes produce human-readable errors that don't catch the attention of LWP. These errors are translated into HTML, much like the SQL query program we just discussed. Thus, they do not affect the overall status code of the document returned to LWP. To test that secondary systems are available and responding to requests, it's good to write an application on the server side that returns a trivial string (such as "success") if and only if the system is accessible and functioning normally.

Finding the desired output within the HTML page is an ideal use of regular expressions. Perl can very quickly search through a large string, such as the HTML results of a test page, to find a string that matches the pattern specified. In this case, the pattern is the simple string /success/, which returns a true value if any occurrence of that string is found in the returned HTML document. Note that this also includes substrings of words or tags, so be sure to use a string that is unique to a successful document. The pattern /ok/ would be a bad choice, for instance because it is a substring of 'broken' or 'token', while a pattern such as /database success/ is much less likely to appear erroneously.

Regular expressions also can be used to check arbitrary data with a finer grain than success or failure. For example, the update time on a page could be checked against the current time to make sure the page was updated recently. In addition, an aggregate query from a database could be checked to make sure the total always increases with time. Regular expressions can be used to find almost anything in an output page, and the resulting values can be used in a conditional, as shown in the following code:

Listing 16.

my ($total) = $result =~ /<b>Total: (\d+)</b>/;
if ($total <= 0)

This regular expression returns a section of the result determined by a series of digits, which come after the word Total in a bold string. The total is then checked before continuing with the conditional statement. The perlre section of the Perl documentation contains an in-depth explanation of regular expressions and their syntax.

Notification Through Email or Instant Message

At run time, the most important function of this program is notification. Instant notification of site outages can be crucial, so it's important to notify in multiple ways simultaneously, such as through e-mail and an SMS message to a phone.

Notifying an email address is relatively simple. Net::SMTP is a module included with the standard distribution of Perl. It is implemented entirely in Perl, which means that it relies only on basic network services. It does not require Sendmail or an equivalent mail server to be installed on the same machine. This is important both for portability and performance, as is explained in detail in Chapter 11, "Problems with Persistence."

In this example, Net::SMTP is invoked with the simplest message possible: "Help!" A new $smtp object is created, and then the message is composed and sent with the mail(), to(),and datasend() methods.

Adding the ability to send a text message to a cell phone is made easy by the LWP module and a Web-based SMS gateway (see Figure 7.2). The example uses a gateway supplied by Sprint for its PCS customers, but other cellular carriers provide similar services and the corresponding URL could be inserted in place of the Sprint site. When doing so, it is usually necessary to find a Web-enabled form and determine the necessary form variables and destination, as in Listing 7.9.

***Insert figure 7.207hpp02.tiffSC

Figure 7.2

Sprint SMS gateway form

Listing 16.9 Excerpt of Sprint SMS Gateway Form

01 <form name="frm" method="post" action="check_message_syntax.html">
02 <b>Recipient Sprint PCS Number<br></b>
03 <input name="mobilenum" size="10" maxlength="10" type="TEXT">
04 <b>Callback Number<br></b>
05 <input name="callbacknum" size="10" maxlength="10" type="TEXT">
06 <b>Message<br></b>
07 <textarea wrap="VIRTUAL" name="message" cols="21" rows="5" onfocus="count_text(this.value)" onchange="count_text2(this.form)" onblur="timer_stop()"></textarea>
08 <b>100 characters maximum<br></b>
09 <b>Email Address for Confirmation<br></b>
10 <input name="ack_add" size="14" type="TEXT">(optional)
11 <input type="image" src="/images/SendButton3.gif" width="40"
12  height="22" border="0">
13 </form>

In this case, the important pieces of information to find are the form action, which is used as the destination URL for the LWP query, and the form variables, which are combined with the phone number and message to make up the query string. In Listing 7.9, the form action is check_message_syntax.html, which can be fully-qualified using the base URL of the form page. The form variables are mobilenum, callbacknum, and message, which are fed to the LWP user agent with their appropriate values.

When posting form variables to a Web page, it's necessary to encode the variables so that illegal characters are escaped as standard sequences of allowable characters. Thus, the message string The server is not responding must be reformatted into The+server+is+not+responding before being sent to the Web server. LWP escapes form variables automatically, but there are some cases in which you must escape the values manually, such as when posting a Location: header to redirect a user to another page.

Logging Results

In this application, the results are logged in a detailed format, but a more abbreviated form could be made available if necessary. The choice could be gathered from command-line options and defaulted to the abbreviated log format. This choice, along with log file locations and other parameters, could just as easily come from a preferences file. Preferences could be read and parsed at startup and stored for later use by the program.

A detailed log entails writing the time and status of each attempted connection to a log file, which then can be analyzed at a later date to determine the relative percentage of uptime. This might be necessary to provide accountability when certifying system uptime to a third party. An additional benefit to a continuous log is the possibility of checking the uptime monitor itself by making sure there are no gaps in the recorded log.

The abbreviated log format involves overwriting a file with the most recent successful connection and keeping a separate file with the times and circumstances of failed connections. One reason to use the abbreviated format is to save disk space; the assumption is that downtime is much more rare than uptime. In a situation with 99 percent uptime overall and a connection check every minute, for instance, the detailed log would gain five megabytes per URL per month, while the abbreviated log would gain only fifty kilobytes per URL per month. Another compelling reason to use the abbreviated log is to reduce the load experienced by log analyzers that need details on failed requests, but that need only a summary of successful requests.

Either way, a server monitor produces lots of good data for analysis, as well as the immediate notification that makes it invaluable.

Text-Processing Example

Producing reams of useful data has a downside. Analyzing text-based logs and reports has generally required the work of a dedicated C program or a patient UNIX shell script writer. Perl, however, was written specifically to solve problems like these with a minimum of effort.

Web server logs are the most readily accessible example of this idea. A Web server generates one line in an access log for every page, image, or script accessed by any user at any time. It takes only a short time for a Web server log to present a challenge that is both daunting and tempting. Many other logs and data sources create similar challenges, which usually go unmet due to time and money constraints.

Downtime Log Analysis

A basic analysis of the logs produced by the server monitor might involve a simple calculation of uptime as a percentage of total time logged. This is the calculation that's usually referred to when servers and applications boast of a 99.99 percent uptime.

Additional levels of analysis can be performed on the same data, including plots of errors as a function of time or a listing of sites with the most errors in a given time period. The opportunities are as rich as the underlying data; this emphasizes the importance of a rapid development environment in which to try many different data models as soon as they are imagined.

Listing 16.

Mon Dec 11 19:21:57 2000 - 200 -
Mon Dec 11 19:24:40 2000 - 200 -
Mon Dec 11 19:25:12 2000 - 500 -
Mon Dec 11 19:28:34 2000 - 404 -
Mon Dec 11 19:28:46 2000 - 200 -
Mon Dec 11 19:29:17 2000 - 500 -
Mon Dec 11 19:40:49 2000 - 500 -

The log that this program reads is formatted with a date, an error code, and a site URL, as in the preceding example. The format is simple, but it's easy to see how thousands of lines of the same type of data could obscure any meaning the data could potentially provide. The log analyzer in Listing 7.10 provides a simple example of how to read the file a line at a time while building a statistical compilation of the data contained therein.

Listing 16.10 Log Analyzer with HTML Output

01 #!/usr/bin/perl
03 #-----------------------------------------
04 # 
05 # - text analysis example
06 # 
07 #-----------------------------------------
09 # include libraries
10 use 5.6.0;
11 use warnings;
12 use strict;
14 # initialize a few variables
15 my %site;
16 my $latest;
17 my $log_file = $ARGV[0];
19 # open the specified log file
20 open (LOG, $log_file) or die " Log file: $!";
22 # check each line
23 while (my $line = <LOG>)
24 {
25   # extract the month, result code and url
26   my ($month, $code, $url) = 
27      $line =~ /^... (...) .+?- (\d\d\d) - (.+?)$/;
28   ($latest) = $line =~ /^([^-]+)-/;
30   # add one to the appropriate site, result, and date hashes
31   $site{$url}->{total}++;
32   $site{$url}->{result}->{$code}->{total}++;
33   $site{$url}->{result}->{$code}->{date}->{$month}->{total}++;
34 }
36 close LOG;
38 # display the collected results by site
39 print "<html>\n";
40 print "<h2>Log Analysis</h2>\n";
41 print "<h3>Site totals:</h3>\n";
42 foreach my $url (sort keys %site)
43 {
44   my $total = $site{$url}->{total} or 1;
45   print "<p><b>$url</b>: $total monitor request(s)\n";
46   print "<ul>\n";
48   # display the site results by code
49   foreach my $code (sort keys %{$site{$url}->{result}})
50   {
51     my $total = $site{$url}->{result}->{$code}->{total};
52     print "<li><b>$code</b>: $total monitor request(s)</li>\n";
53     print "<ul>\n";
55     # display the results by date
56     foreach my $month 
57        (sort keys %{$site{$url}->{result}->{$code}->{date}})
58     {
59       my $total = 
60          $site{$url}->{result}->{$code}->{date}->{$month}->{total};
61       print "<li><b>$month</b>: $total monitor request(s)</li>\n";
62     }
63     print "</ul>\n";
64   }
65   print "</ul>\n";
67   # determine percent uptime
68   my $successes = $site{$url}->{result}->{200}->{total} || 0;
69   my $uptime = sprintf("%2.2f", $successes / $total * 100);
70   print "Percent uptime: <b>$uptime</b></p>\n\n";
71 }
73 # write the summary results to a summary file
74 my $summary_file = "/tmp/log_analysis_summary.txt";
75 open (SUMMARY, ">$summary_file") or die " Summary: $!";
76 # write the summary data
77 foreach my $url (sort keys %site)
78 {
79   foreach my $code (sort keys %{$site{$url}->{result}})
80   {
81     foreach my $month 
82       (sort keys %{$site{$url}->{result}->{$code}->{date}})
83     {
84       my $total = 
85          $site{$url}->{result}->{$code}->{date}->{$month}->{total};
86       print SUMMARY "$url - $code - $month - $total\n";
87     }
88   }
89 }
91 # write the latest time summarized
92 print SUMMARY "Latest: $latest\n";
94 close SUMMARY;
97 print "</html>\n";

Line-By-Line Parsing

Perl provides a basic interface to line-by-line parsing of text files, which works well in a case like this. As the log file is read in, only one line is kept in memory at a time. This allows files to be analyzed even when they are much larger than available memory. It also prevents the resulting data structures from being cumbersome.

Building a Results Hash

As lines are processed, a regular expression is used to extract the relevant data into more understandable chunks, namely $month, $code, and $url. These are then used to determine which data structures to increment; in this case, the data structures are created automatically when data is present to fill them. After it is aggregated, this single data structure holds a summary of the entire log file, no matter what individual dates, codes, or URLs are present. By enabling the data itself to determine the structure of its summary, the program can be applied to more varied situations without the need for a rewrite.

Writing Results as HTML

Printing the results of the aggregated data in a readable format is as easy as producing an HTML list (see Figure 7.3). The lists are nested in the same fashion as the data structure, so a simple set of foreach loops can be used to produce the HTML lists of lists. If a graphic interpretation of the data is desired, you can create images with data points specified by the Perl data structure's values.

***Insert figure 7.307hpp03.tiffSC

Figure 7.3

Log analysis summary in HTML form

Logging Results Summaries

In addition to the human-readable results produced by the log analyzer, you should produce aggregate results that can be reused by the program. By summarizing the results of a long log file and using the results as a base for further summaries, time is saved when analyzing subsequent logs. This time saving is especially important when processing Web access or error logs, which can be too large to process in a single run and might be processed repeatedly over the course of months or years.

When recording summaries for use by the log analyzer, you should write the summary logs in a format that matches the internal data structures as closely as possible. The log analyzer separates results by error, URL, and month and keeps a simple count of the errors recorded for each class. Thus, you should record a separate line in the summary log for each combination of the categories.

Note, however, that aggregate data does not need to be recorded if it can be easily arrived at by manipulating the other recorded data. Functions such as the uptime percentage can be generated on the fly, regardless of whether the data comes from a summary log file or the original logs.

The summary log format used by the server log analyzer would produce output as follows:

Listing 16. - 200 - Dec - 1 - 200 - Dec - 1 - 404 - Dec - 1 - 500 - Dec - 1 - 200 - Dec - 14 - 500 - Dec - 5 - 500 - Dec - 3
Latest: Mon Dec 11 20:05:01 2000

Processing summary log data would be done before any other log data is processed by reading the summary log as though it were the original log, but adding the total value listed instead of a single line.


Perl presents a world of opportunities to Web developers, both in terms of the wealth of programming experience in the Perl community and the sheer number of tools available for the Perl programmer. By digging a little deeper than the buzzwords and headlines, it's possible to find a rich Perl culture that encourages growth and new solutions. Perl is an excellent choice for Web development, and it will remain so for a long time to come.

This is a test.

Page last updated: 15 August 2001