Having done some work developing web applications I was struck by how complex they can be compared to other kinds of software. In particular, I think that using so-called middleware to connect the web server to the database is particularly awkward. In the real world we try to cut out the “middlemen” whenever we can, so why not in our software designs too?

There are few programming languages more powerful or scalable than Structured Query Language (SQL), especially in implementations as advanced as that found in PostgreSQL. Surely, harnessing that power to create the web application which presents the underlying database to the world wide web would be the logical thing to do. Initially, a CGI script was used to connect the Apache web server directly to the database backends. This simple shell script performs the task.


#!/bin/bash

HOST=server
DATABASE=database
SYSTEM="$DATABASE@$HOST"

ADDR=${REMOTE_ADDR:-$1}
VARS=${HTTP_COOKIE:-$2}
ARGS=${QUERY_STRING:-$3}
USER=${HTTP_USER_AGENT:-$4}

read -t 5 -n ${CONTENT_LENGTH:-0} DATA

PSQL="/usr/bin/psql --quiet --pset tuples_only --pset format=unaligned"

FN="CGI('$SYSTEM','$ADDR','$VARS','$ARGS','$DATA','$USER')"

$PSQL --host $HOST --command "SELECT $FN" $DATABASE


This script gathers a few HTTP variables from the environment (or command line) and directly calls a PL/PGSQL function with them. It is gloriously simple, but not very fast. It is not very fast because it has to establish a fresh session for every page request and in PostgreSQL, that means recompiling the functions and replanning the associated queries for each request.

A much faster solution was developed using the Apache 1.3 module API. An Apache module can establish persistent connections with the PostgreSQL databases driving the web application and the database backends. It takes a little more setting up and uses a few more resources than a CGI script, but the results are definitely worth it. Now the server can deliver millions of different dynamic pages at least as fast as if they were stored as static files in an advanced file system.

Download the source code for the module from mod_libpq.c. The source file contains copious notes about installing and using the module. This documentation is also reproduced here for the reader’s convenience. It has been tested on the author’s Debian GNU/Linux “etch” development system with Apache 1.3.34 and PostgreSQL 8.0.3. You will need to install the development libraries for these applications in order to compile the module.

Configure Apache by adding something like the following to your httpd.conf file where the domain name of the database host is ‘server’ and the name of the database is ‘database’. The database host may or may not be the same machine as the web server host. Remember to adjust your pg_hba.conf file to allow the web server to access the database from wherever it is located.


LoadModule libpq_module /usr/lib/apache/1.3/mod_libpq.so

<Location "index.html">
    SetHandler libpq-request
</Location>

# mandatory parameter
libpqConnection "host=server dbname=database"


# optional parameters
libpqConnectWhen Later
libpqHeadersQuery "SELECT cgi.Headers($1,$2,$3,$4,$5,$6,$7)"
libpqContentQuery "SELECT cgi.Content($1)"


Compile, install and start the module with the following commands:


/usr/bin/apxs -c -I /usr/include/postgresql/ -lpq -o mod_libpq.so mod_libpq.c
su -c '/usr/bin/install -s -m 644 mod_libpq.so /usr/lib/apache/1.3/'
su -c '/etc/init.d/apache reload'


You will also need to install two functions in the database and ensure that they can be used by the user under which the web server is running. A marginally less than minimal implementation is shown below, and a complete web application which supports searching and editing of records from a database table can be downloaded from example-modlibpq.sql.


CREATE SCHEMA cgi;

SET SEARCH_PATH TO cgi;


-- preprocess the request and return a unique token for the cookie

CREATE FUNCTION Headers
(
    aSystem TEXT,
    aAddress TEXT,
    aCookie TEXT,
    aQuery TEXT,
    aInput TEXT,
    aBrowser TEXT,
    aReferer TEXT
)
    RETURNS TEXT AS $$
DECLARE
    vToken TEXT;
BEGIN
    --
    -- an 'x=' prefix is required for some browsers to recognise cookies
    --
    vToken := 'token='||TEXT(CURRENT_TIMESTAMP);
    RETURN vToken;
END;
$$ LANGUAGE plpgsql VOLATILE;


-- generate the response for the request identified by the token

CREATE FUNCTION Content
(
    aToken TEXT
)
    RETURNS TEXT AS $$
DECLARE
    vDocument TEXT;
BEGIN
    vDocument := 'Hello world!\nYour cookie is "'||aToken||'"';
    RETURN vDocument;
END;
$$ LANGUAGE plpgsql VOLATILE;


If the reader wishes to host several database driven web applications on the one server, this can be accomplished by defining additional server configurations as virtual hosts. However please be aware that each time Apache spawns a child process, the child may create a persistent connection to every database that you have configured. For example, if there are 3 databases being served and Apache has spawned 10 child processes, there could be 30 database connections open at that time.

To reduce the wastage in situations like this, point all the web applications to one master database, and then use the libpqSystem command in each of the virtual host definitions. The value set by the libpqSystem command is passed to the SQL function which can take the appropriate action. Typically, a master database is used to process the request headers and perform authentication and logging, and then when the web server asks for the document content, the request can be relayed by the master database to another host via a pool of connections that was created with the PostgreSQL dblink library for handling inter-database queries.

Creating and maintaining sophisticated web applications that are written purely in SQL is a very interesting area of application in itself and there will be more code posted here on the subject in the near future.