Table of Contents


Last updated Mon May 27 13:39:06 EDT 1996.

By popular request you can now get automatically generated one-page versions of this document. Since I don't have an HTML to PS filter you have to settle for HTML with possibly dysfunctional links or plain text. OS/2 WebExplorer, Mosaic and Netscape should all be able to print out nice copies of the HTML one-page version.

This set of documents was generated by Genrep.

CGI Security Tutorial

1 Overview of the Tutorial

This tutorial is not intended to teach people how to write CGI scripts -- it won't even define the term CGI. The focus is on defensive programming techniques that will prevent the abuse of CGI scripts. People can use poorly written CGI scripts to read files that should remain secret from the general public, get shell access on machines running CGIs or simply make the CGI host unusable. Careful programming can prevent most kinds of harm.

The content is derived from in-person tutorials that used to be given to people that wanted CGI access on calum. The tutorials began to take over one and a half hours so the online version was written to save time. It has grown considerably since its inception.

1.1 Assumptions

It is assumed that the reader has permission to execute CGI scripts on some server. Particular importance will be attached to the case where CGIs run with the same userid as the CGI writer. This is not the case with most httpds but it is important for calum users -- the main audience for this document.

Unix heavily influenced this tutorial. Many things mentioned here aren't important on other platforms. They all have their own problems waiting to trouble you.

This tutorial is mostly concerned with PERL and C programs. There is some coverage of shell scripts, but not much. The author only writes CGIs in PERL; C examples have been included because some people think that it is easier to write CGI scripts in a language that they already know than to write them in PERL. They might very well be wrong....

1.2 Contacting the Author

Sending mail to mlvanbie@csclub.uwaterloo.ca will usually get some sort of response within a day or two. If you know of a good non-interactive HTML to PS converter it is possible that PS versions will be provided.

2 Never Trust Anything

The first mistake that many CGI writers make is to assume that they can trust their input. There is almost nothing that can actully be trusted -- not even the httpd that calls the script.

2.1 Input From Forms

Never trust input from forms. The following things are all false:

2.2 Path Information

This is just an extension of the ideas in the previous section -- namely that the path information could be anything at all.

3 File Names

Most of the things in this section should be fairly obvious, but it is easy to forget the basics when there are many other problems to worry about.

3.1 Opening Files

Presumably, any file name that you code into your CGI is safe. File names from forms, PATH_INFO and other sources are suspect. Sometimes it is practical to keep a list of acceptable file names. Otherwise you may need to disallow /s or perhaps just forbid .. and leading /s. Usually you can be very specific about the locations of acceptable files.

3.2 Creating Files

Usually you want to create files with simple names. Limiting characters to A-Za-z0-9_ is pretty safe. Under unix files shouldn't start with .; - is also really bad as are whitespace and shell metacharacters. It is much better to specify a set of valid characters than a list of invalid characters.

CGI writers that are particularly worried about security should avoid writing to publicly writable directories (such as /tmp). Creating a directory in /tmp is good provided that programs can handle the directory disappearing between invocations of the CGI script. It is easy for malicious people to create symbolic links to important files or directories -- always make sure that the file you open is the file that you wanted to modify.

3.2.1 Setting Your umask

The default umask of many httpds is 0 ... any files created by a CGI script will be world-writable by default. The umask should probably be set to 022 (allows others to read the file) or 077 (denies everything to everyone).

4 Calling Programs

Many useful CGI programs call other programs, either custom written or standard unix utilities. Consider how easy it would be to implement a quote searching program with fortune. Unfortunately, most CGI security problems result from calling other programs.

What follows is a tour of the problems that face CGI programmers and the techniques for preventing each type of abuse. Each sample is shown with PERL and C versions. Frequently neither example applies to shell programming.

4.1 The Basic Problem

We will assume that the CGI intends to call grep on a text database and that a form provides the regular expression. Note that in the case of PERL it might actually be simpler to implement grep through regular expressions (and certainly safer). The naïve approach

system("grep $exp database"); or
sprintf(tmp, "grep %s database", exp); system(tmp);

has a number of problems. Consider exp with the value ``root /etc/passwd;rm''. Not only does it read the wrong file, it deletes the real database! The simplest solution is to add quotation marks.

4.2 Quotation Marks Aren't Good Enough

system("grep \"$exp\" database"); or
sprintf(tmp, "grep \"%s\" database", exp); system(tmp);

Neither double nor single quotes actually solve the problem. With double quotes exp could be ```rm -rf /`'', for example. Single quotes avoid this but both suffer from problems like ``'root /etc/passwd;rm'''. The quotation marks match with the ones that will enclose the variable, completely negating their effect.

4.3 Escaping Individual Characters Is Much Better

It is fairly easy to put a ``\'' in front of all the special characters:

$exp =~ s/[^\w]/\\\&/g; system("grep \"$exp\" database"); or

for(i=0,p=tmp2;exp[i];i++){ if( !normal(exp[i]) ) *(p++)='\\'; *(p++)=exp[i]; }
*p=0; sprintf(tmp, "grep \"%s\" database", exp); system(tmp);

This solution handles all the problems discussed so far. If exp were ``-i'' we would still run into a problem. ``grep'' would try to find the string ``database'' in its standard input (without case sensitivity). Using the ``-e'' option to grep would prevent this. In general you never want to call a program that cannot tell that an argument isn't a switch unless you can restrict the possible values for exp. GNU utilities are really good this way since they accept ``--'' as an end of switch marker.

4.4 There Are Better Ways

It is unnecessary to escape characters if you invoke programs in a different way:

system("grep", "-e", $exp, "database"); or
[C version not available yet -- uses fork and exec so it needs testing]

Calling grep in this manner will prevent a shell from ever being called. It isn't very convenient when shell features (such as globbing) are required, though.

In case like that other approaches can be useful. This one takes advantage of a nice feature of shells:

$ENV{'FOO'} = $exp; system 'grep -ei "$FOO" *.c'; or
sprintf(tmp, "FOO=%s", exp); putenv(tmp); system("grep -ei \"$FOO\" *.c");

The C version has some hidden traps. It is possible for putenv to fail (it might be a good idea to check its return status) and tmp should not be a local variable.

5 Server-side Includes

This document is only accurate for the NCSA httpd; I don't know of any other httpd that handles server-side includes.

Server-side includes allow all sorts of neat tricks. In general they are easy to set up and safe to run. Unfortunately they are hazardous when combined with CGI scripts that modify HTML.

5.1 The Problem

Consider the case of a guestbook. Many people have them although few actually serve a useful purpose. Most guestbook CGIs don't check their input for HTML tags. This allows people to include inlined images and anchors -- neither of which is a problem (except for HTML integrity). If server-side includes are enabled for the guestbook then there is potential for abuse.

Any of the following HTML comments would be a security hole:

<!--#exec cmd="rm -rf /"-->
<!--#include file="secretfile"-->

The second command is not as general as the first (and less likely to be a security hole since the NCSA httpd restricts the content of the file name) but it is included since some servers might have exec disabled.

5.2 The Solutions

There are several different ways of handling this problem. The simplest is to make sure that your server will not attempt to parse the document for server-side includes.

Disallowing < and > will also work; the input can be rejected or the characters can be escaped. Removing all comments isn't very difficult either. A careful program that checks HTML validity would be even better, though.

6 Shell Scripts

People frequently attempt to write CGIs in sh, bash, csh or tcsh. This leads to problems most of the time, but is sometimes worthwhile.

6.1 Basic Problems

Order of evaluation is a serious problem. If you don't know just how your shell will interpret variable substitution, backticks and other fun things you are in danger of having your program behave in unexpected ways. As a brief example consider the program
#!/bin/csh -f

set foo='*'
set bar='`echo hi`'

echo $foo $bar

or the equivalent sh program. It will output a list of all files in your current directory followed by ```echo hi`''. Playing with the choice of quotation gets interesting.

The other difficulty that CGI writers will face is that there isn't an easy way to convert URL-encoded text into usable variables. Shells and even sed aren't up to handling this in the general case.

There is an advantage to using shell scripts, however. It can simplify calling programs. The method for evaluating variables and so forth is usually amenable to securely calling other programs.

7 Yet More Silly Things

Axiomatically there is always one more stupid thing that can go wrong....

7.1 Mail

Many people write CGI scripts that send email containing user input. Sending arbitrary input through a mail program can be dangerous! The Unix program mail specially interprets lines that begin with the character ``~'' (tilde). This can be used to run programs (amongst other things). In some versions of mail this feature can be turned off. A better program to use is sendmail. Simpler mailers such as elm (briefly checked) and PINE (unchecked) may also do the job safely.

Be careful to send email only to ``safe'' email addresses. If you start an email address with a ``|'' (pipe) character then it might be interpreted as a command to be run. You must carefully read the documentation of any program that you are going to call with your CGI script -- as it says at the start of this section, ``there's always one more stupid thing that can go wrong''.

7.2 Redirecting HTTP Requests

Occaisionally one wants to write a program that accepts a URL and fetches the contents URL. Ka-Ping Yee's Shodouka program is an excellent example. Even assuming that you code a good web library (or borrow one -- both the CERN/W3O libwww and the libwww-perl are quite good) there are still potential problems.

Redirecting HTTP requests will allow people to get around access control rules. Two potential problems at the University of Waterloo are the Oxford English Dictionary (a copyrighted text) and newsbin (think gigabytes of file transfers).

A less likely problem is redirecting the FILE protocol. It is unlikely since few people would think to implement it. It allows any file readable by the CGI to be accessed ... such as your plans to take over the world or /etc/passwd (most passwords are easily cracked).

To continue the possibilities beyond reason don't forget PUT and DELETE requests ... fortunately most servers aren't configured to accept these methods. Some mechanisms for redirecting HTTP requests that handle both GET and POST requests might allow PUT and DELETE.

7.3 Limitations of C

Most C programs tend to have arbitrary limits on array sizes. Programming carelessly will problably just lead to seg faults. However, one should remember that the security holes in NCSA httpd resulted from code that didn't remember array bounds. Clever crackers can corrupt your program's stack so that it executes functions such as system instead of crashing.

Terminating strings with 0s can lead to some interesting problems. Remember that a %00 in the QUERY_STRING will be turned into the string termination character. This can have bizarre side-effects. PERL programs will only suffer from this problem when making system calls (such as open, or stat).

7.4 Lack of Limitations in PERL

PERL gives the CGI programmer just about everything that she needs ... including a rope long enough to hang herself with.

In a previous section we considered the problem of calling the utility grep. This is a bit silly in PERL since we can easily use the regular expression facility in PERL:

while( <FILE> ){ print if /$exp/; }

This code will not cause anything nasty to be executed ... PERL was designed to handle this safely. The problem with that code is that an error in exp will cause the CGI script to get a compilation error (which the httpd will probably report as a server configuration error). This is a poor way to handle incorrect input. Rather than manually check the syntax of a PERL regular expression we can have PERL safely check it for us.

&complain("Illegal regexp.") if !defined eval {if("a" =~ /$exp/){}0;};

The eval was used as an exception handling mechanism. There are several different ways of invoking eval. That was a secure one. Summarizing from the PERL 5 man pages:
eval $x or eval "$x"
The contents of x are interpreted as a string of PERL code and executed. Very unsafe! All compilation for the eval must be done at eval time.
eval { ... $x ... } or eval '... $x ...'
This is safe ... x is used as a string/number/whatever inside the code in the curly braces or single quotes. The code can be compiled at run time.

Using taintperl you can catch many problems (but not all of them!).

7.5 SUID CGI Scripts and CGIwrap

The section is the last one in the tutorial, but it is still important. Most httpds do not change user ID to a CGI script's owner. Instead they run the program as ``nobody'' or use a program like CGIwrap to change user ID. CGI scripts available on the net (guest books, counters and less trivial programs) assume that the CGI script will be run as nobody so they require either files to be world-writable or CGIs to be SUID.

Note that you (almost) never need files to be world-writable. Usually a directory can be made world-writable so that the CGI can create a file owned by nobody. Directory permission can be restored afterwards. Figuring out how this relates to file systems with disk quotas is left as an exercise to the reader.

Making scripts SUID is dangerous if you can't trust people that have access to the machine that the script is running on. If you are using a university machine with many users or a commercial internet service provider's machine you definitely don't want to trust the other users. SUID scripts have many more potential security holes than normal CGI scripts.

On some operating systems it is impossible to have a secure SUID shell script. The simplest methods for attacking SUID scripts rely on setting environment variables maliciously. If you have an old version of an operating system then you should research your system to make sure that there are no known security problems. Almost all versions of csh are completely unsafe. (PERL calls csh to evalutate ``<*.h>'' so never use that construct in a SUID PERL program -- taint checks won't catch this problem). Old versions of sh have serious security holes but most sites have upgraded to safer versions.

The program CGIwrap is a good way to allow users to run CGIs under their own UID. Make sure that you are using a recent version since earlier versions of the program lack the latest features and may contain security holes that have been fixed.