By popular request you can now get automatically generated one-page versions of this document. Since I don't have an HTML to PS filter you have to settle for HTML with possibly dysfunctional links or plain text. OS/2 WebExplorer, Mosaic and Netscape should all be able to print out nice copies of the HTML one-page version.
This set of documents was generated by Genrep.
The content is derived from in-person tutorials that used to be given to people that wanted CGI access on calum. The tutorials began to take over one and a half hours so the online version was written to save time. It has grown considerably since its inception.
Unix heavily influenced this tutorial. Many things mentioned here aren't important on other platforms. They all have their own problems waiting to trouble you.
This tutorial is mostly concerned with PERL and C programs. There is some coverage of shell scripts, but not much. The author only writes CGIs in PERL; C examples have been included because some people think that it is easier to write CGI scripts in a language that they already know than to write them in PERL. They might very well be wrong....
CGI writers that are particularly worried about security should avoid writing to publicly writable directories (such as /tmp). Creating a directory in /tmp is good provided that programs can handle the directory disappearing between invocations of the CGI script. It is easy for malicious people to create symbolic links to important files or directories -- always make sure that the file you open is the file that you wanted to modify.
What follows is a tour of the problems that face CGI programmers and the techniques for preventing each type of abuse. Each sample is shown with PERL and C versions. Frequently neither example applies to shell programming.
system("grep $exp database"); or sprintf(tmp, "grep %s database", exp); system(tmp);has a number of problems. Consider exp with the value ``root /etc/passwd;rm''. Not only does it read the wrong file, it deletes the real database! The simplest solution is to add quotation marks.
system("grep \"$exp\" database"); or sprintf(tmp, "grep \"%s\" database", exp); system(tmp);Neither double nor single quotes actually solve the problem. With double quotes exp could be ```rm -rf /`'', for example. Single quotes avoid this but both suffer from problems like ``'root /etc/passwd;rm'''. The quotation marks match with the ones that will enclose the variable, completely negating their effect.
$exp =~ s/[^\w]/\\\&/g; system("grep \"$exp\" database"); or for(i=0,p=tmp2;exp[i];i++){ if( !normal(exp[i]) ) *(p++)='\\'; *(p++)=exp[i]; } *p=0; sprintf(tmp, "grep \"%s\" database", exp); system(tmp);This solution handles all the problems discussed so far. If exp were ``-i'' we would still run into a problem. ``grep'' would try to find the string ``database'' in its standard input (without case sensitivity). Using the ``-e'' option to grep would prevent this. In general you never want to call a program that cannot tell that an argument isn't a switch unless you can restrict the possible values for exp. GNU utilities are really good this way since they accept ``--'' as an end of switch marker.
system("grep", "-e", $exp, "database"); or [C version not available yet -- uses fork and exec so it needs testing]Calling grep in this manner will prevent a shell from ever being called. It isn't very convenient when shell features (such as globbing) are required, though.
In case like that other approaches can be useful. This one takes advantage of a nice feature of shells:
$ENV{'FOO'} = $exp; system 'grep -ei "$FOO" *.c'; or sprintf(tmp, "FOO=%s", exp); putenv(tmp); system("grep -ei \"$FOO\" *.c");The C version has some hidden traps. It is possible for putenv to fail (it might be a good idea to check its return status) and
tmp
should not be a local variable.
Server-side includes allow all sorts of neat tricks. In general they are easy to set up and safe to run. Unfortunately they are hazardous when combined with CGI scripts that modify HTML.
Any of the following HTML comments would be a security hole:
<!--#exec cmd="rm -rf /"--> <!--#include file="secretfile"-->The second command is not as general as the first (and less likely to be a security hole since the NCSA httpd restricts the content of the file name) but it is included since some servers might have exec disabled.
Disallowing <
and >
will also work; the
input can be rejected or the characters can be escaped.
Removing all comments isn't very difficult either. A careful program
that checks HTML validity would be even better, though.
sh
,
bash
, csh
or tcsh
. This leads to
problems most of the time, but is sometimes worthwhile.
#!/bin/csh -f set foo='*' set bar='`echo hi`' echo $foo $baror the equivalent
sh
program. It will output a list of all
files in your current directory followed by ```echo hi`
''.
Playing with the choice of quotation gets interesting.
The other difficulty that CGI writers will face is that there isn't an easy way to convert URL-encoded text into usable variables. Shells and even sed aren't up to handling this in the general case.
There is an advantage to using shell scripts, however. It can simplify calling programs. The method for evaluating variables and so forth is usually amenable to securely calling other programs.
~
'' (tilde). This can be used to run programs
(amongst other things). In some versions of mail this feature
can be turned off. A better program to use is sendmail. Simpler
mailers such as elm (briefly checked) and PINE (unchecked) may also do
the job safely.
Be careful to send email only to ``safe'' email addresses. If you start
an email address with a ``|
'' (pipe) character then it
might be interpreted as a command to be run. You must carefully read
the documentation of any program that you are going to call with your
CGI script -- as it says at the start of this section, ``there's always
one more stupid thing that can go wrong''.
Redirecting HTTP requests will allow people to get around access control rules. Two potential problems at the University of Waterloo are the Oxford English Dictionary (a copyrighted text) and newsbin (think gigabytes of file transfers).
A less likely problem is redirecting the FILE protocol. It is unlikely since few people would think to implement it. It allows any file readable by the CGI to be accessed ... such as your plans to take over the world or /etc/passwd (most passwords are easily cracked).
To continue the possibilities beyond reason don't forget PUT and DELETE requests ... fortunately most servers aren't configured to accept these methods. Some mechanisms for redirecting HTTP requests that handle both GET and POST requests might allow PUT and DELETE.
Terminating strings with 0s can lead to some interesting problems. Remember that a %00 in the QUERY_STRING will be turned into the string termination character. This can have bizarre side-effects. PERL programs will only suffer from this problem when making system calls (such as open, or stat).
In a previous section we considered the problem of calling the utility grep. This is a bit silly in PERL since we can easily use the regular expression facility in PERL:
while( <FILE> ){ print if /$exp/; }This code will not cause anything nasty to be executed ... PERL was designed to handle this safely. The problem with that code is that an error in
exp
will cause the CGI script to get a
compilation error (which the httpd will probably report as a server
configuration error). This is a poor way to handle incorrect input.
Rather than manually check the syntax of a PERL regular expression we
can have PERL safely check it for us.
&complain("Illegal regexp.") if !defined eval {if("a" =~ /$exp/){}0;};The eval was used as an exception handling mechanism. There are several different ways of invoking eval. That was a secure one. Summarizing from the PERL 5 man pages:
eval $x
or eval "$x"
x
are interpreted as a string of
PERL code and executed. Very unsafe! All
compilation for the eval must be done at eval time.
eval { ... $x ... }
or
eval '... $x ...'
x
is used as a string/number/whatever
inside the code in the curly braces or single quotes. The code can
be compiled at run time.
Using taintperl you can catch many problems (but not all of them!).
Note that you (almost) never need files to be world-writable. Usually a directory can be made world-writable so that the CGI can create a file owned by nobody. Directory permission can be restored afterwards. Figuring out how this relates to file systems with disk quotas is left as an exercise to the reader.
Making scripts SUID is dangerous if you can't trust people that have access to the machine that the script is running on. If you are using a university machine with many users or a commercial internet service provider's machine you definitely don't want to trust the other users. SUID scripts have many more potential security holes than normal CGI scripts.
On some operating systems it is impossible to have a secure SUID shell script. The simplest methods for attacking SUID scripts rely on setting environment variables maliciously. If you have an old version of an operating system then you should research your system to make sure that there are no known security problems. Almost all versions of csh are completely unsafe. (PERL calls csh to evalutate ``<*.h>'' so never use that construct in a SUID PERL program -- taint checks won't catch this problem). Old versions of sh have serious security holes but most sites have upgraded to safer versions.
The program CGIwrap
is a good way to allow users to run CGIs under their
own UID. Make sure that you are using a recent version since earlier
versions of the program lack the latest features and may contain
security holes that have been fixed.