Logo
Department of Electronic and
Computer Engineering

Spam Filtering Proxy Server

About

This page describes a project to implement a Spam filtering proxy server for use with SMTP mail servers.

Many existing Spam filters work on messages after they have been accepted. This often results in Spam messages being deleted or highlighted as Spam. The problem with this course of action is that if any false positives occur messages can disappear and neither the sender or the receiver is any wiser. In the case of highlighting messages the receiver will still have to download and sort all of the Spam.

This project uses a specialised SMTP proxy server to filter mail at the SMTP level before it is accepted from the client. By doing this the recipient is aware whether or not the message has been delivered, and will be informed of any problems. This project also uses a variety of fast DNS and content checks to speed the detection of Spam compared to slower content filters.

Evaluation of the system was done using a specially collected corpus of Spam messages. Since most of the available Spam research corpora do not include SMTP exchange information, but only message contents, these are not suitable for evaluating the system. Therefore, a "honey pot" was set up to generate a Spam corpus that also includes SMTP exchange data. These data are available for download in the "Download" section below.

The system demonstrates performance which is comparable or better than that of widely used Spam filters, such as Spam Assassin.

Reliability graph

The graphs show coverage (percentage of Spam messages detected out of all Spam messages, aka "recall") and reliability (percentage of actual Spam messages out of all messages identified as Spam, aka "precision") depending on the checks used. For details of the checks used, see the "Plugins" section below.

Coverage graph

This level of performance, however, is achieved while spending much less time to check a message compared to Spam Assassin. Not only the system does better than Spam Assassin, but it also works faster. For full evaluation results, please see the report in the "Download" section.

People

Developed by Tom Parrott, supervised by Rinat Khoussainov
Department of Electronic and Computer Engineering, University of Portsmouth, United Kingdom

Installation

Screen shot 3This program is written in Perl and has been developed on a Linux platform (Debian and Fedora). The filter requires a working CPAN setup and MySQL to run. The front end also requires Apache/PHP with MySQL and GD enabled. However setting these up is outside the scope of this document.

Required Perl Modules

The program is written in Perl utilising a number of 3rd party modules, which need to be installed before it will run. These can be installed from CPAN using the command:

perl -MCPAN -e shell
install module-name

Core Modules

Extra Modules (for plugins)

Creating the database structure

The proxy depends on a MySQL database for logging purposes. The required database structure is contained in an SQL file called database.sql in the download.

To import the database execute the following command:

mysql -u root -p < spamproxy/database.sql

Now create an initial config file in /etc/spamproxy.conf

db_host=localhost
db_name=project
db_user=root
db_pass=nospam
db_port=3306

Change the values for each paramter to the correct values for your database. This will allow the program to connect to the database to load the rest of the configuration settings.

Installing the Front End

Screenshot
Copy the frontend directory to your web server's document root. E.g.

cp -R spamproxy/frontend /opt/apache/htdocs/frontend

Note: You should also use Apache's password protection feature to secure this page from authorised access.

Modify the db.php script in frontend with your database details. E.g.

$db_user = "root";
$db_pass = "nospam";
$db_name = "project";
$db_host = "localhost";

Installing the Spam Proxy

To install the Spam Proxy, download the spamproxy.tar.bz2 file into /opt. Then untar the file.

cd /opt
tar jxvf spamproxy.tar.bz2

Now create a system user and group to run the proxy as:

groupadd spamproxy
useradd spamproxy -g spamproxy

Look in /etc/passwd to get the UID and GID of the spamproxy user. Modify /etc/spamproxy.conf and add the UID,GID and home directory:

uid = 500
gid = 500
home_dir = /home/spamproxy

Next add information about your mail server to the /etc/spamproxy.conf file:

#Hostname of proxy
hostname=filter.domain.com

#IP to listen on (0 for all interfaces)
listen_host=0

#Port to listen on
listen_port=25

#Mail server host
mail_host=localhost

#Mail server port
mail_port=1025

You should now be able to run the Spam Proxy. To do this run:

/opt/spamproxy/spamproxy

Check there are no errors as it loads up.

To test that the proxy is working as expected, test that you can connect to it:

telnet [server Host] [server Port]

Check also that you can send an E-Mail through the proxy to test it works.

Screen shot 2

Running the Spam Daemon

The program comes with a control daemon which allows the front end to control the filter service.

To run the spam daemon, run the following command:

/opt/spamproxy/spamdaemon

Configuring the Proxy via the front end

Now that the basic proxy is operational you can enable the plugins. This can be done via the config file, or by using the front end config page.

Each plugin can take a number of arguments, please see the Plugins section below.

Note: Any config options added to the front end config page will override those in the config file.

Here is an example configuration:

#Enabled Plugins
msg_id=count
ptr_check=count
ptr_ip=count
ptr_dynamic=count
ptr_forward=count
ip_blacklist=count,zen.spamhaus.org,bl.spamcop.net
helo_ip=count
helo_dns=count
mail_dns=count
mail_from_check=count
spf_check=count
inline_image=count
spam_assassin=count

#Check date is within 3 hours of now
date_header=count,3

#Grey list messages, or block messages over 4 points
hybrid_grey_list=4

This configuration enables all plugin checks and the grey list option.

Once you are happy with your config, click "Save and Restart". After a few seconds the browser will return to the home page and the status should be "Online".

If this is not the case, consult the system mail log (usually /var/log/mail.log).

Plugins

All of the Spam filtering functionality is provided by a number of plugins which check a certain aspect of the SMTP process for errors or indicators of Spam. When configuring the Spam Proxy each plugin takes different arguments, these are documented below.

Each plugin that can take an $action variable, can be either count,delay,block,close depending on what behaviour you would like to occur if that check fails.

Plugin Options Info
msg_id $action Checks the E-Mail contains a Message-ID header. E.g. msg_id=count
ptr_check $action Checks for a valid PTR record. E.g. ptr_check=block
ptr_ip $action Finds PTR records that contains IP addresses. E.g ptr_ip=count
ptr_dynamic $action Finds PTR records that appear dynamic. E.g. ptr_dynamic=count
ptr_forward $action Checks PTR record has matching forward records. E.g. ptr_forward=count
ip_blacklist $action,$servers Checks IP against multiple blacklist servers. Takes a comma delimited list of black list servers to check against. E.g ip_blacklist=delay,blacklist.com
url_blacklist $action,$ccFile Extracts URL domains from E-Mail and checks them against SURBL. Takes option to file containing country domain codes (Provided in plugins directory). E.g url_blacklist=count,/opt/spamproxy/plugins/cc.txt
helo_ip $action Checks for IP in HELO string. E.g. helo_ip=count
helo_dns $action Verifies DNS of HELO string. E.g. helo_dns=count
mail_dns $action Verifies DNS of MAIL domain. E.g. mail_dns=block
mail_from_check $action Checks MAIL and From header match and are valid. E.g. mail_from_check=count
spf_check $action Checks MAIL domain for valid Sender Protection Framework (SPF) record. E.g. spf_check=block
inline_img $action Finds embedded images in E-Mail message. E.g. inline_img=count
spam_assassin $action Parses message through SpamAssassin filter. E.g. spam_assassin=block
date_header $action,$limit Checks for valid Date header. Takes $limit value of maximum difference in hours. E.g. date_header=count,3
hybrid_grey_list $blockScore Performs grey listing on messages based on error score. Will block messages if they score $blockScore or above. E.g. hybrid_grey_list=4

Download

Use the link below to download a copy of the Spam proxy, data sets and documentation: