This page describes a project to implement a Spam filtering proxy server for use with SMTP mail servers.
Many existing Spam filters work on messages after they have been accepted. This often results in Spam messages being deleted or highlighted as Spam. The problem with this course of action is that if any false positives occur messages can disappear and neither the sender or the receiver is any wiser. In the case of highlighting messages the receiver will still have to download and sort all of the Spam.
This project uses a specialised SMTP proxy server to filter mail at
the SMTP level before it is accepted from the client. By doing this the
recipient is aware whether or not the message has been delivered, and
will be informed of any problems. This project also uses a variety of
fast DNS and content checks to speed the detection of Spam compared to
slower content filters.
Evaluation
of the system was done using a specially collected corpus of Spam
messages. Since most of the available Spam research corpora do not
include SMTP exchange information, but only message contents, these are
not suitable for evaluating the system. Therefore, a "honey pot" was
set up to generate a Spam corpus that also includes SMTP exchange data.
These data are available for download in the "Download" section below.
The system demonstrates performance which is comparable or better
than that of widely used Spam filters, such as Spam Assassin.
The graphs show coverage (percentage of Spam messages detected out
of all Spam messages, aka "recall") and reliability (percentage of
actual Spam messages out of all messages identified as Spam, aka
"precision") depending on the checks used. For details of the checks
used, see the "Plugins" section below.
This level of performance, however, is achieved while spending much
less time
to check a message compared to Spam Assassin. Not only the system does
better than Spam Assassin, but it also works faster. For full
evaluation results,
please see the report in the "Download" section.
Developed by Tom Parrott,
supervised by Rinat
Khoussainov
Department of Electronic and Computer Engineering, University of
Portsmouth, United Kingdom
This
program is written in Perl and has been developed on a Linux
platform (Debian and Fedora). The filter requires a working CPAN setup
and MySQL to run. The front end also requires Apache/PHP with MySQL and
GD enabled. However setting these up is outside the scope of this
document.
The program is written in Perl utilising a number of 3rd party modules, which need to be installed before it will run. These can be installed from CPAN using the command:
perl -MCPAN -e shell
install module-name
Core Modules
Extra Modules (for plugins)
The proxy depends on a MySQL database for logging purposes. The required database structure is contained in an SQL file called database.sql in the download.
To import the database execute the following command:
mysql -u root -p < spamproxy/database.sql
Now create an initial config file in /etc/spamproxy.conf
db_host=localhost
db_name=project
db_user=root
db_pass=nospam
db_port=3306
Change the values for each paramter to the correct values for your database. This will allow the program to connect to the database to load the rest of the configuration settings.
Copy the frontend directory to your web server's document root. E.g.
cp -R spamproxy/frontend /opt/apache/htdocs/frontend
Note: You should also use Apache's password protection feature to secure this page from authorised access.
Modify the db.php script in frontend with your database details. E.g.
$db_user = "root";
$db_pass = "nospam";
$db_name = "project";
$db_host = "localhost";
To install the Spam Proxy, download the spamproxy.tar.bz2 file into /opt. Then untar the file.
cd /opt
tar jxvf spamproxy.tar.bz2
Now create a system user and group to run the proxy as:
groupadd spamproxy
useradd spamproxy -g spamproxy
Look in /etc/passwd to get the UID and GID of the spamproxy user. Modify /etc/spamproxy.conf and add the UID,GID and home directory:
uid = 500
gid = 500
home_dir = /home/spamproxy
Next add information about your mail server to the /etc/spamproxy.conf file:
#Hostname of proxy
hostname=filter.domain.com
#IP to listen on (0 for all interfaces)
listen_host=0
#Port to listen on
listen_port=25
#Mail server host
mail_host=localhost
#Mail server port
mail_port=1025
You should now be able to run the Spam Proxy. To do this run:
/opt/spamproxy/spamproxy
Check there are no errors as it loads up.
To test that the proxy is working as expected, test that you can connect to it:
telnet [server Host] [server Port]
Check also that you can send an E-Mail through the proxy to test it works.
The program comes with a control daemon which allows the front end to control the filter service.
To run the spam daemon, run the following command:
/opt/spamproxy/spamdaemon
Now that the basic proxy is operational you can enable the plugins. This can be done via the config file, or by using the front end config page.
Each plugin can take a number of arguments, please see the Plugins section below.
Note: Any config options added to the front end config page will override those in the config file.
Here is an example configuration:
#Enabled Plugins
msg_id=count
ptr_check=count
ptr_ip=count
ptr_dynamic=count
ptr_forward=count
ip_blacklist=count,zen.spamhaus.org,bl.spamcop.net
helo_ip=count
helo_dns=count
mail_dns=count
mail_from_check=count
spf_check=count
inline_image=count
spam_assassin=count
#Check date is within 3 hours of now
date_header=count,3
#Grey list messages, or block messages over 4 points
hybrid_grey_list=4
This configuration enables all plugin checks and the grey list option.
Once you are happy with your config, click "Save and Restart". After a few seconds the browser will return to the home page and the status should be "Online".
If this is not the case, consult the system mail log (usually /var/log/mail.log).
All of the Spam filtering functionality is provided by a number of plugins which check a certain aspect of the SMTP process for errors or indicators of Spam. When configuring the Spam Proxy each plugin takes different arguments, these are documented below.
Each plugin that can take an $action variable, can be either count,delay,block,close depending on what behaviour you would like to occur if that check fails.
| Plugin | Options | Info |
|---|---|---|
| msg_id | $action | Checks the E-Mail contains a Message-ID header. E.g. msg_id=count |
| ptr_check | $action | Checks for a valid PTR record. E.g. ptr_check=block |
| ptr_ip | $action | Finds PTR records that contains IP addresses. E.g ptr_ip=count |
| ptr_dynamic | $action | Finds PTR records that appear dynamic. E.g. ptr_dynamic=count |
| ptr_forward | $action | Checks PTR record has matching forward records. E.g. ptr_forward=count |
| ip_blacklist | $action,$servers | Checks IP against multiple blacklist servers. Takes a comma delimited list of black list servers to check against. E.g ip_blacklist=delay,blacklist.com |
| url_blacklist | $action,$ccFile | Extracts URL domains from E-Mail and checks them against SURBL. Takes option to file containing country domain codes (Provided in plugins directory). E.g url_blacklist=count,/opt/spamproxy/plugins/cc.txt |
| helo_ip | $action | Checks for IP in HELO string. E.g. helo_ip=count |
| helo_dns | $action | Verifies DNS of HELO string. E.g. helo_dns=count |
| mail_dns | $action | Verifies DNS of MAIL domain. E.g. mail_dns=block |
| mail_from_check | $action | Checks MAIL and From header match and are valid. E.g. mail_from_check=count |
| spf_check | $action | Checks MAIL domain for valid Sender Protection Framework (SPF) record. E.g. spf_check=block |
| inline_img | $action | Finds embedded images in E-Mail message. E.g. inline_img=count |
| spam_assassin | $action | Parses message through SpamAssassin filter. E.g. spam_assassin=block |
| date_header | $action,$limit | Checks for valid Date header. Takes $limit value of maximum difference in hours. E.g. date_header=count,3 |
| hybrid_grey_list | $blockScore | Performs grey listing on messages based on error score. Will block messages if they score $blockScore or above. E.g. hybrid_grey_list=4 |
Use the link below to download a copy of the Spam proxy, data sets and documentation: