Blocking Spam By Separating End-User Machines from Legitimate Mail Server Machines

Various studies have shown that spam messages sent from botnets accounted for above 80% of all spam messages on the Internet in recent years. For example, the MessageLabs Intelligence 2010 annual security report showed that approximately 88.2% of all spam in 2010 were sent from botnets. Spamming botnets present a significant challenge in the control of spam messages because of the sheer volume and wide spread of their members. These two natures of spamming botnets render many anti-spam techniques less effective, such as the reactive DNS-based blacklists (DNSBL) and reputation-based spam filtering schemes. More importantly, the majority of existing anti-spam schemes allow for the arms race between the spammers and the anti-spam community, and they encourage spammers to recruit large number of spamming bots and explore novel usage of these spamming bots.

Rather than allowing for the arms race between spammers and the anti-spam community in the war of botnet-based spamming, in this project we develop a novel scheme that targets the root cause of the problem to discourage (and ideally, to prohibit) spammers from using botnets to send spam messages. A key observation that motivates this approach is that the majority of spamming bots are end-user (EU) machines instead of legitimate mail server (LMS) machines, given that legitimate (mail) server machines are normally well protected and less likely to be compromised. A legitimate email message is normally composed on an EU machine and then delivered to the local mail server of the sender network domain, from where it is further delivered to the recipient mail server. In contrast, spam messages originated from spamming botnets are normally directly delivered from the EU machines (where a customized mail server software is installed) to the recipient mail servers. Figures 1 and 2 show the difference in the message delivery path of normal emails vs spam messages. By blocking messages directly delivered from remote EU machines, we can effectively prohibit spammers from using compromised machines to send spam messages (see Figure 1)

Blocking spam from spamming bots

Figure 1. Blocking messages from remote end-user machines

In this project we aim to develop a lightweight yet effective scheme to distinguish EU machines from LMS machines so that messages delivered from remote EU machines can be blocked. Many features can be used to determine if a sending machine is an EU machine or an LMS machine. However, in this project we focus on the features of a sending machine that cannot be easily manipulated by a spammer, and are already available at or can be easily obtained by a recipient mail server. In particular, we consider two types of features associated with a sending machine: the operating system (OS) and the hostname lexical structure of the sending machine. Based on the OS and hostname lexical features of sending machines, we develop a lightweight Support Vector Machine (SVM) based classifier to separate EU machines from LMS machines.

We evaluate the efficacy and effectiveness of the classifier using real-world data sets, and compare the performance of the developed classifier with eight commonly used DNSBL systems. The evaluation studies show that our SVM-based classifier has a very high detection accuracy (percentage of machines that are classified correctly) with very low false positive and false negative rates. For example, on an aggregated data set containing both EU machines and LMS machines, on average, the SVM-based classifier can achieve a 99.25% detection accuracy, with a false positive rate of 0.35% and false negative rate of 1.27%, significantly outperforming all eight DNSBLs considered in the study.

The following table shows the performance comparison between the SVM-based classifier and eight DNSBLs in identifying spamming machines.

Performance comparison