COP4342 - Fall 2008

Assignment #3: Email message filter

Objectives:

Instructions: In this assignment, you are asked to write a simple email filter in Perl.

Simple email messages have a simple format: they start with a set of headers, then a blank line, and then a body. Headers generally follow the form of /^[-a-zA-Z0-9_]+: .*$/ followed by zero or more continuation lines that start with whitespace (usually a tab) /\t.*/.

For instance, here's an email message:

Received: from mail.cs.fsu.edu (mail.cs.fsu.edu [128.186.120.4])
	by newmail.cs.fsu.edu (Postfix) with ESMTP id 06476175D4C
Received: by mail.cs.fsu.edu (Postfix)
	id 95D01F2DC4; Sat,  7 Jun 2008 03:54:40 -0400 (EDT)
Delivered-To: langley
Message-ID: <484A3E21.4090704@fsu.edu>
Date: Sat, 07 Jun 2008 03:52:01 -0400
From: Tom Kitterman
To: nolenet,
	OTC Help Desk Staff
Subject: [Nolenet] Mailman listserv website down
X-fsucs-MailScanner-SpamCheck: not spam, SpamAssassin (cached, score=-2.599,
	required 5, autolearn=not spam, BAYES_00 -2.60)
X-Spam-Status: No

Hi,
There's something wrong with the mailman listserv website on lists.fsu.edu.
This happened  when we moved it to the new hardware.  It's almost
4AM and I've run out of ideas on how to fix it at the moment so I'm 
going home
to get some sleep and try again tomorrow.

So for now that website is non-functional.  The mailman list software is 
processing
messages so this should mostly affect list owners.  Until we get it 
fixed list owners should
open a ticket through the help desk in the normal manner for any 
critical issues.

Sorry for the inconvenience.

Tom K.
_______________________________________________
https://lists.fsu.edu/mailman/listinfo/nolenet

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Your task is to write a filter that reads standard input for one email message (please do not use the diamond operator since it will also read from files listed on the command line and that's not what this filter should do.)

It checks the message to see if there exists a Subject: header and if so, does that Subject: header contain either the marker [SPAM] or the marker {SPAM}; if a Subject: header in the message does contain either of those then no further processing happens (i.e., it "drops the message").

However, if it is not spam, your program should create a file in /tmp that has a filename of the form filter-HOSTNAME-TIMESTAMP where HOSTNAME is the value of running the program hostname and the TIMESTAMP is the time in the format YYYYMMDDHHMMSS. The file should have in it only the body of the message (i.e., just the message body with all headers removed — see the examples below.)

Thus in the above example, the output file would be named /tmp/filter-sophie.cs.fsu.edu-20081016114906 and would have the contents:

Hi,
There's something wrong with the mailman listserv website on lists.fsu.edu.
This happened  when we moved it to the new hardware.  It's almost
4AM and I've run out of ideas on how to fix it at the moment so I'm 
going home
to get some sleep and try again tomorrow.

So for now that website is non-functional.  The mailman list software is 
processing
messages so this should mostly affect list owners.  Until we get it 
fixed list owners should
open a ticket through the help desk in the normal manner for any 
critical issues.

Sorry for the inconvenience.

Tom K.
_______________________________________________
https://lists.fsu.edu/mailman/listinfo/nolenet

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Your Perl program should be named filter-20081016.pl.

Here are a number of example input and output files that you can use for testing:

Sample Input FileCorresponding Sample Output File
1 1
2 NO OUTPUT
3 3

Homework submission

Please email your script filter-20081016.pl as an attachment to langley@cs.fsu.edu by no later than the beginning of class on Tuesday, October 21.