bmf

bmf (Bayesian Mail Filter) 0.9.4 fork + patches
git clone git://git.codemadness.org/bmf
Log | Files | Refs | README | LICENSE

README (5357B)


      1 		bmf -- Bayesian Mail Filter
      2 
      3 About bmf
      4 =========
      5 
      6 This is a mail filter which uses the Bayes algorithm as explained in Paul
      7 Graham's article "A Plan for Spam".  It aims to be faster, smaller, and more
      8 versatile than similar applicatios.  Implementation is ANSI C and uses POSIX
      9 functions.  Supported platforms are (in theory) all POSIX systems. Support
     10 for win32 is undecided.
     11 
     12 This project provides features which are not available in other filters:
     13 
     14 (1) Independence from external programs and libraries.  Tokens are stored in
     15 memory using simple vectors which require no heavyweight external data
     16 structure libraries.  Multiple token database formats are supported,
     17 including flat files, libdb, and mysql.  Conversion between formats will
     18 always be possible with the included import/export utility and flat files
     19 will always remain an option.
     20 
     21 (2) Efficient processing.  Input data is parsed by a handcrafted parser
     22 which weighs in under 3% of the equivalent code generated by flex.  No
     23 portion of the input is ever copied and all i/o and memory allocation are
     24 done in large chunks.  Updated token lists are merged and written in one
     25 step.  Hashing is being considered for the next version to improve lookup
     26 speed.
     27 
     28 (3) Simple and elegant implementation.  No heavyweight, copy-intensive mime
     29 decoding routines are used.  Decoding of quoted-printable text for selected
     30 mime types is being considered for the next version.
     31 
     32 Note: the core filter function is from esr's bogofilter v0.6 (available at
     33 http://sourceforge.net/projects/bogofilter/) with bugfix updates.
     34 
     35 For the most recent version of this software, see: 
     36 
     37 	http://sourceforge.net/projects/bmf/
     38 
     39 How to integrate bmf
     40 ====================
     41 
     42 The following procmail recipes will invoke bmf for each incoming email and
     43 place spam into $MAILDIR/spam.  The first sample invokes bmf in its normal
     44 mode of operation and the second invokes bmf as a filter.
     45 
     46 	### begin sample one ###
     47 	# Invoke bmf and use return code to filter spam in one step
     48 	:0HB
     49 	* ? bmf
     50 	| formail -A"X-Spam-Status: Yes, tests=bmf" >>$MAILDIR/spam
     51 
     52 	### begin sample two ###
     53 	# Invoke bmf as a filter
     54 	:0 fw
     55 	| bmf -p
     56 
     57 	# Filter spam
     58 	:0:
     59 	^X-Spam-Status: Yes
     60 	$MAILDIR/spam
     61 
     62 The following maildrop equivalents are suggested by Christian Kurz.
     63 
     64 	### begin sample one ###
     65 	# Invoke bmf and use return code to filter spam in one step
     66 	exception {
     67 		`bmf`
     68 		if ( $RETURNCODE == 0 )
     69 			to $MAILDIR/spam
     70 	}
     71 
     72 	### begin sample two ###
     73 	# Invoke bmf as a filter
     74 	exception {
     75 		xfilter "bmf -p"
     76 		if (/^X-Stam-Status: Yes/)
     77 			to $MAILDIR/spam
     78 	}
     79 
     80 
     81 If you put bmf in your procmail or maildrop scripts as suggested above, it
     82 will always register an email as either spam or non-spam.  To reverse this
     83 registration and train bmf, the following mutt macros may be useful:
     84 
     85   macro index \ed "<enter-command>unset wait_key\n<pipe-entry>bmf -S\n<enter-command>set wait_key\n<save-message>=spam\n"
     86   macro index \et "<enter-command>unset wait_key\n<pipe-entry>bmf -t\n<enter-command>set wait_key\n"
     87   macro index \eu "<enter-command>unset wait_key\n<pipe-entry>bmf -N\n<enter-command>set wait_key\n<save-message>=inbox\n"
     88 
     89 These will override these commands:
     90 
     91   <Esc>d = de-register as non-spam, register as spam, and move to spam folder.
     92   <Esc>t = test for spamicity.
     93   <Esc>u = de-register as spam, register as non-spam, and move to inbox folder.
     94 
     95 Alternatively, if you use gnus you could add the following lines to your
     96 .gnus to accomplish a similar result:
     97 
     98 (defun spam ()
     99   (interactive)
    100     (pipe-message "/usr/local/bin/bmf -S")
    101     (gnus-summary-move-article 1 "nnml:Spam"))
    102 
    103 (defun notspam ()
    104   (interactive)
    105     (pipe-message "/usr/local/bin/bmf -N")
    106     (gnus-summary-move-article 1 "nnml:inbox"))
    107 
    108 (add-hook
    109   'gnus-sum-load-hook
    110   (lambda nil
    111     (define-key gnus-summary-mode-map (read-kbd-macro "C-c C-o") 'spam)
    112     (define-key gnus-summary-mode-map (read-kbd-macro "C-c C-p") 'notspam)))
    113 
    114 How to train bmf
    115 ================
    116 
    117 First, please keep in mind that bmf "learns" how to recognize spam from the
    118 input that you give it.  It works best if you give it exactly the email that
    119 you receive, or have received in the recent past.
    120 
    121 Here are some good techniques for training bmf:
    122 
    123   - If you keep a history of email that you have received, use your current
    124     and/or saved emails.  It is fairly easy to create a small shell script
    125     that will pass all of your normal email to "bmf -n" and all of your spam
    126     to "bmf -s".  Note that if you do not use the mbox storage format, you
    127     MUST invoke bmf exactly once per email.  Using "cat * | bmf -n" will NOT
    128     work properly because bmf sees the entire input as one big email.
    129 
    130   - If you already use spamassassin, you can use it to train bmf for a
    131     couple of days or weeks.  If spamassassin tags it as spam, run it
    132     through "bmf -s".  If not, run it through "bmf -n".  This can be
    133     automated with procmail or maildrop recipes.
    134 
    135 Here are some things that you should NOT do:
    136 
    137   - Get impatient with the training process and repeatedly pass one email
    138     through "bmf -s".
    139 
    140   - Manually move words around between lists and/or adjust the word counts.
    141 
    142 Final words
    143 ===========
    144 
    145 Thanks for trying bmf.  If you have any problems, comments, or suggestions,
    146 please direct them to the bmf mailing list, bmf-user@lists.sourceforge.net. 
    147 
    148 							Tom Marshall
    149 							20 Oct 2002