bmf

bmf (Bayesian Mail Filter) 0.9.4 fork + patches
git clone git://git.codemadness.org/bmf
Log | Files | Refs | README | LICENSE

README (5151B)


      1 		bmf -- Bayesian Mail Filter
      2 
      3 About bmf
      4 =========
      5 
      6 This is a mail filter which uses the Bayes algorithm as explained in Paul
      7 Graham's article "A Plan for Spam".  It aims to be faster, smaller, and more
      8 versatile than similar applications.  Implementation is ANSI C and uses POSIX
      9 functions.  Supported platforms are (in theory) all POSIX systems.
     10 
     11 This project provides features which are not available in other filters:
     12 
     13 (1) Independence from external programs and libraries.  Tokens are stored in
     14 memory using simple vectors which require no heavyweight external data
     15 structure libraries. The tokens are stored in plain-text "flat" files.
     16 
     17 (2) Efficient processing.  Input data is parsed by a handcrafted parser
     18 which weighs in under 3% of the equivalent code generated by flex.  No
     19 portion of the input is ever copied and all i/o and memory allocation are
     20 done in large chunks.  Updated token lists are merged and written in one
     21 step.  Hashing is being considered for the next version to improve lookup
     22 speed.
     23 
     24 (3) Simple and elegant implementation.  No heavyweight, copy-intensive mime
     25 decoding routines are used.  Decoding of quoted-printable text for selected
     26 mime types is being considered for the next version.
     27 
     28 Note: the core filter function is from esr's bogofilter v0.6 (available at
     29 http://sourceforge.net/projects/bogofilter/) with bugfix updates.
     30 
     31 For the most recent version of this software, see: 
     32 
     33 	http://sourceforge.net/projects/bmf/
     34 
     35 How to integrate bmf
     36 ====================
     37 
     38 The following procmail recipes will invoke bmf for each incoming email and
     39 place spam into $MAILDIR/spam.  The first sample invokes bmf in its normal
     40 mode of operation and the second invokes bmf as a filter.
     41 
     42 	### begin sample one ###
     43 	# Invoke bmf and use return code to filter spam in one step
     44 	:0HB
     45 	* ? bmf
     46 	| formail -A"X-Spam-Status: Yes, tests=bmf" >>$MAILDIR/spam
     47 
     48 	### begin sample two ###
     49 	# Invoke bmf as a filter
     50 	:0 fw
     51 	| bmf -p
     52 
     53 	# Filter spam
     54 	:0:
     55 	^X-Spam-Status: Yes
     56 	$MAILDIR/spam
     57 
     58 The following maildrop equivalents are suggested by Christian Kurz.
     59 
     60 	### begin sample one ###
     61 	# Invoke bmf and use return code to filter spam in one step
     62 	exception {
     63 		`bmf`
     64 		if ( $RETURNCODE == 0 )
     65 			to $MAILDIR/spam
     66 	}
     67 
     68 	### begin sample two ###
     69 	# Invoke bmf as a filter
     70 	exception {
     71 		xfilter "bmf -p"
     72 		if (/^X-Stam-Status: Yes/)
     73 			to $MAILDIR/spam
     74 	}
     75 
     76 
     77 If you put bmf in your procmail or maildrop scripts as suggested above, it
     78 will always register an email as either spam or non-spam.  To reverse this
     79 registration and train bmf, the following mutt macros may be useful:
     80 
     81   macro index \ed "<enter-command>unset wait_key\n<pipe-entry>bmf -S\n<enter-command>set wait_key\n<save-message>=spam\n"
     82   macro index \et "<enter-command>unset wait_key\n<pipe-entry>bmf -t\n<enter-command>set wait_key\n"
     83   macro index \eu "<enter-command>unset wait_key\n<pipe-entry>bmf -N\n<enter-command>set wait_key\n<save-message>=inbox\n"
     84 
     85 These will override these commands:
     86 
     87   <Esc>d = de-register as non-spam, register as spam, and move to spam folder.
     88   <Esc>t = test for spamicity.
     89   <Esc>u = de-register as spam, register as non-spam, and move to inbox folder.
     90 
     91 Alternatively, if you use gnus you could add the following lines to your
     92 .gnus to accomplish a similar result:
     93 
     94 (defun spam ()
     95   (interactive)
     96     (pipe-message "/usr/local/bin/bmf -S")
     97     (gnus-summary-move-article 1 "nnml:Spam"))
     98 
     99 (defun notspam ()
    100   (interactive)
    101     (pipe-message "/usr/local/bin/bmf -N")
    102     (gnus-summary-move-article 1 "nnml:inbox"))
    103 
    104 (add-hook
    105   'gnus-sum-load-hook
    106   (lambda nil
    107     (define-key gnus-summary-mode-map (read-kbd-macro "C-c C-o") 'spam)
    108     (define-key gnus-summary-mode-map (read-kbd-macro "C-c C-p") 'notspam)))
    109 
    110 How to train bmf
    111 ================
    112 
    113 First, please keep in mind that bmf "learns" how to recognize spam from the
    114 input that you give it.  It works best if you give it exactly the email that
    115 you receive, or have received in the recent past.
    116 
    117 Here are some good techniques for training bmf:
    118 
    119   - If you keep a history of email that you have received, use your current
    120     and/or saved emails.  It is fairly easy to create a small shell script
    121     that will pass all of your normal email to "bmf -n" and all of your spam
    122     to "bmf -s".  Note that if you do not use the mbox storage format, you
    123     MUST invoke bmf exactly once per email.  Using "cat * | bmf -n" will NOT
    124     work properly because bmf sees the entire input as one big email.
    125 
    126   - If you already use spamassassin, you can use it to train bmf for a
    127     couple of days or weeks.  If spamassassin tags it as spam, run it
    128     through "bmf -s".  If not, run it through "bmf -n".  This can be
    129     automated with procmail or maildrop recipes.
    130 
    131 Here are some things that you should NOT do:
    132 
    133   - Get impatient with the training process and repeatedly pass one email
    134     through "bmf -s".
    135 
    136   - Manually move words around between lists and/or adjust the word counts.
    137 
    138 Final words
    139 ===========
    140 
    141 Thanks for trying bmf.  If you have any problems, comments, or suggestions,
    142 please direct them to the bmf mailing list, bmf-user@lists.sourceforge.net. 
    143 
    144 							Tom Marshall
    145 							20 Oct 2002