bmf

bmf (Bayesian Mail Filter) 0.9.4 fork + patches
git clone git://git.codemadness.org/bmf
Log | Files | Refs | README | LICENSE

README (5182B)


      1 		bmf -- Bayesian Mail Filter
      2 
      3 About bmf
      4 =========
      5 
      6 This is a mail filter which uses the Bayes algorithm as explained in Paul
      7 Graham's article "A Plan for Spam".  It aims to be faster, smaller, and more
      8 versatile than similar applicatios.  Implementation is ANSI C and uses POSIX
      9 functions.  Supported platforms are (in theory) all POSIX systems. Support
     10 for win32 is undecided.
     11 
     12 This project provides features which are not available in other filters:
     13 
     14 (1) Independence from external programs and libraries.  Tokens are stored in
     15 memory using simple vectors which require no heavyweight external data
     16 structure libraries. The tokens are stored in plain-text "flat" files.
     17 
     18 (2) Efficient processing.  Input data is parsed by a handcrafted parser
     19 which weighs in under 3% of the equivalent code generated by flex.  No
     20 portion of the input is ever copied and all i/o and memory allocation are
     21 done in large chunks.  Updated token lists are merged and written in one
     22 step.  Hashing is being considered for the next version to improve lookup
     23 speed.
     24 
     25 (3) Simple and elegant implementation.  No heavyweight, copy-intensive mime
     26 decoding routines are used.  Decoding of quoted-printable text for selected
     27 mime types is being considered for the next version.
     28 
     29 Note: the core filter function is from esr's bogofilter v0.6 (available at
     30 http://sourceforge.net/projects/bogofilter/) with bugfix updates.
     31 
     32 For the most recent version of this software, see: 
     33 
     34 	http://sourceforge.net/projects/bmf/
     35 
     36 How to integrate bmf
     37 ====================
     38 
     39 The following procmail recipes will invoke bmf for each incoming email and
     40 place spam into $MAILDIR/spam.  The first sample invokes bmf in its normal
     41 mode of operation and the second invokes bmf as a filter.
     42 
     43 	### begin sample one ###
     44 	# Invoke bmf and use return code to filter spam in one step
     45 	:0HB
     46 	* ? bmf
     47 	| formail -A"X-Spam-Status: Yes, tests=bmf" >>$MAILDIR/spam
     48 
     49 	### begin sample two ###
     50 	# Invoke bmf as a filter
     51 	:0 fw
     52 	| bmf -p
     53 
     54 	# Filter spam
     55 	:0:
     56 	^X-Spam-Status: Yes
     57 	$MAILDIR/spam
     58 
     59 The following maildrop equivalents are suggested by Christian Kurz.
     60 
     61 	### begin sample one ###
     62 	# Invoke bmf and use return code to filter spam in one step
     63 	exception {
     64 		`bmf`
     65 		if ( $RETURNCODE == 0 )
     66 			to $MAILDIR/spam
     67 	}
     68 
     69 	### begin sample two ###
     70 	# Invoke bmf as a filter
     71 	exception {
     72 		xfilter "bmf -p"
     73 		if (/^X-Stam-Status: Yes/)
     74 			to $MAILDIR/spam
     75 	}
     76 
     77 
     78 If you put bmf in your procmail or maildrop scripts as suggested above, it
     79 will always register an email as either spam or non-spam.  To reverse this
     80 registration and train bmf, the following mutt macros may be useful:
     81 
     82   macro index \ed "<enter-command>unset wait_key\n<pipe-entry>bmf -S\n<enter-command>set wait_key\n<save-message>=spam\n"
     83   macro index \et "<enter-command>unset wait_key\n<pipe-entry>bmf -t\n<enter-command>set wait_key\n"
     84   macro index \eu "<enter-command>unset wait_key\n<pipe-entry>bmf -N\n<enter-command>set wait_key\n<save-message>=inbox\n"
     85 
     86 These will override these commands:
     87 
     88   <Esc>d = de-register as non-spam, register as spam, and move to spam folder.
     89   <Esc>t = test for spamicity.
     90   <Esc>u = de-register as spam, register as non-spam, and move to inbox folder.
     91 
     92 Alternatively, if you use gnus you could add the following lines to your
     93 .gnus to accomplish a similar result:
     94 
     95 (defun spam ()
     96   (interactive)
     97     (pipe-message "/usr/local/bin/bmf -S")
     98     (gnus-summary-move-article 1 "nnml:Spam"))
     99 
    100 (defun notspam ()
    101   (interactive)
    102     (pipe-message "/usr/local/bin/bmf -N")
    103     (gnus-summary-move-article 1 "nnml:inbox"))
    104 
    105 (add-hook
    106   'gnus-sum-load-hook
    107   (lambda nil
    108     (define-key gnus-summary-mode-map (read-kbd-macro "C-c C-o") 'spam)
    109     (define-key gnus-summary-mode-map (read-kbd-macro "C-c C-p") 'notspam)))
    110 
    111 How to train bmf
    112 ================
    113 
    114 First, please keep in mind that bmf "learns" how to recognize spam from the
    115 input that you give it.  It works best if you give it exactly the email that
    116 you receive, or have received in the recent past.
    117 
    118 Here are some good techniques for training bmf:
    119 
    120   - If you keep a history of email that you have received, use your current
    121     and/or saved emails.  It is fairly easy to create a small shell script
    122     that will pass all of your normal email to "bmf -n" and all of your spam
    123     to "bmf -s".  Note that if you do not use the mbox storage format, you
    124     MUST invoke bmf exactly once per email.  Using "cat * | bmf -n" will NOT
    125     work properly because bmf sees the entire input as one big email.
    126 
    127   - If you already use spamassassin, you can use it to train bmf for a
    128     couple of days or weeks.  If spamassassin tags it as spam, run it
    129     through "bmf -s".  If not, run it through "bmf -n".  This can be
    130     automated with procmail or maildrop recipes.
    131 
    132 Here are some things that you should NOT do:
    133 
    134   - Get impatient with the training process and repeatedly pass one email
    135     through "bmf -s".
    136 
    137   - Manually move words around between lists and/or adjust the word counts.
    138 
    139 Final words
    140 ===========
    141 
    142 Thanks for trying bmf.  If you have any problems, comments, or suggestions,
    143 please direct them to the bmf mailing list, bmf-user@lists.sourceforge.net. 
    144 
    145 							Tom Marshall
    146 							20 Oct 2002