glenda.party
term% ls -F
term% cat index.txt
SCANMAIL(8)                 System Manager's Manual                SCANMAIL(8)



NAME
       scanmail, testscan -  spam filters

SYNOPSIS
       upas/scanmail  [  options  ] [ qer-args ] root mail sender system rcpt-
       list

       upas/testscan [ -avd ] [ -p patfile ] [ filename ]

DESCRIPTION
       Scanmail accepts a mail message supplied on standard input,  applies  a
       file  of  patterns to a portion of it, and dispatches the message based
       on the results.  It exactly replaces the generic queuing command qer(8)
       that is executed from the rc(1) script /mail/lib/qmail in the mail pro‐
       cessing pipeline.  Associated with each pattern is an action  in  order
       of decreasing priority:

       dump      the  message  is  deleted  and  a  log  entry  is  written to
                 /sys/log/smtpd

       hold      the message is placed in a queue for human inspection

       log       a line containing the matching  portion  of  the  message  is
                 written to a log

       If no pattern matches or only patterns with an action of log match, the
       message is accepted and  scanmail  queues  the  message  for  delivery.
       Scanmail  meshes  with  the  blocking facilities of smtpd(6) to provide
       several layers of filtering on  gateway  systems.   In  all  cases  the
       sender  is  notified  that the message has been successfully delivered,
       leaving the sender unaware that the message has  been  potentially  de‐
       layed or deleted.

       Scanmail accepts the arguments of qer(8) as well as the following:

       -c     Save  a  copy of each message in a randomly-named file in direc‐
              tory /mail/copy.

       -d     Write debugging information to standard error.

       -h     Queue held messages by sending domain name.  The -q option  must
              specify  a root directory; messages are queued in subdirectories
              of this directory.  If the -h option is not specified,  messages
              are  accumulated in a subdirectory of /mail/queue.hold named for
              the contents of /dev/user, usually none.

       -n     Messages are never held for inspection, but are delivered.  Also
              known as vacation mode.

       -p filename
              Read the patterns from filename rather than /mail/lib/patterns.

       -q holdroot
              Queue  deliverable messages in subdirectories of holdroot.  This
              option is the same as the  -q  option  of  qer(8)  and  must  be
              present if the -h option is given.

       -s     Save  deleted messages.   Messages are stored, one per randomly-
              named file, in subdirectories of /mail/queue.dump named with the
              date.

       -t     Test  mode.   The  pattern matcher is applied but the message is
              discarded and the result is not logged.

       -v     Print the highest priority match.  This is useful  with  the  -t
              option  for testing the pattern matcher without actually sending
              a message.

       Testscan is the command line version of scanmail.  If filename is miss‐
       ing,  it applies the pattern set to the message on standard input.  Un‐
       like scanmail, which finds the highest priority match, testscan  prints
       all matches in the portion of the message under test.  It is useful for
       testing a pattern set or  implementing  a  personal  filter  using  the
       pipeto file in a user's mail directory.  Testscan accepts the following
       options:

       -a     Print matches in the complete input message

       -d     Enable debug mode

       -v     Print the message after conversion to canonical form (q.v.).

       -p filename
              Read the patterns from filename rather than /mail/lib/patterns.

   Canonicalization
       Before pattern matching, both programs convert a portion of the message
       header  and  the  beginning  of  the  message to a canonical form.  The
       amount of the header and message body processed are set by compile-time
       parameters  in the source files.  The canonicalization process converts
       letters to lower-case and replaces consecutive spaces, tabs and newline
       characters  with  a single space.  HTML commands are deleted except for
       the parameters following A HREF, IMG SRC, and  IMG  BORDER  directives.
       Additionally, the following MIME escape sequences are replaced by their
       ASCII equivalents:

                  Escape Seq   ASCII
                  ----------   -----
                       =2e       .
                       =2f       /
                       =20    <space>
                       =3d       =
       and the sequence =<newline> is elided.  Scanmail assembles the  sender,
       destination  domain  and  recipient  fields  of the command line into a
       string that is subjected to the same canonical  processing.   Following
       canonicalization,  the command line and the two long strings containing
       the header and the message body are passed to the matching  engine  for
       analysis.

   Pattern Syntax
       The  matching  engine  compiles  the pattern set and matches it to each
       canonicalized input string.  Patterns are specified  one  per  line  as
       follows:

            {*}action: pattern-spec {~~override...~~override}

       On  all lines, a # introduces a comment; there is no way to escape this
       character.

       Lines beginning with * contain a pattern-spec that is a string;  other‐
       wise, the the pattern-spec is a regular expression in the style of reg‐
       exp(6).  Regular expression matching is many times less efficient  than
       string  matching,  so  it is wiser to enumerate several similar strings
       than to combine them into a regular expression.  The action is  a  key‐
       word  terminated  by  a  :  and  separated from the pattern by optional
       white-space.  It must be one of the following:

       dump      if the pattern matches, the message is deleted.   If  the  -s
                 command line option is set, the message is saved.

       hold      if  the pattern matches, the message is queued in a subdirec‐
                 tory of /mail/queue.hold for manual  inspection.   After  in‐
                 spection,  the  queue  can  be swept manually using runq (see
                 qer(8)) to deliver messages that were inadvertently matched.

       header    this is the same as the hold action, except  the  pattern  is
                 only  applied  to  the  message header.  This optimization is
                 useful for patterns that match header  fields  that  are  un‐
                 likely to be present in the body of the message.

       line      the  sender and a section of the message around the match are
                 written to the file /sys/log/lines.  The  message  is  always
                 delivered.

       loff      patterns  of  this type are applied only to the canonicalized
                 command line.  When a match occurs, all  patterns  with  line
                 actions  are  disabled.  This is useful for limiting the size
                 of the log file by excluding  repetitive  messages,  such  as
                 those from mailing lists.

       Patterns  are  accumulated  into  pattern sets sharing the same action.
       The matching engine applies the dump pattern set first, then the header
       and  hold pattern sets, and finally the line pattern set.  Each pattern
       set is applied three times: to the canonicalized command line,  to  the
       message  header, and finally to the message body.  The ordering of pat‐
       terns in the pattern file is insignificant.

       The pattern-spec is a string of characters terminated by a  newline,  #
       or  override  indicator,  ~~.  Trailing white-space is deleted but pat‐
       terns containing leading or trailing white-space  can  be  enclosed  in
       double-quote  characters.   A pattern containing a double-quote must be
       enclosed in double-quote characters and preceded by a  backslash.   For
       example, the pattern

            "this is not \"spam\""

       matches the string this is not "spam".  The pattern-spec is followed by
       zero or more override strings.  When the specific pattern matches, each
       override  is  applied  and if one matches, it cancels the effect of the
       pattern.  Overrides must be strings; regular expressions are  not  sup‐
       ported.  Each override is introduced by the string ~~ and continues un‐
       til a subsequent ~~, # or newline, white-space included.  A ~~  immedi‐
       ately  followed  by a newline indicates a line continuation and further
       overrides continue on the following line.  Leading white-space  on  the
       continuation line is ignored.  For example,

               *hold:   sex.com~~essex.com~~sussex.com~~sysex.com~~
                        lasex.com~~cse.psu.edu!owner-9fans

       matches  all  input  containing  the string sex.com except for messages
       that also contain the strings in the override list.  Often it is desir‐
       able  to  override a pattern based on the name of the sender or recipi‐
       ent.  For this reason, each override pattern is applied to  the  header
       and  the command line as well as the section of the canonicalized input
       containing the matching data.  Thus a pattern matching the command line
       or  the  header searches both the command line and the header for over‐
       rides while a match in the body searches the body, header  and  command
       line for overrides.

       The structure of the pattern file and the matching algorithm define the
       strategy for detecting and filtering  unwanted  messages.   Ideally,  a
       hold  pattern  selects a message for inspection and if it is determined
       to be undesirable, a specific dump pattern is added to  delete  further
       instances  of  the  message.  Additionally, it is often useful to block
       the sender by updating the smtpd control file.

       In this regime, patterns with a dump action,  generally  match  phrases
       that are likely to be unique.  Patterns that hold a message for inspec‐
       tion match phrases commonly found in undesirable material and occasion‐
       ally  in  legitimate messages.  Patterns that log matches are less spe‐
       cific yet.  In all cases the ability to override a pattern by  matching
       another  string,  allows  repetitive messages that trigger the pattern,
       such as mailing lists, to pass the filter after the first one  is  pro‐
       cessed  manually.  The -s option allows deleted messages to be salvaged
       by either manual or semi-automatic review, supporting the specification
       of  more  aggressive  patterns.   Finally,  the  utility of the pattern
       matcher is not confined to filtering spam; it is a generally useful ad‐
       ministrative  tool for deleting inadvertently harmful messages, for ex‐
       ample, mail loops, stuck senders or viruses.  It  is  also  useful  for
       collecting or counting messages matching certain criteria.

FILES
       /mail/lib/patterns
              default pattern file

       /sys/log/smtpd
              log of deleted messages

       /mail/log/lines
              file where log matches are logged

       /mail/queue/*
              directories where legitimate messages are queued for delivery

       /mail/queue.hold
              directory where held messages are queued for inspection

       /mail/queue.dump/*
              directory  where  dumped messages are stored when the -s command
              line option is specified.

       /mail/copy/*
              directory where copies of all incoming messages are stored.

SOURCE
       /sys/src/cmd/upas/scanmail

SEE ALSO
       mail(1), qer(8), smtpd(6)

BUGS
       Testscan does not report a match when the body of  a  message  contains
       exactly one line.



                                                                   SCANMAIL(8)