glenda.party
term% ls -F
term% cat index.txt
SCANMAIL(8)                 System Manager's Manual                SCANMAIL(8)

NAME
       scanmail, testscan -  spam filters

SYNOPSIS
       upas/scanmail  [  options  ] [ qer-args ] root mail sender system rcpt-
       list

       upas/testscan [ -avd ] [ -p patfile ] [ filename ]

DESCRIPTION
       Scanmail accepts a mail message supplied on standard input,  applies  a
       file  of  patterns to a portion of it, and dispatches the message based
       on the results.  It exactly replaces the generic queuing command qer(8)
       that is executed from the rc(1) script /mail/lib/qmail in the mail pro‐
       cessing pipeline.  Associated with each pattern is an action  in  order
       of decreasing priority:

       dump      the  message  is  deleted  and  a  log  entry  is  written to
                 /sys/log/smtpd

       hold      the message is placed in a queue for human inspection

       log       a line containing the matching  portion  of  the  message  is
                 written to a log

       If no pattern matches or only patterns with an action of log match, the
       message  is  accepted  and  scanmail  queues  the message for delivery.
       Scanmail meshes with the blocking facilities  of  smtpd(6)  to  provide
       several  layers  of  filtering  on  gateway  systems.  In all cases the
       sender is notified that the message has  been  successfully  delivered,
       leaving  the  sender  unaware that the message has been potentially de‐
       layed or deleted.

       Scanmail accepts the arguments of qer(8) as well as the following:

       -c     Save a copy of each message in a randomly-named file  in  direc‐
              tory /mail/copy.

       -d     Write debugging information to standard error.

       -h     Queue  held messages by sending domain name.  The -q option must
              specify a root directory; messages are queued in  subdirectories
              of  this directory.  If the -h option is not specified, messages
              are accumulated in a subdirectory of /mail/queue.hold named  for
              the contents of /dev/user, usually none.

       -n     Messages are never held for inspection, but are delivered.  Also
              known as vacation mode.

       -p filename
              Read the patterns from filename rather than /mail/lib/patterns.

       -q holdroot
              Queue  deliverable messages in subdirectories of holdroot.  This
              option is the same as the  -q  option  of  qer(8)  and  must  be
              present if the -h option is given.

       -s     Save  deleted messages.   Messages are stored, one per randomly-
              named file, in subdirectories of /mail/queue.dump named with the
              date.

       -t     Test mode.  The pattern matcher is applied but  the  message  is
              discarded and the result is not logged.

       -v     Print  the  highest  priority match.  This is useful with the -t
              option for testing the pattern matcher without actually  sending
              a message.

       Testscan is the command line version of scanmail.  If filename is miss‐
       ing,  it applies the pattern set to the message on standard input.  Un‐
       like scanmail, which finds the highest priority match, testscan  prints
       all matches in the portion of the message under test.  It is useful for
       testing  a  pattern  set  or  implementing  a personal filter using the
       pipeto file in a user's mail directory.  Testscan accepts the following
       options:

       -a     Print matches in the complete input message

       -d     Enable debug mode

       -v     Print the message after conversion to canonical form (q.v.).

       -p filename
              Read the patterns from filename rather than /mail/lib/patterns.

   Canonicalization
       Before pattern matching, both programs convert a portion of the message
       header and the beginning of the  message  to  a  canonical  form.   The
       amount of the header and message body processed are set by compile-time
       parameters  in the source files.  The canonicalization process converts
       letters to lower-case and replaces consecutive spaces, tabs and newline
       characters with a single space.  HTML commands are deleted  except  for
       the  parameters  following  A HREF, IMG SRC, and IMG BORDER directives.
       Additionally, the following MIME escape sequences are replaced by their
       ASCII equivalents:

                  Escape Seq   ASCII
                  ----------   -----
                       =2e       .
                       =2f       /
                       =20    <space>
                       =3d       =
       and the sequence =<newline> is elided.  Scanmail assembles the  sender,
       destination  domain  and  recipient  fields  of the command line into a
       string that is subjected to the same canonical  processing.   Following
       canonicalization,  the command line and the two long strings containing
       the header and the message body are passed to the matching  engine  for
       analysis.

   Pattern Syntax
       The  matching  engine  compiles  the pattern set and matches it to each
       canonicalized input string.  Patterns are specified  one  per  line  as
       follows:

            {*}action: pattern-spec {~~override...~~override}

       On  all lines, a # introduces a comment; there is no way to escape this
       character.

       Lines beginning with * contain a pattern-spec that is a string;  other‐
       wise, the the pattern-spec is a regular expression in the style of reg‐
       exp(6).   Regular expression matching is many times less efficient than
       string matching, so it is wiser to enumerate  several  similar  strings
       than  to  combine them into a regular expression.  The action is a key‐
       word terminated by a : and  separated  from  the  pattern  by  optional
       white-space.  It must be one of the following:

       dump      if  the  pattern  matches, the message is deleted.  If the -s
                 command line option is set, the message is saved.

       hold      if the pattern matches, the message is queued in a  subdirec‐
                 tory  of  /mail/queue.hold  for manual inspection.  After in‐
                 spection, the queue can be swept  manually  using  runq  (see
                 qer(8)) to deliver messages that were inadvertently matched.

       header    this  is  the  same as the hold action, except the pattern is
                 only applied to the message  header.   This  optimization  is
                 useful  for  patterns  that  match header fields that are un‐
                 likely to be present in the body of the message.

       line      the sender and a section of the message around the match  are
                 written  to  the  file /sys/log/lines.  The message is always
                 delivered.

       loff      patterns of this type are applied only to  the  canonicalized
                 command  line.   When  a match occurs, all patterns with line
                 actions are disabled.  This is useful for limiting  the  size
                 of  the  log  file  by excluding repetitive messages, such as
                 those from mailing lists.

       Patterns are accumulated into pattern sets  sharing  the  same  action.
       The matching engine applies the dump pattern set first, then the header
       and  hold pattern sets, and finally the line pattern set.  Each pattern
       set is applied three times: to the canonicalized command line,  to  the
       message  header, and finally to the message body.  The ordering of pat‐
       terns in the pattern file is insignificant.

       The pattern-spec is a string of characters terminated by a  newline,  #
       or  override  indicator,  ~~.  Trailing white-space is deleted but pat‐
       terns containing leading or trailing white-space  can  be  enclosed  in
       double-quote  characters.   A pattern containing a double-quote must be
       enclosed in double-quote characters and preceded by a  backslash.   For
       example, the pattern

            "this is not \"spam\""

       matches the string this is not "spam".  The pattern-spec is followed by
       zero or more override strings.  When the specific pattern matches, each
       override  is  applied  and if one matches, it cancels the effect of the
       pattern.  Overrides must be strings; regular expressions are  not  sup‐
       ported.  Each override is introduced by the string ~~ and continues un‐
       til  a subsequent ~~, # or newline, white-space included.  A ~~ immedi‐
       ately followed by a newline indicates a line continuation  and  further
       overrides  continue  on the following line.  Leading white-space on the
       continuation line is ignored.  For example,

               *hold:   sex.com~~essex.com~~sussex.com~~sysex.com~~
                        lasex.com~~cse.psu.edu!owner-9fans

       matches all input containing the string  sex.com  except  for  messages
       that also contain the strings in the override list.  Often it is desir‐
       able  to  override a pattern based on the name of the sender or recipi‐
       ent.  For this reason, each override pattern is applied to  the  header
       and  the command line as well as the section of the canonicalized input
       containing the matching data.  Thus a pattern matching the command line
       or the header searches both the command line and the header  for  over‐
       rides  while  a match in the body searches the body, header and command
       line for overrides.

       The structure of the pattern file and the matching algorithm define the
       strategy for detecting and filtering  unwanted  messages.   Ideally,  a
       hold  pattern  selects a message for inspection and if it is determined
       to be undesirable, a specific dump pattern is added to  delete  further
       instances  of  the  message.  Additionally, it is often useful to block
       the sender by updating the smtpd control file.

       In this regime, patterns with a dump action,  generally  match  phrases
       that are likely to be unique.  Patterns that hold a message for inspec‐
       tion match phrases commonly found in undesirable material and occasion‐
       ally  in  legitimate messages.  Patterns that log matches are less spe‐
       cific yet.  In all cases the ability to override a pattern by  matching
       another  string,  allows  repetitive messages that trigger the pattern,
       such as mailing lists, to pass  the  filter  after  the  first  one  is
       processed  manually.   The -s option allows deleted messages to be sal‐
       vaged by either manual or semi-automatic review, supporting the  speci‐
       fication of more aggressive patterns.  Finally, the utility of the pat‐
       tern  matcher is not confined to filtering spam; it is a generally use‐
       ful administrative tool for deleting  inadvertently  harmful  messages,
       for  example,  mail loops, stuck senders or viruses.  It is also useful
       for collecting or counting messages matching certain criteria.

FILES
       /mail/lib/patterns
              default pattern file

       /sys/log/smtpd
              log of deleted messages

       /mail/log/lines
              file where log matches are logged

       /mail/queue/*
              directories where legitimate messages are queued for delivery

       /mail/queue.hold
              directory where held messages are queued for inspection

       /mail/queue.dump/*
              directory where dumped messages are stored when the  -s  command
              line option is specified.

       /mail/copy/*
              directory where copies of all incoming messages are stored.

SOURCE
       /sys/src/cmd/upas/scanmail

SEE ALSO
       mail(1), qer(8), smtpd(6)

BUGS
       Testscan  does  not  report a match when the body of a message contains
       exactly one line.

                                                                   SCANMAIL(8)