glenda.party
term% ls -F
term% cat index.txt
DOC2TXT(1)                  General Commands Manual                 DOC2TXT(1)



NAME
       doc2txt, doc2ps, wdoc2txt, xls2txt, olefs, mswordstrings, msexceltables
       - extract printable text from Microsoft documents

SYNOPSIS
       doc2txt [ file.doc ]
       doc2ps [ file.doc ]
       wdoc2txt [ file.doc ]
       xls2txt [ file.xls ]
       aux/olefs [ -m mtpt ] file.doc
       aux/mswordstrings mtpt/WordDocument
       aux/msexceltables [ -qaDnt ] [ -d delim ] [  -c  column-range  ]  [  -w
       worksheet-range ] mtpt/Workbook

DESCRIPTION
       Doc2txt is an rc(1) script that uses olefs and mswordstrings to extract
       the printable text from the body of a Microsoft Word document and write
       it  on  the  standard  output.  Doc2ps is similar, but emits PostScript
       corresponding to the document.  Wdoc2txt is  similar  to  doc2txt,  but
       uses  plumb(1)  to  send  the  output  to a new acme(1) window instead.
       Xls2txt performs a similar function for Microsoft Excel documents.

       Microsoft Office documents are stored in OLE (Object Linking and Embed‐
       ding)  format,  which  is a scaled down version of Microsoft's FAT file
       system.  Olefs presents the contents of an MS Office document as a file
       system  on  mtpt,  which  defaults to /mnt/doc.  Mswordstrings or msex‐
       celtables may then be used to parse the files inside, extracting a text
       stream.   Msexceltables  may be given options to control the formatting
       of its output.

       -a     Attempt  conversion  of  non-tabular  sheets  in  the   workbook
              (charts).

       -d delim
              Sets the inter-field delimiter to the string delim, by default a
              single space.

       -D     Enables debugging output.

       -c range
              Range is a comma-separated list of column  numbers  and  ranges.
              Ranges  are separated by dashes.  Limit processing to just those
              columns named; by default all columns are output.

       -n     Disables field padding to column width.

       -q     Disable quoting of textural fields (see quote(2).)

       -t     Truncate fields to the column width.

       -w range
              Range is a comma-separated list of worksheet numbers and ranges,
              this  limits  the  sheets output using the same syntax as the -c
              option above.  Suppressed chart pages are always included in the
              sheet count.

EXAMPLE
       Extract pieces of an MS Excel spreadsheet.
              aux/olefs report.xls
              msexceltables -q -w 1,7,9-14 -c 3-5 -n -d '@' /mnt/doc/Workbook > rpt.txt
              unmount /mnt/doc

SOURCE
       /rc/bin
              doc2txt, doc2ps, wdoc2txt, and xls2txt

       /sys/src/cmd/aux
              the others

SEE ALSO
       strings(1)
       ``Microsoft  Word  97  Binary  File  Format'', at Microsoft's developer
       (MSDN) home page.
       ``LAOLA Binary Structures'', http://user.cs.tu-berlin.de/~schwartz/pmh
       ``OpenOffice.Org's Excel Documentation'',
       http://sc.openoffice.org/excelfileformat.pdf



                                                                    DOC2TXT(1)