glenda.party
term% ls -F
term% pwd
$home/manuals/9front/6/utf
term% cat index.txt
UTF(6)                           Games Manual                           UTF(6)

NAME
       UTF, Unicode, ASCII, rune - character set and format

DESCRIPTION
       The  Plan  9  character set and representation are based on the Unicode
       Standard and on the ISO multibyte UTF-8 encoding  (Universal  Character
       Set  Transformation  Format, 8 bits wide).  The Unicode Standard repre‐
       sents its characters in 16 bits; UTF-8 represents  such  values  in  an
       8-bit byte stream.  Throughout this manual, UTF-8 is shortened to UTF.

       In  Plan  9, a rune is a 16-bit quantity representing a Unicode charac‐
       ter.  Internally, programs may store characters as runes.  However, any
       external manifestation of textual information, in files or at  the  in‐
       terface  between  programs, uses a machine-independent, byte-stream en‐
       coding called UTF.

       UTF is designed so the 7-bit ASCII set (values hexadecimal 00  to  7F),
       appear  only as themselves in the encoding.  Runes with values above 7F
       appear as sequences of two or more bytes with values only  from  80  to
       FF.

       The  UTF  encoding  of the Unicode Standard is backward compatible with
       ASCII: programs presented only with ASCII work on Plan 9  even  if  not
       written  to  deal with UTF, as do programs that deal with uninterpreted
       byte streams.  However, programs that perform  semantic  processing  on
       ASCII  graphic  characters  must  convert from UTF to runes in order to
       work properly with non-ASCII input.  See rune(2).

       Letting numbers be binary, a rune x is converted to a multibyte UTF se‐
       quence as follows:

       01.   x in [00000000.0bbbbbbb] â 0bbbbbbb
       10.   x in [00000bbb.bbbbbbbb] â 110bbbbb, 10bbbbbb
       11.   x in [bbbbbbbb.bbbbbbbb] â 1110bbbb, 10bbbbbb, 10bbbbbb

       Conversion 01 provides a one-byte sequence that spans the ASCII charac‐
       ter set in a compatible way.  Conversions 10 and 11  represent  higher-
       valued  characters as sequences of two or three bytes with the high bit
       set.  Plan 9 does not support the 4, 5, and 6 byte  sequences  proposed
       by X-Open.  When there are multiple ways to encode a value, for example
       rune 0, the shortest encoding is used.

       In  the  inverse  mapping, any sequence except those described above is
       incorrect and is converted to rune hexadecimal FFFD.

FILES
       /lib/unicode
              table of characters and descriptions, suitable for look(1).

SEE ALSO
       ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard.

                                                                        UTF(6)