glenda.party
term% ls -F
term% cat index.txt
UTF(6)                           Games Manual                           UTF(6)

NAME
       UTF, Unicode, ASCII, rune - character set and format

DESCRIPTION
       The  Plan  9  character set and representation are based on the Unicode
       Standard and on the ISO multibyte UTF-8 encoding  (Universal  Character
       Set  Transformation  Format, 8 bits wide).  The Unicode Standard repre‐
       sents its characters in 21 bits; UTF-8 represents  such  values  in  an
       8-bit byte stream.  Throughout this manual, UTF-8 is shortened to UTF.

       In  Plan  9, a rune is a 21-bit quantity representing a Unicode charac‐
       ter.  Internally, programs may store characters as runes.  However, any
       external manifestation of textual information, in files or at  the  in‐
       terface  between  programs, uses a machine-independent, byte-stream en‐
       coding called UTF.

       UTF is designed so the 7-bit ASCII set (values hexadecimal 00  to  7F),
       appear  only as themselves in the encoding.  Runes with values above 7F
       appear as sequences of two or more bytes with values only  from  80  to
       FF.

       The  UTF  encoding  of the Unicode Standard is backward compatible with
       ASCII: programs presented only with ASCII work on Plan 9  even  if  not
       written  to  deal with UTF, as do programs that deal with uninterpreted
       byte streams.  However, programs that perform  semantic  processing  on
       ASCII  graphic  characters  must  convert from UTF to runes in order to
       work properly with non-ASCII input.  See rune(2).

       Letting numbers be binary, a rune x is converted to a multibyte UTF se‐
       quence as follows:

       01.   x in [000000.00000000.0bbbbbbb] â 0bbbbbbb
       10.   x in [000000.00000bbb.bbbbbbbb] â 110bbbbb, 10bbbbbb
       11.   x in [000000.bbbbbbbb.bbbbbbbb] â 1110bbbb, 10bbbbbb, 10bbbbbb
       100. x in [bbbbbb.bbbbbbbb.bbbbbbbb] â  1110bbbb,  10bbbbbb,  10bbbbbb,
       10bbbbbb

       Conversion 01 provides a one-byte sequence that spans the ASCII charac‐
       ter  set  in  a  compatible  way.  Conversions 10, 11 and 100 represent
       higher-valued characters as sequences of two, three or four bytes  with
       the  high  bit set.  Plan 9 does not support the 5 and 6 byte sequences
       proposed by X-Open.  When there are multiple ways to  encode  a  value,
       for example rune 0, the shortest encoding is used.

       In  the  inverse  mapping, any sequence except those described above is
       incorrect and is converted to rune hexadecimal FFFD.

FILES
       /lib/unicode
              table of characters and descriptions, suitable for look(1).

SEE ALSO
       ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard.

                                                                        UTF(6)