CONVCS man page on Inferno

CONVCS man page on Inferno
Man page or keyword search:
man Server 579 pages
apropos Keyword Search (all sections)
Output format
CONVCS(2)							     CONVCS(2)

NAME
       Convcs,	Btos, Stob - character set conversion suite

SYNOPSIS
       include "convcs.m";
       convcs := load Convcs Convcs->PATH;

       Btos: module {
	    init: fn(arg: string): string;
	    btos: fn(s: Convcs->State, b: array of byte, nchars: int)
		      : (Convcs->State, string, int);
       };

       Stob: module {
	    init: fn (arg: string): string;
	    stob: fn(s: Convcs->State, str: string)
		      : (Convcs->State, array of byte);
       };

       Convcs: module {
	    State: type string;
	    Startstate: con "";

	    init: fn(csfile: string): string;
	    getbtos: fn(cs: string): (Btos, string);
	    getstob: fn(cs: string): (Stob, string);
	    enumcs: fn(): list of (string, string, int);
	    aliases: fn(cs: string): (string, list of string);
       };

DESCRIPTION
       The  Convcs  suite  is  a  collection of modules for converting various
       standard coded character sets and character  encoding  schemes  to  and
       from the Limbo strings.

       The Convcs module provides an entry point to the suite, mapping charac‐
       ter set names and aliases to their associated converter implementation.

   The Convcs module
       init(csfile)
	      Init should be called once to initialise the internal  state  of
	      Convcs.  The csfile argument specifies the path of the converter
	      mapping file.  If this argument is nil, the default mapping file
	      /lib/convcs/charsets is used.

       getbtos(cs)
	      Getbtos  returns	an initialised Btos module for converting from
	      the requested encoding.

	      The return value is a tuple, holding the module reference and an
	      error string.  If any errors were encountered in locating, load‐
	      ing or initialising the requested converter, the	module	refer‐
	      ence will be nil and the string will contain an explanation.

	      The  character set name, cs, is normalised by mapping all upper-
	      case latin1 characters  to  lower-case  before  comparison  with
	      character set names and aliases from the charsets file.

       getstob(cs)
	      Getstob  returns	an initialised Stob module for converting from
	      strings to the requested encoding.   Apart  from	the  different
	      module  type, the return value and semantics are the same as for
	      getbtos().

       enumcs()
	      Returns a list describing the set of available converters.   The
	      list is comprised of (name, desc, mode) tuples.

	      Name is the standard name of the character set.

	      Desc is a more friendly description taken directly from the desc
	      attribute given in the charsets file.  If the attribute  is  not
	      given then desc is set to name.

	      Mode  is	a bitmap that details the converters available for the
	      given character set.  Valid mode bits are Convcs->BTOS and  Con‐
	      vcs->STOB.  For convenience, Convcs->BOTH is also defined.

       aliases(cs)
	      Returns  the  aliases for the character set name cs.  The return
	      value is the tuple (desc, aliases).

	      Desc is the descriptive text for the character set, as  returned
	      by enumcs().

	      Aliases  lists  all the known aliases for cs.  The first name in
	      the list is the default character set name - the name  that  all
	      the  other  aliases  refer  to.  If the aliases list is nil then
	      there was an error in looking up the character set name and desc
	      will detail the error.

   Using a Btos converter
       The  Btos  module  returned  by getbtos() is already initialised and is
       ready to start the conversion.  Conversions can be made on a individual
       basis, or in a `streamed' mode.

       converter->btos(s, b, nchars)
	      Converts raw byte codes of the character set encoding to a Limbo
	      string.

	      The argument s is a converter state as returned from the	previ‐
	      ous  call	 to  btos on the same input stream.  The first call to
	      btos on a particular input stream should give Convcs->Startstate
	      (or  nil) as the value for s.  The argument b is the bytes to be
	      converted.  The argument nchars is the  maximal  length  of  the
	      string to be returned.  If this argument is -1 then as much of b
	      will be consumed as possible.  A value of	 0  indicates  to  the
	      converter	 that there is no more data and that any pending state
	      should be flushed.

	      The return value of btos is the tuple (state, str, nbytes) where
	      state  is	 the  new state of the converter, str is the converted
	      string, and nbytes is the number of bytes from b consumed by the
	      conversion.

	      The  same	 converter  module can be used for multiple conversion
	      streams by  maintaining  a  separate  state  variable  for  each
	      stream.

   Using an Stob converter
       The  Stob  module  returned  by getstob() is already initialised and is
       ready to start the conversion.

       converter->stob(s, str)
	      Converts the limbo string str to the raw byte codes of the char‐
	      acter  set  encoding.   The  argument s represents the converter
	      state and is treated in the same way  as	for  converter->btos()
	      described	 above.	  To  terminate	 a  conversion	stob should be
	      called with an empty string in  order  to	 flush	the  converter
	      state.

	      The return value of stob is the tuple (state, bytes) where state
	      is the new state of the converter and bytes is the result of the
	      conversion.

   Conversion errors
       When using converter->btos() to convert data to Limbo strings, any byte
       sequences that are not valid for the specific character encoding scheme
       will be converted to the Unicode error character 16rFFFD.

       When  using  converter->stob()  to  convert  Limbo strings, any Unicode
       characters that can not be mapped into the character set will  normally
       be  substituted	by  the	 US-ASCII code for `?'.	 Note that this may be
       inappropriate for certain conversions, such converters will use a suit‐
       able  error  character  for their particular character set and encoding
       scheme.

   Charset file format
       The file /lib/convcs/charsets provides the  mapping  between  character
       set  names  and their implementation modules.  The file format conforms
       to that supported by cfg(2).  The following description relies on terms
       defined in the cfg(2) manual page.

       Each record name defines a character set name.  If the primary value of
       the record is non-empty then the name is an alias, the value being  the
       real  name.   An alias record must point to an actual converter record,
       not to another alias, as Convcs only follows one level of aliasing.

       Each converter record consists of a set of tuples  with	the  following
       primary attributes:

       desc   A	 more  descriptive  name for the character set encoding.  This
	      attribute is used for the description returned by enumcs().

       btos   The path to the Btos module implementation of this converter.

       stob   The path to the Stob module implementation of this converter.

       Both the btos and stob tuples can have an optional arg attribute	 which
       is  passed  to the init() function of the converter when initialised by
       Convcs.	If a converter record has neither an stob nor  a  btos	tuple,
       then it is ignored.

       The  following example is an extract from the standard Inferno charsets
       file:

       cp866=ibm866
       866=ibm866
       ibm866=
	   desc='Russian MS-DOS CP 866'
	   stob=/dis/lib/convcs/cp_stob.dis    arg=/lib/convcs/ibm866.cp
	   btos=/dis/lib/convcs/cp_btos.dis    arg=/lib/convcs/ibm866.cp

       This entry defines Stob and  Btos  converters  for  the	character  set
       called  ibm866.	 The converters are actually the generic codepage con‐
       verters cp_stob and cp_btos paramaterized with a	 codepage  file.   The
       entry also defines the aliases cp866 and 866 for the name ibm866.

FILES
       /lib/convcs/charsets
	      The default mapping between character set names and their imple‐
	      mentation modules.

       /lib/convcs/csname.cp
	      Codepage files  for  use	by  the	 generic  codepage  converters
	      cp_stob  and  cp_btos.   Each file consists of 256 unicode runes
	      mapping codepage byte values to unicode  by  their  index.   The
	      runes are stored in the utf-8 encoding.

SOURCE
       /appl/lib/convcs/convcs.b
	      Implementation of the Convcs module.

       /appl/lib/convcs/cp_btos.b
	      Generic Btos codepage converter.

       /appl/lib/convcs/cp_stob.b
	      Generic Stob codepage converter.

SEE ALSO
       tcs(1), cfg(2)

								     CONVCS(2)
[top]

List of man pages available for Inferno

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome