lex man page on DigitalUNIX

lex man page on DigitalUNIX
Man page or keyword search:
man Server 12896 pages
apropos Keyword Search (all sections)
Output format
lex(1)									lex(1)

NAME
       lex - Generates programs for lexical tasks

SYNOPSIS
       lex [-ct] [-n  | -v] [file...]

       [Tru64 UNIX]  The following syntax applies when the CMD_ENV environment
       variable is set to svr4: lex [-crt]  [-n	  |  -v]  [-V]	[-Qy   |  -Qn]
       [file...]

STANDARDS
       Interfaces  documented on this reference page conform to industry stan‐
       dards as follows:

       lex:  XPG4, XPG4-UNIX

       Refer to the standards(5) reference page	 for  more  information	 about
       industry standards and associated tags.

OPTIONS
       Writes  C  code	to the file lex.yy.c. This is the default.  Suppresses
       the statistics summary. When you set  your  own	table  sizes  for  the
       finite state machine, lex automatically produces this summary if you do
       not select this flag.  [Tru64 UNIX]  Writes RATFOR  code	 to  the  file
       lex.yy.r.  (There  is  no  RATFOR  compiler for Tru64 UNIX.)  Writes to
       standard output instead of writing to a file.  Provides	a  summary  of
       the  generated  finite state machine statistics.	 [Tru64 UNIX]  Outputs
       lex version number to standard error. Requires the environment variable
       CMD_ENV	to  be	set to svr4.  [Tru64 UNIX]  Determines whether the lex
       version number is written to the output file. The -Qn option  does  not
       do  so and is the default. Requires the environment variable CMD_ENV to
       be set to svr4.

DESCRIPTION
       The lex command uses the rules and actions contained in file to	gener‐
       ate  a  program,	 lex.yy.c,  which can be compiled with the cc command.
       That program can then receive input, break the input into  the  logical
       pieces  defined	by  the	 rules in file, and run program fragments con‐
       tained in the actions in file.

       The generated program is a C Language function called yylex(). The  lex
       command	stores	yylex() in a file named lex.yy.c.  You can use yylex()
       alone to recognize simple, 1-word input, or you can use it with other C
       Language	 programs  to perform more difficult input analysis functions.
       For example, you can use lex to generate a program  that	 tokenizes  an
       input  stream  before  sending  it to a parser program generated by the
       yacc command.

       The yylex() function analyzes the input stream using a  program	struc‐
       ture  called  a finite state machine. This structure allows the program
       to exist in only one state (or condition) at a time.  A	finite	number
       of  states  are	allowed.  The  rules in file determine how the program
       moves from one state to another in response to the input that the  pro‐
       gram receives.

       The  lex	 command reads its skeleton finite state machine from the file
       /usr/ccs/lib/ncpform or /usr/ccs/lib/ncform. Use the environment	 vari‐
       able LEXER to specify another location for lex to read from.

       If  you do not specify a file, lex reads standard input. It treats mul‐
       tiple files as a single file.

   Input File Format
       The input file can contain three	 sections:   definitions,  rules,  and
       user  subroutines.  Each section must be separated from the others by a
       line containing only the delimiter, %%.	The format is as follows:

       definitions %% rules %% user_subroutines

       The purpose and format of each of these sections	 are  described	 under
       the headings that follow.

   Definitions Section
       If you want to use variables in rules, you must define them in the def‐
       initions section. The variables make up the left column, and their def‐
       initions	 make  up  the	right  column.	 For example, to define D as a
       numerical digit, enter: D    [0-9]

       You can use a defined variable in the rules section  by	enclosing  the
       variable name in braces, {D}.

       In  the	definitions  section,  you can set either of the following two
       mutually exclusive declarations: Declare the type of  yytext  to	 be  a
       null-terminated	character  array.   Declare the type of yytext to be a
       pointer to a null-terminated character string. Use of the %pointer def‐
       inition selects the /usr/ccs/lib/ncpform skeleton.

       In  the	definitions  section,  you  can	 also  set table sizes for the
       resulting finite state machine. The default sizes are large enough  for
       small programs.	You may want to set larger sizes for more complex pro‐
       grams: Number of positions is number (default 5000) Number of states is
       number  (default	 2500)	Number	of parse tree nodes is number (default
       2000) Number of transitions is number (default 5000) Number  of	packed
       character  classes  is  number (default 2000) Number of output slots is
       number (default 5000)

       If extended characters appear in regular expression  strings,  you  may
       need  to reset the output array size with the %o parameter (possibly to
       array sizes in the range 10,000 to 20,000).  This  reset	 reflects  the
       much  larger  number  of	 extended characters relative to the number of
       ASCII characters.

   Rules Section
       The rules section is required, and it must be preceded by the %% delim‐
       iter,  even  if	you do not have a definitions section. The lex command
       does not recognize rules without the delimiter.

       In this section, the left column contains the pattern to be  recognized
       in  an  input file to yylex().  The right column contains the C program
       fragment executed when that pattern is recognized.

       Patterns can include extended characters with one  exception:  extended
       characters  may	not  appear  in	 range specifications within character
       class expressions surrounded by brackets.

       The columns are separated by a tab. For example, to  search  files  for
       the  word  LEAD	and replace it with GOLD, perform the following steps:
       Create a file called transmute.l containing the lines:

	      %% (LEAD)	 printf("GOLD"); Then issue the following commands  to
	      the  shell: lex transmute.l cc -o transmute lex.yy.c -ll You can
	      test the resulting program with the command:  transmute  <trans‐
	      mute.l

       This  command  echoes the contents of transmute.l, with the occurrences
       of LEAD changed to GOLD.

       Each pattern may have a corresponding action, that is, a fragment of  C
       source  code  to	 execute  when the pattern is matched.	Each statement
       must end with a ; (semicolon).  If you use more than one	 statement  in
       an action, you must enclose all of them in {} (braces). A second delim‐
       iter, %%, must follow the rules section if you have a  user  subroutine
       section.

       When  yylex()  matches  a  string  in  the  input stream, it copies the
       matched text to an external character array, yytext, before it executes
       any actions in the rules section.

       You  can	 use the following operators to form patterns that you want to
       match: Matches the characters written.  Matches any  one	 character  in
       the  enclosed  range  ([.-.])  or  the  enclosed list ([...]). [abcx-z]
       matches a,b,c,x,y, or z.	 Matches the enclosed character or string even
       if it is an operator.  "$" prevents lex from interpreting the $ charac‐
       ter as an operator.  Acts the same as double quotes.  \$	 prevents  lex
       from interpreting the $ character as an operator.  Matches zero or more
       occurrences of the single-character regular expression immediately pre‐
       ceding  it.   x*	 matches  zero	or more repeated literal characters x.
       Matches one or more occurrences of the single-character regular expres‐
       sion  immediately  preceding it.	 Matches either zero or one occurrence
       of the single-character regular expression  immediately	preceding  it.
       Matches the character only at the beginning of a line.  ^x matches an x
       at the beginning of a line.  Matches any character except for the char‐
       acters  following  the ^.  [^xyz] matches any character but x, y, or z.
       Matches any character except the newline character.  Matches the end of
       a  line.	 Matches either of two characters.  x|y matches either x or y.
       Matches one extended regular expression (ERE) only when followed	 by  a
       second ERE. It reads only the first token into yytext.  Given the regu‐
       lar expression a*b/cc and the input aaabcc, yytext  would  contain  the
       string  aaab  on this match.  Matches the pattern in the ( ) (parenthe‐
       ses). This is used for  grouping.  It  reads  the  whole	 pattern  into
       yytext. A group in parentheses can be used in place of any single char‐
       acter in any other pattern.  (xyz123) matches the  pattern  xyz123  and
       reads  the  whole string into yytext.  Matches the character as defined
       in the definitions section.  If D is defined  as	 numeric  digits,  {D}
       matches	all  numeric digits.  Matches m-to-n occurrences of the speci‐
       fied character.	x{2,4} matches 2, 3, or 4 occurrences of x.

       If a line begins with only a space, lex copies it to the lex.yy.c  out‐
       put file. If the line is in the definitions section of file, lex copies
       it to the declarations section of lex.yy.c. If the line is in the rules
       section, lex copies it to the program code section of lex.yy.c.

   User Subroutines Section
       The  lex	 library  has three subroutines defined as macros that you can
       use in the rules.  Reads a character from yyin.	Replaces  a  character
       after it is read.  Writes a character to yyout.

       You  can override these three macros by writing your own code for these
       routines in the user subroutines section. But if	 you  write  your  own
       routines,  you must undefine these macros in the definitions section as
       follows:

       %{ #undef input #undef unput #undef output }%

       When you are using lex as a simple transformer/recognizer for stdin  to
       stdout piping, you can avoid writing the framework by using libl.a (the
       lex library). It has a main routine that calls yylex() for you.

       External names generated by lex all begin with the  prefix  yy,	as  in
       yyin, yyout, yylex, and yytext.

   Putting Spaces in an Expression
       Normally, spaces or tabs end a rule and, therefore, the expression that
       defines a rule.	However, you can enclose the spaces or tab  characters
       in  ""  (double	quotes)	 to include them in the expression. Use quotes
       around all spaces in expressions that are not already within sets of  [
       ] (brackets).

   Other Special Characters
       The  lex program recognizes many of the normal C language special char‐
       acters.	These character sequences are as follows:

       Sequence	  Meaning
       \n	  Newline
       \t	  Tab
       \b	  Backspace
       \\	  Backslash
       \digits	  The character whose encoding is represented
		  by the three-digit octal number
       \xdigits	  The character whose encoding is represented
		  by the hexadecimal integer

       Do not use the actual newline character in an expression.

       When using these special characters in an expression, you do  not  need
       to enclose them in quotes.  Every character, except these special char‐
       acters and the previously described operator symbols, is always a  text
       character.

   Matching Rules
       When  more than one expression can match the current input, lex chooses
       the longest match first.	 Among rules that match	 the  same  number  of
       characters, the rule that occurs first is chosen.  For example:

       integer keyword action...; [a-z]+ identifier action...;

       If  the	preceding  rules  are  given in that order and integers is the
       input word, lex matches the  input  as  an  identifier  because	[a-z]+
       matches	eight  characters, while integer matches only seven.  However,
       if the input is integer, both rules match seven characters. The keyword
       rule is selected because it occurs first. A shorter input, such as int,
       does not match the expression rule integer and causes lex to select the
       rule identifier.

   Matching a String with Wildcard Characters
       Because	lex chooses the longest match first, do not use rules contain‐
       ing expressions like (for example: '.*').

       The preceding rule might seem like a good way to recognize a string  in
       single  quotes.	However, the lexical analyzer reads far ahead, looking
       for a distant single quote to complete the long match.	If  a  lexical
       analyzer	 with  such  a	rule  gets the following input, it matches the
       whole string:

       'first' quoted string here, 'second' here

       To find the smaller strings, first and second, use the following rule:

       '[^'\n]*'

       This rule stops after matching 'first'.

       Errors of this type are not far-reaching because the .  (dot)  operator
       does  not  match a newline character.  Therefore, expressions like stop
       on the current line.  Do not try to defeat this with  expressions  like
       [.\n]  +. The lexical analyzer tries to read the entire input file, and
       an internal buffer overflow occurs.

   Finding Strings within Strings
       The lex program partitions the input stream and does not search for all
       possible	 matches  of each expression.  Each character is accounted for
       once and only once.  For example, to count occurrences of both she  and
       he in an input text, try the following rules:

       she   s++; he	h++; \n	   | .	   ;

       The  last  two  rules  ignore  everything  besides he and she. However,
       because she includes he, lex does not recognize	the  instances	of  he
       that are included in she.

       To  override  this choice, use the REJECT action.  This directive tells
       lex to go to the next rule.  The lex command then adjusts the  position
       of  the	input  pointer	to where it was before the first rule was exe‐
       cuted, and executes the second choice rule. For example, to  count  the
       included instances of he, use the following rules:

       she    {s++; REJECT;} he	    {h++; REJECT;} \n	  | .	   ;

       After counting the occurrences of she, lex rejects the input stream and
       then counts the occurrences of he. In  this  case,  you	can  omit  the
       REJECT  action  on  he  because	she includes he but not vice versa. In
       other cases, it may be difficult to determine  which  input  characters
       are in both classes.

       In general, REJECT is useful whenever the purpose of lex is not to par‐
       tition the input stream but to detect all examples of some items in the
       input,  and  the	 instances  of these items may overlap or include each
       other.

NOTES
       Because lex uses fixed names for intermediate and output files, you can
       have  only  one	lex-generated  program in a given directory. If the -t
       option is not specified, informational, error, and warning messages are
       written to stdout. If the -t option is specified, informational, error,
       and warning messages are written to stderr.

       [Tru64 UNIX]  The yytext array has a default  dimension	of  200,  con‐
       trolled	by  the	 constant  YYLMAX.  If the programmer needs to allow a
       larger array, the YYLMAX constant may  be  redefined  as	 follows  from
       within the lex command file:

       { #undef YYLMAX #define YYLMAX 8192 }

       Two other arrays use YYLMAX, yysubf, and yylstate.

       The  lex	 program  can  be compiled as a C program with -std0, -std, or
       -std1 mode. It can also be compiled as a C++ program. If YY_NOPROTO  is
       defined	on  the	 compilation command line, function prototypes are not
       generated.

EXAMPLES
       The following command draws lex instructions from the file  lexcommands
       and places the output in lex.yy.c: lex lexcommands The file lexcommands
       contains an example of a lex program that would be put into a lex  com‐
       mand  file.   The  following  program  converts uppercase to lowercase,
       removes spaces at the end of a line, and replaces multiple spaces  with
       single spaces:

	      %% [A-Z] putchar(tolower(yytext[0])); [ ]+$ ; [ ]+ putchar(' ');

ENVIRONMENT VARIABLES
       The  following environment variables affect the behavior of lex(): Pro‐
       vides a default value for the locale category variables	that  are  not
       set  or	null.	If set, overrides the values of all other locale vari‐
       ables.  Determines the order in which  output  is  sorted  for  the  -x
       option.	Determines the locale for the interpretation of byte sequences
       as characters (single-byte  or  multi-byte)  in	input  parameters  and
       files.  Determines the locale used to affect the format and contents of
       diagnostic messages displayed by the command.  Determines the  location
       of message catalogs for the processing of LC_MESSAGES.

FILES
       Run-time library.  Default C language skeleton finite state machine for
       lex.  Default C language skeleton finite state machine for lex,	imple‐
       mented  with the pointer definition of yytext.  Default RATFOR language
       skeleton finite state machine for lex.

SEE ALSO
       Commands:  yacc(1)

       Standards:  standards(5)

       Programming Support Tools

									lex(1)
[top]

List of man pages available for DigitalUNIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome