glimpse man page on IRIX

Man page or keyword search:  
man Server   31559 pages
apropos Keyword Search (all sections)
Output format
IRIX logo
[printable version]



     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

     NAME
	  glimpse 3.0 - search quickly through entire file systems

     OVERVIEW
	  Glimpse (which stands for GLobal IMPlicit SEarch) is an
	  indexing and query system that allows you to search through
	  all your files very quickly.	Glimpse supports most of
	  agrep's options (agrep is our powerful version of grep)
	  including approximate matching (e.g., finding misspelled
	  words), Boolean queries, and even some limited forms of
	  regular expressions. It is used in the same way, except that
	  you don't have to specify file names.	 So, if you are
	  looking for a needle anywhere in your file system, all you
	  have to do is say glimpse needle and all lines containing
	  needle will appear preceded by the file name.

	  To use glimpse you first need to index your files with
	  glimpseindex, which is typically run every night.
	  glimpseindex -o ~  will index everything at or below your
	  home directory.  See man glimpseindex for more details.

	  Glimpse is also available for HTTP servers, to provide
	  search of local data, as a set of tools called GlimpseHTTP.
	  See http://glimpse.cs.arizona.edu:1994/ghttp/ for more
	  information.

	  Glimpse includes all of agrep and can be used instead of
	  agrep by giving a file name(s) at the end of the command.
	  This will cause glimpse to ignore the index and run agrep as
	  usual. For example, glimpse -1 pattern file is the same as
	  agrep -1 pattern file.  We added a new option to agrep:  -r
	  searches recursively the directory and everything below it
	  (see agrep options below); it is used only when glimpse
	  reverts to agrep.

	  Mail glimpse-request@cs.arizona.edu to be added to the
	  glimpse mailing list.	 Mail glimpse@cs.arizona.edu to report
	  bugs, ask questions, discuss tricks for using glimpse, etc.
	  (this is a moderated mailing list with very little traffic,
	  mostly announcements).  HTML version of these manual pages
	  can be found in
	  http://glimpse.cs.arizona.edu:1994/glimpsehelp.html Also,
	  see the glimpse developers home page in
	  http://glimpse.cs.arizona.edu:1994/

     SYNOPSIS
	  glimpse [ -(agrep's options) -C -F file_pattern -H directory
	  -J host_name -K port_number -L x -N -T directory -V -W -z ]
	  pattern

     INTRODUCTION
	  We start with simple ways to use glimpse and describe all

     Page 1					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	  the options in detail later on.  Once an index is built,
	  using glimpseindex, searching for pattern is as easy as
	  saying

	  glimpse pattern

	  The output of glimpse is similar to that of agrep (or any
	  other grep), except that the name of the file containing the
	  match appears at the beginning of the line by default.  The
	  pattern can be any agrep legal pattern including a regular
	  expression or a Boolean query (e.g., searching for Tucson
	  AND Arizona is done by glimpse 'Tucson;Arizona').

	  The speed of glimpse depends mainly on the number and sizes
	  of the files that contain a match and only to a second
	  degree on the total size of all indexed files.  If the
	  pattern is reasonably uncommon, then all matches will be
	  reported in a few seconds even if the indexed files total
	  500MB or more.  Some information on how glimpse works and a
	  reference to a detailed article are given below.

	  Most of agrep (and other grep's) options are supported,
	  including approximate matching.  For example,

	  glimpse -1 'Tuson;Arezona'

	  will output all lines containing both patterns allowing one
	  spelling error in any of the patterns (either insertion,
	  deletion, or substitution), which in this case is definitely
	  needed.

	  glimpse -w -i 'parent'

	  specifies case insensitive (-i) and match on complete words
	  (-w).	 So 'Parent' and 'PARENT' will match, 'parent/child'
	  will match, but 'parenthesis' or 'parents' will not match.
	  (Starting at version 3.0, glimpse can be much faster when
	  these two options are specified, especially for very large
	  indexes.  You may want to set an alias especially for
	  "glimpse -w -i".)

	  The -F option provides a pattern that must match the file
	  name.	 For example,

	  glimpse -F '\.c$' needle

	  will find the pattern needle in all files whose name ends
	  with .c.  (Glimpse will first check its index to determine
	  which files may contain the pattern and then run agrep on
	  the file names to further limit the search.)	The -F option
	  should not be put at the end after the main pattern (e.g.,
	  "glimpse needle -F hay" is incorrect).

     Page 2					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

     DETAILED DESCRIPTION OF GLIMPSE
	  The use of glimpse is similar to that of agrep (or any other
	  grep), except that there is no need to specify file names.
	  Most of agrep's (and other greps) options are supported.  It
	  is important to have in mind that the search is over many
	  files.  Using very common patterns may lead to a huge number
	  of matches.  Running glimpse a will work, but will take a
	  long time and will probably output all of the indexed files.
	  We start with the new options, and then list all of agrep's
	  original options (with some additional comments when
	  relevant).

     The New Options of Glimpse
	  -a   prints attribute names.	This option applies only to
	       structured data (used with glimpseindex -s); this
	       option was added to support the Harvest project.	 See
	       STRUCTURED QUERIES below for more information and also
	       http://harvest.cs.colorado.edu for more information
	       about the Harvest project.

	  -C   tells glimpse to send its queries to glimpseserver.
	       See man glimpseserver for more details.

	  -E   prints the lines in the index (as they appear in the
	       index) which match the pattern.	Used mostly for
	       debugging and maintenance of the index.

	  -F file_pattern
	       limits the search to those files whose name (including
	       the whole path) matches file_pattern.  If file_pattern
	       matches a directory, then all files with this directory
	       on their path will be considered.  To limit the search
	       to actual file names, use $ at the end of the pattern.
	       file_pattern can be a regular expression and even a
	       Boolean pattern.	 (Glimpse simply runs agrep
	       file_pattern on the list of file names obtained from
	       the index to filter the list.)  For example,

	       glimpse -F 'src#\.c$' needle

	       will search for needle in all .c files with src
	       somewhere along the path.  The -F file_pattern must
	       appear before the search pattern (e.g., glimpse needle
	       -F '\.c$' will not work).  It is possible to use some
	       of agrep's options when matching file names.  In this
	       case all options as well as the file_pattern should be
	       in quotes.  (-B and -v do not work very well as part of
	       a file_pattern.)	 For example,

	       glimpse -F '-1 gopherc' pattern

	       will allow one spelling error when matching gopherc to

     Page 3					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	       the file names (so "gopherrc" and "gopher" will be
	       considered as well).

	       glimpse -F '-v \.c$' counter

	       will search for 'counter' in all files except for .c
	       files.

	  -H directory_name
	       searches for the index and the other .glimpse files in
	       directory_name.	The default is the home directory.
	       This option is useful, for example, if several
	       different indexes are maintained for different archives
	       (e.g., one for mail messages, one for source code, one
	       for articles).

	  -J host_name
	       used in conjunction with glimpseserver (-C) to connect
	       to one particular server.  See man glimpseserver for
	       more details.

	  -K port_number
	       used in conjunction with glimpseserver (-C) to connect
	       to one particular server at the specified TCP port
	       number.	See man glimpseserver for more details.

	  -L x | x:y | x:y:z
	       if one number is given, it is a limit on the total
	       number of matches.  Glimpse outputs only the first x
	       matches. If -l is used (i.e., only file names are
	       sought), then the limit is on the number of files;
	       otherwise, the limit is on the number of records.  If
	       two numbers are given (x:y), then y is an added limit
	       on the total number of files.  If three numbers are
	       given (x:y:z), then z is an added limit on the number
	       of matches per file.  If any of the x, y, or z is set
	       to 0, it means to ignore it (in other words 0 =
	       infinity in this case);	for example, -L 0:10 will
	       output all matches to the first 10 files that contain a
	       match.

	  -N   searches only the index (so the search is faster).  If
	       -o or -b are used then the result is the number of
	       files that have a potential match plus a prompt to ask
	       if you want to see the file names.  (If -y is used,
	       then there is no prompt and the names of the files will
	       be shown.)  This could be a way to get the matching
	       file names without even having access to the files
	       themselves.  However, because only the index is
	       searched, some potential matches may not be real
	       matches.	 In other words, with -N you will not miss any
	       file but you may get extra files.  For example, since

     Page 4					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	       the index stores everything in lower case, a case-
	       sensitive query may match a file that has only a case-
	       insensitive match.  Boolean queries may match a file
	       that has all the keywords but not in the same line
	       (indexing with -b allows glimpse to figure out whether
	       the keywords are close, but it cannot figure out from
	       the index whether they are exactly on the same line or
	       in the same record without looking at the file).	 If
	       the index was not build with -o or -b, then this option
	       outputs the number of blocks matching the pattern. This
	       is useful as an indication of how long the search will
	       take.  All files are partitioned into usually 200-250
	       blocks.	The file .glimpse_statistics contains the
	       total number of blocks (or glimpse -N a will give a
	       pretty good estimate; only blocks with no occurrences
	       of 'a' will be missed).

	  -Q   an extension to -N that not only displays the filename
	       where the match occurs, but the exact occurrences
	       (offsets) as seen in the index.

	  -T directory
	       Use directory as a place where temporary files are
	       built.  (Glimpse produces some small temporary files
	       usually in /tmp.)  This option is useful mainly in the
	       context of structured queries for the Harvest project,
	       where the temporary files may be non-trivial.

	  -V   prints the current version of glimpse.

	  -W   The default for Boolean AND queries is that they cover
	       one record (the default for a record is one line) at a
	       time. For example, glimpse 'good;bad' will output all
	       lines containing both 'good' and 'bad'.	The -W option
	       changes the scope of Booleans to be the whole file.
	       Within a file glimpse will output all matches to any of
	       the patterns.  So, glimpse -W 'good;bad' will output
	       all lines containing 'good' or 'bad', but only in files
	       that contain both patterns.  For structured queries,
	       the scope is always the whole attribute or file.

	  -z   Allow customizable filtering, using the file
	       .glimpse_filters to perform the programs listed there
	       for each match.	The best example is
	       compress/decompress.  If .glimpse_filters include the
	       line
	       *.Z   uncompress <
	       (separated by tabs) then before indexing any file that
	       matches the pattern "*.Z" (same syntax as the one for
	       .glimpse_exclude) the command listed is executed first
	       (assuming input is from stdin, which is why uncompress
	       needs <) and its output (assuming it goes to stdout) is

     Page 5					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	       indexed.	 The file itself is not changed (i.e., it
	       stays compressed).  Then if glimpse -z is used, the
	       same program is used on these files on the fly.	Any
	       program can be used (we run 'exec').  For example, one
	       can filter out parts of files that should not be
	       indexed.	 Glimpseindex tries to apply all filters in
	       .glimpse_filters in the order they are given.  For
	       example, if you want to uncompress a file and then
	       extract some part of it, put the compression command
	       (the example above) first and then another line that
	       specifies the extraction.  Note that this can slow down
	       the search because the filters need to be run before
	       files are searched.  (See also glimpseindex.)

     The Options of Agrep Supported by Glimpse
	  -#   # is an integer between 1 and 8 specifying the maximum
	       number of errors permitted in finding the approximate
	       matches (the default is zero).  Generally, each
	       insertion, deletion, or substitution counts as one
	       error.  It is possible to adjust the relative cost of
	       insertions, deletions and substitutions (see -I -D and
	       -S options).  Since the index stores only lower case
	       characters, errors of substituting upper case with
	       lower case may be missed (see LIMITATIONS).

	  -c   Display only the count of matching records.  Only files
	       with count > 0 are displayed.

	  -d 'delim'
	       Define delim to be the separator between two records.
	       The default value is '$', namely a record is by default
	       a line.	delim can be a string of size at most 8 (with
	       possible use of ^ and $), but not a regular expression.
	       Text between two delim's, before the first delim, and
	       after the last delim is considered as one record.  For
	       example, -d '$$' defines paragraphs as records and -d
	       '^From ' defines mail messages as records.  glimpse
	       matches each record separately.	This option does not
	       currently work with regular expressions.	 The -d option
	       is especially useful for Boolean AND queries, because
	       the patterns need not appear in the same line but in
	       the same record. For example, glimpse -F mail -d
	       '^From ' 'glimpse;arizona;announcement' will output all
	       mail messages (in their entirety) that have the 3
	       patterns anywhere in the message (or the header),
	       assuming that files with 'mail' in their name contain
	       mail messages.  If you want to output a whole file that
	       matches a Boolean pattern, you can use -d 'O9g1Xs' (or
	       another garbage pattern).  If the delimiter doesn't
	       appear anywhere, the whole file is one record (there is
	       a limit, however, to the size of records, see
	       LIMITATIONS).  Glimpse warning:	Use this option with

     Page 6					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	       care.  If the delimiter is set to match mail messages,
	       for example, and glimpse finds the pattern in a regular
	       file, it may not find the delimiter and will therefore
	       output the whole file.  (The -t option - see below -
	       can be used to put the delim at the end of the record.)

	  -e pattern
	       Same as a simple pattern argument, but useful when the
	       pattern begins with a `-'.

	  -h   Do not display filenames.

	  -i   Case-insensitive search - e.g., "A" and "a" are
	       considered equivalent.  Glimpse's index stores all
	       patterns in lower case (see LIMITATIONS below).

	  -k   No symbol in the pattern is treated as a meta
	       character. For example, glimpse -k 'a(b|c)*d' will find
	       the occurrences of a(b|c)*d whereas glimpse 'a(b|c)*d'
	       will find substrings that match the regular expression
	       'a(b|c)*d'.  (The only exception is ^ at the beginning
	       of the pattern and $ at the end of the pattern, which
	       are still interpreted in the usual way. Use \^ or \$ if
	       you need them verbatim.)

	  -l   Output only the files names that contain a match.

	  -n   Each matching record (line) is prefixed by its record
	       (line) number in the file.

	  -r   (This option is valid only when a file name is given
	       and glimpse is used as agrep; it is a new agrep
	       option.)	 If the file name is a directory name, glimpse
	       will search (recursively) the whole directory and
	       everything below it.  Glimpse will not use its index.

	  -s   Work silently, that is, display nothing except error
	       messages.  This is useful for checking the error
	       status.

	  -t   Output the record starting from the end of delim to
	       (and including) the next delim. This is useful for
	       cases where delim should come at the end of the record.
	       (See warning for the -d option.)

	  -w   Search for the pattern as a word - i.e., surrounded by
	       non-alphanumeric characters.  For example, glimpse -w
	       -1 car will match cars, but not characters and not
	       car10.  The non-alphanumeric must surround the match;
	       they cannot be counted as errors.  This option does not
	       work with regular expressions.

     Page 7					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	  -x   The pattern must match the whole line.  (This option is
	       translated to -w when the index is searched and it is
	       used only when the actual text is searched.  It is of
	       limited use in glimpse.)

	  -y   Do not prompt.  Proceed with the match as if the answer
	       to any prompt is y.

	  -B   Best match mode.	 (Warning: -B sometimes misses
	       matches.	 It is safer to specify the number of errors
	       explicitly.)  When -B is specified and no exact matches
	       are found, glimpse will continue to search until the
	       closest matches (i.e., the ones with minimum number of
	       errors) are found, at which point the following message
	       will be shown:  "the best match contains x errors,
	       there are y matches, output them? (y/n)" This message
	       refers to the number of matches found in the index.
	       There may be many more matches in the actual text (or
	       there may be none if -F is used to filter files).  When
	       the -#, -c, or -l options are specified, the -B option
	       is ignored.  In general, -B may be slower than -#, but
	       not by very much.  Since the index stores only lower
	       case characters, errors of substituting upper case with
	       lower case may be missed (see LIMITATIONS).

	  -Dk  Set the cost of a deletion to k (k is a positive
	       integer).  This option does not currently work with
	       regular expressions.

	  -G   Output the (whole) files that contain a match.

	  -Ik  Set the cost of an insertion to k (k is a positive
	       integer).  This option does not currently work with
	       regular expressions.

	  -Sk  Set the cost of a substitution to k (k is a positive
	       integer).  This option does not currently work with
	       regular expressions.

	  The characters `$', `^', `*', `[', `]', `^', `|', `(', `)',
	  `!', and `\' can cause unexpected results when included in
	  the pattern, as these characters are also meaningful to the
	  shell.  To avoid these problems, enclose the entire pattern
	  in single quotes, i.e., 'pattern'.  Do not use double quotes
	  (").

     PATTERNS
	  glimpse supports a large variety of patterns, including
	  simple strings, strings with classes of characters, sets of
	  strings, wild cards, and regular expressions (see
	  LIMITATIONS).

     Page 8					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	  Strings
	       Strings are any sequence of characters, including the
	       special symbols `^' for beginning of line and `$' for
	       end of line.  The following special characters ( `$',
	       `^', `*', `[', `^', `|', `(', `)', `!', and `\' ) as
	       well as the following meta characters special to
	       glimpse (and agrep):  `;', `,', `#', `<', `>', `-', and
	       `.', should be preceded by `\' if they are to be
	       matched as regular characters.  For example, \^abc\\
	       corresponds to the string ^abc\, whereas ^abc
	       corresponds to the string abc at the beginning of a
	       line.

	  Classes of characters
	       a list of characters inside [] (in order) corresponds
	       to any character from the list.	For example, [a-ho-z]
	       is any character between a and h or between o and z.
	       The symbol `^' inside [] complements the list.  For
	       example, [^i-n] denote any character in the character
	       set except character 'i' to 'n'.	 The symbol `^' thus
	       has two meanings, but this is consistent with egrep.
	       The symbol `.' (don't care) stands for any symbol
	       (except for the newline symbol).

	  Boolean operations
	       Glimpse supports an `AND' operation denoted by the
	       symbol `;' an `OR' operation denoted by the symbol `,',
	       or any combination. For example, glimpse
	       'pizza;cheeseburger' will output all lines containing
	       both patterns.  glimpse -F 'gnu;\.c$' 'define;DEFAULT'
	       will output all lines containing both 'define' and
	       'DEFAULT' (anywhere in the line, not necessarily in
	       order) in files whose name contains 'gnu' and ends with
	       .c.  glimpse '{political,computer};science' will match
	       'political science' or 'science of computers'.

	  Wild cards
	       The symbol '#' is used to denote a sequence of any
	       number (including 0) of arbitrary characters (see
	       LIMITATIONS). The symbol # is equivalent to .* in
	       egrep.  In fact, .* will work too, because it is a
	       valid regular expression (see below), but unless this
	       is part of an actual regular expression, # will work
	       faster. (Currently glimpse is experiencing some
	       problems with #.)

	  Combination of exact and approximate matching
	       Any pattern inside angle brackets <> must match the
	       text exactly even if the match is with errors.  For
	       example, <mathemat>ics matches mathematical with one
	       error (replacing the last s with an a), but
	       mathe<matics> does not match mathematical no matter how

     Page 9					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	       many errors are allowed.	 (This option is buggy at the
	       moment.)

	  Regular expressions
	       Since the index is word based, a regular expression
	       must match words that appear in the index for glimpse
	       to find it.  Glimpse first strips the regular
	       expression from all non-alphabetic characters, and
	       searches the index for all remaining words.  It then
	       applies the regular expression matching algorithm to
	       the files found in the index.  For example, glimpse
	       'abc.*xyz' will search the index for all files that
	       contain both 'abc' and 'xyz', and then search directly
	       for 'abc.*xyz' in those files.  (If you use glimpse -w
	       'abc.*xyz', then 'abcxyz' will not be found, because
	       glimpse will think that abc and xyz need to be matches
	       to whole words.)	 The syntax of regular expressions in
	       glimpse is in general the same as that for agrep.  The
	       union operation `|', Kleene closure `*', and
	       parentheses () are all supported.  Currently '+' is not
	       supported.  Regular expressions are currently limited
	       to approximately 30 characters (generally excluding
	       meta characters).  Some options (-d, -w, -t, -x, -D,
	       -I, -S) do not currently work with regular expressions.
	       The maximal number of errors for regular expressions
	       that use '*' or '|' is 4. (See LIMITATIONS.)

	  structured queries
	       Glimpse supports some form of structured queries using
	       Harvest's SOIF format.  See STRUCTURED QUERIES below
	       for details.

     EXAMPLES
	  (Run "glimpse '^glimpse' this-file" to get a list of all
	  examples, some of which were given earlier.)

	  glimpse -F 'haystack.h$' needle
	       finds all needles in all haystack.h's files.

	  glimpse -2 -F html Anestesiology
	       outputs all occurrences of Anestesiology with two
	       errors in files with html somewhere in their full name.

	  glimpse -l -F '.c$' variablename
	       lists the names of all .c files that contain
	       variablename (the -l option lists file names rather
	       than output the matched lines).

	  glimpse -F 'mail;1993' 'windsurfing;Arizona'
	       finds all lines containing windsurfing and Arizona in
	       all files having `mail' and '1993' somewhere in their
	       full name.

     Page 10					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	  glimpse -F mail 't.j@#uk'
	       finds all mail addresses (search only files with mail
	       somewhere in their name) from the uk, where the login
	       name ends with t.j, where the . stands for any one
	       character. (This is very useful to find a login name of
	       someone whose middle name you don't know.)

	  glimpse -F mbox -h -G	 . > MBOX
	       concatenates all files whose name matches `mbox' into
	       one big one.

     SEARCHING IN COMPRESSED FILES
	  Glimpse includes an optional new compression program, called
	  cast, which allows glimpse (and agrep) to search the
	  compressed files without having to decompress them.  The
	  search is actually significantly faster when the files are
	  compressed.  However, we have not tested cast as thoroughly
	  as we would have liked, and a mishap in a compression
	  algorithm can cause loss of data, so we recommend at this
	  point to use cast very carefully.  (Unless you specifically
	  use cast, the default is to ignore it.)

     GLIMPSEINDEX FILES
	  All files used by glimpse are located at the directory(ies)
	  where the index(es) is (are) stored and have .glimpse_ as a
	  prefix.  The first two files (.glimpse_exclude and
	  .glimpse_include) are optionally supplied by the user.  The
	  other files are built and read by glimpse.

	  .glimpse_exclude
	       contains a list of files that glimpseindex is
	       explicitly told to ignore. In general, the syntax of
	       .glimpse_exclude/include is the same as that of agrep
	       (or any other grep).  The lines in the .glimpse_exclude
	       file are matched to the file names, and if they match,
	       the files are excluded.	Notice that agrep matches to
	       parts of the string!  e.g., agrep /ftp/pub will match
	       /home/ftp/pub and /ftp/pub/whatever.  So, if you want
	       to exclude /ftp/pub/core, you just list it, as is, in
	       the .glimpse_exclude file.  If you put
	       "/home/ftp/pub/cdrom" in .glimpse_exclude, every file
	       name that matches that string will be excluded, meaning
	       all files below it.  You can use ^ to indicate the
	       beginning of a file name, and $ to indicate the end of
	       one, and you can use * and ? in the usual way.  For
	       example /ftp/*html will exclude /ftp/pub/foo.html, but
	       will also exclude /home/ftp/pub/html/whatever;  if you
	       want to exclude files that start with /ftp and end with
	       html use ^/ftp*html$ Notice that putting a * at the
	       beginning or at the end is redundant (in fact, in this
	       case glimpseindex will remove the * when it does the
	       indexing).  No other meta characters are allowed in

     Page 11					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	       .glimpse_exclude (e.g., don't use .* or # or |).	 Lines
	       with * or ? must have no more than 30 characters.
	       Notice that, although the index itself will not be
	       indexed, the list of file names (.glimpse_filenames)
	       will be indexed unless it is explicitly listed in
	       .glimpse_exclude.

	  .glimpse_filters
	       See the description above for the -z option.

	  .glimpse_include
	       contains a list of files that glimpseindex is
	       explicitly told to include in the index even though
	       they may look like non-text files.  Symbolic links are
	       followed by glimpseindex only if they are specifically
	       included here.  If a file is in both .glimpse_exclude
	       and .glimpse_include it will be excluded.

	  .glimpse_filenames
	       contains the list of all indexed file names, one per
	       line.  This is an ASCII file that can also be used with
	       agrep to search for a file name leading to a fast find
	       command.	 For example,
	       glimpse 'count#\.c$' ~/.glimpse_filenames
	       will output the names of all (indexed) .c files that
	       have 'count' in their name (including anywhere on the
	       path from the index).  Setting the following alias in
	       the .login file may be useful:
	       alias findfile 'glimpse -h :1 ~/.glimpse_filenames'

	  .glimpse_index
	       contains the index.  The index consists of lines, each
	       starting with a word followed by a list of block
	       numbers (unless the -o or -b options are used, in which
	       case each word is followed by an offset into the file
	       .glimpse_partitions where all pointers are kept).  The
	       block/file numbers are stored in binary form, so this
	       is not an ASCII file.

	  .glimpse_messages
	       contains the output of the -w option (see above).

	  .glimpse_partitions
	       contains the partition of the indexed space into blocks
	       and, when the index is built with the -o or -b options,
	       some part of the index.	This file is used internally
	       by glimpse and it is a non-ASCII file.

	  .glimpse_statistics
	       contains some statistics about the makeup of the index.
	       Useful for some advanced applications and customization
	       of glimpse.

     Page 12					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	  .glimpse_turbo
	       An added data structure (used under glimpseindex -o or
	       -b only) that helps to speed up queries significantly
	       for large indexes.  Its size is 0.25MB.	Glimpse will
	       work without it if needed.

     STRUCTURED QUERIES
	  Glimpse can search for Boolean combinations of
	  "attribute=value" terms by using the Harvest SOIF parser
	  library (in glimpse/libtemplate). To search this way, the
	  index must be made by using the -s option of glimpseindex
	  (this can be used in conjunction with other glimpseindex
	  options). For glimpse and glimpseindex to recognize
	  "structured" files, they must be in SOIF format. In this
	  format, each value is prefixed by an attribute-name with the
	  size of the value (in bytes) present in "{}" after the name
	  of the attribute. For example, The following lines are part
	  of an SOIF file:
	  type{17}:	  Directory-Listing
	  md5{32}:	  3858c73d68616df0ed58a44d306b12ba
	  Any string can serve as an attribute name.  Glimpse
	  "pattern;type=Directory-Listing" will search for "pattern"
	  only in files whose type is "Directory-Listing".  The file
	  itself is considered to be one "object" and its name/url
	  appears as the first attribute with an "@" prefix; e.g.,
	  @FILE { http://xxx... } The scope of Boolean operations
	  changes from records (lines) to whole files when structured
	  queries are used in glimpse (since individual query terms
	  can look at different attributes and they may not be
	  "covered" by the record/line).  Note that glimpse can only
	  search for patterns in the value parts of the SOIF file:
	  there are some attributes (like the TTL, MD5, etc.) that are
	  interpreted by Harvest's internal routines.  See
	  http://harvest.cs.colorado.edu/harvest/user-manual/ for more
	  detailed information of the SOIF format.

     REFERENCES
	  1.   U. Manber and S. Wu, "GLIMPSE: A Tool to Search Through
	       Entire File Systems," Usenix Winter 1994 Technical
	       Conference, San Francisco (January 1994), pp. 23-32.
	       Also, Technical Report #TR 93-34, Dept. of Computer
	       Science, University of Arizona, October 1993 (a
	       postscript file is available by anonymous ftp at
	       cs.arizona.edu:reports/1993/TR93-34.ps).

	  2.   S. Wu and U. Manber, "Fast Text Searching Allowing
	       Errors," Communications of the ACM 35 (October 1992),
	       pp. 83-91.

     SEE ALSO
	  agrep(1), ed(1), ex(1), glimpseindex(1), glimpseserver(1),
	  grep(1), sh(1), csh(1).

     Page 13					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

     LIMITATIONS
	  The index of glimpse is word based.  A pattern that contains
	  more than one word cannot be found in the index.  The way
	  glimpse overcomes this weakness is by splitting any multi-
	  word pattern into its set of words and looking for all of
	  them in the index.  For example, glimpse 'linear
	  programming' will first consult the index to find all files
	  containing both linear and programming, and then apply agrep
	  to find the combined pattern.	 This is usually an effective
	  solution, but it can be slow for cases where both words are
	  very common, but their combination is not.

	  As was mentioned in the section on PATTERNS above, some
	  characters serve as meta characters for glimpse and need to
	  be preceded by '\' to search for them.  The most common
	  examples are the characters '.' (which stands for a wild
	  card), and '*' (the Kleene closure).	So, "glimpse ab.de"
	  will match abcde, but "glimpse ab\.de" will not, and
	  "glimpse ab*de" will not match ab*de, but "glimpse ab\*de"
	  will.	 The meta character - is translated automatically to a
	  hypen unless it appears between [] (in which case it denotes
	  a range of characters).

	  The index of glimpse stores all patterns in lower case.
	  When glimpse searches the index it first converts all
	  patterns to lower case, finds the appropriate files, and
	  then searches the actual files using the original patterns.
	  So, for example, glimpse ABCXYZ will first find all files
	  containing abcxyz in any combination of lower and upper
	  cases, and then searches these files directly, so only the
	  right cases will be found.  One problem with this approach
	  is discovering misspellings that are caused by wrong cases.
	  For example, glimpse -B abcXYZ will first search the index
	  for the best match to abcxyz (because the pattern is
	  converted to lower case); it will find that there are
	  matches with no errors, and will go to those files to search
	  them directly, this time with the original upper cases. If
	  the closest match is, say AbcXYZ, glimpse may miss it,
	  because it doesn't expect an error.  Another problem is
	  speed.  If you search for "ATT", it will look at the index
	  for "att".  Unless you use -w to match the whole word,
	  glimpse may have to search all files containing, for
	  example, "Seattle" which has "att" in it.

	  There is no size limit for simple patterns and simple
	  patterns within Boolean expressions.	More complicated
	  patterns, such as regular expressions, are currently limited
	  to approximately 30 characters.  Lines are limited to 1024
	  characters.  Records are limited to 48K, and may be
	  truncated if they are larger than that.  The limit of record
	  length can be changed by modifying the parameter Max_record
	  in agrep.h.

     Page 14					     (printed 11/3/95)

     GLIMPSE(l)	     UNIX System V (October 11, 1995)	    GLIMPSE(l)

	  Glimpseindex does not index words of size > 64.

     BUGS
	  A Boolean AND query that includes two patterns one of which
	  is a prefix of the other (or equal to the other) may not
	  work correctly.  Essentially glimpse will find the smallest
	  pattern first, but will not backtrack to try to check again
	  if it matches another pattern.  (We are not sure whether
	  this is a bug or a feature, because there is no apparent
	  reason to have patterns like that.)

	  A Boolean query with a pattern of length 1 (i.e., one
	  character only) may miss matches.

	  In some rare cases, regular expressions using * or # may not
	  match correctly.

	  A query that contains no alphanumeric characters is not
	  recommended (unless glimpse is used as agrep and the file
	  names are provided).	This is an understatement.

	  Please send bug reports or comments to
	  glimpse@cs.arizona.edu.

     DIAGNOSTICS
	  Exit status is 0 if any matches are found, 1 if none, 2 for
	  syntax errors or inaccessible files.

     AUTHORS
	  Udi Manber and Burra Gopal, Department of Computer Science,
	  University of Arizona, and Sun Wu, the National Chung-Cheng
	  University, Taiwan. (Email:  glimpse@cs.arizona.edu)

     Page 15					     (printed 11/3/95)

[top]

List of man pages available for IRIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net