pstotext man page on IRIX

Man page or keyword search:  
man Server   31559 pages
apropos Keyword Search (all sections)
Output format
IRIX logo
[printable version]

pstotext(1)					      pstotext(1)

NAME
       pstotext	 -  extract  ASCII  text from a PostScript or PDF
       file

SYNTAX
       pstotext [option|pathname]...

       where option includes:

       -cork
       -landscape
       -landscapeOther
       -portrait
       -
       -output file
       -gs command
       -debug
       -bboxes


DESCRIPTION
       pstotext reads one or more PostScript or	 PDF  files,  and
       writes  to  standard  output a representation of the plain
       text that would be displayed if the PostScript  file  were
       printed.	  As  is  described in the DETAILS section below,
       this representation is only an  approximation.	Neverthe
       less,  it is often useful for information retrieval (e.g.,
       running grep(1) or  building  a	full-text  index)  or  to
       recover	the  text from a PostScript file whose source you
       have lost.

       pstotext	  calls	  Ghostscript,	 and   requires	  Aladdin
       Ghostscript  version  3.51  or newer.  Ghostscript must be
       invokable on the current	 search	 path  as  gs.	 Alterna
       tively,	you can use the -gs option to specify the command
       (pathname and options) to run Ghostscript.   For	 example,
       on   Windows   you   might   use	 -gs  "c:\gs\gswin32c.exe
       -Ic:\gs;c:\gs\fonts".

       pstotext reads and processes its command line from left to
       right, ignoring the case of options.  When it encounters a
       pathname,  it  opens  the  file	and  expects  to  find	a
       PostScript  job	or PDF document to process.  The option -
       means to read and process a PostScript job  from	 standard
       input.	If  no	-  or pathname arguments are encountered,
       pstotext reads a PostScript job from standard input.  (PDF
       documents require random access, hence cannot be read from
       standard input.) You can use the -output option to specify
       an  output  file	 (remember  to invoke it before the input
       file); otherwise pstotext writes to standard output.

       The option -cork is only	 relevant  for	PostScript  files
       produced	 by  dvips  from TeX or LaTeX documents; it tells
       pstotext to use the Cork encoding (known as T1  in  LaTeX)
       rather  than  the  old  TeX text encoding (known as OT1 in
       LaTeX). Unfortunately files produced by dvips  don't  dis
       tinguish which font encodings were used.

       The  options -landscape and -landscapeOther should be used
       for documents that must be rotated 90 degrees clockwise or
       counterclockwise, respectively, in order to be readable.

       The  options  -debug and -bboxes are mostly of use for the
       maintainers of pstotext.	 -debug shows Ghostscript  output
       and error messages. -bboxes outputs one word per line with
       bounding box information.

DETAILS
       pstotext does its work by telling Ghostscript  to  load	a
       PostScript library that causes it to write to its standard
       output  information  about  each	 string	 rendered  by	a
       PostScript job or PDF document.	This information includes
       the characters of the string, and enough additional infor
       mation  to  approximate	the  string's bounding rectangle.
       pstotext post-processes this  information  and  outputs	a
       sequence	 of  words delimited by space, newline, and form
       feed.

       pstotext outputs words in the same sequence  as	they  are
       rendered	 by  the document.  This usually, but not always,
       follows the order that a human would read the words  on	a
       page.  Within this sequence, words are separated by either
       space or newline depending on whether or not they fall  on
       the  same  line.	 Each page is terminated with a formfeed.
       If you use the incorrect option from the	 set  {-portrait,
       -landscape,  -landscapeOther},  pstotext is likely to sub
       stitute newline for space.

       A PostScript job or PDF document often renders one word as
       several	strings	 in  order to get correct spacing between
       particular pairs of characters.	pstotext does its best to
       assemble	 these	strings	 back  into words, using a simple
       heuristic: strings separated by a distance  of  less  than
       0.3  times  the minimum of the average character widths in
       the two strings are considered to  be  part  of	the  same
       word.   Note that this typically causes leading and trail
       ing punctuation characters to be included with a word.

       The  PostScript	language  provides  a  flexible	 encoding
       scheme by which character codes in strings select specific
       characters (symbols), so a PostScript job is free  to  use
       any  character  code.   On the other hand, pstotext always
       translates to the ISO  8859-1  (Latin-1)	 character  code,
       which  is an extension to ASCII covering most of the West
       ern European languages.	When a character isn't present in
       ISO  8859-1, pstotext uses a sequence of characters, e.g.,
       "---" for em dash or "A\226" for Abreve.	 pstotext can  be
       fooled  by  a  font  whose  Encoding vector doesn't follow
       Adobe's conventions, but it contains  heuristics	 allowing
       it to handle a wide variety of misbehaving fonts.

       (pstotext  no  longer  translates  hyphen  (\255) to minus
       (\055).)

AUTHOR
       Andrew  Birrell	(PostScript  libraries),   Paul	  McJones
       (application), Russell Lang (Windows and OS/2 adaptation),
       and Hunter Goatley (VMS adaptation).

SEE ALSO
       pstotext incorporates technology originally developed  for
       the     Virtual	   Paper     project	at    SRC;    see
       http://www.research.digital.com/SRC/virtualpaper/.

       As mentioned above,  pstotext  invokes  Ghostscript.   See
       gs(1) or http://www.cs.wisc.edu/~ghost/.

COPYRIGHT
       Copyright 1995-8 Digital Equipment Corporation.
       Distributed only by permission.
       See file pstotext.txt for details.

       Last modified on Sat Feb	 5 21:00:00 AEST 2000 by rjl
	    modified on Fri Jun	 5 14:02:37 PDT 1998 by mcjones
	    modified on Wed Jun	 7 17:47:56 PDT 1995 by birrell

       This  file  was	generated automatically by mtex software;
       see  the	 mtex  home  page  at	http://www.research.digi
       tal.com/SRC/mtex/.

						      pstotext(1)
[top]

List of man pages available for IRIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net