Parse(3) perl/Tk Documentation Parse(3)NAME
Pod::Parse - Parse perl's pod files.
SYNOPSIS
THIS TK SNAPSHOT SHOULD BE REPLACED BY A CPAN MODULE
DESCRIPTION
A module designed to simplify the job of parsing and
formatting ``pods'', the documentation format used by
perl5. This consists of several different functions to
present and modify predigested pod files.
GUESSES
This is a work in progress, so I may have some stuff
wrong, perhaps badly. Some of my more reaching guesses:
o An =index paragraph should be split into lines, and
each line placed inside an `X' formatting command
which is then preprended to the next paragraph, like
this:
=index foo
foo2
foo3
foo2!subfoo
Foo!
Will become:
X<foo>X<foo2>X<foo3>X<foo2!subfoo>Foo!
o A related change: that an `X' command is to be used
for indexing data. This implies that all formatters
need to at least ignore the `X' command.
o Inside an =command, no special significance is to be
placed on the first line of the argument. Thus the
following two lines should be parsed identically:
=item 1. ABC
=item 1.
ABC
Note that neither of these are identical to this:
=item 1.
ABC
which puts the "ABC" in a separate paragraph.
25/Aug/1997 Tk400.202 1
Parse(3) perl/Tk Documentation Parse(3)
o I actually violate this rule twice: in parsing =index
commands, and in passing through the =pragma commands.
I hope this make sense.
o I added the =comment command, which simply ignores the
next paragraph
o I also added =pragma, which also ignores the next
paragraph, but this time it gives the formatter a
chance at doing something sinister with it.
POD CONVENTIONS
This module has two goals: first, to simplify the usage of
the pod format, and secondly the codification of the pod
format. While perlpod contains some information, it hardly
gives the entire story. Here I present "the rules", or at
least the rules as far as I've managed to work them out.
Paragraphs: The basic element
The fundamental "atom" of a pod file is the paragraph,
where a paragraph is defined as the text up to the
next completely blank line ("\n\n"). Any pod parser
will read in paragraphs sequentially, deciding what do
to with each based solely on the current state and on
the text at the _beginning_ of the paragraph.
Commands: The method of communication
A paragraph that starts with the `=' symbol is assumed
to be a special command. All of the alphanumeric
characters directly after the `=' are assumed to be
part of the name of the command, up to the first
whitespace. Anything past that whitespace is
considered "the arugment", and the argument continues
up till the end of the paragraph, regardless of
newlines or other whitespace.
Text: Commands that aren't Commands
A paragraph that doesn't start with `=' is treated as
either of two types of text. If it starts with a space
or tab, it is considered a verbatim paragraph, which
will be printed out... verbatim. No formatting changes
whatsover may be done. (Actually, this isn't quite
true, but I'll get back to that at a later date.)
A paragraph that doesn't start with whitespace or `='
is assumed to consist of formmated text that can be
molded as the formatter sees fit. Reformatting to fit
margins, whatever, it's fair game. These paragraphs
also can contain a number of different formatting
codes, which verbatim paragraphs can't. These
formatting codes are covered later.
=cut: The uncommand
There is one command that needs special mention: =cut.
25/Aug/1997 Tk400.202 2
Parse(3) perl/Tk Documentation Parse(3)
Anything after a paragraph starting with =cut is
simply ignored by the formatter. In addition, any text
before a valid command is equally ignored. Any valid
`=' command will reenable formating. This fact is used
to great benefit by Perl, which is glad to ignore
anything between an `=' command and `=cut', so you can
embed a pod document right inside a perl program, and
neither will bother the other.
Reference to paragraph commands
=cut Ignore anything till the next paragraph starting
with `='.
=head1 A top-level heading. Anything after the command
(either on the same line or on further lines) is
included in the heading, up until the end of the
paragraph.
=head2 Secondary heading. Same as =head1, but different.
No, there isn't a head3, head4, etc.
=over [N]
Start a list. The N is the number of characters to
indent by. Not all formatters will listen to this,
though. A good number to use is 4.
While =over sounds like it should just be
indentation, it's more complex then that. It
actually starts a nested environment, specifically
for the use of =item's. As this command recurses
properly, you can use more then one, you just have
to make sure they are closed off properly by =back
commands.
=back Ends the last =over block. Resets the indentation
to whatever it was previously. Closes off the list
of =item's.
=item The point behind =over and =back. This command
should only be used between them. The argument
supplied should be consistent (within a list) to
one of three types: enumeration, itemization, or
description. To exemplify:
An itemized list
25/Aug/1997 Tk400.202 3
Parse(3) perl/Tk Documentation Parse(3)
=over 4
=item *
A bulleted item
=item *
Another bulleted item
=back
An enumerated list
=over 4
=item 1.
First item.
=item 2.
Second item.
=back
A described list
=over 4
=item Item #1
First item
=item Item #2 (which isn't really like #1, but is the second).
Second item
=back
If you aren't consistent about the arguments to =item, Pod::Parse will
complain.
=comment
Ignore this paragraph
=pragma Ignore this paragraph, as well, unless you know
what you are doing.
=index Undecided at this time, but probably magic
involving X<>.
25/Aug/1997 Tk400.202 4
Parse(3) perl/Tk Documentation Parse(3)
Reference to formatting directives
B<...> Format text inside the brackets as bold.
I<...> Format text inside the brackets as italics.
Z<> Replace with a zero-width character. You'll
probably figure out some uses for this.
And yet more that I haven't described yet...
USAGE
Parse
This function takes a list of files as an argument. If no
argument is given, it defaults to the contents of @ARGV.
Parse then reads through each file and returns the data as
a list. Each element of this list will be a nested list
containing data from a paragraph of the pod file. Elements
pertaining to "=over" paragraphs will themselves contain
the nested entries for all of the paragraphs within that
list. Thus, it's easier to parse the output of Parse using
a recursive parses. (Um, did that parse?)
It is highly recommended that you use the output of
Simplify, not Parse, as it's simpler.
The output will consist of a list, where each element in
the list matches one of these prototypes:
[0,0,0,0,$filename]
This is produced at the beginning of each file parsed,
where $filename is the name of that file.
[-1,0,0,0,$filename]
End of same.
[1,$line,$pos,0,$verbatim]
This is produced for each paragraph of verbatim text.
$verbatim is the text, $line is the line offset of the
paragraph within the file, and $pos is the byte
offset. (In all of the following elements, $pos and
$line have identical meanings, so I'll skip explaining
them each time.)
[2,$line,$pos,$level,$heading]
Producded by a =head1 or =head2 command. $level is
either 1 or 2, and $heading is the argument.
[3,$line,$pos,0,$item]
$item is the argument from an =item paragraph.
[4,$line,$pos,0,$index]
$index is the argument from an =index paragraph.
25/Aug/1997 Tk400.202 5
Parse(3) perl/Tk Documentation Parse(3)
[6,$line,$pos,0,$text]
Normal formatted text paragraph. $text is the text.
[7,$line,$pos,0,$pragma]
$pragma is the argument from a =pragma paragraph.
[8,$line,$pos,$indentation,$type,...]
This item is produced for each matching =over/=back
pair. $indentation is the argument to =over, $type is
1 if the embedded =item's are bulleted, 2 if they are
enumerated, 3 if they are text, and 0 if there are no
items.
The "..." indicates an unlimited number of further
elements which are themselves nested arrays in exactly
the format being described. In other words, a list
item includes all the paragraphs inside the list
inside itself. (Clear? No? Nevermind.)
[9,$line,$pos,0,$cut]
$cut contains the text from a =cut paragraph. You
shouldn't need to use this, but I _suppose_ it might
be necessary to do special breaks on a cut. I doubt it
though. This one is "depreciated", as Larry put it. Or
perhaps disappreciated.
Simplify
This procedure takes as it's input the convoluted output
from Parse(), and outputs a much simpler array consisting
of pairs of commands and arguments, designed to be easy
(easier?) to parse in your pod formatting code.
It is used very simply by saying something like:
@Pod = Simplify(Parse());
while($cmd = shift @Pod) { $arg = shift @Pod;
#...
}
Where #... is the code that responds to any of the
commands from the following list. Note that you are
welcome to ignore any of the commands that you want to.
Many contain duplicate information, or at least
information that will go unused. A formatted based on this
data can be quite simple indeed. (See pod2text for
entirely too simple an example.)
Reference to Simplify commands
""""filename""""
The argument contains the name of the pod file that is
25/Aug/1997 Tk400.202 6
Parse(3) perl/Tk Documentation Parse(3)
being parsed. These will be present at the start of
each file. You should open an output file, output
headers, etc., based on this, and not when you start
parsing.
""""endfile""""
The end of the file. Each file will be ended before
the next one begins, and after all files are done
with. You can do end processing here. The argument is
the same name as in "filename".
""""setline""""
This gives you a chance to record the "current" input
line, probably for debugging purposes. In this case,
"current" means that the next command you see that was
derived from an input paragraph will have start at the
arguments line in the file.
""""setloc""""
Same as setline, but the byte offset in the input,
instead of the line offset.
""""pragma""""
The argument contains the text of a pragma command.
""""text""""
The argument contains a paragraph of formatted text.
""""verbatim""""
The argument contains a paragraph of verbatim text.
""""cut""""
A =cut command was hit. You shouldn't really need to
listen for this one.
""""index""""
The argument contains an =index paragraph. (Note:
Current =index commands are not fed through, but
turned into X<> commands.)
""""head1""""
""""head2""""
The argument contains the argument from a header
command.
""""setindent""""
If you are tracking indentation, use the argument to
set the indentation level.
""""listbegin""""
Start a list environment. The argument is the type of
list (1,2,3 or 0).
25/Aug/1997 Tk400.202 7
Parse(3) perl/Tk Documentation Parse(3)
""""listend""""
Ends a list environment. Same argument as listbegin.
""""listtype""""
The argument is the type of list. You can just record
the argument when you see one of these, instead of
paying attention to listbegin & listend.
""""over""""
The argument is the indentation. It's probably better
to listen to the "list..." commands.
""""back""""
Ends an "over" list. The argument is the original
indentation.
""""item""""
The argument is the text of the =item command.
Note that all of these various commands you've seen are
syncronized properly so you don't have to pay attention to
all at once, but they are all output for your benefit.
Consider the following example:
listtype 2
listbegin 2
setindent 4
over 4
item 1.
text Item #1
item 2.
text Item #2
setindent 0
listend 2
back 0
listtype 0
=head2 Normalize
This command is normally invoked by Parse, so you
shouldn't need to deal with it. It just cleans up text a
little, turning spare '<', '>', and '&' characters into
HTML escapes (<, etc.) as well as generating warnings for
some pod formatting mistakes.
Normalize2
A little more aggresive formating based on heuristics. Not
applied by default, as it might confuse your own
heuristics.
25/Aug/1997 Tk400.202 8
Parse(3) perl/Tk Documentation Parse(3)
%Escapes
This hash is exported from Pod::Parse, and contains
default ASCII translations for some common HTML escape
sequences. You might like to use this as a basis for an
%HTML_Escapes array in your own formatter.
25/Aug/1997 Tk400.202 9
Parse(3) perl/Tk Documentation Parse(3)25/Aug/1997 Tk400.202 10