package HTML::GenToc; BEGIN { $HTML::GenToc::VERSION = '3.20'; } use strict; =head1 NAME HTML::GenToc - Generate a Table of Contents for HTML documents. =head1 VERSION version 3.20 =head1 SYNOPSIS use HTML::GenToc; # create a new object my $toc = new HTML::GenToc(); my $toc = new HTML::GenToc(title=>"Table of Contents", toc_entry=>{ H1=>1, H2=>2 }, toc_end=>{ H1=>'/H1', H2=>'/H2' } ); # generate a ToC from a file $toc->generate_toc(input=>$html_file, footer=>$footer_file, header=>$header_file ); =head1 DESCRIPTION HTML::GenToc generates anchors and a table of contents for HTML documents. Depending on the arguments, it will insert the information it generates, or output to a string, a separate file or STDOUT. While it defaults to taking H1 and H2 elements as the significant elements to put into the table of contents, any tag can be defined as a significant element. Also, it doesn't matter if the input HTML code is complete, pure HTML, one can input pseudo-html or page-fragments, which makes it suitable for using on templates and HTML meta-languages such as WML. Also included in the distrubution is hypertoc, a script which uses the module so that one can process files on the command-line in a user-friendly manner. =head1 DETAILS The ToC generated is a multi-level level list containing links to the significant elements. HTML::GenToc inserts the links into the ToC to significant elements at a level specified by the user. B If H1s are specified as level 1, than they appear in the first level list of the ToC. If H2s are specified as a level 2, than they appear in a second level list in the ToC. Information on the significant elements and what level they should occur are passed in to the methods used by this object, or one can use the defaults. There are two phases to the ToC generation. The first phase is to put suitable anchors into the HTML documents, and the second phase is to generate the ToC from HTML documents which have anchors in them for the ToC to link to. For more information on controlling the contents of the created ToC, see L. HTML::GenToc also supports the ability to incorporate the ToC into the HTML document itself via the B option. See L for more information. In order for HTML::GenToc to support linking to significant elements, HTML::GenToc inserts anchors into the significant elements. One can use HTML::GenToc as a filter, outputing the result to another file, or one can overwrite the original file, with the original backed up with a suffix (default: "org") appended to the filename. One can also output the result to a string. =head1 METHODS Default arguments can be set when the object is created, and overridden by setting arguments when the generate_toc method is called. Arguments are given as a hash of arguments. =cut use Data::Dumper; use HTML::SimpleParse; use HTML::Entities; use HTML::LinkList; ################################################################# #---------------------------------------------------------------# # Object interface #---------------------------------------------------------------# =head2 Method -- new $toc = new HTML::GenToc(); $toc = new HTML::GenToc(toc_entry=>\%my_toc_entry, toc_end=>\%my_toc_end, bak=>'bak', ... ); Creates a new HTML::GenToc object. These arguments will be used as defaults in invocations of other methods. See L for possible arguments. =cut sub new { my $invocant = shift; my $class = ref($invocant) || $invocant; # Object or class name my $self = { debug => 0, bak => 'org', entrysep => ', ', footer => '', inline => 0, header => '', input => '', notoc_match => 'class="notoc"', ol => 0, ol_num_levels => 1, overwrite => 0, outfile => '-', quiet => 0, textonly => 0, title => 'Table of Contents', toclabel => '

Table of Contents

', toc_tag => '^BODY', toc_tag_replace => 0, toc_only => 0, # define TOC entry elements toc_entry => { 'H1'=>1, 'H2'=>2, }, # TOC entry element terminators toc_end => { 'H1'=>'/H1', 'H2'=>'/H2', }, useorg => 0, @_ }; # bless self bless($self, $class); if ($self->{debug}) { print STDERR Dumper($self); } return $self; } # new =head2 generate_toc $toc->generate_toc(outfile=>"index2.html"); my $result_str = $toc->generate_toc(to_string=>1); Generates a table of contents for the significant elements in the HTML documents, optionally generating anchors for them first. B =over =item bak bak => I If the input file/files is/are being overwritten (B is on), copy the original file to "I.I". If the value is empty, B backup file will be created. (default:org) =item debug debug => 1 Enable verbose debugging output. Used for debugging this module; in other words, don't bother. (default:off) =item entrysep entrysep => I Separator string for non-
  • item entries (default: ", ") =item filenames filenames => \@filenames The filenames to use when creating table-of-contents links. This overrides the filenames given in the B option, and is expected to have exactly the same number of elements. This can also be used when passing in string-content to the B option, to give a (fake) filename to use for the links relating to that content. =item footer footer => I Either the filename of the file containing footer text for ToC; or a string containing the footer text. =item header header => I Either the filename of the file containing header text for ToC; or a string containing the header text. =item ignore_only_one ignore_only_one => 1 If there would be only one item in the ToC, don't make a ToC. =item ignore_sole_first ignore_sole_first => 1 If the first item in the ToC is of the highest level, AND it is the only one of that level, ignore it. This is useful in web-pages where there is only one H1 header but one doesn't know beforehand whether there will be only one. =item inline inline => 1 Put ToC in document at a given point. See L for more information. =item input input => \@filenames input => $content This is expected to be either a reference to an array of filenames, or a string containing content to process. The three main uses would be: =over =item (a) you have more than one file to process, so pass in multiple filenames =item (b) you have one file to process, so pass in its filename as the only array item =item (c) you have HTML content to process, so pass in just the content as a string =back (default:undefined) =item notoc_match notoc_match => I If there are certain individual tags you don't wish to include in the table of contents, even though they match the "significant elements", then if this pattern matches contents inside the tag (not the body), then that tag will not be included, either in generating anchors nor in generating the ToC. (default: C) =item ol ol => 1 Use an ordered list for level 1 ToC entries. =item ol_num_levels ol_num_levels => 2 The number of levels deep the OL listing will go if B
      is true. If set to zero, will use an ordered list for all levels. (default:1) =item overwrite overwrite => 1 Overwrite the input file with the output. (default:off) =item outfile outfile => I File to write the output to. This is where the modified HTML output goes to. Note that it doesn't make sense to use this option if you are processing more than one file. If you give '-' as the filename, then output will go to STDOUT. (default: STDOUT) =item quiet quiet => 1 Suppress informative messages. (default: off) =item textonly textonly => 1 Use only text content in significant elements. =item title title => I Title for ToC page (if not using B
      or B or B) (default: "Table of Contents") =item toc_after toc_after => \%toc_after_data %toc_after_data = { I => I, I => I }; toc_after => { H2=>'' } For defining layout of significant elements in the ToC. This expects a reference to a hash of tag=>suffix pairs. The I is the HTML tag which marks the start of the element. The I is what is required to be appended to the Table of Contents entry generated for that tag. (default: undefined) =item toc_before toc_before => \%toc_before_data %toc_before_data = { I => I, I => I }; toc_before=>{ H2=>'' } For defining the layout of significant elements in the ToC. The I is the HTML tag which marks the start of the element. The I is what is required to be prepended to the Table of Contents entry generated for that tag. (default: undefined) =item toc_end toc_end => \%toc_end_data %toc_end_data = { I => I, I => I }; toc_end => { H1 => '/H1', H2 => '/H2' } For defining significant elements. The I is the HTML tag which marks the start of the element. The I the HTML tag which marks the end of the element. When matching in the input file, case is ignored (but make sure that all your I options referring to the same tag are exactly the same!). =item toc_entry toc_entry => \%toc_entry_data %toc_entry_data = { I => I, I => I }; toc_entry => { H1 => 1, H2 => 2 } For defining significant elements. The I is the HTML tag which marks the start of the element. The I is what level the tag is considered to be. The value of I must be numeric, and non-zero. If the value is negative, consective entries represented by the significant_element will be separated by the value set by B option. =item toclabel toclabel => I HTML text that labels the ToC. Always used. (default: "

      Table of Contents

      ") =item toc_tag toc_tag => I If a ToC is to be included inline, this is the pattern which is used to match the tag where the ToC should be put. This can be a start-tag, an end-tag or a comment, but the E should be left out; that is, if you want the ToC to be placed after the BODY tag, then give "BODY". If you want a special comment tag to make where the ToC should go, then include the comment marks, for example: "!--toc--" (default:BODY) =item toc_tag_replace toc_tag_replace => 1 In conjunction with B, this is a flag to say whether the given tag should be replaced, or if the ToC should be put after the tag. This can be useful if your toc_tag is a comment and you don't need it after you have the ToC in place. (default:false) =item toc_only toc_only => 1 Output only the Table of Contents, that is, the Table of Contents plus the toclabel. If there is a B
      or a B