/ org.htmlparser / src / org / htmlparser / tags / package.html
package.html
 1  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
 2  <html>
 3  <head>
 4  <!--
 5   HTMLParser Library $Name: v1_6_20060319 $ - A java-based parser for HTML
 6   http://sourceforge.org/projects/htmlparser
 7   Copyright (C) 2004 Somik Raha
 8  
 9   Revision Control Information
10  
11   $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/tags/package.html,v $
12   $Author: derrickoswald $
13   $Date: 2005/04/24 17:48:27 $
14   $Revision: 1.21 $
15  
16   This library is free software; you can redistribute it and/or
17   modify it under the terms of the GNU Lesser General Public
18   License as published by the Free Software Foundation; either
19   version 2.1 of the License, or (at your option) any later version.
20  
21   This library is distributed in the hope that it will be useful,
22   but WITHOUT ANY WARRANTY; without even the implied warranty of
23   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
24   Lesser General Public License for more details.
25  
26   You should have received a copy of the GNU Lesser General Public
27   License along with this library; if not, write to the Free Software
28   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
29  -->
30  </head>
31  <body bgcolor="white">
32  The tags package contains specific tags.
33  <p>This package has implementations of tags that have functionality beyond the
34  capability of a generic tag. For example, the {@.html <META>} tag has methods
35  to get the {@link org.htmlparser.tags.MetaTag#getMetaContent CONTENT} and
36  {@link org.htmlparser.tags.MetaTag#getMetaTagName NAME}
37  attributes (although this could be done with generic attribute manipulation)
38  and an implementation of
39  {@link org.htmlparser.tags.MetaTag#doSemanticAction doSemanticAction}
40  that alters the lexer's encoding.</p>
41  <p>The classes in this package have been added in an ad-hoc fashion, with the
42  most useful ones having existed a long time, while some obvious ones are rather
43  new. Please feel free to add your own custom tags, and register them with the
44  {@link org.htmlparser.PrototypicalNodeFactory PrototypicalNodeFactory},
45  and they will be treated like any other in-built tag. In fact tags do not need
46  to reside in this package.</p>
47  <br><b>Custom Tags</b>
48  <p>Creating custom tags is fairly straight forward. Simply copy one of the
49  simpler tags you find in this package and alter it as follows.
50  <p>If the tag can contain other nodes, i.e. {@.html <h1>My Heading</h1>}, then
51  it should derive from (i.e. be a subclass of) {@link org.htmlparser.tags.CompositeTag}.
52  In this way it will inherit the
53  {@link org.htmlparser.scanners.CompositeTagScanner CompositeTagScanner}
54  and nodes between the start and end tag will be gathered into the list of
55  children. Most of the tags in this package derive from CompositeTag, and that
56  is why the nodes returned from the Parser are nested.</p>
57  <p>If it is a simple tag, i.e. {@.html <br>}, then it should derive from
58  {@link org.htmlparser.nodes.TagNode TagNode}. See for example
59  {@link org.htmlparser.tags.MetaTag}
60  or {@link org.htmlparser.tags.ImageTag}.</p>
61  <p>To be registered with {@link org.htmlparser.PrototypicalNodeFactory#registerTag},
62  and especially if it is a composite tag, the tag needs to implement
63  <code>getIds</code> which returns the UPPERCASE list of names for the tag
64  (usually only one), for example "HTML". If the tag can be smart enough to know
65  what other tags can't be contained within it, it should also implement
66  {@link org.htmlparser.nodes.TagNode#getEnders getEnders()} which returns the
67  list of other tags that should cause this tag to close itself, and 
68  {@link org.htmlparser.nodes.TagNode#getEndTagEnders getEndTagEnders()} which
69  returns the list of end tags (i.e. {@.html </xxx>}), other than it's own name, that
70  should cause this tag to close itself. When these 'ender' lists cause a tag to
71  end before seeing it's own end tag, a virtual end tag is created and 'inserted'
72  at the location where the end tag should have been. These end tags can be
73  distinguished because their {@link org.htmlparser.Node#getStartPosition starting}
74  and {@link org.htmlparser.Node#getEndPosition ending} locations are the same
75  (i.e. they take up no character length in the HTML stream).
76  <p>For example, the {@.html <OPTION>} tag from a form can be prematurely ended by
77  any of {@.html <INPUT>}, {@.html <TEXTAREA>}, {@.html <SELECT>},
78  or another {@.html <OPTION>} tag. These are the tags in the getEnders() list.
79  It can also be prematurely ended by {@.html </SELECT>}, {@.html </FORM>},
80  {@.html </BODY>}, or {@.html </HTML>}. These are the tags in the
81  getEndTagEnders() list.
82  <p>Other than that any functionality is up to you. You should note that
83  {@link org.htmlparser.Node#doSemanticAction doSemanticAction()} is called after
84  the tag has been completely scanned (it has it's children and end tag), but before
85  its siblings further downstream have been scanned. If transformation is your purpose,
86  this is the opportunity to mess around with the content, for example to set the link URL,
87  or lowercase the tag name, or whatever.
88  <!-- Put @see and @since tags down here. -->
89  
90  </body>
91  </html>