package.html
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> 2 <html> 3 <head> 4 <!-- 5 HTMLParser Library $Name: v1_6_20060319 $ - A java-based parser for HTML 6 http://sourceforge.org/projects/htmlparser 7 Copyright (C) 2004 Derrick Oswald 8 9 Revision Control Information 10 11 $Source: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/http/package.html,v $ 12 $Author: derrickoswald $ 13 $Date: 2005/06/19 12:01:14 $ 14 $Revision: 1.3 $ 15 16 This library is free software; you can redistribute it and/or 17 modify it under the terms of the GNU Lesser General Public 18 License as published by the Free Software Foundation; either 19 version 2.1 of the License, or (at your option) any later version. 20 21 This library is distributed in the hope that it will be useful, 22 but WITHOUT ANY WARRANTY; without even the implied warranty of 23 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 24 Lesser General Public License for more details. 25 26 You should have received a copy of the GNU Lesser General Public 27 License along with this library; if not, write to the Free Software 28 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 29 --> 30 </head> 31 <body> 32 The http package is responsible for HTTP connections to servers. 33 The Lexer and Parser provide many ways to supply text to be parsed, 34 but this package only deals with cases where a URL is supplied as a 35 string, with the expectation that the Lexer or Parser will perform 36 the HTTP connection. 37 <p>The {@link org.htmlparser.http.ConnectionManager} class adds 38 <ul> 39 <li>cookie</li> 40 <li>proxy</li> 41 <li>password protected URL</li> 42 </ul> 43 capabilities when accessing the internet via the 44 <a href="http://www.ietf.org/rfc/rfc2616.txt">HTTP protocol</a>. 45 Each of these capabilities requires conditioning the HTTP connection. 46 A HTTP header utility class is also included. 47 <p>The {@link org.htmlparser.http.ConnectionMonitor} interface is a callback 48 mechanism for the ConnectionManager to notify an interested application 49 when an HTTP connection is made. Example uses may include conditioning the 50 connection further, accessing HTTP header information, or providing reporting 51 or statistical functions. Callbacks are not performed for FileURLConnections, 52 which are also handled by the connection manager. 53 <p>The {@link org.htmlparser.http.Cookie} class is a container for 54 cookie data received and sent in HTTP requests and responses. It may be 55 necessary to prime the ConnectionManager with cookies received via a 56 login procedure in order to access protected HTML content. 57 <p> 58 A typical use of this package, might look something like this: 59 <pre> 60 ConnectionManager manager = Parser.getConnectionManager (); 61 // set up proxying 62 manager.setProxyHost ("proxyhost.mycompany.com"); 63 manager.setProxyPort (8888); 64 manager.setProxyUser ("FredBarnes"); 65 manager.setProxyPassword ("secret"); 66 // set up cookies 67 Cookie cookie = new Cookie ("USER", "FreddyBaby"); 68 manager.setCookie (cookie, "www.freshmeat.net"); 69 cookie = new Cookie ("PHPSESSID", "e5dbeb6152e70d99427f2458d8969f8b"); 70 cookie.setDomain (".freshmeat.net"); 71 manager.setCookie (cookie, null); 72 // set up security to access a password protected URL 73 manager.setUser ("FredB"); 74 manager.setPassword ("holy$cow"); 75 // set up (an inner class) for callbacks 76 ConnectionMonitor monitor = new ConnectionMonitor () 77 { 78 public void preConnect (HttpURLConnection connection) 79 { 80 System.out.println (HttpHeader.getRequestHeader (connection)); 81 } 82 public void postConnect (HttpURLConnection connection) 83 { 84 System.out.println (HttpHeader.getResponseHeader (connection)); 85 } 86 }; 87 manager.setMonitor (monitor); 88 // perform the connection 89 Parser parser = new Parser ("http://frehmeat.net"); 90 </pre> 91 The ConnectionManager used by the Parser class is actually held by the 92 {@link org.htmlparser.lexer.Page#mConnectionManager Page} class. 93 It is accessible from the Parser (or the Page class) via 94 {@link org.htmlparser.Parser#getConnectionManager getConnectionManager()}. 95 It is a static (singleton) instance so that subsequent connections made by the 96 parser will use the contents of the cookie jar from previous connections. 97 By default, cookie processing is not enabled. It can be enabled by either 98 setting a cookie or using 99 {@link org.htmlparser.http.ConnectionManager#setCookieProcessingEnabled setCookieProcessingEnabled()}. 100 </body> 101 </html>