You're much better using the dump option of lynx(
http://lynx.browser.org/ ), really.
But here is an attempt at a pure java solution, using HTMLEditorKit.ParserCallbacks.
Code:
import java.io.*;
import java.net.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
public class deTag extends HTMLEditorKit.ParserCallback {
StringBuffer txt;
Reader reader;
// empty default constructor
public deTag() {}
// more convienient constructor
public deTag(Reader r) {
setReader(r);
}
public void setReader(Reader r) { reader = r; }
public void parse() throws IOException {
txt = new StringBuffer();
ParserDelegator parserDelegator = new ParserDelegator();
parserDelegator.parse(reader, this, true);
}
public void handleText(char[] text, int pos) {
txt.append(text);
}
public String toString() {
return txt.toString();
}
public static void main (String[] argv) {
try {
// the HTML to convert
URL toRead;
if(argv.length==1)
toRead = new URL(argv[0]);
else
toRead = new URL("http://p2p.wrox.com");
BufferedReader in = new BufferedReader(
new InputStreamReader(toRead.openStream()));
deTag d = new deTag(in);
d.parse();
in.close();
System.out.println(d.toString());
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Example usage:
Code:
charlie@charlie:~/maui/src/java$ java deTag
p2p.wrox.com Forums View Cart | My AccountSupport | Contact Us
Search P2P for Advanced Search Members:Participate in
discussions or edit your profile. Login:Password: Remember MeForgot
Your Password?New Users: Register NowForum ToolsView All ForumsView
Active TopicsArchivesFAQTerms of UseNew Titles for ASP.NETASP.NET
AJAX Programmer's Reference: with ASP.NET 2.0 or ASP.NET
3.5Professional ASP.NET 2.0 Design: CSS, Themes, and Master Pages
> P2P Forum> p2p Community ForumsNeed to download code? View our
list of code downloads. ForumTopicsPostsLast PostModerator(s) > Wrox
Announcements and Feedback > Books > ASP and ASP.NET >
C#/C++ > Database > .NET > General > Java > Mac >
Microsoft Office > Microsoft Servers > Open Source >
PHP/MySQL > SQL Server > Visual Basic > Web >
XML Statistics 32139 of 68324 Members have made 199919 posts in
344 forums, with the last post on 10/01/2007 11:57:36 AM by:
shipero.There are currently 62429 topics.Please welcome our newest
member: shipero.> Contains new posts since last visit. > No new
posts since the last visit.>p2p.wrox.com ForumsTerms of Service©
2007 Wiley Publishing, Inc.>This page was generated in 0.22
seconds.Server time: 10/01/2007 12:12:34 PM (EST)>TopicIndexDynamic
Topic ListCopyright © 2000-2007 by John Wiley & Sons, Inc. or related
companies. All rights reserved. Please read our Privacy Policy.
charlie@charlie:~/maui/src/java$
Or specify a URL on the commandline:
Code:
charlie@charlie:~/maui/src/java$ java deTag http://perlmonks.com
PerlMonks - The Monastery Gates> >> > > > >laziness, impatience, and
hubris > >PerlMonks The Monastery Gates | Log in | Create a new
user | The Monastery Gates | Super Search | > | Seekers of Perl
Wisdom | Meditations | PerlMonks Discussion | Snippets | > |
Obfuscation | Reviews | Cool Uses For Perl | Perl News | Q&A |
Tutorials | > | Code | Poetry | Recent Threads | Newest Nodes |
Donate | What's New | >( #131=superdoc: print w/ replies, xml )Need
Help??Donations gladly acceptedIf you're new here please read
PerlMonks FAQ> and Create a new user.>Want Mega XP? Prepare to have
your hopes dashed, join in on the: poll ideas quest 2007 (10702 days
remain)New QuestionsHighest scalar = ???> on Oct 01, 2007 at 11:561
replyby Anonymous MonkHowdy Monks! If I'm not using bignum, bigint,
or biganything; what kind of limits should I expect my scalars to
have? Do they have limits? EXAMPLE!! (insert your own
impatience)#!/usr/bin/perl -w
use strict;
---------------------8<---------------
Cheers,
Charlie
--
Charlie Harvey's website - linux, perl, java, anarchism and punk rock:
http://charlieharvey.org.uk