thanks for your help
The best way to do this is probably to use the Lynx browser, which
displays pages as text anyway and redirect its output to file.
The command line
lynx -dump URL > myfile.txt
will do what you want. If you don't want the URLs listed at the end
you can put "-nolist" after -dump.
<URL:http://lynx.browser.org/>
--
"These are, as I began, cumbersome ways / to kill a man. Simpler, direct,
and much more neat / is to see that he is living somewhere in the middle /
of the twentieth century, and leave him there." -- Edwin Brock
http://www.ifi.uio.no/~larsga/ http://birk105.studby.uio.no/
>In message <3466765b....@news.ican.net>, Jean Bigras <t...@ican.net>
>wrote:
>|If you are on a windows platforms or Mac, the lynx solution might not
>|be the best...
>
>Not sure about the Mac but there *is* a Windows 95 version of
>Lynx. It's labeled as a development version but it appeared pretty
>stable when I tried it out.
MacLynx is available from http://www.lirmm.fr/~gutkneco/maclynx/
Andrew McCormick
opinions expressed in this post are not necessarily my own
>Right now I am working on a web page which is basically all in a html
>format with a couple of .gifs. What I would like to know if there is
>away to have a text version of the page which will automatically change
>the html to text without going in and copying the file and making the
>modifications.
>
> thanks for your help
If you are on a windows platforms or Mac, the lynx solution might not
be the best...
One thing you can do is:
1) view the page with Netscape or MSIE.
2) press CTRL A to select all text.
3) paste in your favorite text editor...
The other is get a copy of HOmesite 2.5
http://www.allaire.com/
and this software has a "strip all html tag" feature.
JLB
Not sure about the Mac but there *is* a Windows 95 version of
Lynx. It's labeled as a development version but it appeared pretty
stable when I tried it out.
|One thing you can do is:
|1) view the page with Netscape or MSIE.
|2) press CTRL A to select all text.
|3) paste in your favorite text editor...
And how do you do that in an automated fashion? You can't. The text
version quickly becomes out of sync with the page itself.
--
Shawn K. Quinn - skq...@brokersys.com - visit my home page at
http://www.brokersys.com/~skquinn/ and visit a bunch of bogus e-mail addresses
at http://www.brokersys.com/~skquinn/spamsucks.html (latter to foil robots)
If you're running Apache as your Web server, you can get automatic
text versions of HTML pages quite easily with a hack I thought up
a while ago.
Just put this:
ErrorDocument 404 /cgi-bin/404error
in your httpd's conf files, then include something like this in the
"404error" CGI script:
#!/usr/local/bin/perl
#
# 404error: a cool 404 error handler
#
# Gerald Oskoboiny, 30 Jan 1997
$htdocs = "/www/htdocs";
$logfile = "/usr/log/404_error_log";
$html2txt = "/usr/local/bin/lynx -cfg=/usr/local/lib/lynx.cfg -validate -dump";
$extension = $ENV{REDIRECT_URL}; $extension =~ s/.*\.//g;
$basename = $ENV{REDIRECT_URL}; $basename =~ s/\.[^\.]*$//g;
$basename =~ s|^/||g;
#####
# Check if they were looking for a ".txt" file; if so, generate one for them.
if ( ( $extension eq "txt" ) && ( -f "$htdocs/${basename}.html" ) ) {
print "Content-Type: text/plain\n\n";
open( HTML2TXT, "$html2txt http://www.hwg.org/${basename}.html |" ) ||
die "couldn't run $html2txt with http://www.hwg.org/${basename}.html! $!";
while (<HTML2TXT>) {
print;
}
close( HTML2TXT ) || die "couldn't close $html2txt! $!";
exit;
}
#####
# do other stuff here...
et voila! Instant .txt versions of all your HTML pages.
For example:
http://www.hwg.org/resources/html/validation.html (HTML)
http://www.hwg.org/resources/html/validation.txt (plain text)
http://www.hwg.org/index.html
http://www.hwg.org/index.txt
This isn't especially efficient, but it gets decent results with extremely
little effort.
Better would be to make it an Apache module triggered by a .txt Handler
that caches the automatically-generated plain text versions somewhere
after they're generated.
Gerald
--
Gerald Oskoboiny
<ge...@impressive.net>
I keep my HTML pages in sync automatically by converting them from text
using a program, and using that program to automatically split and
crossreference the files where appropriate. I use it to maintain about
300 FAQ pages. It takes about 5 mins to regenerate the lot from 2 text
files.
more info at http://www.scot.demon.co.uk/q-html.html
--
Craig Cockburn ("coburn"), Du\n E/ideann, Alba. (Edinburgh, Scotland)
http://www.scot.demon.co.uk/ E-mail: cr...@scot.demon.co.uk
Sgri\obh thugam 'sa Gha\idhlig ma 'se do thoil e.