search-headlines india
will scan the web pages www.cnn.com, www.cbsnews.com,
etc. (you can add your own) and display all lines
containing "india" (not case sensitive). Neat, huh?
Krishna Myneni
========================================================
[..]
> \ Use a definition of SHELL appropriate to your system
>
> : shell \ c-addr u -- n | execute a shell command in kForth
> strpck system ;
[..]
\ lib4th, f8
: shell -compile sh ; immediate
hi-level defns of "sh" and, "sh|" for piped output, in "lib4th" archive.
best,
hp
--
>>> pse, reply to : clf -at- lxhp -dot- in-berlin -dot- de <<<
Linux,Assembly,Forth: http://www.lxhp.in-berlin.de/index-lx.shtml en/de
Krishna Myneni
> If you have lynx (a text based web browser) and
> the famous grep (pattern matching utility) on
> your system, attached is a useful script written
> in Forth to automatically search and print lines
> containing the search term from a list of specified
> web pages. The example searches news webpages
> for headlines on a specified topic, e.g.
Here is a useful Forth utility that uses the w3m netbrowser.
( I'd like to thank Marc Olschok <sa7...@l1-hrz.uni-duisburg.de>
for drawing my attention to this useful program ).
Communication with this browser is through a pipe.
The idea is to check if any of the URLs in the "saved-urls.dat"
file have changed since last time we checked. This is done
by computing the 32-bit CRC of these pages.
You can check URLs in parallel with browsing the 'net in the
"normal" way. This saves lots of time while still keeping
you informed :-)
There is an infinite timeout when a page is off-line or not
accessible. I don't see an easy fix for that. (It may be
possible to change ACCUMULATE to have a time-out.)
-marcel
-- -----------------------------------------------------------------------
(*
* LANGUAGE : ANS Forth
* PROJECT : Forth Environments
* DESCRIPTION : check list of URLs for new data
* CATEGORY : Tool
* AUTHOR : Marcel Hendrix
* LAST CHANGE : August 18, 2002, Marcel Hendrix
*)
NEEDS -miscutil
NEEDS -pipes
NEEDS -assemble
REVISION -scooter "ÄÄÄ Scooter - URL fetch Version 1.00 ÄÄÄ"
PRIVATES
DOC
(*
WHAT
----
This program uses the w3m netbrowser. Communication is through a pipe.
The idea is to check if any of the URLs in the "saved-urls.dat" file have
changed since last time we checked. This is done by computing the 32-bit
CRC of these pages.
You can check URLs in parallel with browsing the 'net in the "normal" way.
BUGS
----
There is an infinite timeout when a page is off-line or not accessible.
I don't see an easy fix for that. (It may be possible to change
ACCUMULATE to have a time-out.)
EXAMPLE OUTPUT
--------------
FORTH> update-crcs
0 stable :: http://www.tim-mann.org/chess.html/index.html
1 stable :: http://home.iae.nl/users/mhx/index.html
2 stable :: http://www.bagley.org/~doug/shootout/index.html
3 stable :: http://www.cs.utk.edu/~ghenry/distrib/archive.htm
4 stable :: http://www98.phys.virginia.edu/classes/551.jvn.fall01/index.html
5 stable :: http://www.azillionmonkeys.com/qed/asm.html
6 stable :: http://www.cs.bell-labs.com/cm/cs/pearls/index.html
7 stable :: http://algoart.com/index.html
8 stable :: http://www-cs-faculty.stanford.edu/~knuth/index.html
9 stable :: http://home.hccnet.nl/a.w.m.van.der.horst/index.html
10 stable :: http://pweb.de.uu.net/schwalm.hb/index.html
11 stable :: http://www.jwdt.com/~paysan/bigforth.html
12 stable :: http://www.sunweb.ch/custom/epprecht/index.html
13 stable :: http://www.colorforth.com/index.html
14 stable :: http://www.fig-uk.org/codeindex/index.html
15 stable :: http://www.fig-uk.org/index.html
16 stable :: http://www.geocities.com/forthlinks/index.html
17 stable :: http://home.earthlink.net/~neilbawd/index.html
18 stable :: http://www.quartus.net/discus/index.html
19 stable :: http://www.forth.org.ru/index.html
20 stable :: http://dec.bournemouth.ac.uk/forth/index.html ok
*)
ENDDOC
-- Tools -----------------------------------------------------------------
-- I forgot how to do this in high-level
CODE crc32 ( n1 char -- n2 )
rpush,
ebx pop,
edx -> [esp] xchg, \ pop crc to edx
8 b# -> ecx mov, \ loop count
@@1:
edx shr, \ shift crc
bh rcr,
bl ror, \ shift character
ebx -> eax mov, \ save character
bh -> bl xor,
@@2 offset SHORT jns, \ skip if equal
$EDB88320 d# -> edx xor, \ crc-32 polymial 1 04C1 1DB7
@@2:
eax -> ebx mov, \ restore character
@@1 loop, \ next bit
edx -> [esp] xchg, \ crc to tos
rpop, ebx jmp,
END-CODE
\ calculate crc-32 of string
: crc-32 ( c-addr u -- n2 )
-1 -ROT BOUNDS ?DO I C@ crc32 LOOP INVERT ;
: ACCUMULATE ( c-addr umax handle -- c-addr u )
0 3 PICK LOCALS| start sz handle umax addr |
BEGIN umax DUP
WHILE addr SWAP handle READ-FILE ?FILE DUP
WHILE DUP +TO sz
addr umax ROT /STRING TO umax TO addr
REPEATED
DROP start sz ;P
-- http access -----------------------------------------------------------
: GET-URL ( c-addr1 u1 -- c-addr2 u2 )
S" c:/w3m/w3m.exe -dump " 2SWAP $+ R/O OPEN-PIPE ?FILE >S
PAD #100000 S ACCUMULATE ( -- c-addr u )
S> CLOSE-PIPE ?FILE ;
: .URL ( c-addr u -- )
GET-URL CR TYPE CR ;
: URL->CRC ( c-addr1 u1 -- u2 )
GET-URL crc-32 ;
-- A small URL database --------------------------------------------------
0 VALUE rec#
CREATE URL-DBASE HERE #100 CELLS DUP ALLOT ERASE
: cleanup ( u -- )
DROP
URL-DBASE rec# 0 ?DO @+ FREE ?ALLOCATE LOOP DROP
CLEAR rec#
URL-DBASE #100 CELLS ERASE ;P
: ADD-URL ( c-addr u -- )
rec# 100 U>= ABORT" URL-DBASE full"
DUP 2 CELLS + ALLOCATE ?ALLOCATE ( -- c-addr u addr )
CELL+ CELLPACK CELL- URL-DBASE rec# CELL[] !
1 +TO rec# ;
' cleanup IS-FORGET ADD-URL
: ADD-CRC URL-DBASE []CELL @ ! ; ( u rec# -- )
: ADD-DATA rec# >S ADD-URL S> ADD-CRC ; ( crc c-addr u -- )
: URL@ URL-DBASE []CELL @ CELL+ @+ ; ( ix -- c-addr u )
: CRC@ URL-DBASE []CELL @ @ ; ( ix -- u )
: job URL@ .URL ; ( ix -- )
: RECORD->FILE ( crc c-addr u1 handle -- )
>S ROT (.) S~ S" ~ $+
2SWAP $+
S~ " ADD-DATA~ $+
S> WRITE-LINE ?FILE ;P
: SAVE-URLS ( -- )
S" saved-urls.dat" W/O BIN CREATE-FILE ?FILE LOCAL handle
URL-DBASE rec# 0 ?DO @+ @+ SWAP @+ handle RECORD->FILE
LOOP DROP
handle CLOSE-FILE ?FILE ;
: RESTORE-URLS S" saved-urls.dat" INCLUDED ; ( -- )
RESTORE-URLS
: .URLS ( -- )
URL-DBASE
rec# 0 ?DO @+ @+ SWAP @+ CR I 3 .R ." :: " TYPE
C/L #20 - HTAB ." -- CRC = " H.
LOOP
DROP ;
-- Vector this to what you want done when a page changes -----------------
DEFER DO-SOMETHING ( index -- ) ' job IS DO-SOMETHING
: UPDATE-CRCs ( -- )
URL-DBASE
rec# 0 ?DO
@+ @+ >S @+ URL->CRC DUP >S I ADD-CRC
S> S> <> IF CR I 3 .R ." :: URL contents have changed"
I DO-SOMETHING
ELSE CR I 3 .R ." stable :: " I URL@ TYPE
ENDIF
LOOP
DROP
SAVE-URLS ;
:ABOUT CR .~ Try: S" http://some_url.html" GET-URL ( c-addr u ) -- get URL content in a string~
CR .~ S" http://some_url.html" .URL -- print page content~
CR .~ S" http://some_url.html" URL->CRC ( -- u) -- compute 32-bit CRC of page content~ ;
.ABOUT -scooter CR
DEPRIVE
(* End of Source *)
>Here is a useful Forth utility that uses the w3m netbrowser.
>The idea is to check if any of the URLs in the "saved-urls.dat"
>file have changed since last time we checked. This is done
>by computing the 32-bit CRC of these pages.
I've been using the free service at
http://www.changedetection.com/monitor.html
for this but it's not as flexible as doing it yourself.
Instead, I've been meaning to program this for a while (more strictly,
to learn enough to try to do this). You and Krishna have saved me a
job. Many thanks.
Bye for now ____/ / __ / / / / /
/ / / _/ / / / /
Chris Jakeman __/ / / __ / / /_/
/ / / / / / / \
[To reply, please __/ __/ ____/ ___/ __/ _\
unspam my address]
Forth Interest Group United Kingdom
Voice +44 (0)1733 753489 chapter at http://www.fig-uk.org
> I've been using the free service at
> http://www.changedetection.com/monitor.html for this but it's not as
> flexible as doing it yourself.
I used one too, but they stopped the service and wanted money after a while.
Very inflexible, and I suspect them of just wanting to build one's "interest
profile" so it can be sold to the spam society with a bonus.
> Instead, I've been meaning to program this for a while (more strictly, to
> learn enough to try to do this). You and Krishna have saved me a job.
I had to patch a single byte in w3m (it wants its options directory to start
with a "." which is too much asked from win32), Also, I use pipes, which are
not so easy to do under windows. Hopefully at least gForth will have got
that right.
Please let me know if you need help translating.
-marcel