[flaxcode] r1328 committed - Updated README

1 view

Skip to first unread message

codesite...@google.com

unread,

Jul 30, 2010, 9:17:10 AM7/30/10

to flax-c...@googlegroups.com

Revision: 1328
Author: to...@flax.co.uk
Date: Fri Jul 30 06:15:58 2010
Log: Updated README

http://code.google.com/p/flaxcode/source/detail?r=1328

Modified:
/trunk/flax/crawler/README

=======================================
--- /trunk/flax/crawler/README Wed Jul 28 12:29:13 2010
+++ /trunk/flax/crawler/README Fri Jul 30 06:15:58 2010
@@ -21,10 +21,22 @@
===============

To get a working crawler, you need to specify to the crawler a set of
objects
-satisfying the crawler API. For example::
+satisfying the crawler API. Default, in-memory implementations are
provided by
+the crawler.py module which may be suitable for some applications without
+modification. The minimum required to get a useful crawler is to provide a
+suitable content dumper, add some URLs to the URL pool and start the
crawler::

import crawler

+ crawler.dump = MyContentDumperImplementation()
+ crawler.pool.add_url(StdURL("http://test/"))
+ crawler.pool.add_url(StdURL("http://anothertest/"))
+ crawler.start()
+
+The call to start() blocks until no more URLs are left in the URL pool. In
+general, it is more likely that application specific implementations will
be
+required for all of the objects satifying the crawler API::
+
crawler.dump = MyContentDumperImplementation()
crawler.pool = MyURLPoolImplementation()
crawler.follow = MyFollowDeciderImplementation()
@@ -34,16 +46,8 @@
crawler.robots = MyRobotManagerImplementation()
crawler.error = MyErrorHandler()

-Default, in-memory implementations are provided by the crawler.py module
which
-may be suitable for some applications without modification. Also see the
module
-sql_crawler.py which contains SQL database implementations, as well as a
-command line interface.
-
-Next, add some URLs to the URL pool and begin crawling::
-
- crawler.pool.add_url(StdURL("http://test/"))
- crawler.pool.add_url(StdURL("http://anothertest/"))
- crawler.start()
+The module sql_crawler.py contains an SQL database implementation, as well
as a
+command line interface, and is a useful starting example for an
application.