Hounder-2.0.1 - Issues - Crawler not stopping, Rmi Registry Error, Web Search shows search box twice on search

vlab@work

unread,

Aug 25, 2009, 4:52:32 AM8/25/09

to hounder

Hi Jorge

Small history FYI.
I have been trying the bin installer for quite sometime to crack it.
But it behaves differently in different Linux OS flavors. Finally
tried to crack it on SUSE 10 32 bit.
Tried to take the same extracted version and deployed it on SUSE 10.2
64bit with manually changing the base dir paths in config files.
Managed to crack with this type of deployment and it started working.

But I'm just through with only installation and starting all services.
I get to see all GOOD node status. But could not get onto next step
and see the crawler, indexer and searcher working. May be all could be
'cos of following issue. Can you provide pointers for resolving the
issue and test the installation.

Crawler - hounder.log
FIRST TIME INITIALIZED THE SERVICE: The error I got is as follows
------------------------------------------------------------------------------------------------------------
ERROR [main] com.flaptor.util.remote.RmiServer - [Couldn't register
service default service on registry running on port 47020] 2009-08-25
11:53:06,988
ERROR [com.flaptor.util.remote.RmiServer-StopperThread]
com.flaptor.util.remote.RmiUtil - [Couldn't unregister service default
service at port 47020: java.lang.IllegalArgumentException: There is no
registry running on port 47020] 2009-08-25 11:53:06,989
java.lang.IllegalArgumentException: There is no registry running on
port 47020
at com.flaptor.util.remote.RmiUtil.unregisterLocalService
(RmiUtil.java:113)
at com.flaptor.util.remote.RmiServer.requestStopServer
(RmiServer.java:150)
at com.flaptor.util.remote.AServer$1.run(AServer.java:94)
ERROR [main] com.flaptor.hounder.crawler.Crawler - [The new PageDB is
empty, will stop the crawler before replacing the old PageDB. Please
check the hotspots, modules and other settings before restarting.]
2009-08-25 12:06:28,223
ERROR [main] com.flaptor.util.remote.RmiUtil - [Couldn't export
service default service at port 47020 as
com.flaptor.util.cache.FileCache@940f82] 2009-08-25 12:06:31,479
------------------------------------------------------------------------------------------------------------

I tried stopping the services using stop-all, but it just failed to
stop the process and hung up. Hence hard killed the services. After
killing I started all the services again
SECOND TIME INITIALIZED THE SERVICE: The error I got is as follows
------------------------------------------------------------------------------------------------------------
ERROR [main] com.flaptor.hounder.crawler.Crawler - [The new PageDB is
empty, will stop the crawler before replacing the old PageDB. Please
check the hotspots, modules and other settings before restarting.]
2009-08-25 12:45:16,668
ERROR [main] com.flaptor.util.remote.RmiUtil - [Couldn't export
service default service at port 47020 as
com.flaptor.util.cache.FileCache@1e91a4d] 2009-08-25 12:45:19,813
java.rmi.server.ExportException: Port already in use: 47020; nested
exception is:
java.net.BindException: Address already in use
at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:
310)
at sun.rmi.transport.tcp.TCPTransport.exportObject
(TCPTransport.java:218)
at sun.rmi.transport.tcp.TCPEndpoint.exportObject
(TCPEndpoint.java:393)
at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:129)
at sun.rmi.server.UnicastServerRef.exportObject
(UnicastServerRef.java:190)
at sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:92)
at sun.rmi.registry.RegistryImpl.<init>(RegistryImpl.java:78)
at java.rmi.registry.LocateRegistry.createRegistry
(LocateRegistry.java:186)
at com.flaptor.util.remote.RmiUtil.registerLocalService
(RmiUtil.java:80)
at com.flaptor.util.remote.RmiServer.registerHandler
(RmiServer.java:113)
at com.flaptor.util.remote.RmiServer.startServer
(RmiServer.java:107)
at com.flaptor.util.remote.AServer.start(AServer.java:74)
at com.flaptor.hounder.crawler.modules.CacheModule.<init>
(CacheModule.java:56)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0
(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance
(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:
513)
at com.flaptor.hounder.crawler.modules.ModulesManager.<init>
(ModulesManager.java:90)
at
com.flaptor.hounder.crawler.modules.ModulesManager.getInstance
(ModulesManager.java:43)
at com.flaptor.hounder.crawler.Crawler.stopCrawler
(Crawler.java:141)
at com.flaptor.hounder.crawler.Crawler.crawl(Crawler.java:548)
at com.flaptor.hounder.crawler.Crawler.main(Crawler.java:753)
Caused by: java.net.BindException: Address already in use
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:359)
at java.net.ServerSocket.bind(ServerSocket.java:319)
at java.net.ServerSocket.<init>(ServerSocket.java:185)
at java.net.ServerSocket.<init>(ServerSocket.java:97)
at
sun.rmi.transport.proxy.RMIDirectSocketFactory.createServerSocket
(RMIDirectSocketFactory.java:27)
at
sun.rmi.transport.proxy.RMIMasterSocketFactory.createServerSocket
(RMIMasterSocketFactory.java:333)
at sun.rmi.transport.tcp.TCPEndpoint.newServerSocket
(TCPEndpoint.java:649)
at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:
299)
... 21 more
ERROR [main] com.flaptor.util.remote.RmiServer - [Couldn't register
service default service on registry running on port 47020] 2009-08-25
12:45:19,815
ERROR [com.flaptor.util.remote.RmiServer-StopperThread]
com.flaptor.util.remote.RmiUtil - [Couldn't unregister service default
service at port 47020: java.lang.IllegalArgumentException: There is no
registry running on port 47020] 2009-08-25 12:45:19,816
java.lang.IllegalArgumentException: There is no registry running on
port 47020
at com.flaptor.util.remote.RmiUtil.unregisterLocalService
(RmiUtil.java:113)
at com.flaptor.util.remote.RmiServer.requestStopServer
(RmiServer.java:150)
at com.flaptor.util.remote.AServer$1.run(AServer.java:94)
ERROR [main] com.flaptor.hounder.crawler.Crawler - [The new PageDB is
empty, will stop the crawler before replacing the old PageDB. Please
check the hotspots, modules and other settings before restarting.]
2009-08-25 12:54:57,926
ERROR [main] com.flaptor.hounder.crawler.Crawler - [The new PageDB is
empty, will stop the crawler before replacing the old PageDB. Please
check the hotspots, modules and other settings before restarting.]
2009-08-25 13:14:17,545
------------------------------------------------------------------------------------------------------------

Advice please.

Regards
Vlab@work

vlab@work

unread,

Aug 25, 2009, 5:10:36 AM8/25/09

to hounder

Added to the above issue,

I could see in the web frontend, the crawler is not started and when I
try to click on it to start, I get the following message.

"Problem starting node: could not start remote host. error code: 1 -
The crawler is already running - Pseudo-terminal will not be allocated
because stdin is not a terminal."

Regards,
Vlab@work

Jorge Handl

unread,

Aug 25, 2009, 7:21:05 AM8/25/09

to hou...@googlegroups.com

Vlab@work, we have installed and run Hounder on several linux flavors, both 32 and 64 bits, and have never had a problem. Never tried CentOS, though. I will try Hounder on a CentOS 32bit setup and let you know if I find something.

- Jorge

vlab@work

unread,

Aug 25, 2009, 8:40:09 AM8/25/09

to hounder

Hi Jorge,

Thanks for the update. But, I'm still stuck. can you tell me when do
you face this error ?

ERROR [main] com.flaptor.hounder.crawler.Crawler - [The new PageDB is
empty, will stop the crawler before replacing the old PageDB. Please
check the hotspots, modules and other settings before restarting.]

2009-08-25 18:03:57,842

ERROR [main] com.flaptor.util.remote.RmiUtil - [Couldn't export
service default service at port 47020 as

com.flaptor.util.cache.FileCache@10a5c21] 2009-08-25 18:04:00,042

Regards,
Vlab@work

Jorge Handl

unread,

Aug 25, 2009, 8:49:28 AM8/25/09

to hou...@googlegroups.com

That is a safeguard put there to avoid loosing the whole url database in case something went wrong and the crawler produced a new version of the database without any data in it. You can safely remove the pagedb.new directory and the message will go away.

- Jorge

vlab@work

unread,

Aug 25, 2009, 9:19:45 AM8/25/09

to hounder

Jorge, sorry to stick to the same issue for long, but trying to get
the analysis out.

I just did a clean start with all logs cleared and system rebooted.
Following is the findings. There is no 47020 port getting opened. Am I
missing anything ?
Step 1: Started the tail command to check log trace ( Console 2 )
Step 2: Started all serveices using Start-all.sh ( Console 1 )
Step 3: Checked netstat for port usage. ( Console 3 )
Step 4: Checked the port 47020 status using telnet ( Console 1 )

Trace FYR.

Console 1:
------------------------------------------------
vlabwork@vibpd153:~/setup/hounder> ./start-all.sh
Starting the crawler...
The indexer is not running
Starting the indexer...
The searcher is not running
Starting the searcher...
Starting the cache server...
Starting the clustering web...

Access the admin webapp at http://localhost:47050/

vlabwork@vibpd153:~/setup/hounder> telnet localhost 47020
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
Trying ::1...
telnet: connect to address ::1: Connection refused
vlabwork@vibpd153:~/setup/hounder> telnet localhost 47040
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^[^]
telnet> quit
Connection closed.
vlabwork@vibpd153:~/setup/hounder> telnet localhost 47040
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^[^]
telnet> quit
Connection closed.
vlabwork@vibpd153:~/setup/hounder> telnet localhost 47020
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
Trying ::1...
telnet: connect to address ::1: Connection refused
vlabwork@vibpd153:~/setup/hounder>
-----------------------------------------------------------
Console 2 :
----------------------------------------------------------
vlabwork@vibpd153:~> tail -f crawler/logs/hounder.log
tail: cannot open `crawler/logs/hounder.log' for reading: No such file
or directory
tail: no files remaining
vlabwork@vibpd153:~> tail -f setup/hounder/crawler/logs/hounder.log

ERROR [main] com.flaptor.hounder.crawler.Crawler - [The new PageDB is
empty, will stop the crawler before replacing the old PageDB. Please
check the hotspots, modules and other settings before restarting.]

2009-08-25 18:30:21,303

ERROR [com.flaptor.util.remote.RmiServer-StopperThread]
com.flaptor.util.remote.RmiUtil - [Couldn't unregister service default
service at port 47020: java.lang.IllegalArgumentException: There is no

registry running on port 47020] 2009-08-25 18:30:22,950

java.lang.IllegalArgumentException: There is no registry running on
port 47020
at com.flaptor.util.remote.RmiUtil.unregisterLocalService
(RmiUtil.java:113)
at com.flaptor.util.remote.RmiServer.requestStopServer
(RmiServer.java:150)
at com.flaptor.util.remote.AServer$1.run(AServer.java:94)

-----------------------------------------------------------
Console 3 :
----------------------------------------------------------
vibpd153:~ # netstat -nap |less
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp 0 0 0.0.0.0:111 0.0.0.0:*
LISTEN 2961/portmap
tcp 0 0 127.0.0.1:2544 0.0.0.0:*
LISTEN 3201/zmd
tcp 0 0 127.0.0.1:631 0.0.0.0:*
LISTEN 3358/cupsd
tcp 0 0 127.0.0.1:25 0.0.0.0:*
LISTEN 3440/master
tcp 0 0 0.0.0.0:58875 0.0.0.0:*
LISTEN 3783/skype
tcp 0 0 192.168.0.153:45597 220.150.208.220:36341
ESTABLISHED 3783/skype
tcp 0 0 :::47040 :::*
LISTEN 4301/java
tcp 0 0 :::47041 :::*
LISTEN 4301/java
tcp 0 0 :::47010 :::*
LISTEN 4301/java
tcp 0 0 :::47042 :::*
LISTEN 4258/java
tcp 0 0 :::47011 :::*
LISTEN 4301/java
tcp 0 0 :::47012 :::*
LISTEN 4301/java
tcp 0 0 :::47044 :::*
LISTEN 4322/java
tcp 0 0 :::50053 :::*
LISTEN 4225/java
tcp 0 0 :::47050 :::*
LISTEN 4358/java
tcp 0 0 :::53578 :::*
LISTEN 4301/java
tcp 0 0 :::54610 :::*
LISTEN 4258/java
tcp 0 0 :::47030 :::*
LISTEN 4322/java
tcp 0 0 :::22 :::*
LISTEN 3204/sshd
tcp 0 0 ::1:631 :::*
LISTEN 3358/cupsd
tcp 0 0 :::47000 :::*
LISTEN 4258/java
tcp 0 0 :::47001 :::*
LISTEN 4258/java
tcp 0 0 ::1:25 :::*
LISTEN 3440/master
---------------------------------------------------

Would you mind throwing some light on it.

Regards,
Vlab@work

Jorge Handl

unread,

Aug 25, 2009, 6:04:36 PM8/25/09

to hou...@googlegroups.com

Hey Vlab@work,

I tested Hounder on a clean CentOS 32 bit distro (2.6.18-128.el5), and it is working flawlessly. I had no problem installing and running the system. I really don't know what the problem on your side might be.

Some tips, though:
1. Use the start / stop / status commands, don't rely on the web admin interface.
2. Start by making a clean Hounder install. If the installer doesn't work, there is something wrong that should be fixed before going any further, there is no point trying to run Hounder at this point.
3. Try installing only a crawler, an indexer and a searcher. If that goes ok, first start the searcher and check that the log shows no errors. Then start the indexer, and then the crawler. If all goes well, you should see no errors in the logs, and after a while you should see a growing indexer/indexes/index directory, and after that you should be able to do searches through the websearch interface.
4. When checking the logs, ignore the hounder.* files, as they are cumulative and may show old errors. Instead, rely on the specific log files: crawler.*, indexer.* and searcher.*.

I hope this helps.

- Jorge

vlab@work

unread,

Aug 26, 2009, 7:29:07 AM8/26/09

to hounder

Thanks Jorge. I'll have a try with clean CentOS install and let you
know the outcome.

Regards,
Vlab@work

On Aug 26, 3:04 am, Jorge Handl <jha...@gmail.com> wrote:
> Hey Vlab@work,
>
> I tested Hounder on a clean CentOS 32 bit distro (2.6.18-128.el5), and it is
> working flawlessly. I had no problem installing and running the system. I
> really don't know what the problem on your side might be.
>
> Some tips, though:
> 1. Use the start / stop / status commands, don't rely on the web admin
> interface.
> 2. Start by making a clean Hounder install. If the installer doesn't work,
> there is something wrong that should be fixed before going any further,
> there is no point trying to run Hounder at this point.
> 3. Try installing only a crawler, an indexer and a searcher. If that goes
> ok, first start the searcher and check that the log shows no errors. Then
> start the indexer, and then the crawler. If all goes well, you should see no
> errors in the logs, and after a while you should see a growing
> indexer/indexes/index directory, and after that you should be able to do
> searches through the websearch interface.
> 4. When checking the logs, ignore the hounder.* files, as they are
> cumulative and may show old errors. Instead, rely on the specific log files:
> crawler.*, indexer.* and searcher.*.
>
> I hope this helps.
>
> - Jorge
>

> On Tue, Aug 25, 2009 at 10:19 AM, vlab@work <vlab.w...@googlemail.com>wrote:
>
>
>
>
>
> > Jorge, sorry to stick to the same issue for long, but trying to get
> > the analysis out.
>
> > I just did a clean start with all logs cleared and system rebooted.
> > Following is the findings. There is no 47020 port getting opened. Am I
> > missing anything ?
> > Step 1: Started the tail command to check log trace ( Console 2 )
> > Step 2: Started all serveices using Start-all.sh ( Console 1 )
> > Step 3: Checked netstat for port usage. ( Console 3 )
> > Step 4: Checked the port 47020 status using telnet ( Console 1 )
>
> > Trace FYR.
>
> > Console 1:
> > ------------------------------------------------
> > vlabwork@vibpd153:~/setup/hounder> ./start-all.sh
> > Starting the crawler...
> > The indexer is not running
> > Starting the indexer...
> > The searcher is not running
> > Starting the searcher...
> > Starting the cache server...
> > Starting the clustering web...
>

> > Access the admin webapp athttp://localhost:47050/

> ...
>
> read more »

vlab@work

unread,

Aug 26, 2009, 11:11:54 AM8/26/09

to hounder

Bingo!! It works pal.

On clean CentOS, hounder installs fine. No errors detected in any of
the log files.

Just 2 thing to complete my post on this issue trace.

1. Crawler is not stopping at all by using stop.sh.
2. Web search is not working. It shows 2 search box if I do a search.
I did a search after 15 min of hounder start. Anything missing ?

I'm sure you can use your magic wand on this too. :)

Regards,
Vlab@work

> ...
>
> read more »

vlab@work

unread,

Aug 26, 2009, 11:46:36 AM8/26/09

to hounder

hey Jorge

I tried killing the crawler and changed the pagedb.seed to the sample
given in tutorial and it started working and I get the search results
now.
But this crawler stop problem persists. It can be managed I suppose.

Thanks a lot pal. Keep the good work going.

Will trouble you with lots of queries later

Regards,
Vlab@work

> ...
>
> read more »

Jorge Handl

unread,

Aug 26, 2009, 12:04:21 PM8/26/09

to hou...@googlegroups.com

Vlab@work,

The websearch interface is nothing but a very simple demo, only to be used to verify the installation. The search results page has one search box at the top and another one at the bottom. If you get no results, you will see the two search boxes with nothing in between. We should eliminate the bottom box I suppose, as you're not the first one to be confused by this.

As to the crawler stopping problem, I am confident that it will stop if you give it enough time. The reason it takes so long is that it waits to complete a fetchlist before flushing and closing down. By default, the crawler fetches 500 pages per fetchlist, and that can take some time to finish. If you edit the conf/crawler.properties file and change the fetchlist.size parameter to a smaller value (try 10 for example), it will stop much faster.

- Jorge

Amit Kumar Verma

unread,

Aug 28, 2009, 2:49:58 AM8/28/09

to hounder

Hi Jorge,

All modules are running smoothly, started crawling with "http://
www.wikipedia.org/" as given in 5 minutes tutorial and it was crawled,
indexed and searched properly.

But we changed the 'pagedb.seeds' entry with some other url (a clean
url without any regex) and restarted all modules, now the crawler is
going to old wiki url only. We have written urlnormalizer plugin in
which we are logging where the crawler is moving. It is also working
properly and showing the logs (from which we got to know it is still
crawling the wiki site).

I don't know if I need to change any other property to crawl for
domain we want. Please advise.

Thanks
-amit.

On Aug 26, 9:04 pm, Jorge Handl <jha...@flaptor.com> wrote:
> Vlab@work,
>
> The websearch interface is nothing but a very simple demo, only to be used
> to verify the installation. The search results page has one search box at
> the top and another one at the bottom. If you get no results, you will see
> the two search boxes with nothing in between. We should eliminate the bottom
> box I suppose, as you're not the first one to be confused by this.
>
> As to the crawler stopping problem, I am confident that it will stop if you
> give it enough time. The reason it takes so long is that it waits to
> complete a fetchlist before flushing and closing down. By default, the
> crawler fetches 500 pages per fetchlist, and that can take some time to
> finish. If you edit the conf/crawler.properties file and change the
> fetchlist.size parameter to a smaller value (try 10 for example), it will
> stop much faster.
>
> - Jorge
>

> ...
>
> read more »

Jorge Handl

unread,

Aug 28, 2009, 7:41:24 AM8/28/09

to hou...@googlegroups.com

Amit, after you change the pagedb.seeds file you need to run the createPageDB.sh command.

- Jorge

Amit Kumar Verma

unread,

Aug 28, 2009, 1:43:54 PM8/28/09

to hounder

Hi Jorge,

Thanks for your support, problem is resolved as you given the
solution.

Thanks,
-amit

On Aug 28, 4:41 pm, Jorge Handl <jha...@gmail.com> wrote:
> Amit, after you change the pagedb.seeds file you need to run the