=ERROR REPORT==== 30-Dec-2013::13:04:20 ===
** Generic server <0.94.0> terminating
** Last message in was {tcp_closed,#Port<0.4182>}
** When Server state == {state,"
www.zalora.sg",80,5000,#Ref<0.0.0.586>,false,
undefined,[],false,#Port<0.4182>,false,[],
{[],[]},
undefined,idle,undefined,<<>>,0,0,[],undefined,
undefined,true,undefined,false,undefined,
undefined,<<>>,0,false,147471,1,undefined}
** Reason for termination ==
** connection_closed
=ERROR REPORT==== 30-Dec-2013::13:04:21 ===
** Generic server <0.218.0> terminating
** Last message in was {'$gen_cast',
{crawl,
** When Server state == []
** Reason for termination ==
** {{'Elixir.HTTPotion.HTTPError','__exception__',<<"retry_later">>},
[{'Elixir.HTTPotion',request,5,
[{file,
"/Users/rambo/code/elixir-code/exrachnid/deps/httpotion/lib/httpotion.ex"},
{line,134}]},
{'Elixir.Exrachnid.Worker',handle_cast,2,
[{file,
"/Users/rambo/code/elixir-code/exrachnid/lib/exrachnid/worker.ex"},
{line,38}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,607}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
=ERROR REPORT==== 30-Dec-2013::13:04:21 ===
** Generic server <0.221.0> terminating
** Last message in was {'$gen_cast',
{crawl,
The offending piece of code apparently comes from the Worker:
def handle_cast({ :crawl, url }, _state) do
case HTTPotion.get(url, @user_agent, []) do
Response[body: body, status_code: status, headers: _headers] when status in 200..299 ->
Exrachnid.add_fetched_url(url)
host = URI.parse(url).host
# Add extracted links
body
|> extract_links(host)
|> Exrachnid.add_new_urls
_ ->
# TODO: Do nothing yet.
end
{ :stop, :normal, [] }
end
defmodule Exrachnid do
use Application.Behaviour
# other code omitted.
def crawl(url) do
Exrachnid.Worker.crawl(url)
end
def add_new_urls(urls) do
urls
|> Exrachnid.DbServer.add_new_urls
|> Enum.each(fn(url) -> crawl(url) end)
end
end
What this does is this:
a) urls get added to the DbServer, and will return only the new urls.
b) the new urls will then each be passed to the crawl function
c) the crawl function starts a child and attaches it to the supervision tree.
Is there anything wrong with this approach?