Hello,
I am new to scrubyt.
I've done some work with hpricot and know xpath and css selectors well
enough to usually find what I'm looking for.
I'm trying to get my mind attahced to this scrubyt syntax:
google_data = Scrubyt::Extractor.define do
fetch '
http://www.google.com/search?hl=en&q=ruby'
link_title "//a[@class='l']", :write_text => true do
link_url
end
end
Comment 1:
- The first 2 lines make sense to me; I guess.
This makes sense I think:
- link_title("//a[@class='l']")
I think it returns an ennumerable type object.
I looked at its methods using this syntax:
google_data = Scrubyt::Extractor.define do
fetch '
http://www.google.com/search?hl=en&q=ruby'
(link_title("//a[@class='l']").methods - 1.methods).each{ |m| p m}
end
I see:
"parse_child_patterns"
"children"
"result_indexer"
"children="
"extractor"
"write_text"
"evaluate"
"extractor="
"indices_to_extract"
"constraints"
"options"
"generalize"
"constraints="
"indices_to_extract="
"parent"
"generate_relative_XPaths"
"options="
"parent="
"name"
"output_type"
"name="
"limit"
"except"
"method_missing"
"to_sexp"
"parent_of_leaf"
"resolve"
"referenced_extractor"
"referenced_pattern"
"referenced_extractor="
"referenced_pattern="
"check_if_shortcut_pattern"
"default"
"filters"
"modifier_calls"
"example_type"
"filters="
"filter_count"
"modifier_calls="
"check_if_detail_page"
"next_page_url"
None of these methods jump out at me as useful.
Is there anyone out there using any of these methods?
If yes, what for?
Next I looked at this syntax:
link_title "//a[@class='l']", :write_text => true do
link_url
end
How did the auther "know" that link_url would give him the value of
the href attribute?
I tried this:
audrey_title "//a[@class='l']", :write_text => true do
audrey_url
end
The result suprised me.
I got this:
[{:audrey_url=>"
http://www.ruby-lang.org/", :audrey_title=>"Ruby
Programming Language"}, ...
So audrey_title is both a label and a method!
Next, I tried this:
peter_title "//a[@class='l']", :write_text => true do
peter_class
end
And I got this:
Fri Sep 11 02:38 /pt/b/scrubyt/sefg maco$
Fri Sep 11 02:38 /pt/b/scrubyt/sefg maco$
Fri Sep 11 02:38 /pt/b/scrubyt/sefg maco$ ruby t.rb
/pt/r1/lib/ruby/gems/1.8/gems/scrubyt-0.4.06/lib/scrubyt/core/scraping/
filters/tree_filter.rb:96:in `generate_XPath_for_example': undefined
method `parent' for nil:NilClass (NoMethodError)
from /pt/r1/lib/ruby/gems/1.8/gems/scrubyt-0.4.06/lib/scrubyt/core/
scraping/filters/tree_filter.rb:89:in `loop'
from /pt/r1/lib/ruby/gems/1.8/gems/scrubyt-0.4.06/lib/scrubyt/core/
scraping/filters/tree_filter.rb:89:in `generate_XPath_for_example'
So, I must not understand what is going on here.
I assumed that class is an attribute just like href; but I guess its
not.
Okay, I'll end this with a simple question.
Assume I have this:
audrey_title "//a[@class='l']", :write_text => true do
audrey_url
end
Q:
- What are some different functiions which will work where
audrey_url is:
audrey_title "//a[@class='l']", :write_text => true do
## besides audrey_url,
## What can I put here?
end
Thanks,
--Audrey