Audrey is new to scrubyt

0 views
Skip to first unread message

Audrey A Lee

unread,
Sep 11, 2009, 6:05:09 AM9/11/09
to forumgrouper
Hello,

I am new to scrubyt.

I've done some work with hpricot and know xpath and css selectors well
enough to usually find what I'm looking for.

I'm trying to get my mind attahced to this scrubyt syntax:


google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/search?hl=en&q=ruby'

link_title "//a[@class='l']", :write_text => true do
link_url
end
end

Comment 1:
- The first 2 lines make sense to me; I guess.

This makes sense I think:
- link_title("//a[@class='l']")

I think it returns an ennumerable type object.

I looked at its methods using this syntax:

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/search?hl=en&q=ruby'
(link_title("//a[@class='l']").methods - 1.methods).each{ |m| p m}
end

I see:

"parse_child_patterns"
"children"
"result_indexer"
"children="
"extractor"
"write_text"
"evaluate"
"extractor="
"indices_to_extract"
"constraints"
"options"
"generalize"
"constraints="
"indices_to_extract="
"parent"
"generate_relative_XPaths"
"options="
"parent="
"name"
"output_type"
"name="
"limit"
"except"
"method_missing"
"to_sexp"
"parent_of_leaf"
"resolve"
"referenced_extractor"
"referenced_pattern"
"referenced_extractor="
"referenced_pattern="
"check_if_shortcut_pattern"
"default"
"filters"
"modifier_calls"
"example_type"
"filters="
"filter_count"
"modifier_calls="
"check_if_detail_page"
"next_page_url"


None of these methods jump out at me as useful.

Is there anyone out there using any of these methods?
If yes, what for?

Next I looked at this syntax:

link_title "//a[@class='l']", :write_text => true do
link_url
end

How did the auther "know" that link_url would give him the value of
the href attribute?

I tried this:

audrey_title "//a[@class='l']", :write_text => true do
audrey_url
end

The result suprised me.

I got this:

[{:audrey_url=>"http://www.ruby-lang.org/", :audrey_title=>"Ruby
Programming Language"}, ...

So audrey_title is both a label and a method!

Next, I tried this:

peter_title "//a[@class='l']", :write_text => true do
peter_class
end

And I got this:


Fri Sep 11 02:38 /pt/b/scrubyt/sefg maco$
Fri Sep 11 02:38 /pt/b/scrubyt/sefg maco$
Fri Sep 11 02:38 /pt/b/scrubyt/sefg maco$ ruby t.rb
/pt/r1/lib/ruby/gems/1.8/gems/scrubyt-0.4.06/lib/scrubyt/core/scraping/
filters/tree_filter.rb:96:in `generate_XPath_for_example': undefined
method `parent' for nil:NilClass (NoMethodError)
from /pt/r1/lib/ruby/gems/1.8/gems/scrubyt-0.4.06/lib/scrubyt/core/
scraping/filters/tree_filter.rb:89:in `loop'
from /pt/r1/lib/ruby/gems/1.8/gems/scrubyt-0.4.06/lib/scrubyt/core/
scraping/filters/tree_filter.rb:89:in `generate_XPath_for_example'

So, I must not understand what is going on here.

I assumed that class is an attribute just like href; but I guess its
not.

Okay, I'll end this with a simple question.

Assume I have this:

audrey_title "//a[@class='l']", :write_text => true do
audrey_url
end

Q:
- What are some different functiions which will work where
audrey_url is:

audrey_title "//a[@class='l']", :write_text => true do
## besides audrey_url,
## What can I put here?
end


Thanks,
--Audrey

Audrey A Lee

unread,
Sep 11, 2009, 2:33:05 PM9/11/09
to forumgrouper
Reply all
Reply to author
Forward
0 new messages