How do i scrape dynamic content from Struts framework with Ruby

33 views
Skip to first unread message

fugee ohu

unread,
Dec 25, 2018, 6:16:13 PM12/25/18
to Ruby on Rails: Talk
How do i scrape dynamic content from Struts framework with Ruby

Hassan Schroeder

unread,
Dec 25, 2018, 6:40:23 PM12/25/18
to rubyonrails-talk
On Tue, Dec 25, 2018 at 3:16 PM fugee ohu <fuge...@gmail.com> wrote:
>
> How do i scrape dynamic content from Struts framework with Ruby

Same as any web source: send a request, parse the response. Is
there some particular issue you're encountering?

--
Hassan Schroeder ------------------------ hassan.s...@gmail.com
twitter: @hassan
Consulting Availability : Silicon Valley or remote

fugee ohu

unread,
Dec 26, 2018, 1:31:29 AM12/26/18
to Ruby on Rails: Talk


On Tuesday, December 25, 2018 at 6:40:23 PM UTC-5, Hassan Schroeder wrote:
On Tue, Dec 25, 2018 at 3:16 PM fugee ohu <fuge...@gmail.com> wrote:
>
> How do i scrape dynamic content from Struts framework with Ruby

Same as any web source: send a request, parse the response. Is
there some particular issue you're encountering?


Hassan Schroeder ------------------------ hassan.s...@gmail.com
twitter: @hassan
Consulting Availability : Silicon Valley or remote

So I just use browser.execute_script and pass in the full path https and query string just as it appears in the name column in Chrome-> Dev Tools?

fugee ohu

unread,
Dec 26, 2018, 1:51:39 AM12/26/18
to Ruby on Rails: Talk


On Tuesday, December 25, 2018 at 6:40:23 PM UTC-5, Hassan Schroeder wrote:

Selenium::WebDriver::Error::UnknownError: unknown error: Runtime.evaluate threw exception: SyntaxError: Unexpected end of input
  (Session info: chrome=71.0.3578.80)
  (Driver info: chromedriver=2.42.591071 (0b695ff80972cc1a65a5cd643186d2ae582cd4ac),platform=Linux 4.15.0-43-generic x86_64)

Maybe I need to run it without .do extension and also mayb

Rafael Belo

unread,
Dec 26, 2018, 8:19:47 AM12/26/18
to Ruby on Rails: Talk
You are passing an URL instead of a script.
This function "execute_script" it's to execute sobre javascript "script".
You've to visit the page with the browser support.
If you're using Capybara, you shoul use the "visit" function passing the URL that you want to go.
But if not, you've to look which command your driver has to visit URL's.

fugee ohu

unread,
Dec 26, 2018, 9:21:32 AM12/26/18
to Ruby on Rails: Talk
Must I use the url?
 

Rafael Belo

unread,
Dec 26, 2018, 9:49:28 AM12/26/18
to rubyonra...@googlegroups.com
Yes, if you are using capybara, you may use `visit 'http://myurl.com/goes-here'`

--
You received this message because you are subscribed to a topic in the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rubyonrails-talk/CpOPHz-zFsc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rubyonrails-ta...@googlegroups.com.
To post to this group, send email to rubyonra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/acaef991-80d3-4c6c-bccc-fdc017ef4734%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Rafael Belo
Web Developer
Skype: rafaelrpbelo
Twitter: @rafaelrpbelo
Linkedin: rafaelrpbelo

fugee ohu

unread,
Dec 26, 2018, 10:08:11 AM12/26/18
to Ruby on Rails: Talk
This is the first time I'm hearing Capybara recommended for web scraping Is this the preferred method for what I'm trying to do? 

Rafael Belo

unread,
Dec 26, 2018, 10:25:33 AM12/26/18
to rubyonra...@googlegroups.com
Capybara has a friendly interface for your web drivers, you can integrate it with selenium, webkit, poltergeist and other.
Try to use it, I think you will like it.



For more options, visit https://groups.google.com/d/optout.
Message has been deleted

fugee ohu

unread,
Dec 26, 2018, 11:20:42 AM12/26/18
to Ruby on Rails: Talk
I need to work in rails console and when I run `visit ...`  rails complains of no matching route 

Rafael Belo

unread,
Dec 26, 2018, 11:58:35 AM12/26/18
to rubyonra...@googlegroups.com
You've to include Capybara::DSL.

```
include Capybara::DSL
```


For more options, visit https://groups.google.com/d/optout.

fugee ohu

unread,
Dec 26, 2018, 2:11:25 PM12/26/18
to Ruby on Rails: Talk
require 'capybara/rails'
include Capybara::DSL

No change I still get the same routing error  ActionController::RoutingError (No route matches [GET] "/getI2iRecommendingResults.do"):

Walter Lee Davis

unread,
Dec 26, 2018, 2:15:37 PM12/26/18
to rubyonra...@googlegroups.com
You'll probably get better answers if you show your work. Try writing a single script that demonstrates what you want to do, and post it as a Gist. Link it here, show what the output looks like, and see where that leads you. Often times, working in the constraints of making the example work in a single script forces you to reconsider the problem, or shows you a simple error you made while configuring something more complex.

Walter

Rafael Belo

unread,
Dec 26, 2018, 2:15:57 PM12/26/18
to rubyonra...@googlegroups.com
Which params are you using for `visit`?


For more options, visit https://groups.google.com/d/optout.

fugee ohu

unread,
Dec 26, 2018, 2:38:12 PM12/26/18
to Ruby on Rails: Talk

fugee ohu

unread,
Dec 26, 2018, 5:32:07 PM12/26/18
to Ruby on Rails: Talk
I'd like to write a script after I get my commands down, for now I'm working in rails console It may not be the intended use of capybara to visit url's not defined in routes.rb Should I be using something else to make the request

Walter Lee Davis

unread,
Dec 26, 2018, 6:14:19 PM12/26/18
to rubyonra...@googlegroups.com
Try using an API tool, like Faraday.

gem 'faraday'
require 'faraday'
response = Faraday.get('https://entire.url.of/your/api/data.json')

whatever_parsing_tool_you_want.parse(response.body)

Walter

>
> --
> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
> To post to this group, send email to rubyonra...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/ba70d205-f364-4c1e-ba25-b8edd4650c0f%40googlegroups.com.

fugee ohu

unread,
Dec 27, 2018, 9:25:58 AM12/27/18
to Ruby on Rails: Talk
Thanks That works The returned data is delimited as \"name\":value,...
 #(Text "/**/jQuery18307882644047005491_1545806199753({\"success\":true,\"code\":0,\"results\":[{\"productId\":32617749905,\"sellerId\":228628782,\"oriMinPrice\":\"US $363.00\",\"oriMaxPrice\"...
Am I gonna have to regex my way through it?

fugee ohu

unread,
Dec 27, 2018, 9:29:40 AM12/27/18
to Ruby on Rails: Talk
I used Nokogiri::HTML.parse(response.body) It isn't converting the javascript response to something more friendly 

Rafael Belo

unread,
Dec 27, 2018, 9:38:18 AM12/27/18
to rubyonra...@googlegroups.com
You won't have some friendly parsed javascript response. The javascript it's not information itself, it's a lot of command the will handle browser's DOM. That's why we're using a driver to get this informations.

If you get request and parse it, you'll get the raw html with javascript code, but if you use a driver, then it'll get the response and execute the loaded javascript. This is the key.


For more options, visit https://groups.google.com/d/optout.
Message has been deleted

fugee ohu

unread,
Dec 27, 2018, 10:57:57 AM12/27/18
to Ruby on Rails: Talk
 require 'selenium-webdriver
 driver=Selenium::Webdriver.for:chrome
 driver.get ("https://gpsfront.website.com/getI2iRecommendingResults.do?   callback=jQuery18307882644045605491_1545806199753&currentItemList=32819755026&categoryId=200121521&shopId=2339135&companyId=238423932&recommendType=&scenario=pcDetailLeftTopSell&limit=6&offset=0&_=1545800704149")
The result:
=> nil
The path I posted here is ficticious but the real path returned => nil

fugee ohu

unread,
Dec 27, 2018, 11:36:43 AM12/27/18
to Ruby on Rails: Talk
Can you please read my continuing discussion with Rafael  
Reply all
Reply to author
Forward
0 new messages