text visible in browser but not in source

166 views
Skip to first unread message

fugee ohu

unread,
Nov 7, 2018, 10:35:01 AM11/7/18
to Ruby on Rails: Talk
I'm not very good with the consoles in chrome and firefox but I couldn't find the text I was looking for in source even though it's displayed as text seemingly, the cursur changes to a vertical line on mouse-over I found this html below in the source How does this html create the text that displays?

   <div class="ui-box product-description-main" id="j-product-description">
        <div class="ui-box-title">Product Description</div>
        <div class="ui-box-body">

            <div class="description-content" data-role="description" data-spm="1000023">
            <div class="loading32"></div>
            </div>

        </div>
    </div>

Colin Law

unread,
Nov 7, 2018, 11:01:32 AM11/7/18
to rubyonra...@googlegroups.com
I should think that javascript is involved. I am sure you asked a
similar question before when you were trying to scrape a website and
couldn't find the text in the html.

Colin
> --
> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
> To post to this group, send email to rubyonra...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/8e0eb26a-517a-4216-bb9c-8bd05e4412a5%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

fugee ohu

unread,
Nov 7, 2018, 11:17:02 AM11/7/18
to Ruby on Rails: Talk
 Yes, within that context, javascript, how does it happen that the text I'm viewing in the browser isn't visible in source?

Colin Law

unread,
Nov 7, 2018, 11:30:58 AM11/7/18
to rubyonra...@googlegroups.com
On Wed, 7 Nov 2018 at 16:17, fugee ohu <fuge...@gmail.com> wrote:
>
> Yes, within that context, javascript, how does it happen that the text I'm viewing in the browser isn't visible in source?

It isn't in the source, the DOM is updated using javascript. You
should see it in the DOM inspector but not in the source.

Colin

Jake Niemiec

unread,
Nov 7, 2018, 12:28:05 PM11/7/18
to rubyonra...@googlegroups.com
The ui-box class would indicate that it is a react component: https://github.com/segmentio/ui-box

React components are run client-side, meaning the text you are looking for is inserted into the document after the page runs <script> tags. I would take a look at the Sources tab in chrome, you can find all the loaded scripts there.

fugee ohu

unread,
Nov 8, 2018, 1:09:28 AM11/8/18
to Ruby on Rails: Talk
Thanks Can you point me to a brief tutorial to show me how to get react to render the content 

Colin Law

unread,
Nov 8, 2018, 3:41:14 AM11/8/18
to rubyonra...@googlegroups.com
On Thu, 8 Nov 2018 at 06:09, fugee ohu <fuge...@gmail.com> wrote:
> ...
> Thanks Can you point me to a brief tutorial to show me how to get react to render the content

Open it in a browser, that's what browsers do.

Note there may well be successive requests back to the server to get
the data you are looking for. Look at the Network tab in the browser
developer tools and you may see the call that fetches it.

Colin

fugee ohu

unread,
Nov 8, 2018, 5:53:08 PM11/8/18
to Ruby on Rails: Talk


On Wednesday, November 7, 2018 at 12:28:05 PM UTC-5, Jake Niemiec wrote:
I was able to find the text that wasn't shown in source by opening console and expanding the ui-box div 

fugee ohu

unread,
Nov 9, 2018, 6:22:45 PM11/9/18
to Ruby on Rails: Talk


On Wednesday, November 7, 2018 at 12:28:05 PM UTC-5, Jake Niemiec wrote:
So far I'm trying to get up to the table, the last element shown below   doc.at_css("div#j-product-description div.ui-box-body div.description-content") gets me back the div class="description-content element but  doc.at_css("div#j-product-description div.ui-box-body div.description-content div.origin-part") returns nil There's a lot inside kde:widget that I'm not including here

<div class="ui-box product-description-main" id="j-product-description" data-widget-cid="widget-27">
        <div class="ui-box-title">Product Description</div>
        <div class="ui-box-body">
<div class="description-content" data-role="description" data-spm="1000023"><div class="origin-part"><p> <br> <br> <br> &nbsp; </p> 
<kse:widget data-widget-type="relatedProduct" id="24226336" title="TOP" type="relation">...</kse:widget> 
<table border="2">

Walter Lee Davis

unread,
Nov 10, 2018, 10:35:03 AM11/10/18
to rubyonra...@googlegroups.com
It seems to me that you are going to have to identify the data source that the in-page JavaScript is using to generate the dynamic table data, and query that rather than trying to work everything out from the HTML (which is just a template for the in-page script to fill). There's probably a JSON URL somewhere that is being loaded into the page, and the script is building from that. This entire approach is pretty fraught with peril, though, because (like any scraping project, only more so) any change to the scheme that the site's developer chooses to implement will break your scraper immediately.

Following this path is going to force you to learn about how the site is working on a code level -- and to figure out how they go from data to presentation.

Another approach might be to use a headless browser on the server to construct a "real" DOM of the page, and query that. To be clear -- I do not recommend you follow this path -- I am noting it here to illustrate how ridiculous this effort will be.

One way to visualize this difference is to use the Web Inspector in Safari or Chrome to look at the differences between the raw HTML (Safari labels this tab "Resources") and the DOM (Safari calls this "Elements"). There is likely very little in common outside of the overall outline, if the page is changing as dramatically as you describe. If you hunt through the Resources tab (in Safari) you may find a link to a JSON file that is being required into the page. Loading that URL, rather than the HTML, may give you a much cleaner set of data (which you can parse directly using Ruby) rather than trying to execute JS on your server in order to construct an HTML DOM that you can parse with Nokogiri.

Walter

fugee ohu

unread,
Nov 10, 2018, 12:22:52 PM11/10/18
to Ruby on Rails: Talk
It wasn't shown in source but when I expanded the element recursively in chrome developer tools I saw the text I was looking for So, what's that gonna be worth?

Colin Law

unread,
Nov 10, 2018, 12:26:40 PM11/10/18
to rubyonra...@googlegroups.com
On Sat, 10 Nov 2018 at 17:23, fugee ohu <fuge...@gmail.com> wrote:
> ...
> It wasn't shown in source but when I expanded the element recursively in chrome developer tools I saw the text I was looking for So, what's that gonna be worth?

As has been said a number of times that will be because it was filled
in by javascript, probably as a result of further calls to the server.

Colin

fugee ohu

unread,
Nov 10, 2018, 4:28:22 PM11/10/18
to Ruby on Rails: Talk
Using a headless browser would be cheating?

Colin Law

unread,
Nov 10, 2018, 4:57:54 PM11/10/18
to rubyonra...@googlegroups.com
Have you done what I suggested and looked in the browser developer
tools at the Network tab? Then you will see if it fetches any further
data after the initial page fetch. Very often you will find it
fetching some json which will very likely contain the data you are
looking for.

Colin

>>
>> Colin
>
>
> Using a headless browser would be cheating?
>
> --
> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
> To post to this group, send email to rubyonra...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/a9b07a26-4caf-450b-b436-11c9762a7456%40googlegroups.com.
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

fugee ohu

unread,
Dec 25, 2018, 3:19:25 PM12/25/18
to Ruby on Rails: Talk
yes, there's scripts running and when i click response i see the data i'm looking for The script names are https urls ending in .do? with a lot of query string data, so what should I do?

fugee ohu

unread,
Dec 25, 2018, 5:37:43 PM12/25/18
to Ruby on Rails: Talk
 The ".do" extension may also be a URL mapping scheme for a web application and not a file extension. For example, the Struts framework often uses the ".do" string for mapping Java servlet actions in the web.xml configuration file

Colin Law

unread,
Dec 26, 2018, 4:41:53 AM12/26/18
to Ruby on Rails: Talk
On Tue, 25 Dec 2018 at 20:19, fugee ohu <fuge...@gmail.com> wrote:
>
> yes, there's scripts running and when i click response i see the data i'm looking for The script names are https urls ending in .do? with a lot of query string data, so what should I do?

Don't worry about scripts for the moment, look for urls that provide
data, probably xml or json. Surely you have used this yourself in
your rails apps using AJAX.

Colin

fugee ohu

unread,
Dec 26, 2018, 5:12:13 PM12/26/18
to Ruby on Rails: Talk
 Trying now to use Capybara::DSL but when I run visit <'url'> from within rails console rails complains no such route

Walter Lee Davis

unread,
Dec 27, 2018, 1:38:34 PM12/27/18
to rubyonra...@googlegroups.com
You are missing the entire point of what Colin is telling you. From what you describe, you are trying to do the following:

1. Download the JS from a data source.
2. Reconstruct the DOM using a JS driver like Chrome or PhantomJS.
3. Parse the DOM with Nokogiri or similar
4. Use the data you gather

Colin is recommending that you download the JS and parse it directly for the data you require. This will not require a driver of any kind, you are simply reading the data as JSON, which is a valid interchange format that Ruby can read directly using the standard library.

Walter

fugee ohu

unread,
Dec 27, 2018, 3:24:04 PM12/27/18
to Ruby on Rails: Talk
So then how 

Hassan Schroeder

unread,
Dec 27, 2018, 4:04:55 PM12/27/18
to rubyonrails-talk
On Thu, Dec 27, 2018 at 12:24 PM fugee ohu <fuge...@gmail.com> wrote:

> So then how

1) choose HTTP client
2) send request
3) parse response

--
Hassan Schroeder ------------------------ hassan.s...@gmail.com
twitter: @hassan
Consulting Availability : Silicon Valley or remote
Message has been deleted

fugee ohu

unread,
Dec 27, 2018, 4:17:35 PM12/27/18
to Ruby on Rails: Talk
How do I download the js and parse it directly for the data I require

fugee ohu

unread,
Dec 27, 2018, 4:39:09 PM12/27/18
to Ruby on Rails: Talk
I tried the code below but when I get to JSON.parse JSON::ParserError: 765: unexpected token at 'jQuery18307882644047005491_1545855559753({"success":true,"code":0,"results":[{"productId":32815555905, ...

require 'net/http'
require 'json'
url = "<fqdn url path with query string data> "
uri = URI(url)
response = Net::HTTP.get(uri)
data = JSON.parse(response)

fugee ohu

unread,
Dec 27, 2018, 5:00:07 PM12/27/18
to Ruby on Rails: Talk
Net::HTTP.get(<url>) returns a string 
data = JSON.parse(response)
JSON::ParserError: 765: unexpected token at 'jQuery18300000644047005491_1545822229753({"success":true,"code":0,"results":[{"productId"
 

Colin Law

unread,
Dec 27, 2018, 5:17:05 PM12/27/18
to Ruby on Rails: Talk
Forget about how to parse it for the moment. Is the data you want in
the response?

Colin

fugee ohu

unread,
Dec 27, 2018, 5:28:23 PM12/27/18
to Ruby on Rails: Talk


On Thursday, December 27, 2018 at 5:17:05 PM UTC-5, Colin Law wrote:
On Thu, 27 Dec 2018 at 22:00, fugee ohu <fuge...@gmail.com> wrote:
>
>
>
> On Thursday, December 27, 2018 at 4:04:55 PM UTC-5, Hassan Schroeder wrote:
>>
>> On Thu, Dec 27, 2018 at 12:24 PM fugee ohu <fuge...@gmail.com> wrote:
>>
>> > So then how
>>
>> 1) choose HTTP client
>> 2) send request
>> 3) parse response
>>
>> --
>> Hassan Schroeder ------------------------ hassan.s...@gmail.com
>> twitter: @hassan
>> Consulting Availability : Silicon Valley or remote
>
>
> Net::HTTP.get(<url>) returns a string
> data = JSON.parse(response)
> JSON::ParserError: 765: unexpected token at 'jQuery18300000644047005491_1545822229753({"success":true,"code":0,"results":[{"productIdes"

Forget about how to parse it for the moment.  Is the data you want in
the response?

Colin

yes

Hassan Schroeder

unread,
Dec 27, 2018, 5:46:59 PM12/27/18
to rubyonrails-talk
On Thu, Dec 27, 2018 at 2:00 PM fugee ohu <fuge...@gmail.com> wrote:

> Net::HTTP.get(<url>) returns a string
> data = JSON.parse(response)
> JSON::ParserError: 765: unexpected token at 'jQuery18300000644047005491_1545822229753({"success":true,"code":0,"results":[{"productId"

Did you ask for JSON data via the Accept request header ?
What is the Content-Type response header you get back?

fugee ohu

unread,
Dec 27, 2018, 7:10:12 PM12/27/18
to Ruby on Rails: Talk
So I need to pass in accept request header json as the first parameter to NET::HTTP ? 

Hassan Schroeder

unread,
Dec 27, 2018, 8:07:19 PM12/27/18
to rubyonrails-talk
On Thu, Dec 27, 2018 at 4:10 PM fugee ohu <fuge...@gmail.com> wrote:

> So I need to pass in accept request header json as the first parameter to NET::HTTP ?

Is that what the docs say?

fugee ohu

unread,
Dec 28, 2018, 2:26:53 AM12/28/18
to Ruby on Rails: Talk
'application/json'

fugee ohu

unread,
Dec 28, 2018, 2:29:35 AM12/28/18
to Ruby on Rails: Talk
What docs? You asked if I passed in an `accept request header` so I asked if I need to You're obscuring this thread  

fugee ohu

unread,
Dec 28, 2018, 2:31:56 AM12/28/18
to Ruby on Rails: Talk


On Thursday, December 27, 2018 at 1:38:34 PM UTC-5, Walter Lee Davis wrote:
The JSON returned is a string How do I parse it into an object? 

fugee ohu

unread,
Dec 28, 2018, 7:18:55 AM12/28/18
to Ruby on Rails: Talk
What docs are you talking about? 

fugee ohu

unread,
Dec 28, 2018, 7:33:13 AM12/28/18
to Ruby on Rails: Talk


On Wednesday, November 7, 2018 at 12:28:05 PM UTC-5, Jake Niemiec wrote:
The ui-box class would indicate that it is a react component: https://github.com/segmentio/ui-box

React components are run client-side, meaning the text you are looking for is inserted into the document after the page runs <script> tags. I would take a look at the Sources tab in chrome, you can find all the loaded scripts there.

On Wed, Nov 7, 2018 at 10:17 AM fugee ohu <fuge...@gmail.com> wrote:


On Wednesday, November 7, 2018 at 11:01:32 AM UTC-5, Colin Law wrote:
I should think that javascript is involved.  I am sure you asked a
similar question before when you were trying to scrape a website and
couldn't find the text in the html.

Colin
On Wed, 7 Nov 2018 at 15:35, fugee ohu <fuge...@gmail.com> wrote:
>
> I'm not very good with the consoles in chrome and firefox but I couldn't find the text I was looking for in source even though it's displayed as text seemingly, the cursur changes to a vertical line on mouse-over I found this html below in the source How does this html create the text that displays?
>
>    <div class="ui-box product-description-main" id="j-product-description">
>         <div class="ui-box-title">Product Description</div>
>         <div class="ui-box-body">
>
>             <div class="description-content" data-role="description" data-spm="1000023">
>             <div class="loading32"></div>
>             </div>
>
>         </div>
>     </div>
>
> --
> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
> To post to this group, send email to rubyonra...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

 Yes, within that context, javascript, how does it happen that the text I'm viewing in the browser isn't visible in source?

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
To post to this group, send email to rubyonra...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

How do I scrape it 

fugee ohu

unread,
Dec 28, 2018, 7:58:29 AM12/28/18
to Ruby on Rails: Talk
Yes the sources tab showed the script and it's output, a string, but how do I parse it into an object? 

Colin Law

unread,
Dec 28, 2018, 8:38:14 AM12/28/18
to Ruby on Rails: Talk
On Fri, 28 Dec 2018 at 12:58, fugee ohu <fuge...@gmail.com> wrote:
> ...
> Yes the sources tab showed the script and it's output, a string, but how do I parse it into an object?

If it is a JSON string then JSON.parse should do it. If it is not
then perhaps you need to strip something off to get to be valid JSON.
Not sure what any of this has got to do with Rails.

Colin

fugee ohu

unread,
Dec 28, 2018, 9:19:26 AM12/28/18
to Ruby on Rails: Talk
It does of course have to do with rails That's why I asked here The reason it has to do with rails is because I'm working within the rails console of my app which I'm trying to scrape data for JSON.parse returns invalid token 

Colin Law

unread,
Dec 28, 2018, 9:38:38 AM12/28/18
to Ruby on Rails: Talk
Dos the string look like valid json to you?

Colin

>
> --
> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
> To post to this group, send email to rubyonra...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/15d9c5ff-f243-4016-a4dd-a7ba43041658%40googlegroups.com.

fugee ohu

unread,
Dec 28, 2018, 10:53:04 AM12/28/18
to Ruby on Rails: Talk
I don't know what a valid json response is supposed to look like Also I don't know if it makes a difference they're using Struts framework

Colin Law

unread,
Dec 28, 2018, 11:04:37 AM12/28/18
to Ruby on Rails: Talk
On Fri, 28 Dec 2018 at 15:53, fugee ohu <fuge...@gmail.com> wrote:
> ...
> I don't know what a valid json response is supposed to look like Also I don't know if it makes a difference they're using Struts framework

Once again I have googled for a link for you
https://www.w3schools.com/whatis/whatis_json.asp

Colin

Hassan Schroeder

unread,
Dec 28, 2018, 3:37:12 PM12/28/18
to rubyonrails-talk
On Thu, Dec 27, 2018 at 11:29 PM fugee ohu <fuge...@gmail.com> wrote:

>> On Thu, Dec 27, 2018 at 4:10 PM fugee ohu <fuge...@gmail.com> wrote:

> What docs? You asked if I passed in an `accept request header` so I asked if I need to You're obscuring this thread

>> > So I need to pass in accept request header json as the first parameter to NET::HTTP ?

You asked the question above ^^ and without looking at the docs
I wouldn't know the order of arguments ("parameters") to that library.

And I didn't say you *needed* to do anything, I just asked if you were
specifying the content-type you wanted in the request. There's still no
guarantee that's what you'll get, but there's a better chance 😀

fugee ohu

unread,
Dec 28, 2018, 3:48:18 PM12/28/18
to Ruby on Rails: Talk
Thanks The response I'm trying to parse is JSONP which is javascript (not json) and has some prepending like   /**/jQuery18307882644047003491_1545786199753 so I have to strip it out of the response or make my call to the script without the callback maybe?

fugee ohu

unread,
Dec 28, 2018, 3:51:33 PM12/28/18
to Ruby on Rails: Talk

fugee ohu

unread,
Dec 28, 2018, 4:40:36 PM12/28/18
to Ruby on Rails: Talk


On Wednesday, November 7, 2018 at 10:35:01 AM UTC-5, fugee ohu wrote:
I'm not very good with the consoles in chrome and firefox but I couldn't find the text I was looking for in source even though it's displayed as text seemingly, the cursur changes to a vertical line on mouse-over I found this html below in the source How does this html create the text that displays?

   <div class="ui-box product-description-main" id="j-product-description">
        <div class="ui-box-title">Product Description</div>
        <div class="ui-box-body">

            <div class="description-content" data-role="description" data-spm="1000023">
            <div class="loading32"></div>
            </div>

        </div>
    </div>

I think I just need to strip the leading /**/jQuery18307882633047005491_1545805999753 from the result  /**/jQuery18307882633047005491_1545805999753({"success":true,"code":0,"results":[{"productId":32817749905,
Anyone can help with the regex?

Colin Law

unread,
Dec 28, 2018, 5:00:46 PM12/28/18
to Ruby on Rails: Talk
On Fri, 28 Dec 2018 at 21:40, fugee ohu <fuge...@gmail.com> wrote:
> ...
> I think I just need to strip the leading /**/jQuery18307882633047005491_1545805999753 from the result /**/jQuery18307882633047005491_1545805999753({"success":true,"code":0,"results":[{"productId":32817749905,
> Anyone can help with the regex?

If you can't work out the regex just find the position of the first {
and strip to there.

Colin

fugee ohu

unread,
Dec 28, 2018, 10:42:49 PM12/28/18
to Ruby on Rails: Talk
I gsub'd for the first  ({

 json_data = res.body.gsub(/^.+\(\{/, "\(\{")
json = JSON.parse(json_data)
JSON::ParserError: 765: unexpected token at '({"success":true,"code":0,"results":[{"productId":32817

Colin Law

unread,
Dec 29, 2018, 2:12:33 AM12/29/18
to Ruby on Rails: Talk
Look at the JSON spec again. What should the first char be?

Colin

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
To post to this group, send email to rubyonra...@googlegroups.com.

fugee ohu

unread,
Dec 29, 2018, 4:37:50 AM12/29/18
to Ruby on Rails: Talk
The first character should be a curly bracket 
json_data = res.body.gsub(/^.+\{/, "\{").chop
json = JSON.parse(json_data)

JSON::ParserError: 765: unexpected token at '{\"pvid\":\"805ba4ee-6446-4148-ab0a-ff5e51c0ab24\",\"

Colin Law

unread,
Dec 29, 2018, 5:01:03 AM12/29/18
to Ruby on Rails: Talk
On Sat, 29 Dec 2018 at 09:37, fugee ohu <fuge...@gmail.com> wrote:
> ..
>>> JSON::ParserError: 765: unexpected token at '({"success":true,"code":0,"results":[{"productId":32817
>>
> ...
> JSON::ParserError: 765: unexpected token at '{\"pvid\":\"805ba4ee-6446-4148-ab0a-ff5e51c0ab24\",\"

You have not only taken out the bracket, but you have also changed all
the " to \". Did you not notice that yourself?

Colin

fugee ohu

unread,
Dec 29, 2018, 5:20:21 AM12/29/18
to Ruby on Rails: Talk
No I didn't notice, thanks

fugee ohu

unread,
Dec 29, 2018, 5:36:23 AM12/29/18
to Ruby on Rails: Talk

json_data = res.body.gsub(/^.+{/, "{").chomp(");")
json = JSON.parse(json_data)
JSON::ParserError: 765: unexpected token at '{\"pvid\":\"ff43f2dc-3f86-42a3-bfa8-49c117e9da6a\",\"scm-cnt\":\"1007.13482.95643.0\", ...
 

Colin Law

unread,
Dec 29, 2018, 5:48:31 AM12/29/18
to Ruby on Rails: Talk
On Sat, 29 Dec 2018 at 10:36, fugee ohu <fuge...@gmail.com> wrote:
>
> json_data = res.body.gsub(/^.+{/, "{").chomp(");")
> json = JSON.parse(json_data)
> JSON::ParserError: 765: unexpected token at '{\"pvid\":\"ff43f2dc-3f86-42a3-bfa8-49c117e9da6a\",\"scm-cnt\":\"1007.13482.95643.0\",

Why have you posted this? You still have \" instead of "
As I said, look again at the spec of what a JSON string should look
like. Until you have that there is no point feeding into parse()

Colin

fugee ohu

unread,
Dec 29, 2018, 5:55:20 AM12/29/18
to Ruby on Rails: Talk
Well it seems my response shouldn't contain the \" characters if it's json so It's not json it's js? because i already read that the string response is  jsonp which actually contains js, not json

fugee ohu

unread,
Dec 29, 2018, 6:16:17 AM12/29/18
to Ruby on Rails: Talk
 It's not clear to me what the response is Can the problem be that my request is the full url with callback and other query string data Should I just be calling the script without invoking the Struts framework callback but still provide the item number as query string data

Colin Law

unread,
Dec 29, 2018, 6:48:22 AM12/29/18
to Ruby on Rails: Talk
On Sat, 29 Dec 2018 at 11:16, fugee ohu <fuge...@gmail.com> wrote:
> ...
> It's not clear to me what the response is Can the problem be that my request is the full url with callback and other query string data Should I just be calling the script without invoking the Struts framework callback but still provide the item number as query string data

Earlier on you had something that looked like JSON to me, except for
the stuff on the front, and you posted

> I think I just need to strip the leading /**/jQuery18307882633047005491_1545805999753 from the result
> /**/jQuery18307882633047005491_1545805999753({"success":true,"code":0,"results":[{"productId":32817749905,
>

Since then you have done something that is producing \" instead of ".
I suggest you go back to whatever you had then.

As an alternative you could try Hassan's advice. Whatever you do make
sure you keep sufficient notes that you can always get back to the
where you were when you find yourself going backwards.

Colin

fugee ohu

unread,
Dec 29, 2018, 7:13:46 AM12/29/18
to Ruby on Rails: Talk
That was from the beginning of the response A little further on in the response some quotes are escaped I don't know why  

Walter Lee Davis

unread,
Dec 29, 2018, 10:02:07 AM12/29/18
to rubyonra...@googlegroups.com
I would recommend that you put the text you get back from the URL in a file (just now, for testing purposes), so you can experiment with various transforms on it. When you have the raw data in a text file, you can take advantage of your editor's color-coding to see what is what in the raw string.

You're going to have to read through the data and figure out how it works, so you can understand what parts are data and what parts are the callback functions. Read about JSONP and what it is and how and why you would use it. Until you truly understand what the developers have been doing in order to construct their page, you won't know how to pare back the page-building parts and get just the raw data out of the payload.

Once you have done this, though, the very last step (parsing the data with the standard library JSON function) will be a trivial maraschino cherry on top of the sundae.

Until you actually understand the data, you cannot scrape it or parse it. So eat your vegetables first!

Walter

fugee ohu

unread,
Dec 30, 2018, 6:04:36 AM12/30/18
to Ruby on Rails: Talk
All my requests are drawing  'Errno::ECONNRESET: Connection reset by peer' now I don't know why

fugee ohu

unread,
Dec 30, 2018, 6:37:26 AM12/30/18
to Ruby on Rails: Talk
Oh ok sorry You mean the Net::HTTP docs? I don't know I haven't thought to read them I found this page http://po-ru.com/diary/turning-jsonp-callbacks-into-a-ruby-api-with-johnson/ but don't understand it

fugee ohu

unread,
Dec 30, 2018, 7:06:39 AM12/30/18
to Ruby on Rails: Talk


On Saturday, December 29, 2018 at 10:02:07 AM UTC-5, Walter Lee Davis wrote:
It's not json it's javascript so I don't have to run JSON.parse 

Colin Law

unread,
Dec 30, 2018, 8:36:40 AM12/30/18
to Ruby on Rails: Talk
On Sun, 30 Dec 2018 at 12:06, fugee ohu <fuge...@gmail.com> wrote:
> ...
> It's not json it's javascript so I don't have to run JSON.parse

{"success":true,"code":0,"results":[{"productId":32815555905, ...
Looks like JSON to me (embedded in js admittedly). You said that the
data you want is in that string. If that is correct then all you have
to do is to extract it and parse it as JSON.

Colin

fugee ohu

unread,
Dec 30, 2018, 10:45:19 AM12/30/18
to Ruby on Rails: Talk

Colin Law

unread,
Dec 30, 2018, 10:56:30 AM12/30/18
to Ruby on Rails: Talk
What is it about the string
{"success":true,"code":0,"results":[{"productId":32815555905, ...
that makes it not JSON?

Colin

Hassan Schroeder

unread,
Dec 30, 2018, 3:18:20 PM12/30/18
to rubyonrails-talk
On Sun, Dec 30, 2018 at 3:37 AM fugee ohu <fuge...@gmail.com> wrote:
>>
> You mean the Net::HTTP docs? I don't know I haven't thought to read them

🙄

> I found this page http://po-ru.com/diary/turning-jsonp-callbacks-into-a-ruby-api-with-johnson/ but don't understand it

That's an interesting approach though `johnson` is abandoned; you
could try using `therubyracer` instead.

fugee ohu

unread,
Dec 31, 2018, 12:23:28 AM12/31/18
to Ruby on Rails: Talk
I don't know I'm repeating what I read somewhere else about jsonp actually

fugee ohu

unread,
Dec 31, 2018, 12:26:38 AM12/31/18
to Ruby on Rails: Talk
The escaped quotes further down  the string, you see i left it ending in ... there's more to it i didn't include it's a long string when the parser reaches the first backslash it returns an error

fugee ohu

unread,
Dec 31, 2018, 7:12:47 AM12/31/18
to Ruby on Rails: Talk
Now I only get connection reset by peer when I try to make the request 

Colin Law

unread,
Dec 31, 2018, 7:24:10 AM12/31/18
to Ruby on Rails: Talk
That probably means you have dropped off the end of the json onto
something else, examine that area to see. Is all the data you need in
the bit before that?

Colin

>
> --
> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
> To post to this group, send email to rubyonra...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/d5068ec8-3c81-4d4a-ba0e-70fdda4dff2e%40googlegroups.com.

Colin Law

unread,
Dec 31, 2018, 7:25:13 AM12/31/18
to Ruby on Rails: Talk
Can't help you there, presumably either the website has changed or you
have changed the way you are fetching it. Try the url in a browser.

Colin

fugee ohu

unread,
Dec 31, 2018, 7:46:04 AM12/31/18
to Ruby on Rails: Talk
Yes the requests still work in a browser Maybe I just need to kill  Net::HTTP process Let me try to reboot Please stand by 
Message has been deleted
Message has been deleted

Jake Niemiec

unread,
Jan 2, 2019, 10:33:09 AM1/2/19
to rubyonra...@googlegroups.com
If you get “Connection reset by peer” while scraping a website, it is very likely that your scraping attempts were detected and automatically blocked for a while. They might have also been alerted by all of these malformed requests.

Consider pacing out your requests so that it doesn’t look like you are scraping.
On Mon, Dec 31, 2018 at 7:58 AM fugee ohu <fuge...@gmail.com> wrote:
 I changed my code and now getting a text/html content type response Not sure what I'm doing I commented out my previous creation of http object and used an inline syntax that's part of the creation of res object

require "net/http"
require "uri"
url = URI.parse("https://www.ali<notshown>.com/item/Robotic-Vacuum-Cleaner-Proscenic-790T-Vacuum-Mop-Sweep-3-in-1-Cleaner-for-Pet-Hair-Wifi/32840149410.html?spm=2114.search0104.3.1.24d566b6GAD2uI&ws_ab_test=searchweb0_0,searchweb201602_1_10065_10068_10130_10890_10547_319_10546_317_10548_5730311_10545_10696_453_10084_454_10083_5729211_10618_10307_538_537_536_10059_10884_10887_100031_321_322_10103-10890,searchweb201603_51,ppcSwitch_0&algo_expid=99dc32b9-d1ce-4020-8bec-624c18225f44-0&algo_pvid=99dc32b9-d1ce-4020-8bec-624c18225f44")
#http = Net::HTTP.new(url.host, url.port)
#http.use_ssl = true
req = Net::HTTP::Get.new url 
res = Net::HTTP.start(url.host, url.port, :use_ssl => url.scheme == 'https') {|http| http.request req}
puts res.body

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
To post to this group, send email to rubyonra...@googlegroups.com.
Message has been deleted

fugee ohu

unread,
Jan 28, 2019, 10:20:01 PM1/28/19
to Ruby on Rails: Talk
Can you give me an idea how to extract it 

Colin Law

unread,
Jan 29, 2019, 4:21:43 AM1/29/19
to Ruby on Rails: Talk
On Tue, 29 Jan 2019 at 03:20, fugee ohu <fuge...@gmail.com> wrote:
> ...
> Can you give me an idea how to extract it

Are you seriously asking, after the years that you have been using
Ruby, that you don't know how to extract a particular section from a
string? Apart from anything else, since I haven't got access to the
full string (nor do I want it), I have no idea what the details
surrounding the JSON section are so can't tell you how to do it.

Colin

Naveen Goud

unread,
Jan 29, 2019, 4:43:24 AM1/29/19
to rubyonra...@googlegroups.com
Hi,

Can anyone help how to work this projects

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
To post to this group, send email to rubyonra...@googlegroups.com.

fugee ohu

unread,
Jan 29, 2019, 10:51:51 PM1/29/19
to Ruby on Rails: Talk
It looks like this:
 /**/jQuery18308525223902976695_1548819829968({"success":true,"code":0,"results":[{"productId":32755997022,"sellerId":201591356,"oriMinPrice":"US $275.00 ... });

JSON.parse and cxt.eval both return errors even if I remove everything up to the first { and then chomp the trailing )

Colin Law

unread,
Jan 30, 2019, 2:02:06 AM1/30/19
to Ruby on Rails: Talk
Have you checked that the trailing } you show there is the one that matches the { you show?  Perhaps there is more than one JSON string there.
If you have checked that then show us where the error occurs. Presumably you have already looked at that though and have not found what I suggest.

Colin




--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
To post to this group, send email to rubyonra...@googlegroups.com.

fugee ohu

unread,
Jan 30, 2019, 4:36:29 AM1/30/19
to Ruby on Rails: Talk
 parsed_obj=JSON.parse( res.body.gsub(/^.+\(/,"").chomp(");") )
 puts parsed_obj

 that gets me => true 


fugee ohu

unread,
Jan 30, 2019, 4:55:19 AM1/30/19
to Ruby on Rails: Talk
I can print the outer values of parsed_obj with an iteration loop
parsed_obj.each |key, val| do
can you tell me how do i select individual keys within the nested hash 'results'  

Colin Law

unread,
Jan 30, 2019, 5:06:13 AM1/30/19
to Ruby on Rails: Talk
On Wed, 30 Jan 2019 at 09:55, fugee ohu <fuge...@gmail.com> wrote:
> ...
> I can print the outer values of parsed_obj with an iteration loop
> parsed_obj.each |key, val| do
> can you tell me how do i select individual keys within the nested hash 'results'

Does that mean you have parsed it ok?

Isn't it obvious? You have to use each|key,val| on the outer level
results to get their contents. With different names for key and val
obviously.

Colin

fugee ohu

unread,
Jan 30, 2019, 5:09:10 AM1/30/19
to Ruby on Rails: Talk
I found this extended discussion on different ways  of accessing hash values here https://stackoverflow.com/questions/5544858/accessing-elements-of-nested-hashes-in-ruby 
How would I get results->productId

Colin Law

unread,
Jan 30, 2019, 5:15:54 AM1/30/19
to Ruby on Rails: Talk
On Wed, 30 Jan 2019 at 10:09, fugee ohu <fuge...@gmail.com> wrote:
>
> How would I get results->productId

results["productId"]

Colin
Message has been deleted

fugee ohu

unread,
Jan 30, 2019, 6:19:21 AM1/30/19
to Ruby on Rails: Talk
puts parsed_obj["results"]  shows the entire results but `puts parsed_obj["results"]["productId"] gets me error no implicit conversion of String into Integer

Colin Law

unread,
Jan 30, 2019, 6:54:36 AM1/30/19
to Ruby on Rails: Talk
On Wed, 30 Jan 2019 at 11:19, fugee ohu <fuge...@gmail.com> wrote:
> ...
> puts parsed_obj["results"] shows the entire results but `puts parsed_obj["results"]["productId"] gets me error no implicit conversion of String into Integer

Show us what puts parsed_obj["results"] gives. If it is long then
from the start up to where productID occurs.

Colin

fugee ohu

unread,
Jan 30, 2019, 8:48:47 AM1/30/19
to Ruby on Rails: Talk
 {"productId"=>32970292001, "sellerId"=>235696817, "oriMinPrice"=>"US $50.00", "oriMaxPrice"=>"US $50.00", "productTitle"=>"Paid  function", "minPrice"=>"US $50.00", "maxPrice"=>"US $50.00", "orders"=>"1", "productImage"=>"//ae01.alicdn.com/kf/HTB10AX7aPLuK1Rjy0Fhq6xpdFXa5.jpg", "productDetailUrl"=>" ...

Colin Law

unread,
Jan 30, 2019, 9:45:03 AM1/30/19
to Ruby on Rails: Talk
On Wed, 30 Jan 2019 at 13:48, fugee ohu <fuge...@gmail.com> wrote:
>
>>
>> Show us what puts parsed_obj["results"] gives. If it is long then
>> from the start up to where productID occurs.
>>
>> Colin
>
>
> {"productId"=>32970292001, "sellerId"=>235696817, "oriMinPrice"=>"US $50.00", "oriMaxPrice"=>"US $50.00", "productTitle"=>"Paid function", "minPrice"=>"US $50.00", "maxPrice"=>"US $50.00", "orders"=>"1", "productImage"=>"//ae01.alicdn.com/kf/HTB10AX7aPLuK1Rjy0Fhq6xpdFXa5.jpg", "productDetailUrl"=>" ...

Are you absolutely sure that is what parsed_obj["results"] is?
Frankly I think you are mistaken.
Looking at the original source you posted which had
"results":[{"productId":32755997022,"...
suggests that actually parsed_obj["results"] should be
[ {"productId"=>32970292001,...},{...}]

Colin

fugee ohu

unread,
Jan 30, 2019, 11:41:20 AM1/30/19
to Ruby on Rails: Talk
Unparsed response looks like this
 /**/myscript.js({"success":true,"code":0,"results":[{"productId":32962770119, ... ,"itemEvalTotalNum":0}],"finished":false,"page":1,"pageSize":20,"postback":"9954eca0-4297-4d1f-bada-a5c3b131214c","pin":"gps-id=pcDetailLeftTrendProduct&scm=1007.13438.100207.0&scm_id=1007.13438.100207.0&scm-url=1007.13438.100207.0&pvid=778c79a8-9092-483e-92cb-f393856b0565"});
So if I'm gonna use this approach I have to gsub out everything up to the first [{ after the last }]
As you can see I was trying another approach as well, need to learn both, substituting the original callback script for myscript.js I can write a function instead of using JSON.parse but I wanna be able to do it both ways

Colin Law

unread,
Jan 30, 2019, 12:44:53 PM1/30/19
to Ruby on Rails: Talk
Am I right in saying that parsed_obj is actually an array as I
suggested? If so then do you not know how to access the elements of
the array?

Colin

fugee ohu

unread,
Jan 30, 2019, 4:56:01 PM1/30/19
to Ruby on Rails: Talk
Everything in the unparsed resonse body that I want is between [ and ] I have to gsub it out

Colin Law

unread,
Jan 30, 2019, 5:02:17 PM1/30/19
to Ruby on Rails: Talk
On Wed, 30 Jan 2019 at 21:56, fugee ohu <fuge...@gmail.com> wrote:
> ...
> Everything in the unparsed resonse body that I want is between [ and ] I have to gsub it out


No you don't. After you get parsed_obj["results] (which is an array,
that's what the [] mean) then you can get the first product by
parsed_obj["results"][0]["productId"]
It is just an array. You have met ruby arrays haven't you?

I am rapidly losing the will to live.

Colin

fugee ohu

unread,
Jan 30, 2019, 5:05:43 PM1/30/19
to Ruby on Rails: Talk
How do I chomp everything at the end of the string starting with ] 

fugee ohu

unread,
Jan 30, 2019, 5:09:26 PM1/30/19
to Ruby on Rails: Talk
The response body isn't JSON.parse parsable as is it has to be gsub'd and chomped first before I can run JSON.parse My original gsub wasn't right it wasn't removing the end that follows ] 
JSON::ParserError: 784: unexpected token at 'myscript.js({"success":true,"code
It is loading more messages.
0 new messages