Can anyone point me to a JsonPath Example?

373 views
Skip to first unread message

Enrico Kunz

unread,
May 13, 2016, 3:52:29 AM5/13/16
to django-dynamic-scraper
Hey everybody,

would be grateful for a concrete example on how to use JsonPath with DDS. 

Precisely I'm having the following questions:

Should I put the JsonPath into the Xpath field of my scraper in the django admin section when creating the scraper?
Do I have to change anything code wise in my spider? 
Python JsonPath rw was obviously installed on installation but do I have to change anything in my backend?

Tried putting simply:  $..* into the xpath section of my scraper and it didn't work.


Can not find any tutorials on this or even a stack overflow question.

Thanks a lot!


All the best,
Enrico

Holger Drewes

unread,
May 13, 2016, 5:21:37 AM5/13/16
to django-dyna...@googlegroups.com
Hi Enrico,
no, no code change on spider. Just put the JsonPath expressions (examples on https://github.com/jayway/JsonPath) in the XPath attribute fields.

Not sure about the $ sign, it's definitely possible to leave out, but I think it should also work with it as a root element.

E.g.

XPath: title

in some JSON

{
  "title": "Test 1"
}

gives: "Test 1"

Greetings

Holger




--
Sie erhalten diese Nachricht, weil Sie in Google Groups E-Mails von der Gruppe "django-dynamic-scraper" abonniert haben.
Wenn Sie sich von dieser Gruppe abmelden und keine E-Mails mehr von dieser Gruppe erhalten möchten, senden Sie eine E-Mail an django-dynamic-sc...@googlegroups.com.
Weitere Optionen finden Sie unter https://groups.google.com/d/optout.

Enrico Kunz

unread,
May 16, 2016, 12:48:59 PM5/16/16
to django-dynamic-scraper
Hi Holger,

thanks again for your time and for building DDS! 

However I'm still struggling with getting Jsonpath to work. 
One doubt I'm having despite my specific problem is how to define the base element when dealing with Jsonpath?


I'm trying to scrape this site where as you can see the Jsons are quite simply put in an array. 

My guess for the base element is that I should use one of the following:

1) Xpath: $..* 
2) Xpath: .*
3) Xpath: $[*] 
4) Xpath: $[1:400]

Everyone of these expressions returns all elements inside the Jsons I want to scrape using this Jsonpath evaluator. So they should be valid Jsonpath expressions. 

My guess though is, that I'm not selecting the Jsons for the base element this way, but all elements inside the Jsons which are created as an array like mentioned in this SO and secondly won't work as base elements since it's simply everything on the page and all elements won't serve as one encapsulating element. 
But I couldn't think of a better approach..

The following errors are related to the Jsonpath expressions above:

For 1): 
I'm getting a TypeError
TypeError: object of type 'int' has no len() 

File "/usr/local/lib/python2.7/dist-packages/dynamic_scraper/spiders/django_spider.py", line 421, in parse

    if(len(base_objects) == 0):


For 2): 
I'm getting an Exception

Exception: Parse error at 1:0 near token . (.)

File "/usr/local/lib/python2.7/dist-packages/jsonpath_rw/parser.py", line 69, in p_error

    raise Exception('Parse error at %s:%s near token %s (%s)' % (t.lineno, t.col, t.value, t.type))


For 3):
I'm getting an AttributeError
AttributeError: 'NoneType' object has no attribute 'linen'

File "/usr/local/lib/python2.7/dist-packages/jsonpath_rw/parser.py", line 69, in p_error

    raise Exception('Parse error at %s:%s near token %s (%s)' % (t.lineno, t.col, t.value, t.type)) 

For 4):
I'm getting an AttributeError

AttributeError: 'NoneType' object has no attribute 'linen'

File "/usr/local/lib/python2.7/dist-packages/jsonpath_rw/parser.py", line 69, in p_error

    raise Exception('Parse error at %s:%s near token %s (%s)' % (t.lineno, t.col, t.value, t.type))


Now, if I only put in the name of the object I'm trying to scrape without a base element:

like 

Xpath: company or 
Xpath: $..company (which using the Jsonpath evaluator mentioned above works as well)

I'm getting a DoesNotExist: ScraperElem matching query does not exist.

I believe a base element is not necessary for my case. But since it's required in DDS I'm running out of ideas of what to try next. Would be very thankful for any further tips on how to implement this.


Best,
Enrico

Holger Drewes

unread,
May 20, 2016, 6:09:55 AM5/20/16
to django-dyna...@googlegroups.com
Hi Enrico,
I just added a new unit test for your example to the library! :-) This is a bit special case since the JSON is coming directly as an array. I actually had to google if this is valid JSON, seems to be that it is:

The solution seems to be a lot simpler than one is anticipating, actually just taking "$" without any additions should do the trick (at least this worked in the unit test, didn't try via admin, but should be the same).

With the base element one is always going down to the list of items to actually scrape. Since the items in this case are directly located at the root element, you can directly reference the root element (actually using the root element otherwise is in many cases optional, e.g. "$.title" and "title" are equivalent).

Hope this works/helps!

Cheers
Holger

--

Enrico Kunz

unread,
May 21, 2016, 8:38:30 AM5/21/16
to django-dynamic-scraper
Thank you Holger! It doooes kind of work for me. Thing is, it only selects the first elements in the first Json but is not going through the array and outputs an error.
Not sure how to make this work.  But getting there
Wenn Sie sich von dieser Gruppe abmelden und keine E-Mails mehr von dieser Gruppe erhalten möchten, senden Sie eine E-Mail an django-dynamic-scraper+unsub...@googlegroups.com.

Holger Drewes

unread,
Jun 1, 2016, 7:30:52 AM6/1/16
to django-dyna...@googlegroups.com
Did you solve your issue?

Wenn Sie sich von dieser Gruppe abmelden und keine E-Mails mehr von dieser Gruppe erhalten möchten, senden Sie eine E-Mail an django-dynamic-sc...@googlegroups.com.

Weitere Optionen finden Sie unter https://groups.google.com/d/optout.

--
Sie erhalten diese Nachricht, weil Sie in Google Groups E-Mails von der Gruppe "django-dynamic-scraper" abonniert haben.
Wenn Sie sich von dieser Gruppe abmelden und keine E-Mails mehr von dieser Gruppe erhalten möchten, senden Sie eine E-Mail an django-dynamic-sc...@googlegroups.com.

Enrico Kunz

unread,
Jun 1, 2016, 8:51:23 AM6/1/16
to django-dynamic-scraper
Yeah! I just was a little embarrassed about it. Turned out I had the mandatory checkbox checked for an item I didn't see on my screen.
So for everyone struggling with 


AttributeError: 'NoneType' object has no attribute 'linen'

File "/usr/local/lib/python2.7/dist-packages/jsonpath_rw/parser.py", line 69, in p_error

    raise Exception('Parse error at %s:%s near token %s (%s)' % (t.lineno, t.col, t.value, t.type))



Obvisouly its complaining about a NoneType .. which might mean that you have the checkbox checked in your scraper somewhere


Thanks again for your help and for DDS!

Wenn Sie sich von dieser Gruppe abmelden und keine E-Mails mehr von dieser Gruppe erhalten möchten, senden Sie eine E-Mail an django-dynamic-scraper+unsub...@googlegroups.com.

Weitere Optionen finden Sie unter https://groups.google.com/d/optout.

--
Sie erhalten diese Nachricht, weil Sie in Google Groups E-Mails von der Gruppe "django-dynamic-scraper" abonniert haben.
Wenn Sie sich von dieser Gruppe abmelden und keine E-Mails mehr von dieser Gruppe erhalten möchten, senden Sie eine E-Mail an django-dynamic-scraper+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages