Selecting Jsons within unnamed array not elements inside the Jsons

179 views
Skip to first unread message

Enrico Kunz

unread,
May 17, 2016, 2:05:54 PM5/17/16
to JsonPath
Hi everyone,

I'm having a very hard time figuring out how to scrape an unnamed array that contains Jsonfiles. The implementation I'm using is build around the django dynamic scraper which relies on scrapy and Jsonpath. 
My question is how to get an encapsulating Jsonpath that will hold all the Jsons in the array and not the elements of the Jsons itself?

The structure I'm faced with is the following

[
  {value one: value one
   value two: value two
    },
  {value one: value one
   value two: value two
   },
]

I'm looking for a Jsonpath that selects only the Jsons and not the elements within them.

The site I'm trying to scrape is this one.

All 4 Jsonpaths below give me all elements of all Jsons but I can't think of one that holds the Jsons itself. 
The background is that Django Dynamic Scraper relies on a concept called Base element which in terms of Xpath is the element that is shared by all elements one tries to scrape. 
So I'm trying to figure out what the Json equivalent is but I'm not getting anywhere.

Any ideas would be great!

1) Xpath: $..* 
2) Xpath: .*
3) Xpath: $[*] 
4) Xpath: $[1:400]



For 1): 
I'm getting a TypeError
TypeError: object of type 'int' has no len() 

File "/usr/local/lib/python2.7/dist-packages/dynamic_scraper/spiders/django_spider.py", line 421, in parse

    if(len(base_objects) == 0):


For 2): 
I'm getting an Exception

Exception: Parse error at 1:0 near token . (.)

File "/usr/local/lib/python2.7/dist-packages/jsonpath_rw/parser.py", line 69, in p_error

    raise Exception('Parse error at %s:%s near token %s (%s)' % (t.lineno, t.col, t.value, t.type))


For 3):
I'm getting an AttributeError
AttributeError: 'NoneType' object has no attribute 'linen'

File "/usr/local/lib/python2.7/dist-packages/jsonpath_rw/parser.py", line 69, in p_error

    raise Exception('Parse error at %s:%s near token %s (%s)' % (t.lineno, t.col, t.value, t.type)) 

For 4):
I'm getting an AttributeError

AttributeError: 'NoneType' object has no attribute 'linen'

File "/usr/local/lib/python2.7/dist-packages/jsonpath_rw/parser.py", line 69, in p_error

    raise Exception('Parse error at %s:%s near token %s (%s)' % (t.lineno, t.col, t.value, t.type))



Thanks,

Enrico

Reply all
Reply to author
Forward
0 new messages