How to avoid the alphabetical order for storing elements in dictionary in scrapy python

266 views
Skip to first unread message

shiva krishna

unread,
May 22, 2012, 8:51:55 AM5/22/12
to scrapy...@googlegroups.com

I am working on scrapy, i fetched some items from website and storing them in json files.

My items.py code is :

job_title = Field()
full_or_part_Time
= Field()
location_affiliates
= Field()
department
= Field()
requisition_number
= Field()

Actually after fetching , the items stored in the json file are in the following format

{"full_or_part_Time": ["Full Time"], 
 
"department": ["808 - Spons Prj Accounting"],
 
"requisition_number": ["12-1407456"],
 
"job_title": ["Accountant"],
 
"location_affiliates": ["Mount Sinai Medical Center (Manhattan)"]}

But i want to save the items in the format i declared in the items.py file. Can anyone please let me know how to arrange in the declared format.

Thanks in advance.

Norman Rosner

unread,
May 22, 2012, 12:26:37 PM5/22/12
to scrapy...@googlegroups.com
Hi,

as I remember an Item in scrapy is basically a
dictionary.(https://github.com/scrapy/scrapy/blob/0.14.4/scrapy/item.py).
The python default dict or UserDict/DictMixin in that case doesn't
track the order of insertion.

A quick look into the JsonItemExporter
(https://github.com/scrapy/scrapy/blob/0.14.4/scrapy/contrib/exporter/__init__.py):

While exporting to JSON the exporter creates a normal python dict from
the serialized fields of the item and uses the standard python json
encoder to serialize to json.

So there are several pieces involved and I guess that you would have
to do something like this (with OrderedDict in mind):

a) Write a custom Item class based on scrapys own Item class and use
it throughout your scraping
b) Write a custom exporter that behaves how you want it to and figure
out how to use that exporter in the scrapy settings.

Nonetheless I don't quite understand why you would want to have the
order. You export your items as json, that means that you can easily
decode them into a dict and than access them with the desired keys:

import json

items = json.loads(items_as_json_string)
first_item = items[0]
first_item['department']

Hope that helps,

norman
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to
> scrapy-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/scrapy-users?hl=en.

shiva krishna

unread,
May 23, 2012, 3:48:02 AM5/23/12
to scrapy...@googlegroups.com
Thanks very much for your reply
Reply all
Reply to author
Forward
0 new messages