Update saved item continously if exists in database

49 views
Skip to first unread message

Boy Sandy Gladies Arriezona

unread,
May 4, 2017, 11:42:18 PM5/4/17
to django-dynamic-scraper
Hi, I scrap e-commerce products for research.
But, sometimes those products are updated.
Can DDS update the scraped data if it already exist in database?

* I cannot found anything in documentation.

k bez

unread,
May 5, 2017, 1:26:23 AM5/5/17
to django-dynamic-scraper
I do what u want in pipeline module with this logic
if key already exists then update data else create new

Boy Sandy Gladies Arriezona

unread,
May 5, 2017, 2:33:43 AM5/5/17
to django-dynamic-scraper
Can you tell me how do I know whether the key is already exist and how to update the existing data?
I really appreciate it if you can give me some example.

This is my pipelines:
ITEM_PIPELINES = {
'dynamic_scraper.pipelines.DjangoImagesPipeline': 200,
'open_product.scraper.pipelines.PriceConverterPipeline': 300,
'dynamic_scraper.pipelines.ValidationPipeline': 400,
'open_product.scraper.pipelines.DjangoWriterPipeline': 800,
}

This is the pipeline which save my data into DB:
class DjangoWriterPipeline(object):
def process_item(self, item, spider):
if spider.conf['DO_ACTION']:
try:
item['source_detail'] = spider.ref_object
checker_rt = SchedulerRuntime(runtime_type='C')
checker_rt.save()
item['checker_runtime'] = checker_rt

item.save()
spider.action_successful = True
spider.log("Item saved to Django DB.", logging.INFO)

except IntegrityError as e:
spider.log(str(e), logging.ERROR)
raise DropItem("Missing attribute.")

return item
Message has been deleted

k bez

unread,
May 5, 2017, 11:44:00 AM5/5/17
to django-dynamic-scraper
if spider.conf['DO_ACTION']:
try:
obj = MyDBTable.objects.get(url=spider.ref_object.url)
if getattr(obj, 'MyDBField')) != item['ScrapedField']:
SET DB Fields Values
obj.save()
spider.action_successful = True
spider.log("Item updated.", logging.INFO)
else:
spider.action_successful = False
raise DropItem("Item duplicated.")

Holger Drewes

unread,
May 5, 2017, 12:16:00 PM5/5/17
to django-dyna...@googlegroups.com
You can define the fields you want to get updated as STANDARD (UPDATE) fields for your SCRAPED_OBJECT_CLASS in the Django admin. For objects with the same scraping identity (all ID_FIELDS from SCRAPED_OBJECT_CLASS have the same values), these fields will then be updated.

Cheers
Holger

--
Sie erhalten diese Nachricht, weil Sie in Google Groups E-Mails von der Gruppe "django-dynamic-scraper" abonniert haben.
Wenn Sie sich von dieser Gruppe abmelden und keine E-Mails mehr von dieser Gruppe erhalten möchten, senden Sie eine E-Mail an django-dynamic-scraper+unsub...@googlegroups.com.
Weitere Optionen finden Sie unter https://groups.google.com/d/optout.

Boy Sandy Gladies Arriezona

unread,
May 7, 2017, 2:04:43 AM5/7/17
to django-dynamic-scraper
I see, I miss that part when I read the document.
Thank you for your explanation Holger. 
Reply all
Reply to author
Forward
0 new messages