scrape name and address from the website

19 views
Skip to first unread message

Jerry Wu

unread,
Jun 10, 2014, 7:07:35 AM6/10/14
to scrapy...@googlegroups.com
Hello,

I am a newbie to scrapy (and have little programming background). I want to learn scrapy fast and efficiently and believe start a project is the best way to learn. English is not my mother language so it sometimes makes me feel difficult. But I am trying my best to understand what I read on Tutorial and Stackoverflow. I hope I could be more pythonic and think like most scrapy users think. So if you have any suggestion, please feel free to let me know. If you come to Shanghai someday, I am very glad to buy you a cup of coffee and take you around.

Here is the project I am working on. I want to scrape down name and address pair from the link: http://www.lawson.com.cn/store/ . Here is my roadmap:
1. get links which would be scraped later with rules: for example: http://www.lawson.com.cn/store/shanghai/west/west01/
2. for each links I scrape, I call def parse_shop to deal with it. "店名" means the name and “地址” means address. I also use regular expression for the address.

Above two steps are fine for me. However, when I export result to csv file, I found there are quite a few duplicates. I add a class in pipeline.py and activiate it according to tutorial but doesn't work. What I got is: exceptions.KeyError: 'id'. I have no idea what to do with it.

My code is below. Any thoughts are welcomed.

Tiago Natel de Moura

unread,
Jun 10, 2014, 7:28:46 AM6/10/14
to scrapy...@googlegroups.com
Hi Jerry,

Your problem is in line 11 of file pipeline.py. You're using self.ids_seens when it should be self.ids_seen.

Cheers!


--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Jerry Wu

unread,
Jun 10, 2014, 8:17:12 AM6/10/14
to scrapy...@googlegroups.com
Hi Tiago,

I fixed this typo but still doesn't work :(


--
You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/t8UGQDknMw8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users...@googlegroups.com.

To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.



--
Best Regards.

Jerry Wu

Life is short. Change is possible : )


Reply all
Reply to author
Forward
0 new messages