scrapy crawl NewsBlog -o Items_File.csv -t csv
This web page gets refreshed every few hours with New Articles (Blogs). If I run the above command I get duplicate items in the csv as each time the entire set of items are appended to the csv file. I don't want the duplicates in the csv. I am not using pipelines to filter duplicates as the duplicates are not there in the blog. I am not deleting the csv each time as old articles disappear from the blog url page. Its a fair to assume that the combination of date and heading is unique in the item fields.
Please suggest a viable solution for storing only the unique items in the csv.
--
You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/x_R1HAqoySU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.