TakeFirst vs Returning multiple elements found

61 views
Skip to first unread message

Rakan Alhneiti

unread,
Oct 9, 2014, 11:06:08 AM10/9/14
to django-dyna...@googlegroups.com
Hello,

I've posted about scraping multiple images using DDS previously on this group. However, i discovered that if you're capturing an article (example) with a body inside multiple paragraph tags, you'll only get the first one. Digging through this detail, i found out that the DjangoSpider was defining:
1) _get_processors
procs = [TakeFirst(), processors.string_strip,] 
as the default processors
and
2) self.loader.default_output_processor = TakeFirst()

Do you think this was a correct design decision given that it adds the restriction of ending up with 1 element only xpath or regex? I believe DDS should allow the flexibility of returning multiple elements and users would need to use TakeFirst in case they only need the first element.

Thanks,
Rakan

Naran Khetani

unread,
Aug 2, 2018, 5:03:35 AM8/2/18
to django-dynamic-scraper
Hi Rakan,

I know this was posted a long time ago, but I am in real need of help, I have a xpath which return multiple urls however DDS is only taking the first one any idea how I can get around this?
Reply all
Reply to author
Forward
0 new messages