TakeFirst vs Returning multiple elements found

61 views

Skip to first unread message

Rakan Alhneiti

unread,

Oct 9, 2014, 11:06:08 AM10/9/14

to django-dyna...@googlegroups.com

Hello,

I've posted about scraping multiple images using DDS previously on this group. However, i discovered that if you're capturing an article (example) with a body inside multiple paragraph tags, you'll only get the first one. Digging through this detail, i found out that the DjangoSpider was defining:

1) _get_processors

procs = [TakeFirst(), processors.string_strip,]

as the default processors

and

2) self.loader.default_output_processor = TakeFirst()

Do you think this was a correct design decision given that it adds the restriction of ending up with 1 element only xpath or regex? I believe DDS should allow the flexibility of returning multiple elements and users would need to use TakeFirst in case they only need the first element.

Thanks,

Rakan

Naran Khetani

unread,

Aug 2, 2018, 5:03:35 AM8/2/18

to django-dynamic-scraper

Hi Rakan,

I know this was posted a long time ago, but I am in real need of help, I have a xpath which return multiple urls however DDS is only taking the first one any idea how I can get around this?

Reply all

Reply to author

Forward

0 new messages