How to get text from alt img?

597 views
Skip to first unread message

kareld...@gmail.com

unread,
Dec 11, 2015, 10:56:49 AM12/11/15
to Web Scraper
Hi all,

With this great scraper tool, I am trying to extract the text that is 'stored' in the img alt. The HTML looks as follow:

<div class="jum-promotion jum-mediumbadge"><img alt="THIS IS WHAT I NEED" src="http://www.blahblah.com/blah"></div>

I assume this has to be done with Element Attribute, but I am not sure what to fill out where (selector, etc). It is also a bit hard to test because only a few products on the site have this extra text and it takes a long time before it reaches them. 

Thanks in advance for your suggestions.

Regards,
Karel

dsav

unread,
Dec 14, 2015, 7:31:56 AM12/14/15
to Web Scraper
Hello!

Use a Selector of type Element Attribute. In the field Attribute name write alt 

Example (use Create new sitemap / Import sitemap) of getting google logo image attributes:
{"startUrl":"https://groups.google.com/forum/#!topic/web-scraper/fq18Dp2vjGM","selectors":[{"parentSelectors":["_root"],"type":"SelectorElementAttribute","multiple":false,"id":"image_attribute_height","selector":"a.google-logo img","extractAttribute":"height","delay":""},{"parentSelectors":["_root"],"type":"SelectorElementAttribute","multiple":false,"id":"image_attribute_width","selector":"a.google-logo img","extractAttribute":"width","delay":""},{"parentSelectors":["_root"],"type":"SelectorElementAttribute","multiple":false,"id":"image_attribute_src","selector":"a.google-logo img","extractAttribute":"src","delay":""}],"_id":"webscraper_helper"}

пятница, 11 декабря 2015 г., 18:56:49 UTC+3 пользователь kareld...@gmail.com написал:

kareld...@gmail.com

unread,
Dec 14, 2015, 1:55:34 PM12/14/15
to Web Scraper
Thanks! This helped me find the right settings. I ended up using 

selector: div.jum-promotion.jum-mediumbadge
attribute name: img.alt

(although I also remember having seen the img-term at the end of the selector field...)

Karel


Op maandag 14 december 2015 13:31:56 UTC+1 schreef dsav:

kareld...@gmail.com

unread,
Dec 15, 2015, 2:10:25 AM12/15/15
to Web Scraper
I double-checked: the selector indeed has img at the end (no spaces or periods) and the attribute name only has alt. Just in case someone wondered!

Karel


Op maandag 14 december 2015 19:55:34 UTC+1 schreef kareld...@gmail.com:
Reply all
Reply to author
Forward
0 new messages