scraping book information on Amazon from ISBN list

Chris Brien

unread,

May 8, 2023, 1:46:11 AM5/8/23

to Web Scraping

Hi. I'm trying to scrape Lexile data for the books in our school's library to add to the database. I have a list of ISBN numbers but I can't figure out how to set up Parsehub to look at more than one book's page at a time. Is there a way to input the ISBN list with ParseHub and get the lexile data for each book in the list? Any help would be greatly appreciated.

Andrew11

unread,

May 8, 2023, 2:02:36 AM5/8/23

to Web Scraping

Sure, in Project Settings (gear icon @ top left) and under # workers put 1, and in Start Value put in something like {"ISBNs": ["234432", "4315231423X", ...]} and in the first template put a Loop command for each item in ISBNs. Inside the Loop put an Extract: ISBN with the word item as its value, and then a Go To Template pointing to https://hub.lexile.com/find-a-book/search and a 2nd template with Ignore duplicates unchecked (use 3 dots icon inside template tab). In the 2nd template put a Select CSS with Wait for and rooted selection checked and selector:

[name=quickSearch]

In the Select put an Input command and make sure Expression is chosen inside the Input options and the word ISBN as its value. Let me know if any problems...

Chris Brien

unread,

May 8, 2023, 4:10:44 AM5/8/23

to Web Scraping

Thanks, Andrew. I'd originally planned to scrape this data from amazon.com. It seems you are suggesting lexile.com instead, which is a good idea. That site has all the data I need. In that case, do I need to create a new project on https://hub.lexile.com/find-a-book/search and then follow your steps?

Andrew11

unread,

May 8, 2023, 1:50:17 PM5/8/23

to Web Scraping

I guess so... you can start a project on about:blank and then you can do the main template loop from there. Lexile's probably more reliable to scrape than Amazon anyway, who doesn't like that very much. You might save some page loads if after the search the text box is still there on the next page, by restructuring the steps a bit.

Chris Brien

unread,

May 8, 2023, 8:56:07 PM5/8/23

to Web Scraping

Ok. When adding the Go To Template, I'm getting an "unexpected end of input" error on the URL. I've attached a screenshot.

Andrew11

unread,

May 8, 2023, 10:16:00 PM5/8/23

to Web Scraping

Sorry about that -- it needs double quotes around it. Thats because you can also Extract a variable and put its name in the Go to URL box, and that time the variable name is without quotes.

Chris Brien

unread,

May 8, 2023, 11:13:17 PM5/8/23

to Web Scraping

No problem. I appreciate all the help as I work through the learning curve. This is clearly a powerful and intuitive tool. I just need to get better at using it.

In the last part of your original instructions, you wrote, " In the 2nd template put a Select CSS...." From what I've read, the Select CSS is done by editing an existing select or relative select command (https://help.parsehub.com/hc/en-us/articles/226142548-Using-CSS-to-select-elements). Which element should I first select? The search page does not have the Lexile score anywhere on it. To see that, you have to be on the page for a particular book, but I don't know how to tell ParseHub to look select from that result page. I also ran into this problem when following the tutorial on entering a list of keywords into a search box (https://help.parsehub.com/hc/en-us/articles/217736068-Enter-a-list-of-keywords-into-a-search-box). This is what I have so far:

Andrew11

unread,

May 8, 2023, 11:37:29 PM5/8/23

to Web Scraping

It doesn't really matter which element you select with your mouse before switching to CSS mode, it'll be discarded anyway. I like to click the download icon if any that appears on the Select after an element is clicked on, so that it expands whatever's there, because I seemed to find a bug once that it fixed to do that. Anyway after you get the Input inside the quickSearch CSS Select, highlight that CSS select, press CTRL-C to copy and then highlight the Select page and press CTRL-V to paste, then in the new Select command remove the Input and change the CSS to:

.search[role=button]

Inside that Select put a Click command and have it open a new page and go to a new template, which should open on the book itself as seen when entering in your web browser the ISBN 9780385379052 and clicking on the magnifying glass.

Chris Brien

unread,

May 9, 2023, 12:40:49 AM5/9/23

to Web Scraping

Thanks again, Andrew. I'm getting closer. On a Test Run, it now successfully cycles through each book's detail page, but I'm still not able to extract the Lexile from the detail page. I tried manually navigating to a detail page (https://hub.lexile.com/find-a-book/book-details/9780316201544) while on the Book Detail template and selecting the Lexile, but it didn't extract and doesn't show in the csv. How do I tell ParseHub to extract the lexile from the detail page?

Andrew11

unread,

May 9, 2023, 9:15:48 AM5/9/23

to Web Scraping

Are you just scraping the 760L part? That one looks like this CSS might work:

.header-info span:first

Chris Brien

unread,

May 10, 2023, 4:29:11 AM5/10/23

to Web Scraping

I was hoping to also scrape other data, such as age range and categories, but I'd be happy if I could get it to work for the 760L part. Where would I put that CSS?

Melissa Edwards

unread,

May 10, 2023, 8:34:09 AM5/10/23

to Web Scraping

Hi Chris,

For this particular website, you can use the ParseHub selector to select the elements you want to scrape from. There is no need to use CSS

Andrew11

unread,

May 10, 2023, 9:18:38 AM5/10/23

to Web Scraping

That might not work -- they have series of words made of random letters in the class attribute. Just put the CSS Select in the template that opens when you click on the magnifying glass, You might want to add a Begin new entry inside the 1st template with the Go to 2nd template inside of the New entry command. Extract ISBN should be in the Begin new entry also.

Reply all

Reply to author

Forward