I recently purchased the book "Scraping for Journalists" by Paul Bradshaw, the second edition. I was following the chapter on using OpenRefine to extract data and I came into an error. I've attached pictures with my results following along and included an image from the book of what I should see compared to what I actually get.
Summary: The first thing the book had us to was create a google spreadsheet of a series of URLs and then we created the project in OpenRefine by importing that file. We basically put together two values to create a single URL, and those urls did go to working pages. For instance:
http://www.nhs.uk/Services/hospitals/Overview/DefaultView.aspx?id=2205 is the first URL in the series and it loads fine. The next step was to edit the new url column using the "add column by fetching URLs" section. The book instructs to name the column (in this case "HTML") but leave the expression box as just "value" (which is the default setting). We are supposed to click OK, and the result should be a new column of the raw HTML of each link. Instead, I get a column of blank cells. I searched some online discussions and other tutorials to see if there was any answer, but I didn't find anything. It either wasn't the same situation, or I wasn't understanding it properly (fairly new to coding and all of this). I decided to try the next step, which was edit the new HTML column with "add column based on this column" (the same process we used to make the column of URLs in the first place). The image "What the book says I should have.jpg" shows I should already see columns filled with the html code already there and the formula is meant to narrow that down to what we're looking for in the book's example. However, what I have is a column of still empty cells.
What am I doing wrong?