In the event that anyone else has a similar issue, this is how I approached it.
Firstly, make a histogram of the number of pixels with each intensity (so an array of 256 numbers).
When you inspect this you get results like the below.
This is after a little smoothing and taking the log of the values.
You can see that the properly blank pages show little or no very dark (black) pixels, whereas the pages with some text, even if a small amount have a fair number.
I simply set a cutoff level (in this case 1) and a cutoff intensity (in my case 80), so providing the first peak of 1 of the log smoothed intensity is below 80 it is text, otherwise it is blank.
You can also see the problem which tesseract has (with default binarisation) in that the intensity is distinctly bimodal. I think this is due to bleedthrough from the reverse of the page. Of course that is essentially what OTSU uses pick out 'black' from 'white'.
Iain