Based on a quick glance at that example, there's a direct analog for everything they did in that example. There are a lot of ways to do document extraction. How easy/well it works largely depends on how many assumptions you can make. For example, you could estimate the background color then use a color histogram to filter out the background. I'm actually surprised this example worked as well as it did considering how noisy the line extraction step was.