Hi,
The purpose of captcha is to keep out automated scraping.
By defeating that purpose, we trigger a nuclear arms race which will end in one of these outcomes:
- The institution will (at great cost) achieve a way to effectively make scraping impossible.
- The institution shall shut down its public information portal and declare that please use offline methods.
- The institution shall push for government to hunt down and prosecute those who scrape it and will "make an example" by destroying some peoples' lives.
Just sharing the logical conclusions for your kind consideration.
In one scraping activity in the past I had followed a hybrid approach : Through automated keyboard strokes, I could automate everything upto the captcha stage, then I could type in the captcha manually, then start off another set of keystrokes. It sped up the process by a lot.
I was using "AutoHotKey" in windows years ago; there have been a lot of innovations since then in this kind of automating. This approach is also subject to its own arms race of course, but less chance of going nuclear, because with this you're not crashing the institution's servers.
Regards
Nikhil VJ