Hi,
I am trying to crawl secured site content with basic authentication. If I understand correctly, I either can use Selenium with custom navigation filter to login first or perform curl post on login url to generate cookie and transfer to outlinks.
I am trying the second option. I wrote a simple bat/sh file that generates cookie.txt. I have following config to set metadata.transfer. But my question is how do i read cookie from this file and pass it? Do I have to pass as value for set-cookie key? If so, where do I do that, should I create my custom Protocol. Can I just run the curl command and start the crawl?
I am new to StormCrawler and I really like the power of crawler with Elastic Search combo. Any help is appreciated.
metadata.transfer:
- set-cookie
http.use.cookies: true
http.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"
https.protocol.implementation: "com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol"
thanks
ravi