Operation of Cookies

Skip to first unread message

ghislain borremans

Jun 20, 2021, 3:19:20 AM6/20/21
to Abot Web Crawler
When crawling, i notice on every page there is a cookie request. With the "IsSendingCookiesEnabled = true,", i thought this would disappear.
However, as the cookies are not confirmed ( this is normally done by the user in a screen version and not in the headless version ), how can it save the cookies at first ?
Can i access the site first with firefox, then save the cookies and then let the site be crawled so that the cookie question no longer appears ?
Is the user agent important for the recognition of the cookies ?


Jun 23, 2021, 8:10:21 PM6/23/21
to ghislain borremans, Abot Web Crawler
Can you give me a more concrete example or even better a very easy case to reproduce please? Ie...

I'm trying to crawl http://blah.com and... 
Through a browser i see this...
Abotx is only showing me this...

You received this message because you are subscribed to the Google Groups "Abot Web Crawler" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abot-web-crawl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/abot-web-crawler/63494d83-5abd-46d1-8327-1b0af5ce73c4n%40googlegroups.com.


Jul 1, 2021, 7:29:48 PM7/1/21
to ghislain borremans, Abot Web Crawler

IsSendingCookiesEnabled = true tells Abot/AbotX to resend any cookies that are returned in the http response from the crawled page.  Specifically the Set-Cookie header. The banner that you are talking about is shown until you click it (which then sets another cookie, i suspect). If you would like to not see the banner on the first request then you would need to see what cookie gets set when you click the button (in a regular browser) and manually add it by overriding the PageReqester.BuildHttpClientHandler method (adding it to the CookieContainer). I don't see a way to do this automatically (without digging deep into it).

Hope that helps

On Sun, Jun 20, 2021 at 12:19 AM ghislain borremans <ghislain...@gmail.com> wrote:
Reply all
Reply to author
0 new messages