Cannot scan a particular site - every url throws timeout / no content

Skip to first unread message

S Wilkinson

Mar 21, 2022, 11:12:08 PMMar 21
to Abot Web Crawler
Hi,  I'm using abot to crawl various sites and the vast majority work, however for one site, every request throws a cancellation/timeout error (therefore no content). I have logging turned up to Debug and I have no clues as to why this happens. I've changed my user agent string to be a typical Firefox browser string, my own string, and a google bot. I tried 25 second timeouts, 45 second timeouts, more threads, fewer threads, and the service point connection limits as well. All no luck with this one site. But hitting it in a browser or even the VS Code Thunder Client extension (a postman clone) all return nice 200 results and content. These exceptions happen deep in the crawler, how can I detect them and/or get some more info to debug this? Worst case at least I could log a warning the site isn't parsing. I can "dm" more detailed log or setup, but i'm confident that is ok since it works for 80+ other sites. Thanks for any help :)


[20:52:59 DBG] Error occurred requesting url [] {"TargetSite": null, "StackTrace": null, "Message": "Request timeout occurred", "Data": [], "InnerException": {"Task": null, "CancellationToken": {"IsCancellationRequested": true, "CanBeCanceled": true, "WaitHandle": "The property accessor threw an exception: ObjectDisposedException", "$type": "CancellationToken"}, "TargetSite": "Void MoveNext()", "StackTrace": "   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)\r\n   at System.Net.Http.HttpConnectionPool.SendWithNtConnectionAuthAsync(HttpConnection connection, HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)\r\n   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)\r\n   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)\r\n   at Abot2.Core.PageRequester.MakeRequestAsync(Uri uri, Func`2 shouldDownloadContent)", "Message": "The operation was canceled.", "Data": [], "InnerException": {"TargetSite": "Void ThrowException(System.Net.Sockets.SocketError, System.Threading.CancellationToken)", "StackTrace": "   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)\r\n   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)\r\n   at System.Net.Security.SslStream.<FillBufferAsync>g__InternalFillBufferAsync|215_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)\r\n   at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)\r\n   at System.Net.Http.HttpConnection.FillAsync()\r\n   at System.Net.Http.HttpConnection.ReadNextResponseHeaderLineAsync(Boolean foldedHeadersAllowed)\r\n   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)", "Message": "Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..", "Data": [], "InnerException": {"Message": "The I/O operation has been aborted because of either a thread exit or an application request.", "SocketErrorCode": "OperationAborted", "ErrorCode": 995, "NativeErrorCode": 995, "TargetSite": null, "StackTrace": null, "Data": [], "InnerException": null, "HelpLink": null, "Source": null, "HResult": -2147467259, "$type": "SocketException"}, "HelpLink": null, "Source": "System.Net.Sockets", "HResult": -2146232800, "$type": "IOException"}, "HelpLink": null, "Source": "System.Net.Http", "HResult": -2146233029, "$type": "TaskCanceledException"}, "HelpLink": null, "Source": null, "HResult": -2146233029, "$type": "HttpRequestException"}
Reply all
Reply to author
0 new messages