getting binary info. How do I get the html content?
Also, trying to use the advanced options for the 30 day demo, but getting an error that I don't have a license. How do I demo without the license?
Thanks
David
sjdirect
unread,
Jan 1, 2021, 4:07:18 PM1/1/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Abot Web Crawler
Hi,
Do you have a simple code snippet or unit test that demonstrates your issue? Also, can you verify if you put in another website url with the same code you get the same issue? I suspect Amazon is blocking your requests since it's against their Usage Policy to crawl their pages. You are likely being blocked by a WAF. As far as the advanced features you likely just need to make sure your license file ends up in the correct location and is available during execution. What you are describing is almost certainly a side effect of this.
David Hansen
unread,
Jan 1, 2021, 4:28:59 PM1/1/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Abot Web Crawler
Hi, I pulled the Abot2 (2.0.67) and AbotX 2.1.8 from nuget. I copied your examples. and just included the url for amazon.com above and was getting the binary content. I copied your code exactly with only adding www.amazon.com as the url.
For the trial licence, where do I find it. I searched everywhere, and can't locate it.
Thanks
David
David Hansen
unread,
Jan 1, 2021, 5:01:57 PM1/1/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Abot Web Crawler
One other update. When using WebClient.DownloadString on same url, then it retrieves the page content as text (not binary). Does it have to do anything with page encoding?
Thanks
David
sjdi...@gmail.com
unread,
Jan 2, 2021, 11:18:48 AM1/2/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to David Hansen, Abot Web Crawler
Hi David,
What example code are you using specifically (give me a link to the page or send a snippet)? I can't make assumptions here as there could be 1000 variations. If you use the same code but change the url to a few other sites/urls does it have the same issue? Abot/AbotX doesn't use WebClient internally so that test isn't helpful in pointing out the problem. The trial license is sent through email after signing up here.
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to David Hansen, Abot Web Crawler
Hi David,
Glad you were able to work through most of the issues. I'm not having any issues with the url for amazon shredder. When you say "binary data" are you looking somewhere other than args.CrawledPage.ContentText shown below? Maybe you haven't set a JavascriptRenderingWaitTimeInMilliseconds? Here is a unit test that demonstrates it, your phantomjs exe might be slightly different.