Error i'm getting using Cheerio module. Is there a workaround?

67 views
Skip to first unread message

Marc Juretus

unread,
Jun 3, 2014, 8:10:01 AM6/3/14
to nod...@googlegroups.com


First off I love the module works great for most part. Had one question. One of the sites i'm trying to read some data in from gives me this error. 

"Anonymous Browser error #80040156"

Have you seen this before and is there a workaround for it? 

Thanks in advance


MaRc

Jake Wolpert

unread,
Jun 3, 2014, 12:13:18 PM6/3/14
to nod...@googlegroups.com
Cheerio just uses what you pass it. Are you sure the page that you passed to cheerio doesn't have that text in it?

Googling for the error, it seems to come up on porn sites that were scraped by other "porn collector" sites. I'll guess that the producers of the original content don't want their content scraped. They're checking headers!



--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
To post to this group, send email to nod...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/541c30f6-9840-4971-bb23-9252e66f7633%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Warren Lindsey

unread,
Jun 3, 2014, 12:49:24 PM6/3/14
to nod...@googlegroups.com
Set your own user agent header. If you're using the "request" module it's very easy. Some sites also check referrer to ensure you're not deep linking. 
--

Marc Juretus

unread,
Jun 3, 2014, 8:37:37 PM6/3/14
to nod...@googlegroups.com
Do you have a link to an example on how to set the User Agent Header?

Jake Wolpert

unread,
Jun 3, 2014, 10:42:46 PM6/3/14
to nod...@googlegroups.com
var http = require('http');

var options = 
{
    host: 'www.someserver.com',
    headers: {'user-agent': 'Mozilla/5.0'},
    path: '/test.cgi'
};

http.get(options, function(res) {
passes the simple user agent, Mozilla/5.0. You can choose any from UserAgentString.com - List of User Agent Strings

Jake

Reply all
Reply to author
Forward
0 new messages