What to Do about AppSpot App that is stealing copyrighted material?

299 views
Skip to first unread message

Norm Deplume

unread,
Apr 13, 2012, 11:18:05 AM4/13/12
to Google App Engine
This is especially frustrating. Someone created a proxy server on
Google's App Engine platform here: wapfree-ec.appspot.com

Unfortunately, they did not configure this proxy server correctly with
<META ROBOTS> tags to inform search engines NOT TO INDEX the proxied
content.

As a result, this site is unintentionally STEALING content and getting
it indexed in Google Search. Not just my site either...93,000 pages
of other people's content.
Go to google and type this: site:wapfree-ec.appspot.com you'll see
thousands of pages of OTHER PEOPLE'S CONTENT. ON A GOOGLE HOSTED
SITE!

What's frustrating is that I tried to solve this "the right way", by
filing a DMCA removal request. Unbelievably, the response from
Google was this:

"The DMCA notice sent below is for an application that is a web proxy.
The
content in question is not stored on the application, but instead is
simply being pulled from the original source and forwarded to the
user."

That's completely stupid. How is the manner in which the content was
stolen, published, and INDEXED by Google search of any conseqence?
Am I allowed to create a web proxy that copies content from anyone I
want, and get it all indexed in Google?

Tell me there's someone at Google that understands this problem!

Brandon Wirtz

unread,
Apr 17, 2012, 9:29:07 PM4/17/12
to google-a...@googlegroups.com
You can block URL Fetches from GAE by checking the request headers which
will identify as both GAE and the Application ID that is making the request.


Jeff Schnitzer

unread,
Apr 17, 2012, 9:39:12 PM4/17/12
to google-a...@googlegroups.com
On Fri, Apr 13, 2012 at 11:18 AM, Norm Deplume <kerry....@gmail.com> wrote:
>
> Tell me there's someone at Google that understands this problem!

In this case, you're the one that misunderstands the problem. Proxies
are part of the fabric of the internet, and aren't stealing your
content any more than the routers carrying the packets of this email
message.

It's unfortunate (or funny) that somehow the proxy ranks higher in
search engines than your site does, but you have the complete ability
to fix this yourself. You don't need Google's help.

The simple solution is to block requests from wapfree-ec. All
urlfetches from GAE include the appid in the User-Agent header.
Viola, problem solved. If you want to be heavy-handed, you can block
all requests from GAE.

If you want to be really clever about it, serve different content to
wapfree-ec. Like, say, a blank page with a big link pointing at your
website. You might improve your own SEO juice that way.

Last of all, stop blaming other people for your mistakes.

Jeff

Jeff Schnitzer

unread,
Apr 17, 2012, 9:41:32 PM4/17/12
to google-a...@googlegroups.com
Hey, who said you get to play good cop this time??

Jeff

On Tue, Apr 17, 2012 at 9:29 PM, Brandon Wirtz <dra...@digerat.com> wrote:
> You can block URL Fetches from GAE by checking the request headers which
> will identify as both GAE and the Application ID that is making the request.
>
>
>
>

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>

Brandon Wirtz

unread,
Apr 17, 2012, 10:35:21 PM4/17/12
to google-a...@googlegroups.com

> Hey, who said you get to play good cop this time??

I always play the good cop. I beat the dumb people, they get smarter long
term. It's all about how long it takes to see that I'm right. ;-)

Oh you meant the "Nice" cop. Yeah well, I figure this guy is just passing
by. He won't be here in a week...


Joshua Smith

unread,
Apr 18, 2012, 10:10:33 AM4/18/12
to google-a...@googlegroups.com
Another good idea would be to put canonical links at the top of each of your pages:

http://en.wikipedia.org/wiki/Canonical_link_element

Barry Hunter

unread,
Apr 18, 2012, 10:39:45 AM4/18/12
to google-a...@googlegroups.com
On Wed, Apr 18, 2012 at 3:10 PM, Joshua Smith <Joshua...@charter.net> wrote:
> Another good idea would be to put canonical links at the top of each of your pages:
>
> http://en.wikipedia.org/wiki/Canonical_link_element

Have to be careful there tho, as many proxies, will blindly rewrite
all links in a page to point to themselves (usually as relative links)

... I just checked. wapfree-ec alters the canonical tag.

So would have to find a way to defeat their link parser, but still
write it in such a way that search engines would still understand!

Joshua Smith

unread,
Apr 18, 2012, 11:06:35 AM4/18/12
to google-a...@googlegroups.com
If it rewrites the links then, I'm sorry, it *isn't* a proxy server.

The theory that makes proxy servers OK under DMCA is that they are a § 512(a) Transitory Network Communications Safe Harbor

One of the requirements for that case is that the content " is not modified in any way,"

So I would recommend the the OP alert google that the app in question is modifying their content, creating a derivative work without authorization, and must be removed under DMCA.

-Joshua

Jeff Schnitzer

unread,
Apr 18, 2012, 12:35:14 PM4/18/12
to google-a...@googlegroups.com
On Wed, Apr 18, 2012 at 11:06 AM, Joshua Smith <Joshua...@charter.net> wrote:
>
> So I would recommend the the OP alert google that the app in question is modifying their content, creating a derivative work without authorization, and must be removed under DMCA.

He could argue about it for weeks with lawyers, or he could fix it
himself today. Only one of these approaches will actually solve the
problem.

Jeff

Joshua Smith

unread,
Apr 18, 2012, 12:58:49 PM4/18/12
to google-a...@googlegroups.com
I doubt very much that he would need to argue about it for weeks with lawyers.

And although he can certainly defend himself now, that does nothing to help others who are being copied by the same not-really-a-proxy-server, and it doesn't do anything about his search results, which could take weeks or months to recover.

-Joshua

Jeff Schnitzer

unread,
Apr 18, 2012, 1:44:51 PM4/18/12
to google-a...@googlegroups.com
On Wed, Apr 18, 2012 at 12:58 PM, Joshua Smith <Joshua...@charter.net> wrote:
> I doubt very much that he would need to argue about it for weeks with lawyers.

Good luck with that.

Jeff

Geoffrey Spear

unread,
May 7, 2012, 10:23:20 AM5/7/12
to google-a...@googlegroups.com


On Wednesday, April 18, 2012 11:06:35 AM UTC-4, Joshua Smith wrote:
If it rewrites the links then, I'm sorry, it *isn't* a proxy server.



A proxy server that doesn't rewrite links would be completely unusable; you wouldn't be able to navigate at all (or, e.g., load images, CSS, or javascript) without it being unproxied (and, if you need to be using a proxy server, thus probably blocked entirely).

Joshua Smith

unread,
May 7, 2012, 10:56:27 AM5/7/12
to google-a...@googlegroups.com
A proxy server is a computer that you configure your browser to tunnel HTTP requests through. It does not change the content it is forwarding. It will often cache requests. Proxy servers are within the DMCA "safe harbor" provisions, because they don't change anything.

A web site that reproduces another site is not a proxy server. It's something else. A cache, perhaps? Under 512(b)2A, a cache is only a safe harbor if it passes data "without modification to its content". What does that mean, exactly? Well one judge decided google cache was OK, but didn't explain why. IMHO, I think the judge played a little fast and loose with the law, because the way it is written, what google does is really not OK.

You can read more about these issues here: http://www.plagiarismtoday.com/2007/01/16/debunking-the-dmca-caching-loophole/
> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/-xtNUYSN88EJ.

Jeff Schnitzer

unread,
May 7, 2012, 2:45:14 PM5/7/12
to google-a...@googlegroups.com
On Mon, May 7, 2012 at 10:56 AM, Joshua Smith <Joshua...@charter.net> wrote:
>
> You can read more about these issues here: http://www.plagiarismtoday.com/2007/01/16/debunking-the-dmca-caching-loophole/

That article strongly suggests that a proxy which rewrites URLs so
that content is faithfully reproduced to the end-user would qualify.
Just as Google Cache does. And let's be honest, a ton of "content" is
rewritten in any transmission in the form of network packet headers,
HTTP headers, etc.

It should be plainly obvious that the issues here are not clearcut and
there will be nothing gained by waiting for a political solution. Use
the readily available technical solution or give up.

Jeff

Joshua Smith

unread,
May 7, 2012, 3:10:22 PM5/7/12
to google-a...@googlegroups.com
The article also quite explicitly says that if you set up your (not really a) proxy server to serve a particular site, then you do not have any protection from copyright violation under DMCA.

My reading of the law and the EFF stuff is consistent with this interpretation.

The original poster wants google to take action, which they are required to do to maintain their "safe harbor" status. I think he's right.

-Joshua
> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.

Jeff Schnitzer

unread,
May 7, 2012, 9:49:19 PM5/7/12
to google-a...@googlegroups.com
On Mon, May 7, 2012 at 3:10 PM, Joshua Smith <Joshua...@charter.net> wrote:
> The article also quite explicitly says that if you set up your (not really a) proxy server to serve a particular site, then you do not have any protection from copyright violation under DMCA.
>
> My reading of the law and the EFF stuff is consistent with this interpretation.
>
> The original poster wants google to take action, which they are required to do to maintain their "safe harbor" status. I think he's right.

It's been a long time since I looked at the domain in question
(wapfree-ec.appspot.com) and it's over quota right now, but IIRC it
was not specific to the OP's site. The OP even mentions this. The
proxy just happened to rank higher in Google that the original
content, probably due to the same technical ineptitude that caused him
to rant here on this list.

Jeff

Joshua Smith

unread,
May 8, 2012, 9:38:39 AM5/8/12
to google-a...@googlegroups.com
Oh, well that does make a difference.

However, I do think google needs to look carefully at this trend of people setting up website clones on app engine. The Blake Field v. Google Cache case was a very narrow decision by a single judge. It didn't set any particular precedent, and it really is not OK to set up scraper sites under US law.

-Joshua

Norm Deplume

unread,
Mar 19, 2013, 11:19:27 AM3/19/13
to google-a...@googlegroups.com, je...@infohazard.org
Jeff Schnitzer: "Last of all, stop blaming other people for your mistakes."
Jeff Schnitzer: "The proxy just happened to rank higher in Google that the original content, probably due to the same technical ineptitude that caused him to rant here on this list. "

Your site is happily cached by these misconfigured proxies as well.  Guess you're just as inept as I am.


As for the site ranking higher than mine...turns out it was a coordinated negative seo attack.  Somebody founds lots of proxies that didn't have a "<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> tag inserted.   Appengine was just one platform used.   The attacker also created backlinks to help the proxied content get indexed.   For appspot specifically, Google seems to assign a fairly high level of trust, and crawls the subdomains very quickly.  So, it would find new articles that I was publishing on the proxy first.

Am I "inept" because I didn't see that coming?  

Why should owners of misconfigured proxies, that rewrite canonical tags, and don't bother instructing search engines to NOINDEX cached content be immune from DMCA requests? 
Reply all
Reply to author
Forward
0 new messages