DMCA doesn't look like its applicable, Google can store things in their cache without claims about copyright, content is not actually stored on the proxying site.
But what to do if a proxy stores other sites in the Google cache, as if it was theirs?
Can we have a simple reporting form for this? The symptoms are al too easy to recognize!
You can only block the ip of the broxy as the only definitive defense. You cannot detect in any definitive manner the use of a proxy.
What's worse, even when you do detect it (which coudl also be a false detection), you cannot do anything about it other than perhaps not serve content or serve a page of nonsense. I opet to serve a bih splash div with huge text saying I believe you are viewing this through a proxy and I dont' want to cater to proxies. can't redirect, can't use anything. Even javascript usually is neutralized.
Any headers you may try to send will not make it.
Only Google can actively get rid of this by not even indexing any url that has even a hint of proxy in it. This can take care of nph-proxy and cgi-proxy and a few other known proxy types. But then there are lots more out there, disguised as proper website url's. Incidentally the use of base href for your domain very typiclaly gets neutralized by the proxy whcih will replace it with their own. Also all links expressed in absolute paths will also be converted to their own. The proxy operators dont care if it works or doesn't past the first page.
Here's one where I detect enough to warn but not enough to do anyhting about: 72.232.94.93/perl/nph-proxy.pl/010110A/ 687474703a2f2f7777772e77656261646f2e6e65742f61626f75742e706870
In this one I am able to use javascript to redirect to my site: generalhaberdashery.com/cgi-bin/nph-blah.cgi/000110A/http/ www.webado.net/services.php?lang=FR
Here's one where I can actually state with more conviction that I believe it's a screper site: toastedgamers.com/poxy/index.php?q=aHR0cDovL3dlYmFkby5jb20v
I don't dare suppress the content cmpletely in case I am mistaken about my detection.
I am only able to do these things by having fairly extensive scripting (php and js) added to all my pages. The typical website that is just html will be at a disadvantage.
> DMCA doesn't look like its applicable, Google can store things in > their cache without claims about copyright, content is not actually > stored on the proxying site.
> But what to do if a proxy stores other sites in the Google cache, as > if it was theirs?
> Can we have a simple reporting form for this? The symptoms are al too > easy to recognize!
> You can only block the ip of the broxy as the only definitive > defense. > You cannot detect in any definitive manner the use of a proxy.
> What's worse, even when you do detect it (which coudl also be a false > detection), you cannot do anything about it other than perhaps not > serve content or serve a page of nonsense. I opet to serve a bih > splash div with huge text saying I believe you are viewing this > through a proxy and I dont' want to cater to proxies. can't redirect, > can't use anything. Even javascript usually is neutralized.
> Any headers you may try to send will not make it.
> Only Google can actively get rid of this by not even indexing any url > that has even a hint of proxy in it. > This can take care of nph-proxy and cgi-proxy and a few other known > proxy types. But then there are lots more out there, disguised as > proper website url's. > Incidentally the use of base href for your domain very typiclaly gets > neutralized by the proxy whcih will replace it with their own. Also > all links expressed in absolute paths will also be converted to their > own. The proxy operators dont care if it works or doesn't past the > first page.
> Here's one where I detect enough to warn but not enough to do anyhting > about: > 72.232.94.93/perl/nph-proxy.pl/010110A/ > 687474703a2f2f7777772e77656261646f2e6e65742f61626f75742e706870
> In this one I am able to use javascript to redirect to my site: > generalhaberdashery.com/cgi-bin/nph-blah.cgi/000110A/http/www.webado.net/services.php?lang=FR
> Here's one where I can actually state with more conviction that I > believe it's a screper site: > toastedgamers.com/poxy/index.php?q=aHR0cDovL3dlYmFkby5jb20v
> I don't dare suppress the content cmpletely in case I am mistaken > about my detection.
> I am only able to do these things by having fairly extensive scripting > (php and js) added to all my pages. The typical website that is just > html will be at a disadvantage.
> On Jun 29, 3:27 pm, Burt wrote:
> > I already am busy with a site owner and his host, they have willing > > ears, but things don't get solved so easily.
> > DMCA doesn't look like its applicable, Google can store things in > > their cache without claims about copyright, content is not actually > > stored on the proxying site.
> > But what to do if a proxy stores other sites in the Google cache, as > > if it was theirs?
> > Can we have a simple reporting form for this? The symptoms are al too > > easy to recognize!
Trying to whack every single new proxy is a losing proposition.
One thing we do have a strong interest in keeping an eye on is situations in which proxies outrank the original sites for reasonable query terms.
For instance, if you have a site about lefthanded smokeshifters and a search for "buy lefthanded smokeshifters" turns up a proxy site before yours, then there's a potential issue there.
In contrast, simple site queries in this context don't really tell us much or cause severe concern. Such results aren't negatively impacting typical Google searches.
That is pretty much a bomb proof method for doing what Google is suggesting it be used for, search engine bot verification but many, outside of Google, seem to indicate its applicability in the prevention of proxies doing what Burt is talking about here.
What I don't understand is that using the bot verification method for something like this, proxy hijack prevention, would seem to assume that proxies are going to spoof the client identification as coming from a search engine.
What is to stop a given proxy from just identifying itself as some random non-search engine bot or even just a normal browser client which would seem to make the bot verification process not very applicable. Am I missing something here?
I'd like to make it clear that so far, I have NOT seen anyone from Google suggesting the use of the bot verification method to be applied to the case of proxy hijack prevention but I am wondering if anyone knows why so many people seem to think it IS the answer to proxy prevention?
If it is the answer to proxy hijack prevention, implementing it would not be that hard but it would seem to me that its use for proxy hijack prevention could simply be bypassed by a given proxy NOT spoofing its client identification.
Anyone got any thoughts on this?
Is one's only recourse, as is most often the case, "simply" monitoring one's site and search results and reporting inconsistencies as they come up?
> Trying to whack every single new proxy is a losing proposition.
> One thing we do have a strong interest in keeping an eye on is > situations in which proxies outrank the original sites for reasonable > query terms.
> For instance, if you have a site about lefthanded smokeshifters and a > search for "buy lefthanded smokeshifters" turns up a proxy site before > yours, then there's a potential issue there.
> In contrast, simple site queries in this context don't really tell us > much or cause severe concern. Such results aren't negatively > impacting typical Google searches.
Ok, I'm mising something here. Like maybe the whole point LOL
It seems to me this issue isn't the identification of which bot is doing what. Only the simple (but, oh so complex, to me at least) issue of determining if the web page is being funnelled through a proxy or not. Any robot implicated would be once removed. And the only way to know that maybe there's one involved is if you already know your page is in a proxy hold. Maybe in the rather rare cases when I know mypage is seen through a proxy and I also know what IP was acccessing the proxy it would be that Ip that interests me - if it's a good robot I may want to serve no robot food. Why? because I dont want the proxy scraper's SERPS to ride on my content. The trouble is most of the time the tell-tale signs of proxy usage aren't so clear.
Adam, I personally get very incensed at proxy usage in general. As far as I and my websites are concerned I see no legitimate use for them. Therefore I've been on a crusade to annnihilate them since before seeing and knowing anything about scrapers. I started this when I had been almost defrauded a cople of times by would-be clients ordering services from my site through proxies. That's' when I decided I have to find them out and prevent them from using my order forms at least.
> That is pretty much a bomb proof method for doing what Google is > suggesting it be used for, search engine bot verification but many, > outside of Google, seem to indicate its applicability in the > prevention of proxies doing what Burt is talking about here.
> What I don't understand is that using the bot verification method for > something like this, proxy hijack prevention, would seem to assume > that proxies are going to spoof the client identification as coming > from a search engine.
> What is to stop a given proxy from just identifying itself as some > random non-search engine bot or even just a normal browser client > which would seem to make the bot verification process not very > applicable. Am I missing something here?
> I'd like to make it clear that so far, I have NOT seen anyone from > Google suggesting the use of the bot verification method to be applied > to the case of proxy hijack prevention but I am wondering if anyone > knows why so many people seem to think it IS the answer to proxy > prevention?
> If it is the answer to proxy hijack prevention, implementing it would > not be that hard but it would seem to me that its use for proxy hijack > prevention could simply be bypassed by a given proxy NOT spoofing its > client identification.
> Anyone got any thoughts on this?
> Is one's only recourse, as is most often the case, "simply" monitoring > one's site and search results and reporting inconsistencies as they > come up?
> Craig
> On Jun 30, 11:09 am, Adam Lasnik wrote:
> > Trying to whack every single new proxy is a losing proposition.
> > One thing we do have a strong interest in keeping an eye on is > > situations in which proxies outrank the original sites for reasonable > > query terms.
> > For instance, if you have a site about lefthanded smokeshifters and a > > search for "buy lefthanded smokeshifters" turns up a proxy site before > > yours, then there's a potential issue there.
> > In contrast, simple site queries in this context don't really tell us > > much or cause severe concern. Such results aren't negatively > > impacting typical Google searches.- Hide quoted text -
> Trying to whack every single new proxy is a losing proposition.
> One thing we do have a strong interest in keeping an eye on is > situations in which proxies outrank the original sites for reasonable > query terms.
> For instance, if you have a site about lefthanded smokeshifters and a > search for "buy lefthanded smokeshifters" turns up a proxy site before > yours, then there's a potential issue there.
> In contrast, simple site queries in this context don't really tell us > much or cause severe concern. Such results aren't negatively > impacting typical Google searches.
> Ok, I have a sample, where a proxy outranks page of my own site for a > reasonable search term.
> And my own page IS NO WHERE in the serps.
> Proxies can take sites out, can take pages out.
> Adam, I think its a serious problem.
> This is the search term, without quotes "Bali-Portal is my first site > about Bali"
> Proxy is nr 2, on page one, and supplemental, my page is not listed at > all!
> Use the phrase with quotes on you find the original, NOT supplemental
> ????
> It the first term I tried, I guess there are much more...
> On Jun 30, 10:09 am, Adam Lasnik wrote:
> > Trying to whack every single new proxy is a losing proposition.
> > One thing we do have a strong interest in keeping an eye on is > > situations in which proxies outrank the original sites for reasonable > > query terms.
> > For instance, if you have a site about lefthanded smokeshifters and a > > search for "buy lefthanded smokeshifters" turns up a proxy site before > > yours, then there's a potential issue there.
> > In contrast, simple site queries in this context don't really tell us > > much or cause severe concern. Such results aren't negatively > > impacting typical Google searches.- Hide quoted text -
> You ever wounder why the proxy sites results rank higher than the > original results?
> On Jun 30, 4:27 pm, Burt wrote:
> > O I have a sample, where a proxy outranks page of my own site for a > > reasonable search term.
> > And my own page IS NO WHERE in the serps.
> > Proxies can take sites out, can take pages out.
> > Adam, I think its a serious problem.
> > This is the search term, without quotes "Bali-Portal is my first site > > about Bali"
> > Proxy is nr 2, on page one, and supplemental, my page is not listed at > > all!
> > Use the phrase with quotes on you find the original, NOT supplemental
> > ????
> > It the first term I tried, I guess there are much more...
> > On Jun 30, 10:09 am, Adam Lasnik wrote:
> > > Trying to whack every single new proxy is a losing proposition.
> > > One thing we do have a strong interest in keeping an eye on is > > > situations in which proxies outrank the original sites for reasonable > > > query terms.
> > > For instance, if you have a site about lefthanded smokeshifters and a > > > search for "buy lefthanded smokeshifters" turns up a proxy site before > > > yours, then there's a potential issue there.
> > > In contrast, simple site queries in this context don't really tell us > > > much or cause severe concern. Such results aren't negatively > > > impacting typical Google searches.- Hide quoted text -
Do a search on Google for the following term - video broadcast standards
The top result comes back as - viaweb.info/index.php? q=aHR0cDovL3d3dy5hbGtlbm1ycy5jb20vdmlkZW8vc3RhbmRhcmRzLmh0bWw%3D
which is the proxy that hijacked my complete site - www.alkenmrs.com and got it totally deindexed
If you actually click on the returned link, that goes to another proxy - wacast.com with the search term already entered in the box. It won't work any further from there as I have blocked both proxies and the ip that ultimately accesses my site.
The search term above has been well ranked for my site for around ten years and yet today, although coming in at the top slot, it is ranked under the proxy with that particular page - http://www.alenrms.com/video/standards.html on my site having been totally deindexed along with the rest of the site.
So, as well as having a devastating impact on my business, I would say that it is also impacting negatively on Google searches.
BTW - I have filed a spam report and also a reinclusion request in respect of the proxies above
> Trying to whack every single new proxy is a losing proposition.
> One thing we do have a strong interest in keeping an eye on is > situations in which proxies outrank the original sites for reasonable > query terms.
> For instance, if you have a site about lefthanded smokeshifters and a > search for "buy lefthanded smokeshifters" turns up a proxy site before > yours, then there's a potential issue there.
> In contrast, simple site queries in this context don't really tell us > much or cause severe concern. Such results aren't negatively > impacting typical Google searches.
> Do a search on Google for the following term - video broadcast > standards
> The top result comes back as - > viaweb.info/index.php? > q=aHR0cDovL3d3dy5hbGtlbm1ycy5jb20vdmlkZW8vc3RhbmRhcmRzLmh0bWw%3D
> which is the proxy that hijacked my complete site -www.alkenmrs.com > and got it totally deindexed
> If you actually click on the returned link, that goes to another proxy > - wacast.com with the search term already entered in the box. It > won't work any further from there as I have blocked both proxies and > the ip that ultimately accesses my site.
> The search term above has been well ranked for my site for around ten > years and yet today, although coming in at the top slot, it is ranked > under the proxy with that particular page -http://www.alenrms.com/video/standards.html > on my site having been totally deindexed along with the rest of the > site.
> So, as well as having a devastating impact on my business, I would say > that it is also impacting negatively on Google searches.
> BTW - I have filed a spam report and also a reinclusion request in > respect of the proxies above
> Alan
> On Jun 30, 3:09 am, Adam Lasnik wrote:
> > Trying to whack every single new proxy is a losing proposition.
> > One thing we do have a strong interest in keeping an eye on is > > situations in which proxies outrank the original sites for reasonable > > query terms.
> > For instance, if you have a site about lefthanded smokeshifters and a > > search for "buy lefthanded smokeshifters" turns up a proxy site before > > yours, then there's a potential issue there.
> > In contrast, simple site queries in this context don't really tell us > > much or cause severe concern. Such results aren't negatively > > impacting typical Google searches.
Blocking ip addresses is a fruitless exercise as there are thousands of these sites some using spoof/non static ip's. http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-goog... suggests a double reverse-DNS lookup on the IP address requesting as Googlebot; If the IP address points to a Google hostname, and looking up that hostname then returns the original IP address, then it is legitimate Googlebot request. Since the reverse dns can also be spoofed its my understanding you have to implement a reverse-forward DNS spider validation.i.e IP -> REVERSE DNS -> FORWARD DNS = Original IP Once you have installed the reverse-forward DNS checking code, you can make your code return a page with text such as "this page has prevented theiving scumbags from proxy hijacking" -
As far as i can see the issue of proxy hi-jacks mushroomed because earlier versions of Apache prior to 1.3.had by default (or at least all the hosts we had did) the reverse dns lookup switched on. However after 3.1 It was switched off in order to save the network traffic for those sites that did not need the reverse lookups done. It was also thought better for the end users because they don't have to suffer the delay that a lookup entails particularly for large sites. Our problem is DNS lookups can take up too much time and its not easy persuading some of our hosts to install the DNS reverse-forward lookup - which brings us back to Webadoos point that the major search engines ought to throw out the proxy sites (or any site that has something before http://www in its url).
Burt if your Website is weaker than a proxy, you are hijacking this group, with your complaint. Yes many proxies are evil and should be removed from the Net, but whose job is it to police them? Google?
Google should not show the proxy results in it's index, and I am sure it does not once the algorithm determines that it is proxy results not original author result.
And Burt, if you want to start a thread were only selective users can reply to it, please do it in a private group!
> > You ever wounder why the proxy sites results rank higher than the > > original results?
> > On Jun 30, 4:27 pm, Burt wrote:
> > > O I have a sample, where a proxy outranks page of my own site for a > > > reasonable search term.
> > > And my own page IS NO WHERE in the serps.
> > > Proxies can take sites out, can take pages out.
> > > Adam, I think its a serious problem.
> > > This is the search term, without quotes "Bali-Portal is my first site > > > about Bali"
> > > Proxy is nr 2, on page one, and supplemental, my page is not listed at > > > all!
> > > Use the phrase with quotes on you find the original, NOT supplemental
> > > ????
> > > It the first term I tried, I guess there are much more...
> > > On Jun 30, 10:09 am, Adam Lasnik wrote:
> > > > Trying to whack every single new proxy is a losing proposition.
> > > > One thing we do have a strong interest in keeping an eye on is > > > > situations in which proxies outrank the original sites for reasonable > > > > query terms.
> > > > For instance, if you have a site about lefthanded smokeshifters and a > > > > search for "buy lefthanded smokeshifters" turns up a proxy site before > > > > yours, then there's a potential issue there.
> > > > In contrast, simple site queries in this context don't really tell us > > > > much or cause severe concern. Such results aren't negatively > > > > impacting typical Google searches.- Hide quoted text -
I am convinced that the Google algorithm cannot distinguish the difference between the proxy and the original result. In my post above, doing a search for 'video broadcast standards' returns one of my pages as the top ranking result but with a url of -
That page was one of the original pages on my site from 1996 and yet, when some idiot decides to hijack the site, the proxy gets favoured and my site gets totally deindexed - how can that be!!
It would appear to me that there is nothing in the algorithm to detect this sort of abuse.
On top of that, although this viaweb.info proxy has a number of other pages from my site indexed, how come my whole site (some 7000+ pages) got deindexed rather than just the duplicated pages unless of course my site has been continually crawled through the proxy for longer than I know about.
I say that because the Google cache date for the index of the proxy goes back to 19 March in some instances. My site got deindexed on 26 June with no sign of it getting back in despite spam reports and a reinclusion request.
> Burt if your Website is weaker than a proxy, you are hijacking this > group, with your complaint. > Yes many proxies are evil and should be removed from the Net, but > whose job is it to police them? Google?
> Google should not show the proxy results in it's index, and I am sure > it does not once the algorithm determines that it is proxy results not > original author result.
> And Burt, if you want to start a thread were only selective users can > reply to it, please do it in a private group!
> Thank you, > Igor
> On Jun 30, 5:41 pm, Burt wrote:
> > Igor, don't hijack this thread.
> > On Jun 30, 4:37 pm, ivb wrote:
> > > You ever wounder why the proxy sites results rank higher than the > > > original results?
> > > On Jun 30, 4:27 pm, Burt wrote:
> > > > O I have a sample, where a proxy outranks page of my own site for a > > > > reasonable search term.
> > > > And my own page IS NO WHERE in the serps.
> > > > Proxies can take sites out, can take pages out.
> > > > Adam, I think its a serious problem.
> > > > This is the search term, without quotes "Bali-Portal is my first site > > > > about Bali"
> > > > Proxy is nr 2, on page one, and supplemental, my page is not listed at > > > > all!
> > > > Use the phrase with quotes on you find the original, NOT supplemental
> > > > ????
> > > > It the first term I tried, I guess there are much more...
> > > > On Jun 30, 10:09 am, Adam Lasnik wrote:
> > > > > Trying to whack every single new proxy is a losing proposition.
> > > > > One thing we do have a strong interest in keeping an eye on is > > > > > situations in which proxies outrank the original sites for reasonable > > > > > query terms.
> > > > > For instance, if you have a site about lefthanded smokeshifters and a > > > > > search for "buy lefthanded smokeshifters" turns up a proxy site before > > > > > yours, then there's a potential issue there.
> > > > > In contrast, simple site queries in this context don't really tell us > > > > > much or cause severe concern. Such results aren't negatively > > > > > impacting typical Google searches.- Hide quoted text -
> I am convinced that the Google algorithm cannot distinguish the > difference between the proxy and the original result. In my post > above, doing a search for 'video broadcast standards' returns one of > my pages as the top ranking result but with a url of -
> That page was one of the original pages on my site from 1996 and yet, > when some idiot decides to hijack the site, the proxy gets favoured > and my site gets totally deindexed - how can that be!!
> It would appear to me that there is nothing in the algorithm to detect > this sort of abuse.
> On top of that, although this viaweb.info proxy has a number of other > pages from my site indexed, how come my whole site (some 7000+ pages) > got deindexed rather than just the duplicated pages unless of course > my site has been continually crawled through the proxy for longer than > I know about.
> I say that because the Google cache date for the index of the proxy > goes back to 19 March in some instances. My site got deindexed on 26 > June with no sign of it getting back in despite spam reports and a > reinclusion request.
> Alan
> On Jun 30, 2:22 pm, ivb wrote:
> > Burt if your Website is weaker than a proxy, you are hijacking this > > group, with your complaint. > > Yes many proxies are evil and should be removed from the Net, but > > whose job is it to police them? Google?
> > Google should not show the proxy results in it's index, and I am sure > > it does not once the algorithm determines that it is proxy results not > > original author result.
> > And Burt, if you want to start a thread were only selective users can > > reply to it, please do it in a private group!
> > Thank you, > > Igor
> > On Jun 30, 5:41 pm, Burt wrote:
> > > Igor, don't hijack this thread.
> > > On Jun 30, 4:37 pm, ivb wrote:
> > > > You ever wounder why the proxy sites results rank higher than the > > > > original results?
> > > > On Jun 30, 4:27 pm, Burt wrote:
> > > > > O I have a sample, where a proxy outranks page of my own site for a > > > > > reasonable search term.
> > > > > And my own page IS NO WHERE in the serps.
> > > > > Proxies can take sites out, can take pages out.
> > > > > Adam, I think its a serious problem.
> > > > > This is the search term, without quotes "Bali-Portal is my first site > > > > > about Bali"
> > > > > Proxy is nr 2, on page one, and supplemental, my page is not listed at > > > > > all!
> > > > > Use the phrase with quotes on you find the original, NOT supplemental
> > > > > ????
> > > > > It the first term I tried, I guess there are much more...
> > > > > On Jun 30, 10:09 am, Adam Lasnik wrote:
> > > > > > Trying to whack every single new proxy is a losing proposition.
> > > > > > One thing we do have a strong interest in keeping an eye on is > > > > > > situations in which proxies outrank the original sites for reasonable > > > > > > query terms.
> > > > > > For instance, if you have a site about lefthanded smokeshifters and a > > > > > > search for "buy lefthanded smokeshifters" turns up a proxy site before > > > > > > yours, then there's a potential issue there.
> > > > > > In contrast, simple site queries in this context don't really tell us > > > > > > much or cause severe concern. Such results aren't negatively > > > > > > impacting typical Google searches.- Hide quoted text -
> > > > > - Show quoted text -- Hide quoted text -
> > I am convinced that the Google algorithm cannot distinguish the > > difference between the proxy and the original result. In my post > > above, doing a search for 'video broadcast standards' returns one of > > my pages as the top ranking result but with a url of -
> > That page was one of the original pages on my site from 1996 and yet, > > when some idiot decides to hijack the site, the proxy gets favoured > > and my site gets totally deindexed - how can that be!!
> > It would appear to me that there is nothing in the algorithm to detect > > this sort of abuse.
> > On top of that, although this viaweb.info proxy has a number of other > > pages from my site indexed, how come my whole site (some 7000+ pages) > > got deindexed rather than just the duplicated pages unless of course > > my site has been continually crawled through the proxy for longer than > > I know about.
> > I say that because the Google cache date for the index of the proxy > > goes back to 19 March in some instances. My site got deindexed on 26 > > June with no sign of it getting back in despite spam reports and a > > reinclusion request.
> > Alan
> > On Jun 30, 2:22 pm, ivb wrote:
> > > Burt if your Website is weaker than a proxy, you are hijacking this > > > group, with your complaint. > > > Yes many proxies are evil and should be removed from the Net, but > > > whose job is it to police them? Google?
> > > Google should not show the proxy results in it's index, and I am sure > > > it does not once the algorithm determines that it is proxy results not > > > original author result.
> > > And Burt, if you want to start a thread were only selective users can > > > reply to it, please do it in a private group!
> > > Thank you, > > > Igor
> > > On Jun 30, 5:41 pm, Burt wrote:
> > > > Igor, don't hijack this thread.
> > > > On Jun 30, 4:37 pm, ivb wrote:
> > > > > You ever wounder why the proxy sites results rank higher than the > > > > > original results?
> > > > > On Jun 30, 4:27 pm, Burt wrote:
> > > > > > O I have a sample, where a proxy outranks page of my own site for a > > > > > > reasonable search term.
> > > > > > And my own page IS NO WHERE in the serps.
> > > > > > Proxies can take sites out, can take pages out.
> > > > > > Adam, I think its a serious problem.
> > > > > > This is the search term, without quotes "Bali-Portal is my first site > > > > > > about Bali"
> > > > > > Proxy is nr 2, on page one, and supplemental, my page is not listed at > > > > > > all!
> > > > > > Use the phrase with quotes on you find the original, NOT supplemental
> > > > > > ????
> > > > > > It the first term I tried, I guess there are much more...
> > > > > > On Jun 30, 10:09 am, Adam Lasnik wrote:
> > > > > > > Trying to whack every single new proxy is a losing proposition.
> > > > > > > One thing we do have a strong interest in keeping an eye on is > > > > > > > situations in which proxies outrank the original sites for reasonable > > > > > > > query terms.
> > > > > > > For instance, if you have a site about lefthanded smokeshifters and a > > > > > > > search for "buy lefthanded smokeshifters" turns up a proxy site before > > > > > > > yours, then there's a potential issue there.
> > > > > > > In contrast, simple site queries in this context don't really tell us > > > > > > > much or cause severe concern. Such results aren't negatively > > > > > > > impacting typical Google searches.- Hide quoted text -
> > > > > > - Show quoted text -- Hide quoted text -
is the shared secured server that orders are routed through.
I noticed that showing up recently although in theory, it shouldn't be getting indexed. On the other hand, thinking about it, it is possible to navigate the whole site through the secure server.
> > I am convinced that the Google algorithm cannot distinguish the > > difference between the proxy and the original result. In my post > > above, doing a search for 'video broadcast standards' returns one of > > my pages as the top ranking result but with a url of -
> > That page was one of the original pages on my site from 1996 and yet, > > when some idiot decides to hijack the site, the proxy gets favoured > > and my site gets totally deindexed - how can that be!!
> > It would appear to me that there is nothing in the algorithm to detect > > this sort of abuse.
> > On top of that, although this viaweb.info proxy has a number of other > > pages from my site indexed, how come my whole site (some 7000+ pages) > > got deindexed rather than just the duplicated pages unless of course > > my site has been continually crawled through the proxy for longer than > > I know about.
> > I say that because the Google cache date for the index of the proxy > > goes back to 19 March in some instances. My site got deindexed on 26 > > June with no sign of it getting back in despite spam reports and a > > reinclusion request.
> > Alan
> > On Jun 30, 2:22 pm, ivb wrote:
> > > Burt if your Website is weaker than a proxy, you are hijacking this > > > group, with your complaint. > > > Yes many proxies are evil and should be removed from the Net, but > > > whose job is it to police them? Google?
> > > Google should not show the proxy results in it's index, and I am sure > > > it does not once the algorithm determines that it is proxy results not > > > original author result.
> > > And Burt, if you want to start a thread were only selective users can > > > reply to it, please do it in a private group!
> > > Thank you, > > > Igor
> > > On Jun 30, 5:41 pm, Burt wrote:
> > > > Igor, don't hijack this thread.
> > > > On Jun 30, 4:37 pm, ivb wrote:
> > > > > You ever wounder why the proxy sites results rank higher than the > > > > > original results?
> > > > > On Jun 30, 4:27 pm, Burt wrote:
> > > > > > O I have a sample, where a proxy outranks page of my own site for a > > > > > > reasonable search term.
> > > > > > And my own page IS NO WHERE in the serps.
> > > > > > Proxies can take sites out, can take pages out.
> > > > > > Adam, I think its a serious problem.
> > > > > > This is the search term, without quotes "Bali-Portal is my first site > > > > > > about Bali"
> > > > > > Proxy is nr 2, on page one, and supplemental, my page is not listed at > > > > > > all!
> > > > > > Use the phrase with quotes on you find the original, NOT supplemental
> > > > > > ????
> > > > > > It the first term I tried, I guess there are much more...
> > > > > > On Jun 30, 10:09 am, Adam Lasnik wrote:
> > > > > > > Trying to whack every single new proxy is a losing proposition.
> > > > > > > One thing we do have a strong interest in keeping an eye on is > > > > > > > situations in which proxies outrank the original sites for reasonable > > > > > > > query terms.
> > > > > > > For instance, if you have a site about lefthanded smokeshifters and a > > > > > > > search for "buy lefthanded smokeshifters" turns up a proxy site before > > > > > > > yours, then there's a potential issue there.
> > > > > > > In contrast, simple site queries in this context don't really tell us > > > > > > > much or cause severe concern. Such results aren't negatively > > > > > > > impacting typical Google searches.- Hide quoted text -
> > > > > > - Show quoted text -- Hide quoted text -
Interesting about viaweb.info as I emailed this guy yesterday trying to figure out what was going on. Here is a copy of the reply (He/She is Italian hence the poor English)
QUOTE
Hey
the domain vaweb.info is not hosted anywhere it is just registered at tierra.net
it was a proxy website and the pages you see indexed by google is has not beend created from me, but it is hotlinks created from the google chage directly to the prixified pages, and it was a problem for mee too becaose it steals bandwidth from me , i do not know if you het the point.
actually there is not anything at viaweb.com , it is not hosted at tierra.net just parked for free .
So i do not know how can i remove the copyrighted material you say, if i actually have not a website , but just a domain!!!
ENDQUOTE
and then when I emailed and asked him about wacast.com (viaweb.info urls get redirected through wacast.com), this is what I got back.
QUOTE
I do not know if you understad how a proxy works?
exampe somebody wandts to open your site but that site from their location is blocked,so this user access to your site throught a proxy (viaweb.info for example) then the pages this user visits throught this proxy are the pages you find at google ,i do not know why google indexes this pages ,because it is not real pages .
ENDQUOTE
Maybe I didn't know so much about proxies (although I have learnt a lot in the past couple of days) this guy knows more than he is letting on especially as he boasting about 54 pages indexed on Google. He is right in that respect but whose pages are they!!
> Interesting about viaweb.info as I emailed this guy yesterday trying > to figure out what was going on. Here is a copy of the reply (He/She > is Italian hence the poor English)
> QUOTE
> Hey
> the domain vaweb.info is not hosted anywhere it is just registered at > tierra.net
> it was a proxy website and the pages you see indexed by google is has > not beend created from me, but it is hotlinks created from the google > chage directly to the prixified pages, and it was a problem for mee > too becaose it steals bandwidth from me , i do not know if you het the > point.
> actually there is not anything at viaweb.com , it is not hosted at > tierra.net just parked for free .
> So i do not know how can i remove the copyrighted material you say, if > i actually have not a website , but just a domain!!!
> ENDQUOTE
> and then when I emailed and asked him about wacast.com (viaweb.info > urls get redirected through wacast.com), this is what I got back.
> QUOTE
> I do not know if you understad how a proxy works?
> exampe somebody wandts to open your site but that site from their > location is blocked,so this user access to your site throught a proxy > (viaweb.info for example) then the pages this user visits throught > this proxy are the pages you find at google ,i do not know why google > indexes this pages ,because it is not real pages .
> ENDQUOTE
> Maybe I didn't know so much about proxies (although I have learnt a > lot in the past couple of days) this guy knows more than he is letting > on especially as he boasting about 54 pages indexed on Google. He is > right in that respect but whose pages are they!!
> you can only block those spiders that actually honor robots.txt and > unfortunately i doubt wacast and uk250 will take any notice of robots > txt.
I am only really interested in Google at present but I know what you are saying.
That UK250 site is some sort of UK directory which shows different sites, such as one section of mine, but they do it by framing particular pages. They do the same to every site they have in their directory. Strange way of doing things but it is a live snapshot with all links going back to the original site so I wouldn't think it does any damage - at least I hope not!!
Since they got indexed (which is why they were found by Copyscape), they are doing some damage, in the sense that they are competing wih your original site for the same content.
You can jump out of frames (using javascript) so at least visitors don't get fooled into staying on sites that frame yours. It does nothing for robots though.
> > you can only block those spiders that actually honor robots.txt and > > unfortunately i doubt wacast and uk250 will take any notice of robots > > txt.
> I am only really interested in Google at present but I know what you > are saying.
> That UK250 site is some sort of UK directory which shows different > sites, such as one section of mine, but they do it by framing > particular pages. They do the same to every site they have in their > directory. > Strange way of doing things but it is a live snapshot with all links > going back to the original site so I wouldn't think it does any damage > - at least I hope not!!