> On the other hand, if there isn't sufficient PageRank to keep every > page out of the Supplemental index, which would you rather have in the > Supplemental index, a page which mainly contains duplicate content or > a page with unique content on it?
Totally agree Craig - But I guess that goes back to the 'what percentage is the cut-off' question Daamsie has asked. I guess the real solution for everyone would be if google started treating blockquote as a hint that the content may be duplicate, and that they should not index that proportion of the page - or add "section targetting" hint tags like adsense does.
Kerry - re your reiteration of what Adam said re: not getting into in depth discussions - thanks for reminding us.
Does removing a URL's file extension via Apaches 'content negotiation' have any impact on the flow of PR to that URL? Currently if we have inbound links to '.../silver/rings.html' will '.../silver/rings' retain the PR if Apache is automatically delivering the best variant of the same resource (i.e 'rings') to browsers - In the long term future we may need to change technologies (who knows maybe even 'html' itself) and i want to avoid using yet more 301's. I know that google can handle almost any file extension but i am unclear on how it handles no file extension.
You also need to 301 redirect your original .html sufficed pages to their unsuffixed counterparts - which internally get rewritten back to the .html suffixed ones - yes, it's tricky since you may end up with a loop if not careful.
If you don't do the 301 redirection, it's bye-bye value flow.
> Does removing a URL's file extension via Apaches 'content negotiation' > have any impact on the flow of PR to that URL? > Currently if we have inbound links to '.../silver/rings.html' will > '.../silver/rings' retain the PR if Apache is automatically > delivering the best variant of the same resource (i.e 'rings') to > browsers - In the long term future we may need to change technologies > (who knows maybe even 'html' itself) and i want to avoid using yet > more 301's. I know that google can handle almost any file extension > but i am unclear on how it handles no file extension.
"I do not believe you are Matt Cutts because you do not have a cool G icon next to your name! =P"
Hmm. I'm going to have to do something about that--thanks for pointing that out, Admin Aaron. Maybe I'll tackle your nofollow question to say thanks for mentioning it. :)
> "I do not believe you are Matt Cutts because you do not have a cool G > icon next to your name! =P"
> Hmm. I'm going to have to do something about that--thanks for pointing > that out, Admin Aaron. Maybe I'll tackle your nofollow question to say > thanks for mentioning it. :)
> "I do not believe you are Matt Cutts because you do not have a cool G > icon next to your name! =P"
> Hmm. I'm going to have to do something about that--thanks for pointing > that out, Admin Aaron. Maybe I'll tackle your nofollow question to say > thanks for mentioning it. :)
"You also need to 301 redirect your original .html sufficed pages to their unsuffixed counterparts"
are you sure because the original page still exists and it is keeping its html extension but apache refers to the web resource without it. If say i changed rings.html in due course to rings.w2 or whatever new format is required in the future all References and inbound links which do have the old html extension will still work because apache is serving up the file by reference only to 'rings'. http://www.w3.org/Provider/Style/URI#remove It can also save migration headaches http://www.websiteoptimization.com/speed/tweak/rewrite/ I would not need to implement a 301 for the browsers however will it be needed to keep google's PR.
Obviously Google doesn't (always) follow the HTTP protocol when handling 301, 302 and 307 redirects. At least with 302 redirects there is a lot of magic involved. The voodoo was implemented to fight 302- hijacking. I'd like to know what Google actually does with 301, 302 and 307 redirects. Something in the lines of "When a 302ing URL and it's target is on the same domain and both have inbound links our canonicalization routine chooses the URL with the highest PR value as canonical URL; 302 and 307 redirects don't pass PageRank or anchor text, but 301 redirects do; We stop spidering the source of a 301 redirect when we don't find a single link pointing to it anymore; When a 302 redirect leaves its name space we'll index the redirect target instead of the 302ing URL except when it's an affiliate link, then we don't care much and consolidate the given location with the best matching landing page on the merchants site; We follow exactly 2 redirects in a row, when we spot the 3rd we'll list the source URL under "URLs not followed" and deindex every linkless URL involved in the redirect chain, also every redirect in a chain takes away n% from the remaining power of the initial link; We interpret 307 redirects within the current server like 302s except when the destination is feedburner.com, with 307 redirects pointing to another server we index the source URL with the content of the redirect target if the redirect target has no (valuable) inbound links itself ... (all factless/ provoking speculation made up to sample the expected granularity of course)."
> Things change over time and what was said yesterday may no longer be > true today. I think all questions posed in this thread are valid, just > because some regular read an answer to a question back in 2004 doesn't > mean it still isn't valid as a question to pose today and want an > answer to. If anything, that just shows to Google that perhaps they > need to update their official response area (webmaster help center) > with that answer so that all webmasters, not just the few hundred that > post here, have a fair shot at reading it.
> With regards to the actual thread at hand, we're about 4 days away > from the two week mark at which time Google has committed to answer 5 > questions posed here. Since Google is making this unusual gesture to > communicate with us, perhaps we should do the unusual too, and all > come together and agree on the 5 questions we'd like to see answered > by Google. We've got a few days to seive out all the real questions > here and then see for ourselves if they've been answered or not > elsewhere. That will leave a few questions that have never been > answered and that we REALLY want an answer to. We then come together > as a group and post those 5 questions here together. Of course Google > might not choose to answer those 5 questions, but that would be going > against us, the users, and from Google's own goals we know that the > users are their main priority. So either they shoot their company > objectives in the foot, or they see this is a great opportunity to > really help out.
> What do you say? I'm willing to put some time into going through this > thread and picking out the questions, without the fluff and then > starting a new thread where we get together to decide on the final 5 > before putting to the G'plex. Who's on board - it's 4 days to d-day, > or should I say G-day? :)
> On Oct 3, 9:55 am, cass-hacks wrote:
> > > Well, that's your opinion. You state it as fact, but it is only your > > > opinion. There are enough others who claim that there is such a > > > penalty though.
> > No, it is not only my opinion. If it were an opinion I would have > > said so.
> > What I wrote is based on what has been said, one way or another, by > > numerous Google employees and although it might have just been their > > opinion, I would value their opinion at least slightly higher than > > someone on the outside that is just guessing.
> > Are any of the others who make the claim that a "duplicate content > > penalty" exists Google employees?
> > Thought not. ;-)
> > On the other hand, Google employees, "Vanessa Fox" to name one, have > > actually stated that there is no such thing as a "duplicate content > > penalty".
> > Hint, there is some quoted text in the previous sentence that you can > > use in a Google search to get it straight from the horses mouth, so to > > speak. "Seek and ye shall find." ;-)
> > On that note, why should Googlers here in this forum answer questions > > that have been answered by Googlers numerous times and in numerous > > ways?
Here is an idea for a thread - one that probably doesn't have an easy answer.
What would you suggest is the best mechanism for a site that generates a lot of original content to get discovered and increase its page rank? I know the standard answer is get more links, but we all know how problematic that can be. One shouldn't buy links, or trade links. We can hire SEO experts to optimize our pages for discovery. But it takes a really, really long time to generate natural links without shameless hustling and posting to lots of blogs that are not really that interested in posting self-serving links to other sites.
The nut of the question is, as long as the algorithm is based on link popularity, new sites will always be at a disadvantage to sites that have been around longer. That is life, but are there any other strategies that can help move sites with lots of content up in the rankings?