I have a hard time getting my mind around some of what is regarded as "duplicate content." Can anyone explain?
Okay, if .http://www.yourdomain.com and .http:yourdomain.com go to the same page, I don't see how that's duplicate content. If two streets come to a house, it ain't a duplicate house. And yet, that's what Google calls it. It seems to me that the truth is that Googlebot isn't programmed well enough to know that all of the following point to the same file:
Hi Kathy The trailing slash does not make a difference on the domain name itself. That's the easy part.
Everything else is much harder.
www.domain.com vs domain.com: Technically, those are two different host names. It might as well be www.domain.com vs just.bananas.edu. Technically, you could have different things online on those two host names. From the name alone, it is not possible to know that they are the same, you can only do it from the content.
You might be called "Kathy", but your friends might call you "Kat". To someone who does not know you, the names "Kathy" and "Kat" are unique and could be for different people. Even the address might not give it away, if you had both names listed in the phone book. Smart people might guess and assume that it's only one "physical" person, but is that really something you can do with certainty? What if you're wrong? The only safe bet is to assume that the names are for different people.
Now, the real question is, what happens after Google recognizes that it might be the same. Imagine the situation where you have a dynamic element on your site, even just something as simple as the date/time being displayed. The pages pulled from the different host names are not fully identical. Do you see how this can turn into something complicated? It's like someone meeting you as Kat in the Gym first and then later meeting Kathy in a business meeting. Hmm, looks similar... is it identical? :-)
> Hi Kathy > The trailing slash does not make a difference on the domain name > itself. That's the easy part.
> Everything else is much harder.
> www.domain.comvs domain.com: Technically, those are two different > host names. It might as well bewww.domain.comvs just.bananas.edu. > Technically, you could have different things online on those two host > names. From the name alone, it is not possible to know that they are > the same, you can only do it from the content.
> You might be called "Kathy", but your friends might call you "Kat". To > someone who does not know you, the names "Kathy" and "Kat" are unique > and could be for different people. Even the address might not give it > away, if you had both names listed in the phone book. Smart people > might guess and assume that it's only one "physical" person, but is > that really something you can do with certainty? What if you're wrong? > The only safe bet is to assume that the names are for different > people.
> Now, the real question is, what happens after Google recognizes that > it might be the same. Imagine the situation where you have a dynamic > element on your site, even just something as simple as the date/time > being displayed. The pages pulled from the different host names are > not fully identical. Do you see how this can turn into something > complicated? It's like someone meeting you as Kat in the Gym first and > then later meeting Kathy in a business meeting. Hmm, looks similar... > is it identical? :-)
Thank you for the explanation. I appreciate it. But when i step back and ask what's the point, I see that in the vast majority of cases there is no "duplicate content." There is just a variable URL.
"Duplicate content" is usually then a misnomer and implies something the web builder did wrong.
My feeling is that the Internet shouldn't be made the province of geeks only by expecting us web builders to solve these technical problems FOR Google. Google firmly believes that a machine can do this. Okay, then make it do it.
Expecting us to do all this fiddling htaccess files and stuff is essentially asking us to solve their problem. Just yesterday I pasted some code into my htaccess for this (from an authoritative blog), and sure enough, I then needed support to give me access back to my website statistics. And I'm pretty good at figuring this stuff out. I can imagine the problems people without my time and science background have. It's getting way, way, way too complicated.
Google is like a vast billboard selling ad space along the Internet highway. It created the monster of black hat seo. Not us. But we're the ones jumping through hoops constantly to avoid being trashed right along with the cheaters these new rules are attempting to get rid of. So much for the "little guy."
> HiKathy > The trailing slash does not make a difference on the domain name > itself. That's the easy part.
> Everything else is much harder.
> www.domain.comvs domain.com: Technically, those are two different > host names. It might as well bewww.domain.comvs just.bananas.edu. > Technically, you could have different things online on those two host > names. From the name alone, it is not possible to know that they are > the same, you can only do it from the content.
> You might be called "Kathy", but your friends might call you "Kat". To > someone who does not know you, the names "Kathy" and "Kat" are unique > and could be for different people. Even the address might not give it > away, if you had both names listed in the phone book. Smart people > might guess and assume that it's only one "physical" person, but is > that really something you can do with certainty? What if you're wrong? > The only safe bet is to assume that the names are for different > people.
> Now, the real question is, what happens after Google recognizes that > it might be the same. Imagine the situation where you have a dynamic > element on your site, even just something as simple as the date/time > being displayed. The pages pulled from the different host names are > not fully identical. Do you see how this can turn into something > complicated? It's like someone meeting you as Kat in the Gym first and > then later meetingKathyin a business meeting. Hmm, looks similar... > is it identical? :-)
DirectoryIndex filename.html PHP or whatever Acomplish this?
Or is this as I read it just telling the server which file to use as the default and is not a redirect.
I suppose it its pointless to suggest that since we ( can ) choose a prefered domain Google does infact have a way to determin that this "stuff" is not duplicated.
> Hi Kathy > The trailing slash does not make a difference on the domain name > itself. That's the easy part.
> Everything else is much harder.
> www.domain.comvs domain.com: Technically, those are two different > host names. It might as well bewww.domain.comvs just.bananas.edu. > Technically, you could have different things online on those two host > names. From the name alone, it is not possible to know that they are > the same, you can only do it from the content.
> You might be called "Kathy", but your friends might call you "Kat". To > someone who does not know you, the names "Kathy" and "Kat" are unique > and could be for different people. Even the address might not give it > away, if you had both names listed in the phone book. Smart people > might guess and assume that it's only one "physical" person, but is > that really something you can do with certainty? What if you're wrong? > The only safe bet is to assume that the names are for different > people.
> Now, the real question is, what happens after Google recognizes that > it might be the same. Imagine the situation where you have a dynamic > element on your site, even just something as simple as the date/time > being displayed. The pages pulled from the different host names are > not fully identical. Do you see how this can turn into something > complicated? It's like someone meeting you as Kat in the Gym first and > then later meeting Kathy in a business meeting. Hmm, looks similar... > is it identical? :-)
You're right - ideally the web should not have high entry barriers. Ideally, anyone should be able to express their opinion, voice their thoughts and have it published online, picked up by search engines world wide.
However, things are never as simple. There will always be an entry barrier, if only being able to find out where you can get things done right easily.
The problem is not that you have to do extra work to get things done right, the problem is that all those services and web-hosters are doing extra work to make things easy AND technically incorrect. Think of the www/non-www issue: the problem is not that *you* have to redirect, the problem is that your web-hoster is automatically answering to both versions. Technically, they shouldn't. The problem is that you have to fix something that they broke - and they broke it in the hope of making it easier for you. What a paradox. :-)
The web has long been a place where you can easily put content online, in some form or other. Grab a copy of Frontpage from '97 and you can publish a template-based site within minutes, no big deal, and that's the fun part. However, to get things right, to get things optimal, you do have to do a bit more. It's always like that. You can grab a $5 digicam and snap shots with it, but you can't take those shots and turn them into a billboard (at least not unless you've got a big name that you can count on :-)). Similarly, you can put content online with Frontpage and have Google find it and send you visitors, no big deal. But if you want to get the most out of it, you'll have to put in some extra effort.
And I think that's the difference that will always remain: those who are satisfied with what they achieve with their limited resources and those who want to squeeze the last drop out of what they can achieve. A stupid example, I hate it myself: people like pretty sites -- if you slap together a site with a default template from Frontpage, most people are going to browse away. If you have a professional designer set up a really slick site, with the same content, you'll easily do much better. The content might be the same, but the pretty site will win just about every time. Which site should Google send visitors to? The one which was created by an amateur (and looks like it) or the one that has the same content but has a professional finish? Small things matter, and sometimes the money you spend on getting something done right, by an expert, matters. The same goes for website optimization for search engines: small things sometimes matter, and sometimes getting things technically 100% correct can be worth the trouble -- and sometimes you don't even need an expert to do that, sometimes you can find the important information on a forum.
So where does that leave the little guy? Don't try to compete with someone out of your league. Find your own niche and dominate it. If you're the only one selling walrus-skin laptop covers then you can have as much duplicate content as you want, Google will still list your site on top. If however you try to sell cheap laptops, then don't expect a technically incorrect site to beat professionally generated websites, even if all the other factors were identical (which they won't be either). To be #1, you just have to be better than the old #1 - if the old #1 is "worthless junk", you just have to be "junk" to beat them. ;-) If the old #1 is a fortune 100 company spending $5 million / year on the website, you're going to need a lot of help to beat them - that is where you'll want to make sure that your site is technically perfect. Of course if you're just aiming to beat "worthless junk", making your site technically perfect will help you to cement your #1 status - why be content with a #1 as "junk" when you can be a really strong #1 with "perfect"?
Originally, I pointed outed that the vast majority of time the so- called "duplicate content" is just variable URLs. But you didn't address that issue. You just explained to me how this is so. I also brought up that Google is expecting others to solve ITS problems so it can apply its duplicate content filter without the necessary programming to deal with the Web as it is. Now, implicit in that is an issue fairness and responsibility. But you made nothing of that. Instead you offered me business advice. As for the issue of making success on the web so difficult that only big business has a chance, you saw nothing wrong with that either. You made nothing of it and talked only about other things.
So, it's been great but I don't think any communication took place.
> You're right - ideally the web should not have high entry barriers. > Ideally, anyone should be able to express their opinion, voice their > thoughts and have it published online, picked up by search engines > world wide.
> However, things are never as simple. There will always be an entry > barrier, if only being able to find out where you can get things done > right easily.
> The problem is not that you have to do extra work to get things done > right, the problem is that all those services and web-hosters are > doing extra work to make things easy AND technically incorrect. Think > of the www/non-www issue: the problem is not that *you* have to > redirect, the problem is that your web-hoster is automatically > answering to both versions. Technically, they shouldn't. The problem > is that you have to fix something that they broke - and they broke it > in the hope of making it easier for you. What a paradox. :-)
> The web has long been a place where you can easily put content online, > in some form or other. Grab a copy of Frontpage from '97 and you can > publish a template-based site within minutes, no big deal, and that's > the fun part. However, to get things right, to get things optimal, you > do have to do a bit more. It's always like that. You can grab a $5 > digicam and snap shots with it, but you can't take those shots and > turn them into a billboard (at least not unless you've got a big name > that you can count on :-)). Similarly, you can put content online with > Frontpage and have Google find it and send you visitors, no big deal. > But if you want to get the most out of it, you'll have to put in some > extra effort.
> And I think that's the difference that will always remain: those who > are satisfied with what they achieve with their limited resources and > those who want to squeeze the last drop out of what they can achieve. > A stupid example, I hate it myself: people like pretty sites -- if you > slap together a site with a default template from Frontpage, most > people are going to browse away. If you have a professional designer > set up a really slick site, with the same content, you'll easily do > much better. The content might be the same, but the pretty site will > win just about every time. Which site should Google send visitors to? > The one which was created by an amateur (and looks like it) or the one > that has the same content but has a professional finish? Small things > matter, and sometimes the money you spend on getting something done > right, by an expert, matters. The same goes for website optimization > for search engines: small things sometimes matter, and sometimes > getting things technically 100% correct can be worth the trouble -- > and sometimes you don't even need an expert to do that, sometimes you > can find the important information on a forum.
> So where does that leave the little guy? Don't try to compete with > someone out of your league. Find your own niche and dominate it. If > you're the only one selling walrus-skin laptop covers then you can > have as much duplicate content as you want, Google will still list > your site on top. If however you try to sell cheap laptops, then don't > expect a technically incorrect site to beat professionally generated > websites, even if all the other factors were identical (which they > won't be either). To be #1, you just have to be better than the old #1 > - if the old #1 is "worthless junk", you just have to be "junk" to > beat them. ;-) If the old #1 is a fortune 100 company spending $5 > million / year on the website, you're going to need a lot of help to > beat them - that is where you'll want to make sure that your site is > technically perfect. Of course if you're just aiming to beat > "worthless junk", making your site technically perfect will help you > to cement your #1 status - why be content with a #1 as "junk" when you > can be a really strong #1 with "perfect"?
> Originally, I pointed outed that the vast majority of time the so- > called "duplicate content" is just variable URLs.
Yes, and that is a technical error on the side of the website operator. It is duplicate content, even if it is identical. It's something that can be fixed.
> I also > brought up that Google is expecting others to solve ITS problems so it > can apply its duplicate content filter without the necessary > programming to deal with the Web as it is.
It's a technical error that the website operator has committed. It is not Google's duty to fix those errors. Google will work around those errors the best it can, but it will always be a work-around and not an optimal solution. By letting Google work around your errors, you are accepting that it might make a decision that is sub-optimal to you. If you do not want Google to make such a decision, you need to fix your errors and make the decisions yourself.
> But you made nothing of that. Instead you offered me business advice.
It applies to websites just as well. If you are fine with Google making a decision about your website, then let Google handle it. You do not have to do anything. However, if your business depends on Google doing it right (in other words: you make the decision yourself because you know better than Google what you want to have), if your niche is competitive enough to require your website to be more than just "average", then you have to make sure that it is more than just average. You can't rely on Google to fix your errors if you have to depend on being perfect.
> As for the issue of making > success on the web so difficult that only big business has a chance, > you saw nothing wrong with that either. You made nothing of it and > talked only about other things.
Find a niche and dominate it. You cannot take Dell's niche and try to dominate it with $100/week on advertising. This is true online and offline. If the big business is active in your niche and you cannot match their resources, then you need to find a different niche - online or not. As a small company, you can be agile, you can react to changes in the business environment, you can talk to all your customers, you can offer services that a big company could never offer. Take that advantage and use it. Don't try to be a big business on a shoestring budget, it won't work - online or not. If you have your niche - online or not - you can do a lot without having to be perfect, without having to deal with duplicate content.
Duplicate content is something that sometimes doesn't "make sense" to the average person with a website. However, it is a technical issue that can be cleaned up. In most cases it doesn't matter if you clean it up or not, Google will handle it for you, more or less. In some cases, it can make a difference. If you want to make sure to take advantage of all the small things involved, then it does make sense to clean it up. If you have the knowhow and resources available to clean it up and can justify the effort, go ahead and do it -- but it's not a requirement for getting listed in Google.
So to sum it all up: - it's duplicate content when you have multiple URLs returning the same content. - you can let Google handle it for you, they'll usually do a pretty good job at it - you can fix it yourself, if you feel your decisions are better than Google's automated ones
Does that make more sense? Sometimes I don't have the time to keep things shorter :-)