What Data Should I Keep

3 views
Skip to first unread message

BC

unread,
Jun 30, 2009, 12:48:33 PM6/30/09
to SOFTplus GSiteCrawler
I have for you a few REAL NEW-BEE questions.

My site gspirit.com consist of 316 products and 50 Categories/Sub
Categories.
I did a GSiteCrawl and came up with 4027 URLS, 4026 to be included,
3337 to be crawled, and 65 aborted.

Now this seems a little extreme because if I do the math, 316 products
+ 50 categories + 316 img files = 682 links.

So, My Question Is.....
What data do I really want to keep in my site map? i.e. (I've noticed
the following)
-----------------------------------------------------------------------------------------
1.) The follow 3 links all go to the same page...

a) http://www.gspirit.com/Merchant5/merchant.mvc?Screen=PROD&Store_Code=GS&Product_Code=APTAJFR

b) http://www.gspirit.com/Merchant5/merchant.mvc?Screen=PROD&Store_Code=GS&Product_Code=APTAJFR&Action=ADPR&Attributes=Yes&Quantity=1

c) http://www.gspirit.com/Merchant5/merchant.mvc?Screen=PROD&Store_Code=GS&Product_Code=APTAJFR&FootSteps=1

One of them is a "direct link", another was created by my "footstep
links" and the other appears to be created by the "attribute link".
Should I remove links "b & c" and simple keep link "a", the direct
link? ?????
----------------------------------------------------------------------------------------------

2.) In another example I have one category "Christian T Shirts" that
has 70 products, which are setup to display "10 products" per page or
"view all". When crawled I end up with 8 links. Would it be best
from an SEO perspective to eliminate all but the "view all" link or
should I keep all 8 since they do "Theoretically" point to different
pages?
--------------------------------------------------------------------------------------------------

3.) Should I remove all of the references to image files in the site
map, i.e. gif, jpg, etc.?
Do they offer any value?

----------------------------------------------------------------------------------------------
4.) Also, would it be best to remove all references to pages like
"basket", "search", affiliate links, "account"?

------------------------------------------------------------------------------------------------

5.) And what would you recommend for links such as "contact us",
"customer service", "about us", and "FAQ's"?
-------------------------------------------------------------------------------------------------

I am guessing that when it is all said and done, that I should end up
with (per my example above) around 682 links?

Thank you very much for the support, believe me, it is much
appreciated.
Bill Cannon, BC
http://www.gspirit.com

webado2

unread,
Jul 1, 2009, 12:12:38 PM7/1/09
to SOFTplus GSiteCrawler
Hi,

The problelm is not so much the sitemap but rather what any robot gets
to crawl and find on the site.

You can and should disallow uris or uri prefixes for basket, search,
account etc. in your robots.txt file.
You can also disallow some uris or uri prefixes by pattern matching,
e.g. when the uri includes Attributes or Action or FootSteps, etc.

You seem to be using 2 types of urls: without query strings (as a
subfolder structure) and with query strings, based on /Merchant5/
merchant.mvc.
Are they are any that overlap?

Let's see what needs to be disallowed (using prefix and pattern
matching where * is any character string). Trying to also get them in
any order:

User-agent: *
Disallow: /Merchant5/merchant.mvc?*Screen=BASK
Disallow: /Merchant5/merchant.mvc?*Screen=LOGN
Disallow: /Merchant5/merchant.mvc?*Screen=SRCA
Disallow: /Merchant5/merchant.mvc?*Action
Disallow: /Merchant5/merchant.mvc?*Attributes
Disallow: /Merchant5/merchant.mvc?*Quantity
Disallow: /Merchant5/merchant.mvc?*FootSteps
Disallow: /Merchant5/merchant.mvc?*Screen=CTGY


There may be others that need disallowing, so you can tweak it after
you have run through it once.

I would not disallow the pages mentioned in #5 - at least not right
off. They add needed text content I feel. maybe Contact us if you feel
yuo dont' want to be indexed on that. Then add thsi to the series of
disallowed items in robots.txt:

Disallow: /SPTS


For the sake of not getting penalized by Google and other search
engines, you shoudl revise the page http://www.gspirit.com/LINK.html
and remove hints of link exchanges and reciprocal links.
You can add rel="nofollow" to various links you list there and
elsewhere on the site which are part of any link exchange or link
swapping initiative.
Or you can disallow that page altogether (Disallow: /LINK ). That of
course means you apge doesnt' pass any link value to any outgoing
links. However that is the idea behind Google discouraging link
exchanges anyway.

And don't let GsiteCrawler include images and media files (audio,
video, flash). Yuo can include such text based fiels
like .pdf, .doc, .xls if applicable







On Jun 30, 12:48 pm, BC <bcan...@GSpirit.com> wrote:
> I have for you a few REAL NEW-BEE questions.
>
> My site gspirit.com consist of 316 products and 50 Categories/Sub
> Categories.
> I did a GSiteCrawl and came up with 4027 URLS, 4026 to be included,
> 3337 to be crawled, and 65 aborted.
>
> Now this seems a little extreme because if I do the math, 316 products
> + 50 categories + 316 img files = 682 links.
>
> So, My Question Is.....
> What data do I really want to keep in my site map?  i.e. (I've noticed
> the following)
> ---------------------------------------------------------------------------­--------------
> 1.) The follow 3 links all go to the same page...
>
> a)http://www.gspirit.com/Merchant5/merchant.mvc?Screen=PROD&Store_Code=...
>
> b)http://www.gspirit.com/Merchant5/merchant.mvc?Screen=PROD&Store_Code=...
>
> c)http://www.gspirit.com/Merchant5/merchant.mvc?Screen=PROD&Store_Code=...
>
> One of them is a "direct link", another was created by my "footstep
> links" and the other appears to be created by the "attribute link".
> Should I remove links "b & c" and simple keep link "a", the direct
> link? ?????
> ---------------------------------------------------------------------------­-------------------
>
> 2.) In another example I have one category "Christian T Shirts" that
> has 70 products, which are setup to display "10 products" per page or
> "view all".  When crawled I end up with 8 links.  Would it be best
> from an SEO perspective to eliminate all but the "view all" link or
> should I keep all 8 since they do "Theoretically" point to different
> pages?
> ---------------------------------------------------------------------------­-----------------------
>
> 3.) Should I remove all of the references to image files in the site
> map, i.e. gif, jpg, etc.?
> Do they offer any value?
>
> ---------------------------------------------------------------------------­-------------------
> 4.) Also, would it be best to remove all references to pages like
> "basket", "search", affiliate links, "account"?
>
> ---------------------------------------------------------------------------­---------------------
>
> 5.) And what would you recommend for links such as "contact us",
> "customer service", "about us", and "FAQ's"?
> ---------------------------------------------------------------------------­----------------------

BC

unread,
Jul 2, 2009, 6:24:04 PM7/2/09
to SOFTplus GSiteCrawler
WOW!!!
Thank you Very Very much for the detailed explanation.
Looks like you took some time to crawl my site.

I am going to spend the entire day tomorrow deciphering your
message.
Having looked over it briefly (pressed for time) I can see that I will
have a couple questions for you.

Thanks again and have a blessed day.
BC
> engines, you shoudl revise the pagehttp://www.gspirit.com/LINK.html
> > Bill Cannon, BChttp://www.gspirit.com- Hide quoted text -
>
> - Show quoted text -

BC

unread,
Jul 5, 2009, 10:29:51 AM7/5/09
to SOFTplus GSiteCrawler
Hello Mr. Webado

Again I thank you for the very in-depth reply to my questions.
It has shed a little light on things for me.

I do need some clarification on a couple of things.

In your replay you made not of the following….

1) Your Statement:
“You seem to be using 2 types of urls: without query strings (as a
subfolder structure) and with query strings, based on /Merchant5/
merchant.mvc.
Are they are any that overlap?”

Reply:
Yes, they are overlapping and I am not sure why I am getting them. It
is a function of my Miva Merchant shopping cart. I recently changed
all of my store links to short links which eliminates the query string
and creates a standard html link. I am going to follow-up with the
boys at Miva as to why this is happening and/or if this is a normal
function.

2) I setup the robot.txt file “Disallowing” all the parameters you
mentioned. Thank you very much for that. I had no clue on how to do
it. I was thinking it had something to do with my htaccess file
(LOL). So anyway, I set it up and it worked like a charm.

3) Your Statement:
“For the sake of not getting penalized by Google and other search
engines, you shoudl revise the page http://www.gspirit.com/LINK.html
and remove hints of link exchanges and reciprocal links.
You can add rel="nofollow" to various links you list there and
elsewhere on the site which are part of any link exchange or link
swapping initiative.
Or you can disallow that page altogether (Disallow: /LINK ). That of
course means you apge doesnt' pass any link value to any outgoing
links. However that is the idea behind Google discouraging link
exchanges anyway.”

Reply:
Now this has got me really confused, because it sounds like you are
telling me that I should not be using link exchanges and reciprocal
links at all.
I do know (and here’s a chance for me to show off my intellect, lol)
from the research I have done that Reciprocal links are a vital part
of any website promotion effort. And that they help you increase your
web site traffic in two ways, (1) from people clicking on the links,
and (2) they play a major role in boosting your rankings in search
engines. I know that Link Farms are a bad idea and a site could be
penalized for using them, but quality links to quality sites are one
of the most important aspects in achieving a high search engine
ranking.

I am sure that I must be missing something in your response, could you
please help clarify this for me?

Well I guess that’s about it.
Looking forward to your reply, thanks for the support.

Have a blessed day.
BC
> > > Bill Cannon, BChttp://www.gspirit.com-Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -

Christina S

unread,
Jul 5, 2009, 11:50:50 AM7/5/09
to gsitec...@googlegroups.com
Hi,


Naturally acquired incoming links from related and relevant sites in good
standing are good for increasing your pages' PR. Naturally acquired means
you did not put them there yourself, or participate in a linking schem for
that. It means webmasters of other sites found yoru site good and relevant
to yours and decided to link to yours because it's good for their own
visitors.

Outgoing links to sites that are related to yours and relevant to the text
you have are good for your own site's visitors. Again such outgoing links
were not put on your site by others, only by you because you found those
other sites good and relevant and they add value to your own site's visitors
experience.

This precludes bought/sold, exchanged, swapped links and any from 2-way,
3-way, n-way linking scheme. Those are not natural.

You cannot control what sites links to yours - so if Google decides some
incoming links are not natural, they will be ignored.
You are in full control of your outgoing links.

If any of your outgoing links are not natural then you should either rmeove
them or add rel="nofollow" so they will not pass PR. Google has massive
resources and can quite easily detect linking patterns. In blatant cases,
Google can penalize sites that were found to particiapte in linking
schemes.


If any links are just for traffic that's OK but they have to be
robot-proofed: add rel="nofollow" or serve them through javascript (like
Adsense ads) or a redirector page which disallows robot access. This way
they are not goign to pass PR.

All this is well described in Google's Webmaster Guidelines.

All the best.

Christina
www.webado.net
Reply all
Reply to author
Forward
0 new messages