Hi,
The problelm is not so much the sitemap but rather what any robot gets
to crawl and find on the site.
You can and should disallow uris or uri prefixes for basket, search,
account etc. in your robots.txt file.
You can also disallow some uris or uri prefixes by pattern matching,
e.g. when the uri includes Attributes or Action or FootSteps, etc.
You seem to be using 2 types of urls: without query strings (as a
subfolder structure) and with query strings, based on /Merchant5/
merchant.mvc.
Are they are any that overlap?
Let's see what needs to be disallowed (using prefix and pattern
matching where * is any character string). Trying to also get them in
any order:
User-agent: *
Disallow: /Merchant5/merchant.mvc?*Screen=BASK
Disallow: /Merchant5/merchant.mvc?*Screen=LOGN
Disallow: /Merchant5/merchant.mvc?*Screen=SRCA
Disallow: /Merchant5/merchant.mvc?*Action
Disallow: /Merchant5/merchant.mvc?*Attributes
Disallow: /Merchant5/merchant.mvc?*Quantity
Disallow: /Merchant5/merchant.mvc?*FootSteps
Disallow: /Merchant5/merchant.mvc?*Screen=CTGY
There may be others that need disallowing, so you can tweak it after
you have run through it once.
I would not disallow the pages mentioned in #5 - at least not right
off. They add needed text content I feel. maybe Contact us if you feel
yuo dont' want to be indexed on that. Then add thsi to the series of
disallowed items in robots.txt:
Disallow: /SPTS
For the sake of not getting penalized by Google and other search
engines, you shoudl revise the page
http://www.gspirit.com/LINK.html
and remove hints of link exchanges and reciprocal links.
You can add rel="nofollow" to various links you list there and
elsewhere on the site which are part of any link exchange or link
swapping initiative.
Or you can disallow that page altogether (Disallow: /LINK ). That of
course means you apge doesnt' pass any link value to any outgoing
links. However that is the idea behind Google discouraging link
exchanges anyway.
And don't let GsiteCrawler include images and media files (audio,
video, flash). Yuo can include such text based fiels
like .pdf, .doc, .xls if applicable
On Jun 30, 12:48 pm, BC <bcan...@GSpirit.com> wrote:
> I have for you a few REAL NEW-BEE questions.
>
> My site
gspirit.com consist of 316 products and 50 Categories/Sub
> Categories.
> I did a GSiteCrawl and came up with 4027 URLS, 4026 to be included,
> 3337 to be crawled, and 65 aborted.
>
> Now this seems a little extreme because if I do the math, 316 products
> + 50 categories + 316 img files = 682 links.
>
> So, My Question Is.....
> What data do I really want to keep in my site map? i.e. (I've noticed
> the following)
> -----------------------------------------------------------------------------------------
> 1.) The follow 3 links all go to the same page...
>
> a)
http://www.gspirit.com/Merchant5/merchant.mvc?Screen=PROD&Store_Code=...
>
> b)
http://www.gspirit.com/Merchant5/merchant.mvc?Screen=PROD&Store_Code=...
>
> c)
http://www.gspirit.com/Merchant5/merchant.mvc?Screen=PROD&Store_Code=...
>
> One of them is a "direct link", another was created by my "footstep
> links" and the other appears to be created by the "attribute link".
> Should I remove links "b & c" and simple keep link "a", the direct
> link? ?????
> ----------------------------------------------------------------------------------------------
>
> 2.) In another example I have one category "Christian T Shirts" that
> has 70 products, which are setup to display "10 products" per page or
> "view all". When crawled I end up with 8 links. Would it be best
> from an SEO perspective to eliminate all but the "view all" link or
> should I keep all 8 since they do "Theoretically" point to different
> pages?
> --------------------------------------------------------------------------------------------------
>
> 3.) Should I remove all of the references to image files in the site
> map, i.e. gif, jpg, etc.?
> Do they offer any value?
>
> ----------------------------------------------------------------------------------------------
> 4.) Also, would it be best to remove all references to pages like
> "basket", "search", affiliate links, "account"?
>
> ------------------------------------------------------------------------------------------------
>
> 5.) And what would you recommend for links such as "contact us",
> "customer service", "about us", and "FAQ's"?
> -------------------------------------------------------------------------------------------------