The aborted URLs Statistics lists a few 404 Not found errors, some of
which is correct , they are not there and I will remove the links to
them.
<pre><code>
<div id='navigation'>
<ul class='level1'>
<li><a href='home.html'>Home</a></li>
<li><a href='about.html'>About
us</a></li>
<li class='submenu' >Services
<ul class='level2' id='sub1'>
<li><a href='design.html'>Web site
design</a></li>
<li><a href='hosting.html'>Web site
hosting</a></li>
<li><a href='search.html'>Search engine
submission</a></li>
</ul>
</li>
<li><a href='contact.html'>Contact
us</a></li>
</ul>
</div>
</code></pre>
So, the above it part of a tutorial, the URLs do not go anywhere, and
they are not active links in any way.Your program I guess crawls for
URLs and ignores the fact they may be code and not actual URLs.
Two questions, can you sort it :) And second question, does the actual
Google Crawler act in this way (I would hope not)
Thanks
Gav...
http://www.minitutorials.com/firewalls/%5C%5Cwww.zonelog.com
error code: 404 (not found), linked from page(s):
http://www.minitutorials.com/firewalls/utb_part3_4.shtml
http://www.minitutorials.com/forums/favicon.ico
error code: 404 (not found), linked from page(s):
http://www.minitutorials.com/forums/index.php
http://www.minitutorials.com/issue_7/tutorials/PHPTut2.zip
error code: 404 (not found), linked from page(s):
http://www.minitutorials.com/webdesign/php/introtophp_2.php
http://www.minitutorials.com/issue_7/tutorials/simple_2.php
error code: 404 (not found), linked from page(s):
http://www.minitutorials.com/webdesign/php/introtophp_2.php
http://www.minitutorials.com/rss/feeds/itgfeed.xml
error code: 404 (not found), linked from page(s):
http://www.minitutorials.com/rss_xml/rss_display.php
5 broken link(s) reported
Also a whole slew of broken named anchors.
You are also mixing www and non www links on your site. Or perhaps some
are stated absolutely as http://minitutotials.com...... whereas the
others are relative so they end up being www.minitutorial.com/.......
when starting from www.mintutorials.com/ .
We can asusme that whatever Xenu sees Google also sees the same way.
However I need to rephrase my original question I think as you never
answered what I was asking, my fault.
GsiteCrawler is reporting broken links in the ''Aborted URLs' list.
These links are :-
Failed at 12/12/2006 18:25:
URL:
http://www.minitutorials.com/webdesign/css/home.html
Error: HTTP-Error 404 Not Found
Linked from:
http://www.minitutorials.com/webdesign/css/create_css_menu_system2.php
Failed at 12/12/2006 18:25:
URL:
http://www.minitutorials.com/webdesign/css/about.html
Error: HTTP-Error 404 Not Found
Linked from:
http://www.minitutorials.com/webdesign/css/create_css_menu_system2.php
Failed at 12/12/2006 18:25:
URL:
http://www.minitutorials.com/webdesign/css/design.html
Error: HTTP-Error 404 Not Found
Linked from:
http://www.minitutorials.com/webdesign/css/create_css_menu_system2.php
Failed at 12/12/2006 18:25:
URL:
http://www.minitutorials.com/webdesign/css/hosting.html
Error: HTTP-Error 404 Not Found
Linked from:
http://www.minitutorials.com/webdesign/css/create_css_menu_system2.php
Failed at 12/12/2006 18:25:
URL:
http://www.minitutorials.com/webdesign/css/contact.html
Error: HTTP-Error 404 Not Found
Linked from:
http://www.minitutorials.com/webdesign/css/create_css_menu_system2.php
Failed at 12/12/2006 18:25:
URL:
http://www.minitutorials.com/webdesign/css/search.html
Error: HTTP-Error 404 Not Found
Linked from:
http://www.minitutorials.com/webdesign/css/create_css_menu_system2.php
These are all within the 'content' of the 'create_css_menu_system2.php'
page.
They are not actually links, they are code placed within <code> tags,
and as such
the < and > before quoting a URL like 'a href='home.html'.
These are not real links, they don't go anywhere and yes, the pages do
not actually
exist, there not meant to, it's code samples.
So these IMO should not be treated as live links, but instead should be
treated as
code data and as such ignored by GSiteCrawler.
It seems Xenu and Google are interpreting these correctly as neither
are reporting
these as broken links.
Thanks
Gav...
When I include code on a page I also use <pre> ....</pre> but no
<code> ... </code> tags.
Perhaps if you use " instead of the apostrophe ' it might "fool"
the GSiteCrawler into ignoring the contents? I realize that's not the
answer you expect though.
I'd wait until John sees this however, as he'd know best what's
happening and why and might be able to fix it.
One thing you might be able to do is obfuscate a part of that text.
Instead of
href='url'
you could try
href='url'
(I hope I got it right :-))
John
Yes, I did the obsucation and it works fine.
Then I did the same for code snippets of "img src..."
links using c etc but this did not work for these.
Google and Xenu and other link checkers do not have
this problem and so I am bascially just using this
workaround just to please GSiteCrawler which is not
the ideal solution. I don't think I am prepared to do this
and so will just ignore this part of GSiteCrawler I
think.
Perhaps you could enhance this by checking using DOM
& checking the parent container against the namespace
or similar. By checking this you'll be able to ignore
anything inside of <code> tags etc..
Gav...