Session IDs are a problem for robots. If you can configure the
Gallery2 software to get rid of them for robots and other non-
authenticated visitors, this will be a vast improvement.
You should also disallow robots from various types of uris for
functions such as login, print, enlarge, comment, different sorting
methos, etc.
Without your website url I am guessing.
> > forum but not sure just what I can leave out of it.- Hide quoted text -
>
> - Show quoted text -
For instance a url like this:
http://www.godragracing.org/gallery/main.php?g2_view=keyalbum.KeywordAlbum&g2_keyword=Joe+Newsham&g2_highlightId=6036
gets 302 redirected to
http://www.godragracing.org/gallery/main.php?g2_view=keyalbum.KeywordAlbum&g2_keyword=Joe+Newsham
Basically the last parameter in the query string, g2_highlightId=6036, gets
removed through that redirection.
But 302 redirection has the result of preserving both urls, albeit both with
the content on the destination url.
So you have instant content duplication across 2 urls.
Multiply this by however many urls the site has.
It also makes it harder for robots to crawl if at every step they get
redirected (that regardless of which kind of redirection is used).
Basically for smooth crawling and indexing there should not be any kinds of
redirections in navigation.
While you certainly have many "keywords", you don't actually have that much
text content to justify all those keywords.
But that's besides the point.
The login uri should be disallowed in robots.txt . Easiest would be with a
prefix:
Disallow: /gallery/main.php?g2_view=core.UserAdmin
Disallow: /gallery/main.php?g2_view=core.UserAdmin
All images wherever they may be should have alt text.
I'm crawling your site now with GSC.
It is set to include only web page urls and nothing else - so no image urls
or any media files. They don't belong in a general web sitemap.
So far I've had to ban urls containing these items as they produce duplicate
and some are virtually empty pages (no text, just an image that's already
been shown on another page).
/gallery/main.php?g2_view=core.DownloadItem
/gallery/main.php?g2_view=core.UserAdmin
/gallery/main.php?g2_view=keyalbum.KeywordAlbum
The above simulates the corresponding robots.txt directives
Also I used Remove Parameter for:
g2_highlightId
You will need to fix the software somehow to get rid of that and of the 302
redirection from urls containing that to others. No idea how you can manage
this in Gallery.
Crawling is excruciatingly slow due to all the redirections.
Also even after filtering out a lot of useless urls, it's found over 5000
and still going.
Just how many pages do you think you have? Just because you have over 10000
images doesn't mean there should be that many distinct pages indexed.
--
You received this message because you are subscribed to the Google Groups
"SOFTplus GSiteCrawler" group.
To post to this group, send email to gsitec...@googlegroups.com.
To unsubscribe from this group, send email to
gsitecrawler...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/gsitecrawler?hl=en.
Christina
www.webado.net
This was exactly what I was looking for and very detialed also thank
you. I finished the crawl after an excruciating 30 plus hrs and it
sent me 3 xml sitemaps in the standard and gzipped form. I believe it
found 467,000+ and indexed so far a little over 4,700. I know this is
wrong and felt it is hurting more than helping just like the 302 that
G2 adds to immediately for some reason also which I know google hates.
I thought the keywords would be needed for searching the types of cars
people were looking for and I don't think I can add any type of "alt"
to the images unles it's by descriptions only. Ifeel you have
definately found the right way to go about this for me and will
recrawl with your added parameters involved after I see what Gallery 2
has to say about the redirect.
On Dec 20, 8:32 pm, "Christina S" <web...@gmail.com> wrote:
> I don't advise rewriting urls now until and unless you implement
> simultaneously 301 redirections from old (current) urls to the new ones.
> AN issue I see right away is the overabundance of 302 redirections.
>
> For instance a url like this:http://www.godragracing.org/gallery/main.php?g2_view=keyalbum.Keyword...
> gets 302 redirected tohttp://www.godragracing.org/gallery/main.php?g2_view=keyalbum.Keyword...
> The url ishttp://www.godragracing.org/gallery/main.phpwhich in
> Christinawww.webado.net- Hide quoted text -
In the end there are about 10000 which I found. Seems like a lot to me even
then.
The crawl took maybe an hour or two, but I don't really remember.
Dear Christina,
--
Sincerely Mark
> > Christinawww.webado.net-Hide quoted text -
Good luck.
Personally my early attempts at using Gallery for anything have failed. Too
many options for me, can't see the forest for the trees LOL
I gave up long ago on that.
Sincerely Mark
--