Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

how to prevent duplicate entries

1 view
Skip to first unread message

neat...@yahoo.com

unread,
Jul 7, 2005, 5:24:11 PM7/7/05
to gsitec...@googlegroups.com
Ok I have a web store with products. Now the products can be found with
many different search strings. The Crawler is finding the each product
in all the possible search strings. So I may get 10 listings for each
product.

Is there a way of adding a wildcard after a certain DROP string?
Example all searchs begin with &page=
but from there the url is dymanic and will be listed differnt depending
on the search the customer used.
example:
&page=1&from=v&srchby=title&keyword=&section=sv&category=All

If the Crawler stopped at the &page= then it would know when it found
duplicates and I think not list duplicates.

Any ideas?

JohnnyM

unread,
Jul 7, 2005, 6:44:22 PM7/7/05
to gsitec...@googlegroups.com
I'm not sure if I understand you correctly - are you saying that you
want to drop all URLs with "&page=" in them? If so, you can do this
quite easily with the program - just add "&page=" to the table in the
tab "ban URLs". That way it won't pick up any links with that in it.

If you do want certain queries to work (i.e. in order to find all your
products), you can add these specifically to the URL-List (un-check
"Include" if you don't want them in the sitemap-file, but keep "Crawl"
checked).

Is that about what you were looking for? (If not, it would be easiest
if you could send me your URL, then I can see how to optimize the
settings for it). :-)

Best regards
John

neat...@yahoo.com

unread,
Jul 8, 2005, 8:01:20 AM7/8/05
to gsitec...@googlegroups.com
Thanks JohnnyM, that did the trick. I just did not know how to use that
feature.

J.M.H.

unread,
Jul 7, 2005, 7:03:04 PM7/7/05
to gsitec...@googlegroups.com
We have around 500 products, so checking them one at a
time or unchecking the duplicates is not going to
work.

All of the product listings have the "&page=" so
banning the urls defeats the purpose.

I will email you the urls and maybe that will be
clearer that I am.

Well here let's try this:

http://XXXXXXXXX.com/background_info.php?id=683&ptitle=Celebration+-+Loopable&page=1&from=v&srchby=title&keyword=C%2A&section=sv&category=All

and

http://XXXXXXXXX.com/background_info.php?id=683&ptitle=Celebration+-+Loopable&page=1&from=v&srchby=title&keyword=&section=sv&category=All

and

http://XXXXXXX.com/background_info.php?id=683&ptitle=Celebration+-+Loopable&page=1&from=v&srchby=label&keyword=celebration&section=sv&category=All

These are all the same product. Up to the &page the
url is the same and if Google list it that way then
there should be no duplicates.

JohnnyM

unread,
Jul 7, 2005, 7:10:16 PM7/7/05
to gsitec...@googlegroups.com
Why this is marked as abuse? It has been marked as abuse.
Report not abuse
Ah, ok, I see (not everything, the URL would still be nice, if you
could email it to me I can confirm). What the program can also do is
remove the trailing parameters (from, srchby, keyword, section,
category). You can do that by putting these into the table in the tab
"Remove Parameters". That way, when the crawler gets to one of these
links, it will instead go to "
http://XXXXXXXXX.com/background_info.php?id=683&ptitle=Celebration+-+Loopable&page=1
" - which, if I understand you correctly, will go to the product page.
Also, this will be the URL saved in the sitemap file as well.

Does that help? Otherwise feel free to send me the link by mail
(softplus at gmail.com), I'm certain we can find a way :-)

John
Reply all
Reply to author
Forward
0 new messages