Google Groups Home
Help | Sign in
Message from discussion Alt.internet.search-engines faq
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Martin Jensen  
View profile  
 More options Jan 31 2001, 5:13 am
Newsgroups: alt.internet.search-engines
From: "Martin Jensen" <ryt...@sleipner.dk>
Date: Wed, 31 Jan 2001 11:10:55 +0100
Local: Wed, Jan 31 2001 5:10 am
Subject: Alt.internet.search-engines faq
___ __ __ ___ ___ ____ _______________ ____ ___ ____ __ _ _ _ _
___ __ __ _|                                     |__ __ __ __ _
___ __ __ _|    ALT.INTERNET.SEARCH-ENGINES      |__ __ _ _ _ _
___ __ __ _|            CHARTER & FAQ            |__ __ __ __ _
___ __ __ _|_ ___ ____ _______________ ____ ___ _|__ __ __ __ _

TABLE OF CONTENTS

I. What is alt.internet.search-engines?
        1. Officially appropiate topics.
        2. Unofficially appropriate topics.
        3. Statement on advertising.
        4. Basic behaviour in alt.internet.search-engines.
        5: Quoting techniques.

II. Frequently asked questions.
        1. What is a portal/directory?
        2. What is a search engine?
        3. What is cloaking?
        4. What is search engine optimization?
        5. Why is search engine ranking important?
        6. What keywords or phrases should I optimize my
           web site for?
        7. What is "robots.txt"?
        8. Will a search engine spider my frames page?
        9. How can I start my own search-engine?
       10. Virtual Hosts / individual IP addresses
       11. What is this Dmoz/Open Directory Project (ODP)
           everyone rants about?

III. Other resources on the net
        1. List of Search engines.
        2. Cloaking Tutorial + FAQ
        3. Keyword Research
        4. Meta Tags
        5. Search Engine Newsletters
        6. Search Engine Optimization Newsletters
        7. Discussion Forums
        8. General Tips + Tricks

IIII. More info on this FaQ
        1. Current version and posting-frequency.
        2. Suggestions and changes.
        3. Changes in versions
        4. Contributors

Appendix 1: The original charter.

---------------------------------------------------
- - - - - - - - - - PART  I - - - - - - - - - - - -
- - - -WHAT IS ALT.INTERNET.SEARCH-ENGINES? - - - -
---------------------------------------------------
The newsgroup alt.internet.search-engines was first created on
23 Feb, 1999 by Imran Ghory as "A group for discussing search
engines".
_________________________________________________________________
I.1. Officially appropriate topics

This is the definition from the original charter of the group.

The following will be on-topic:
- Announcements relating to search engines.
- Discussion on how to use search engines efficiently.
- Discussion of getting URLs added to search engines.
- Comparisons of search engines.
- Analysing of webpages in reference to search engines.
- Questions(and answers) on search engine use.
- General discussion of search engines.

The following will be off-topic:

- Adverts
- Development of search engines. (Use comp.infosystems.search)

The following are not allowed in the group even if they are
on-topic,
- Binaries.
- Excessive cross posts(ECP).
- Posts containing or in HTML.

_________________________________________________________________
I.2. Unofficially appropriate topics.

The definition "discussion of search-engines" has unofficially
been adjusted by the group and is recommended to be read as
follows:

The purpose of this group is to discuss and debate all subjects
related to search engines and the technology associated with
them. This wide subject ranges from search engine optimization
to 'where is the submit button in AltaVista's new layout'. This
group is not intended for the posting of ads and it is suggested
that you instead use the newsgroups intended for them.

A general rule for deciding if your post is well regarded or not
is that it must at least include a question or answer to a
question that is in some way connected to the search engines. Do
not post messages like "fast search-engines www.something.com".
If you have found a good search engine - and only want to share
it with the others (meaning not discussing it) - See the contact
information in the bottom of this page, then it will be included
in the FaQ or will be added to one of the websites under "other
net resources".

_________________________________________________________________
I.3. Statement on advertising.

There is a general consensus that off-topic advertising (i.e.
not relating to the purpose of the group) should not be allowed.

1: Nobody wants to read advertisements for products which they
probably would not be interested in. 2: People PAY to use
newsgroups. It is a waste of their money when you post your
advert. (and that will most likely keep them from buying your
stuff anyway).

Because of this - please understand that alt.internet.search-
engines is NOT the right place to advertise your products. If
you are publishing a new book related to the search engines, or
have created a site with quality content related to search
engines or have written an article concerning search engines,
you are welcome to post an announcement about it. Just don't
impersonate someone that has 'found a great new site' and make
sure the information you're offering is indeed relevant. Also,
please don't post the same announcement several times. We'll all
see it the first time and will read it if we feel like doing so.
Posting the same announcement again will just upset people who
have already read it.

There are some users whose servers does not store messages for
more than a month. If you have a site or program that you think
is _VERY_ relevant to people using the group - send it to the
maintainer of this FaQ and it will be included in the next
version. This FaQ is posted 3-4 times a month to the group.

_________________________________________________________________
I.4. Basic behaviour in alt.internet.search-engines

Much of the info in the following will be common sense to most
people, but sometimes someone tends to forget and sometimes
people are new to the net and need introduction. If you are into
the basic netiquette, you can skip reading the following.

What is netiquette: Netiquette is a set of norms that you should
follow when acting online. It is a set of rules that you can
chose not to follow. But not following these is like farting in
a restaurant or scratching your bottom before shaking hands.

- Spam and inflammatory messages: Don't join a group just to
post inflammatory messages - this upsets most system
administrators and you could lose access to the net.

- Keeping the group clear: Try to keep your questions and
comments relevant to the focus of the discussion group.

- When someone posts an off-subject note, and someone else
criticizes that posting, you should NOT submit a gratuitous note
saying "well, I liked it and lots of people probably did as well
and you guys ought to lighten up and not tell us to stick to the
subject".

You can read more about USENET rules at:
http://www.faqs.org/faqs/usenet/

_________________________________________________________________
I.5. Quoting techniques

When quoting another person, edit out whatever isn't directly
applicable to your reply. Don't let your mailing or Usenet
software automatically quote the entire body of messages you are
replying to when it's not necessary.

Take the time to edit any quotations down to the minimum
necessary to provide context for your reply. Nobody likes
reading a long message in quotes for the third or fourth time,
only to be followed by a one line response: "Yeah, me too."

When quoting part of a message (part by part) use [..] or <snip>
or similar tags where you have cut sections and parts of the
message.

Keep your message short and devoid of redundancy. If you do not
edit your message. Try imagining. You have 3kb´s of unessesary
text in your message. That is probably nothing, you may think -
But if 1500 people download your message, that would add up to
4500 KB!. Irrelevant downloads aside - it can be rather annoying
trying to follow a badly quoted discussion.

---------------------------------------------------
- - - - - - - - - - PART II - - - - - - - - - - - -
- - - - - - FREQUENTLY ASKED QUESTIONS- - - - - - -
---------------------------------------------------
- At current time the FaQ only contains some basic info on the
different terms used concerning search engine discussions.

_________________________________________________________________
II.1. What is a portal/directory?

Portals/directories are "search engines" that require submission
of our site to see it included in their database. They normally
have quite a number of editors reviewing the sites before they
will be accepted. Many of these are business to business or
otherwise focussed on a certain topic.

One of the most popular directories is http://www.yahoo.com.
Many portals are linked with search engines that provide search
results if the directories do not have any sites within a
specific search term.

_________________________________________________________________
II.2. What is a search engine?

A search engine is a huge database that is constantly updated by
"spiders" - these are robots or automated programs that
constantly crawl the www following links on home pages (web
sites). These capture the text on webpages and, based on
different algorithms, they output results when people search in
them. Some very popular search engines are
http://www.google.com, http://www.altavista.com and
http://www.webcrawler.com. Many search engines cooperate with
portals/directories for providing their users an alternative way
of finding information instead of relying on search by keyword.

_________________________________________________________________
II.3. What is cloaking?

Cloaking is a technique used by some web sites to feed different
content to search engine spiders (see above) and to human
visitors.

This may be employed to improve ranking for a site as the output
to the search engines will usually be optimized, targeting their
specific ranking algorithms. Another major use of cloaking is to
protect web page code from being stolen by competitors. Finally,
cloaking may be required to work around browser incompatibility
issues, non-spiderable page code (e.g. graphics rich sites,
splash pages, Flash, Java, JavaScript, etc.), dynamic page
delivery, etc.

Please notice that many search engines do _not_ approve of this
practice while a few others encourage it. This is mainly so if
cloaking is employed in a misleading ("spammy") way, e.g. by
redirecting surfers to content they did not target when clicking
the displayed search result URL.

_________________________________________________________________
II.4 What is search engine optimization?

Search engine optimization or search engine positioning is the
art and the science of constructing or organizing web pages in a
way to help them achieve good rankings with the search engines.
All search engines follow their own, proprietary ranking
algorithms which are continuously tweaked and improved upon.
These algorithms being treated as trade secrets, the search
engines will obviously not divulge their details. This makes
professional search engine optimization very similar to reverse
engineering: some experts will run test pages and even whole
test domains for the sole purpose of determining individual
search engines' ranking behavior. This may involve questions
like which engine values meta tags, titles, alt tags, link
popularity, click-through frequency, etc.

Hence, efficient optimization can turn into a very involved
affair requiring lots of specialist knowledge, up-to-date
information, statistical analysis, etc. The more competitive the
WWW becomes, the harder it gets to achieve decent rankings in
those areas where many sites are vying for attention.

_________________________________________________________________
II.5 Why is search engine ranking important?

Surveys and studies have shown that surfers searching the engines
for keywords or phrases will typically click through to those
sites featured highest. Page one to page three rankings will
make for appr. 90% of all search engine generated user traffic.
What this boils down to is that your web site will not generate
any traffic worth mentioning if it is featured lower than
(typically) Top 30. So if you want your site to be known and to
draw lots of visitors, a good ranking with the major search
engines is crucial.

_________________________________________________________________
II.6 What keywords or phrases should I optimize my web site for?

Regardless of whether you have a commercial or a non-profit or
amateur web site: picking the keywords or search phrases for
optimization of your site is crucial. A frequent mistake among
webmasters is gauging the popularity of keywords biased by their
own tunnel view of what people should be interested in. Luckily,
many search engines (major and minor) offer real time search
monitoring on special pages (so-called "voyeur" function or
pages). There is also an abundance of real life search phrases
databases (both free and commercial) available on the net.
Finally, you can make use of special software which can help you
automate the process. For a fairly extensive overview of real
life keyword research resources see "Keyword Research" in the
resources section below.

_________________________________________________________________
II.7. Will a search engine spider my frames page?

They will if you link all your subpages from the text within the
noframes tag. However - It will not index your frameset, but
each single page. This means that users entering you site will
most likely NOT load the frameset.

You can use JavaScript to check that the frameset is loaded.
However that presents 2 problems:

1. Most of them do not work very well. 2: The client side
redirection might get your page banned from the search engine.

It is recommended that you (concerned to SEO, not to pagedesign)
do not employ frames. If you chose to do so, it is highly
recommended that you have navigation within your framed page as
well so the user can navigate without the frameset.

_________________________________________________________________
II.8 What is "robots.txt"?

The Robots Exclusion Protocol is a method that allows you to
tell visiting spiders what to index and what to leave alone. You
can exclude a particular spider or all spiders (that follow the
standard) from your entire site, from particular directories, or
from particular files.

- Should I create a robots.txt file?
  Only if you want crawlers to stay away from your site (or parts
  of it such as password restricted areas, graphics directories,
  etc.)

- Can I leave the robots.txt blank?
  Yes, but that will cause some spiders to leave without indexing.

- How should my robots.txt look like?
  Check here:
  http://info.webcrawler.com/mak/projects/robots/exclusion.html
  as this page features links to relevant sites.

- Can I prevent indexing by other means than robots.txt?
  Yes, you can use: <META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
  in your header. However, not all robots respect this.

_________________________________________________________________
II.9 How can I start my own search-engine?

Robots (also known as spiders, wanderers, worms, crawlers and gatherers)
follow links from one web page to another. They work with indexing code
to store data for later searching.

There is a good deal of free open source code available -- you don't have
to start from scratch. You can find a long range of search engines in the
programming language best suited for your needs at:

http://www.searchtools.com/robots/robot-code.html

_________________________________________________________________
II.10 Virtual Hosts / individual IP addresses

It is a common problem that search engines will occasionally index one site
and redirect to another. Usually this issue relates to problems with the
HTTP/1.1 standard.

The World-Wide Web Consortium strongly recommends that web servers use
virtual hosts, so as not to waste additional IP addresses simply for Web
hosting. This means that hundreds of domains can reside on the same ip
address.

The problem results from the fact that not all Search Engines honor the
HTTP/1.1 standard which allows for this particular implementation, or, in
rare instances, that the web hosting services have misconfigured their
servers.

Avi Rappoport have done research that shows that AltaVista, Excite, FAST,
Google, Northern Light, Go (the engine formerly known as Infoseek), etc. do
not have any problems with this at all.

The only spiders that failed to send the proper headers were MOMspider/1.10
libwww-perl/0.40 and PerlMan Surf; additionally, similar problems with the
French search engine Voila have been reported.

How can you resolve this problem? Simply put, leave your current web
hosting service if they fail to address the issue, or get an individual IP
address. Contacting the search engines directly might also produce results.

This problem should evaporate in 2001 at the very latest, as indivdual ip
addresses are getting ever more rare and the Search Engines simply cannot
avoid adapting to the standards of the World-Wide Web Consortium.

_________________________________________________________________
_________________________________________________________________
II.11 What is this Dmoz/Open Directory Project (ODP) everyone rants about?

Throughout last year there has been a lot of discussion about the
Open Directory Project (dmoz.org) that delivers directory data to some of
the
major search engines. It is currently the largest index on the web, with
more than
2 million unique sites and almost no dead links.

There are basicly two camps participating in this discussion:

1. People defening the ODP.
2. People that hate the ODP.

There are lots of problems under discussion - the following is intended as a
short non partisan summary.

1.
The directory is owned by AOL/Warner/Netscape, one of the major players in
the internet market, but the directory is driven entirely by volunteers that
do not get paid for their efforts. Some people consider this an "abuse" of
volunteers for commercial purposes.

2.
The Dmoz/ODP people are apparently constantly seeking new editors, but
rejecting 90% of all applications. It is known that people who are is very
qualified for a particular category (that does not have any editor yet) are
being rejected with the same standard reply. Many people have experienced
thei applications not getting any response at all.

3.
Dmoz fires many editors without reasons given. This contention should be
taken with a grain of salt as representing only one of two sides to this
story. Editors have been known to abuse their privileges before and have
been fired for this, while others seem to have been fired without apparent
reason.

4.
ODP is quite inaccessible. They don't publish any phone numbers, their
address seems to be treated as a state secret and is virtually impossible to
obtain. Some people consider this a bad thing, while others diasagree. It is
a common occurrence that you do won't get any feedback from editors,
explaining why your submissions have been turned down and what you could to
to rectify the situation - even if you have been indexed in all the other
major directories.

5.
The Editors have incredible power. If they manage to get in charge of a
category in which they themselve have a vested interest (typically a web
site of their own), they can "cool" their site and change the descriptions
of the other sites in that category and keep competitors from getting their
page indexed at all. It is an established fact that this happens, but it is
also well known that the Dmoz administration are working on preventing this.

Needless to say, many people nurture hard feelings towards Dmoz. Either not
for getting any feedback on their site submissions (e.g. an explanation for
not being indexed) or their applications for editorship, or for getting the
same boiler plate reply that all people get.

On the other side their is the "pro-Dmoz" front consisting either of
established editors or people who are simply  in favour of the index.

-------------------------------------------------------------

---------------------------------------------------
- - - - - - - - - - PART III. - - - - - - - - - - -
- - - - - - OTHER RESOURCES ON THE NET- - - - - - -
---------------------------------------------------
This sections contains links to other resources on the net.

___________________________________________
III.1. Search engines.

Search Engines:
- http://www.altavista.com/
- http://www.alltheweb.com/
- http://www.directhit.com/
- http://www.excite.com/
- http://search.go.com/
- http://www.google.com/
- http://www.goto.com/
- http://www.hotbot.com/
- http://www.lycos.com/
- http://www.northernlight.com/
- http://raging.com/

Portals directories
- http://www.dmoz.com/
- http://www.looksmart.com/
- http://www.snap.com/
- http://www.yahoo.com/

Submission URLs
- http://searchenginebase.com/sbsumissions.html

You can find a comprehensive list of all major and many minor
search engines at: http://www.searchenginebase.com/

III.2. Cloaking Tutorial + FAQ
- http://fantomaster.com/fafaqcloak1.html
- http://www.spiderhunter.com/

III.3. Keyword Research
- http://fantomaster.com/fasmbres03.html#voyeur

III.4. Meta Tags
- The Definitive Resource: http://vancouver-webpages.com/META/

III.5. Search Engine Newsletters
Newsletters Featuring Search Engine News (in alphabetical order)
- Actu Moteurs (in French): http://www.abondance.com/
- Google Friends Newsletter: http://www.google.com/
- Pay Per Click Search Engines Update:
  http://PayPerClickSearchEngines.com
- Search Engine Guide: http://www.searchengineguide.com/

III.6. Search Engine Optimization Newsletters
Newsletters on Search Engine Optimization (in alphabetical order)
- fantomNews: http://fantomaster.com/fantomnews.html
- RankWrite: http://www.rankwrite.com/
- Search Engine News: http://www.searchengine-news.com
- Search Engine Optimization and User Interface:
  http://www.cre8pc.com/seui.html
- Search Engine Quarterly: http://www.searchengineworld.com/
- Search Engine Watch: http://www.searchenginewatch.com/
- The Spider Report: http://spider-food.net/

III.7. Discussion Forums (in alphabetical order)
- AIM-Pro: http://www.aim-pro.com/cgi-bin/Ultimate.cgi
- Market Position Talk: http://www.marketpositiontalk.com/forums/
- SearchEngineBase Forum:
  http://searchenginebase.com/discussions.html
- SearchEngine Discussion Forum:
  http://searchenginediscussion.com/cgi-bin/ubb/Ultimate.cgi
- SearchEngineForums: http://www.searchengineforums.com/
- SearchEngineMatrix Forum: http://www.searchenginematrix.com/
- SearchEngineWorld Forum: http://www.webmasterworld.com/index.cgi

III.8. General Tips + Tricks (in alphabetical order)
- http://www.aim-pro.com/
- http://fantomaster.com/
- http://www.searchengineworld.com/
- http://spider-food.net/
- http://www.spiderhunter.com/

III.9. Search engine spider verification service
- http://spiderscouts.com/

---------------------------------------------------
- - - - - - - - - -PART IV. - - - - - - - - - - -
- - - - - - - MORE INFO ON THIS FAQ - - - - - - - -
---------------------------------------------------
Current release: 1.07

_________________________________________________________________
IV.1. Current version and posting-frequency.

The current version of this document can always be found at the
following:

  WWW     http://searchenginebase.com/aise-charter-faq.html
          http://search.mermaidconsulting.com/altinternetsearchengines.txt
          http://www.geocities.com/ranktips/faq.htm

  USENET  Posted three to four times per month to
          alt.internet.search-engines

_________________________________________________________________
IV.2. Suggestions and changes.

Suggestions and changes should be posted to
alt.internet.search-engines with a title of
"FAQ-Suggestion"
or
sent by email to the current maintainer:
fantomasterNOS...@NOSPAMfantomaster.com
(with frequently asked questions)
or:
martinNOS...@NOSPAMmermaidconsulting.com
(the rest)

_________________________________________________________________
IV.3. Changes in versions

1.07:
Edited I.
Added II.10 and II.11.
Added III.9.
Edited IV.

1.06:
Edited III.1.

1.05:
Removed: III.9. (due to update matters)

1.04:
Added:  III.9. (thanks Ash)

1.03:
Added:  II.9 (thanks Avi Rappoport)

1.02:
Proofread and typos removed. (Thanks, Dirk!)

1.01:
Added:  II.4 -> II.6
                III.2 -> III.8
Edited:         I.4.

_________________________________________________________________
IIII.4. Contributors

Contributors include: Imran Ghory, Rupert Bowling,
Ashley Williams, Lauri Harpf, Avi Rappoport, Uksitesubmit, Ash Williams,
James Cox, Dirk, Ralph aka fantomaster, Martin Rytter Jensen, and many
others.

_________________________________________________________________
Appendix 1: The original charter.

Charter:

A group for discussing search engines.

The following will be on-topic:

Announcements relating to search engines.
Discussion on how to use search engines efficiently.
Discussion of getting URLs added to search engines.
Comparisons of search engines.
Questions(and answers) on search engine use.
General discussion of search engines.

The following will be off-topic:

Adverts
Development of search engines.(Use comp.infosystems.search)

The following are not allowed in the group even if they
on-topic,

Binaries.
Excessive cross posts(ECP).
Posts containing or in HTML.

Justification:

Conducting a DejaNews search on the term "Search Engine" it
turned up exactly 20461 messages between the period 1st January
1999 and the 1st February 1999, showing an average of 660
messages a day. These messages were widespread over several
hundred newsgroups, but the largest concentration of them were
in comp.infosystems.www.* and alt.internet.* hierarchies.

The group comp.infosystems.search is not suitable for this
purpose as it focuses on the development and administration
side of search engines.

Proponent: Imran Ghory <Imr...@btinternet.com>

_________________________________________________________________
<END OF FILE>


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google