Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Update on indexing blogrolls
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 29 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post will appear after it is approved by moderators
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Jeremy Hylton  
View profile  
 More options Dec 19 2008, 1:25 pm
From: Jeremy Hylton <jhyl...@gmail.com>
Date: Fri, 19 Dec 2008 10:25:22 -0800 (PST)
Local: Fri, Dec 19 2008 1:25 pm
Subject: Update on indexing blogrolls
I wanted to give everyone a brief end-of-the-year update on the
blogroll problem.  When we switched blogsearch to indexing the full
text of posts, we started seeing a lot more results where the only
matches for a query where from the blogroll or other parts of the page
that frame the actual post.  (There's been a lot of discussion of the
problem.  You can search for [google blogsearch] using Google
Blogsearch.)

We're in the midst of deploying a solution for this problem.  The
basic approach is to analyze each blog to look for text and markup
that is common to all of the posts.  Usually, these comment elements
include the blogroll, any navigational elements, and other parts of
the page that aren't part of the post.  This approach works well for a
lot of blogs, but we're continuing to improve the algorithm.  The
search results should ignore matches that only come from these common
elements.  The indexing change to implement it is deployed almost
everywhere now.

We expect users will continue to see some spurious results, but many
fewer than before.  I tried a search for my own name, which does
appear in a few blogrolls, and all the results looked good.  If you
are still seeing blogroll hits, the problem is most likely caused by
our failure to analyze a particular blog correctly.  Feel free to
follow up with examples in private email or in this forum.

Jeremy Hylton
Google Blogsearch


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tamar  
View profile  
 More options Dec 26 2008, 9:34 am
From: tamar <puntr...@gmail.com>
Date: Fri, 26 Dec 2008 06:34:41 -0800 (PST)
Local: Fri, Dec 26 2008 9:34 am
Subject: Re: Update on indexing blogrolls
Curious - around the same time of the initial report, I started
getting Google Alerts with blogroll links.  If anything, it's become
*more* common and not less common lately.  Does the change you write
about, Jeremy, impact Google Alerts?

If not, perhaps someone should take a look.

Thanks.

On Dec 19, 1:25 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kyle_Texas  
View profile  
 More options Dec 28 2008, 11:10 pm
From: Kyle_Texas <Reiko.Admi...@gmail.com>
Date: Sun, 28 Dec 2008 20:10:54 -0800 (PST)
Local: Sun, Dec 28 2008 11:10 pm
Subject: Re: Update on indexing blogrolls
Tamar,

It has become even more common.  If Google Blog Search isn't finding
these blogroll hits, it is finding spam.  In the last 3 days, I have
seen exactly ONE result which was not a result from the blogroll or a
SPLOG.

On Dec 26, 8:34 am, tamar <puntr...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Hylton  
View profile  
 More options Dec 29 2008, 11:35 am
From: Jeremy Hylton <jhyl...@gmail.com>
Date: Mon, 29 Dec 2008 08:35:49 -0800 (PST)
Local: Mon, Dec 29 2008 11:35 am
Subject: Re: Update on indexing blogrolls
On Dec 28, 11:10 pm, Kyle_Texas <Reiko.Admi...@gmail.com> wrote:

> Tamar,

> It has become even more common.  If Google Blog Search isn't finding
> these blogroll hits, it is finding spam.  In the last 3 days, I have
> seen exactly ONE result which was not a result from the blogroll or a
> SPLOG.

Can you tell me the specific queries that are showing bad results?
Also, is the problem specific to alerts or do you see them in regular
blogsearch results, too?

Jeremy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kyle_Texas  
View profile  
 More options Dec 29 2008, 8:08 pm
From: Kyle_Texas <Reiko.Admi...@gmail.com>
Date: Mon, 29 Dec 2008 17:08:16 -0800 (PST)
Local: Mon, Dec 29 2008 8:08 pm
Subject: Re: Update on indexing blogrolls
I could write out a lengthy explanation of the different search I do
in Google Blog Search, but I decided since this is all visual, it
would be more efficient just to use screenshots.

I have tagged almost all of the results with what they are, either
Blogroll results or my personal favorite, Fake DVD Review SPLOGS.  A
few that are either legit or I am unsure what they are, are left
mostly blank.

Search Term: “Reiko Aylesworth”

http://i131.photobucket.com/albums/p312/CO757300/Temp/gbs-1.jpg

http://i131.photobucket.com/albums/p312/CO757300/Temp/gbs-2.jpg

http://i131.photobucket.com/albums/p312/CO757300/Temp/gbs-3.jpg

Search Term: “Carlos Bernard”

http://i131.photobucket.com/albums/p312/CO757300/Temp/gbs-4.jpg

Search Term: “Kiefer Sutherland”

http://i131.photobucket.com/albums/p312/CO757300/Temp/gbs-5.jpg

I hope this helps.  If needed, I can write out a more detailed
explanation.

On Dec 29, 10:35 am, Jeremy Hylton <jhyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tamar  
View profile  
 More options Jan 1 2009, 9:54 pm
From: tamar <puntr...@gmail.com>
Date: Thu, 1 Jan 2009 18:54:40 -0800 (PST)
Local: Thurs, Jan 1 2009 9:54 pm
Subject: Re: Update on indexing blogrolls
Jeremy, I'm doing searches for "tamar weinberg," my blog title name,
or link:www.domain.com (where domain.com is my blog).

I don't check blogsearch results regularly, but I just performed a
search for the purposes of giving you as much information as possible
and saw a result that showed my blog on the sidebar navigation from 4
hours ago.

That said, I'm pretty certain that this isn't fully addressed. :(

On Dec 29 2008, 11:35 am, Jeremy Hylton <jhyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Hylton  
View profile  
 More options Jan 7 2009, 12:58 pm
From: Jeremy Hylton <jhyl...@gmail.com>
Date: Wed, 7 Jan 2009 09:58:46 -0800 (PST)
Local: Wed, Jan 7 2009 12:58 pm
Subject: Re: Update on indexing blogrolls
On Jan 1, 9:54 pm, tamar <puntr...@gmail.com> wrote:

> Jeremy, I'm doing searches for "tamar weinberg," my blog title name,
> or link:www.domain.com(where domain.com is my blog).

> I don't check blogsearch results regularly, but I just performed a
> search for the purposes of giving you as much information as possible
> and saw a result that showed my blog on the sidebar navigation from 4
> hours ago.

> That said, I'm pretty certain that this isn't fully addressed. :(

I agree that the problem isn't fully addressed :-(.  I just did a
link: search for your blog.  It returned 10 results ranging from 37
minutes old to several days old (Jan 1).  There were two results that
obviously came from the blogroll, one from http://janefouts.com/ and
one from http://simplystated.realsimple.com/.  We'll have to see why
we failed to detect those links as coming from the blogroll.  There
are also a few results that came from Techcrunch posts that you
commented on.  The comment has a link to your blog.  I think those are
legitimate results, but I'd be interested to hear what users thinks.

So we're at 80% accuracy at this very moment.  It's better than it
was, but obviously a lot of room for improvement.

Jeremy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tamarw  
View profile  
 More options Jan 8 2009, 9:53 am
From: tamarw <puntr...@gmail.com>
Date: Thu, 8 Jan 2009 06:53:42 -0800 (PST)
Local: Thurs, Jan 8 2009 9:53 am
Subject: Re: Update on indexing blogrolls
Thanks Jeremy.  As far as comments showing up in these searches,
you're right - that may be a little out of place, but I'm actually not
adverse to seeing those in my queries/alerts emails.  It's more of a
concern when I see links coming from random sidebars (repeatedly, like
simplystated.realsimple.com).

I appreciate that you're still looking into it!

On Jan 7, 12:58 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tamarw  
View profile  
 More options Jan 11 2009, 1:21 am
From: tamarw <puntr...@gmail.com>
Date: Sat, 10 Jan 2009 22:21:55 -0800 (PST)
Local: Sun, Jan 11 2009 1:21 am
Subject: Re: Update on indexing blogrolls
One more thing - there's a LOT of MyBlogLog stuff coming up for my
name.  I'm not sure that should be included in search results either.

On Jan 8, 9:53 am, tamarw <puntr...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Holly  
View profile  
 More options Jan 12 2009, 1:06 pm
From: Holly <hollywoodo...@gmail.com>
Date: Mon, 12 Jan 2009 10:06:39 -0800 (PST)
Local: Mon, Jan 12 2009 1:06 pm
Subject: Re: Update on indexing blogrolls
In my particular case, it's a little weird. Before Blogsearch started
to index blogroll links and everything was fine, when I searched using
the command link: mysite.com it used to bring around 50+ backlinks.
Now, it only shows 2.
Why is that? Maybe some reset or something?

On Dec 19 2008, 4:25 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tamar  
View profile  
 More options Jan 22 2009, 9:39 am
From: tamar <puntr...@gmail.com>
Date: Thu, 22 Jan 2009 06:39:42 -0800 (PST)
Local: Thurs, Jan 22 2009 9:39 am
Subject: Re: Update on indexing blogrolls
Any update?  It's been 3 weeks.

On Jan 7, 12:58 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tamar  
View profile  
 More options Jan 27 2009, 11:22 am
From: tamar <puntr...@gmail.com>
Date: Tue, 27 Jan 2009 08:22:36 -0800 (PST)
Local: Tues, Jan 27 2009 11:22 am
Subject: Re: Update on indexing blogrolls
It looks like no progress has been made on this front AT ALL.  The
Google Alert emails I receive are spam and nothing but at this point.
Plus, I keep receiving the same emails again and again -- it's not
necessarily a "blogroll" issue but the same OLD content is being
treated by Google Blogsearch as new content.  On one search query,
I've received the same result at least 10 times.

Jeremy and team, please don't forget about us.

On Jan 22, 9:39 am, tamar <puntr...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
McCain  
View profile  
 More options Jan 27 2009, 11:19 pm
From: McCain <mccain.blogg...@yahoo.com>
Date: Tue, 27 Jan 2009 20:19:57 -0800 (PST)
Local: Tues, Jan 27 2009 11:19 pm
Subject: Re: Update on indexing blogrolls
We are having similar experiences, not just with blogroll references
but also recent post widgets and such on the blogs.  Anytime another
post is mentioned with a link, we were frequently seeing a mostly
irrelevant page substituted for a relevant page in the index.  It has
led to a lesser user experience, but we've ended up removing our
blogrolls from the sidebars, removing "recent post" references from
the sidebars, altering the recent comment widget so it does not cite
posts by title, and changing "recent/next" post references at the top
of posts so that the links are generic references rather than post
titles.   That seems to make the SERPs more appropriate but it's
really not an ideal presentation.  Hope this issue can be worked out.

On Jan 27, 8:22 am, tamar <puntr...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kyle_Texas  
View profile  
 More options Jan 28 2009, 6:25 pm
From: Kyle_Texas <Reiko.Admi...@gmail.com>
Date: Wed, 28 Jan 2009 15:25:57 -0800 (PST)
Local: Wed, Jan 28 2009 6:25 pm
Subject: Re: Update on indexing blogrolls
Yep, the problem remains.  Either SPAM or Blogroll for 90% of
results.  The SPAM is actually getting worse.  It's funny to see
SPLOGS at the top of the relevancy rankings, or better yet, almost the
entire first page of relevancy rankings being SPLOGS.

On Jan 27, 10:22 am, tamar <puntr...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tamar  
View profile  
 More options Jan 31 2009, 8:00 pm
From: tamar <puntr...@gmail.com>
Date: Sat, 31 Jan 2009 17:00:20 -0800 (PST)
Local: Sat, Jan 31 2009 8:00 pm
Subject: Re: Update on indexing blogrolls
Today, I got links from 2006 and 2007 in my link: query emails.

:(

On Jan 28, 6:25 pm, Kyle_Texas <Reiko.Admi...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kyle_Texas  
View profile  
 More options Feb 2 2009, 12:11 am
From: Kyle_Texas <Reiko.Admi...@gmail.com>
Date: Sun, 1 Feb 2009 21:11:44 -0800 (PST)
Local: Mon, Feb 2 2009 12:11 am
Subject: Re: Update on indexing blogrolls
Yeah, same thing for me.  It keeps reverting to these old results
which are completely worthless.

On Jan 31, 7:00 pm, tamar <puntr...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tamar  
View profile  
 More options Feb 6 2009, 8:07 am
From: tamar <puntr...@gmail.com>
Date: Fri, 6 Feb 2009 05:07:51 -0800 (PST)
Local: Fri, Feb 6 2009 8:07 am
Subject: Re: Update on indexing blogrolls
Is anything at ALL being done about this?  I'm starting to consider
either:

1. flagging all Google Alerts sent to my Gmail inbox as spam (cuz uh,
they contain spammy results)
2. unsubscribing from Google Alerts -- since the results returned
aren't relevant and they certainly aren't fresh.  (Come on, isn't
Google's mission to organize the world's information?  This is clearly
disorganized and in a very bad way.)

Google: we've been pretty darn patient.  This thread started in
December and referenced an even older incident.  It's February now.
Is ANYONE paying attention to this?  Please?

Thanks.

(p.s. a Google Alert email just prompted this post update.  I don't
really post about this out of the blue.)

On Feb 2, 12:11 am, Kyle_Texas <Reiko.Admi...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Hylton  
View profile  
 More options Feb 6 2009, 6:03 pm
From: Jeremy Hylton <jhyl...@gmail.com>
Date: Fri, 6 Feb 2009 15:03:50 -0800 (PST)
Local: Fri, Feb 6 2009 6:03 pm
Subject: Re: Update on indexing blogrolls
Tamar,

Apologies for my tardy response.  I'll be sure to give everyone an
update every week, even if we don't have much news to report.

As I mentioned, we made an initial attempt to fix the blogroll problem
in December.  It fixed some fraction of the results that were coming
from blogrolls, but was inadequate in a number of ways.  For some
blogs, the blog roll detection didn't pick anything up.  For other
blogs, it detect some items in the blog roll, but not all of them.  My
colleague Rick Klau was particularly unlucky.  His blog appears in the
blog rolls of many legal blogs.  I noticed that we often detect every
blog but his as a blogroll entry.  We've been looking at a collection
of backlink queries (with the link: operator) and still see about 50%
of the results coming from blog rolls.  So there is obviously a lot of
room for improvement.

We have been working on an improved blog roll detector.  Our internal
tests look fairly promising, but there is a lot of variability in blog
markup that we need to handle.  It's going to be a few more weeks
until we can start to deploy it.  I'll see if I can provide a better
ETA next week.

I haven't been paying attention to the Google Alerts specifically.
The accuracy I mentioned earlier was for the regular search results.
I'll make sure we add some metrics that look at Alerts quality so that
we don't forgot about it again.  The basic solution is the same for
search results and for alerts, but maybe there's something more we can
do for alerts in the short term.

Jeremy

On Feb 6, 8:07 am, tamar <puntr...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Hylton  
View profile  
 More options Feb 6 2009, 10:12 pm
From: Jeremy Hylton <jhyl...@gmail.com>
Date: Fri, 6 Feb 2009 19:12:16 -0800 (PST)
Local: Fri, Feb 6 2009 10:12 pm
Subject: Re: Update on indexing blogrolls
On Feb 6, 6:03 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:

I wanted to clarify this point a little bit.  The problem really is
worst for people with popular blogs.  The average user is getting more
and better results as a consequence of the indexing changes that
introduced the blogroll problems.  We're return results from blogs
with partial content feeds.  We're index comments.  We discover more
links.  So a lot of our internal analysis shows that most queries do
better as a result of the changes.  If there weren't some real
benefits to the indexing changes, we would have reverted to the old
version.

Jeremy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
tamar  
View profile  
 More options Feb 7 2009, 11:00 pm
From: tamar <puntr...@gmail.com>
Date: Sat, 7 Feb 2009 20:00:05 -0800 (PST)
Local: Sat, Feb 7 2009 11:00 pm
Subject: Re: Update on indexing blogrolls
Thanks for the update.

A few things I noticed lately:

1. Lots of redundancy.  For example, 25 separate Google Alerts have
arrived in my inbox since 12/18/08 from a single blog source citing
the SAME exact blog post (nothing new!)
2. Old posts from 2006/2007.
3. The blogroll issue

That said, the issue seems to not necessarily be limited to the
blogroll itself.  The entire system is a mess.  And while I say Google
Alerts, I'm able to reproduce the problems every time simply by going
to blogsearch.google.com, so I don't really think you need to focus
too much on Google Alerts.  After all, it seems to be gathering data
from a system that isn't exactly returning relevant results.

Also, some of the data I actually receive is not tied to popular blogs
of mine at all.  I understand the indexing problems; I'm not
requesting that you revert to the old system, but I still contend that
the new system gives me 95% noise and 5% reasonable results, which is
pretty poor.

Hopefully Google's deployment of the fixes will address the issue.

p.s. I'll be happy to send you the *really* awkward results I've
received that illustrate all above issues if you want them...unless,
of course, you already received them. ;)

On Feb 6, 10:12 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kyle_Texas  
View profile  
 More options Feb 19 2009, 1:05 pm
From: Kyle_Texas <Reiko.Admi...@gmail.com>
Date: Thu, 19 Feb 2009 10:05:45 -0800 (PST)
Local: Thurs, Feb 19 2009 1:05 pm
Subject: Re: Update on indexing blogrolls
It seems to have been better as of late until yesterday.  All of a
sudden it reverted back to some old version and results from 2007 and
now coming up as the most relevant.  As always, most of the recent
results have vanished if you search by date with the majority from 2
weeks to 2 months ago.

On Feb 7, 10:00 pm, tamar <puntr...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Hylton  
View profile  
 More options Feb 25 2009, 11:22 am
From: Jeremy Hylton <jhyl...@gmail.com>
Date: Wed, 25 Feb 2009 08:22:22 -0800 (PST)
Local: Wed, Feb 25 2009 11:22 am
Subject: Re: Update on indexing blogrolls
This is just a brief status report.  We've been continuing to
experiment with blogroll detectors.  We're going to do some user-
visible experiments early next month, probably starting with link:
queries.  I'll follow up here when the experiments are running.

Jeremy

On Feb 7, 11:00 pm, tamar <puntr...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Hylton  
View profile  
 More options Mar 6 2009, 2:22 pm
From: Jeremy Hylton <jhyl...@gmail.com>
Date: Fri, 6 Mar 2009 11:22:53 -0800 (PST)
Local: Fri, Mar 6 2009 2:22 pm
Subject: Re: Update on indexing blogrolls
Unfortunately, we ran into some delays with these experiments and had
to push back the schedule a couple of weeks.

Jeremy

On Feb 25, 11:22 am, Jeremy Hylton <jhyl...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Barry Schwartz  
View profile  
 More options Mar 9 2009, 7:43 am
From: Barry Schwartz <barry.schwa...@gmail.com>
Date: Mon, 9 Mar 2009 04:43:19 -0700 (PDT)
Local: Mon, Mar 9 2009 7:43 am
Subject: Re: Update on indexing blogrolls
thanks for the update.

On Mar 6, 3:22 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rodrigo  
View profile  
 More options Mar 21 2009, 1:15 pm
From: Rodrigo <rali...@gmail.com>
Date: Sat, 21 Mar 2009 10:15:53 -0700 (PDT)
Local: Sat, Mar 21 2009 1:15 pm
Subject: Re: Update on indexing blogrolls
Anything new?

On Mar 6, 3:22 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 29   Newer >
« Back to Discussions « Newer topic     Older topic »