Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Message from discussion Update on indexing blogrolls
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post will appear after it is approved by moderators
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Jeremy Hylton  
View profile  
 More options Feb 6 2009, 6:03 pm
From: Jeremy Hylton <jhyl...@gmail.com>
Date: Fri, 6 Feb 2009 15:03:50 -0800 (PST)
Local: Fri, Feb 6 2009 6:03 pm
Subject: Re: Update on indexing blogrolls
Tamar,

Apologies for my tardy response.  I'll be sure to give everyone an
update every week, even if we don't have much news to report.

As I mentioned, we made an initial attempt to fix the blogroll problem
in December.  It fixed some fraction of the results that were coming
from blogrolls, but was inadequate in a number of ways.  For some
blogs, the blog roll detection didn't pick anything up.  For other
blogs, it detect some items in the blog roll, but not all of them.  My
colleague Rick Klau was particularly unlucky.  His blog appears in the
blog rolls of many legal blogs.  I noticed that we often detect every
blog but his as a blogroll entry.  We've been looking at a collection
of backlink queries (with the link: operator) and still see about 50%
of the results coming from blog rolls.  So there is obviously a lot of
room for improvement.

We have been working on an improved blog roll detector.  Our internal
tests look fairly promising, but there is a lot of variability in blog
markup that we need to handle.  It's going to be a few more weeks
until we can start to deploy it.  I'll see if I can provide a better
ETA next week.

I haven't been paying attention to the Google Alerts specifically.
The accuracy I mentioned earlier was for the regular search results.
I'll make sure we add some metrics that look at Alerts quality so that
we don't forgot about it again.  The basic solution is the same for
search results and for alerts, but maybe there's something more we can
do for alerts in the short term.

Jeremy

On Feb 6, 8:07 am, tamar <puntr...@gmail.com> wrote:

> Is anything at ALL being done about this?  I'm starting to consider
> either:

> 1. flagging all Google Alerts sent to my Gmail inbox as spam (cuz uh,
> they contain spammy results)
> 2. unsubscribing from Google Alerts -- since the results returned
> aren't relevant and they certainly aren't fresh.  (Come on, isn't
> Google's mission to organize the world's information?  This is clearly
> disorganized and in a very bad way.)

> Google: we've been pretty darn patient.  This thread started in
> December and referenced an even older incident.  It's February now.
> Is ANYONE paying attention to this?  Please?

> Thanks.

> (p.s. a Google Alert email just prompted this post update.  I don't
> really post about this out of the blue.)

> On Feb 2, 12:11 am, Kyle_Texas <Reiko.Admi...@gmail.com> wrote:

> > Yeah, same thing for me.  It keeps reverting to these old results
> > which are completely worthless.

> > On Jan 31, 7:00 pm, tamar <puntr...@gmail.com> wrote:

> > > Today, I got links from 2006 and 2007 in my link: query emails.

> > > :(

> > > On Jan 28, 6:25 pm, Kyle_Texas <Reiko.Admi...@gmail.com> wrote:

> > > > Yep, the problem remains.  Either SPAM or Blogroll for 90% of
> > > > results.  The SPAM is actually getting worse.  It's funny to see
> > > > SPLOGS at the top of the relevancy rankings, or better yet, almost the
> > > > entire first page of relevancy rankings being SPLOGS.

> > > > On Jan 27, 10:22 am, tamar <puntr...@gmail.com> wrote:

> > > > > It looks like no progress has been made on this front AT ALL.  The
> > > > > Google Alert emails I receive are spam and nothing but at this point.
> > > > > Plus, I keep receiving the same emails again and again -- it's not
> > > > > necessarily a "blogroll" issue but the same OLD content is being
> > > > > treated by Google Blogsearch as new content.  On one search query,
> > > > > I've received the same result at least 10 times.

> > > > > Jeremy and team, please don't forget about us.

> > > > > On Jan 22, 9:39 am, tamar <puntr...@gmail.com> wrote:

> > > > > > Any update?  It's been 3 weeks.

> > > > > > On Jan 7, 12:58 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:

> > > > > > > On Jan 1, 9:54 pm, tamar <puntr...@gmail.com> wrote:

> > > > > > > > Jeremy, I'm doing searches for "tamar weinberg," my blog title name,
> > > > > > > > or link:www.domain.com(wheredomain.comismyblog).

> > > > > > > > I don't check blogsearch results regularly, but I just performed a
> > > > > > > > search for the purposes of giving you as much information as possible
> > > > > > > > and saw a result that showed my blog on the sidebar navigation from 4
> > > > > > > > hours ago.

> > > > > > > > That said, I'm pretty certain that this isn't fully addressed. :(

> > > > > > > I agree that the problem isn't fully addressed :-(.  I just did a
> > > > > > > link: search for your blog.  It returned 10 results ranging from 37
> > > > > > > minutes old to several days old (Jan 1).  There were two results that
> > > > > > > obviously came from the blogroll, one fromhttp://janefouts.com/and
> > > > > > > one fromhttp://simplystated.realsimple.com/.  We'll have to see why
> > > > > > > we failed to detect those links as coming from the blogroll.  There
> > > > > > > are also a few results that came from Techcrunch posts that you
> > > > > > > commented on.  The comment has a link to your blog.  I think those are
> > > > > > > legitimate results, but I'd be interested to hear what users thinks.

> > > > > > > So we're at 80% accuracy at this very moment.  It's better than it
> > > > > > > was, but obviously a lot of room for improvement.

> > > > > > > Jeremy

> > > > > > > > On Dec 29 2008, 11:35 am, Jeremy Hylton <jhyl...@gmail.com> wrote:

> > > > > > > > > On Dec 28, 11:10 pm, Kyle_Texas <Reiko.Admi...@gmail.com> wrote:

> > > > > > > > > > Tamar,

> > > > > > > > > > It has become even more common.  If Google Blog Search isn't finding
> > > > > > > > > > these blogroll hits, it is finding spam.  In the last 3 days, I have
> > > > > > > > > > seen exactly ONE result which was not a result from the blogroll or a
> > > > > > > > > > SPLOG.

> > > > > > > > > Can you tell me the specific queries that are showing bad results?
> > > > > > > > > Also, is the problem specific to alerts or do you see them in regular
> > > > > > > > > blogsearch results, too?

> > > > > > > > > Jeremy

> > > > > > > > > > On Dec 26, 8:34 am, tamar <puntr...@gmail.com> wrote:

> > > > > > > > > > > Curious - around the same time of the initial report, I started
> > > > > > > > > > > getting Google Alerts with blogroll links.  If anything, it's become
> > > > > > > > > > > *more* common and not less common lately.  Does the change you write
> > > > > > > > > > > about, Jeremy, impact Google Alerts?

> > > > > > > > > > > If not, perhaps someone should take a look.

> > > > > > > > > > > Thanks.

> > > > > > > > > > > On Dec 19, 1:25 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:

> > > > > > > > > > > > I wanted to give everyone a brief end-of-the-year update on the
> > > > > > > > > > > > blogroll problem.  When we switched blogsearch to indexing the full
> > > > > > > > > > > > text of posts, we started seeing a lot more results where the only
> > > > > > > > > > > > matches for a query where from the blogroll or other parts of the page
> > > > > > > > > > > > that frame the actual post.  (There's been a lot of discussion of the
> > > > > > > > > > > > problem.  You can search for [google blogsearch] using Google
> > > > > > > > > > > > Blogsearch.)

> > > > > > > > > > > > We're in the midst of deploying a solution for this problem.  The
> > > > > > > > > > > > basic approach is to analyze each blog to look for text and markup
> > > > > > > > > > > > that is common to all of the posts.  Usually, these comment elements
> > > > > > > > > > > > include the blogroll, any navigational elements, and other parts of
> > > > > > > > > > > > the page that aren't part of the post.  This approach works well for a
> > > > > > > > > > > > lot of blogs, but we're continuing to improve the algorithm.  The
> > > > > > > > > > > > search results should ignore matches that only come from these common
> > > > > > > > > > > > elements.  The indexing change to implement it is deployed almost
> > > > > > > > > > > > everywhere now.

> > > > > > > > > > > > We expect users will continue to see some spurious results, but many
> > > > > > > > > > > > fewer than before.  I tried a search for my own name, which does
> > > > > > > > > > > > appear in a few blogrolls, and all the results looked good.  If you
> > > > > > > > > > > > are still seeing blogroll hits, the problem is most likely caused by
> > > > > > > > > > > > our failure to analyze a particular blog correctly.  Feel free to
> > > > > > > > > > > > follow up with examples in private email or in this forum.

> > > > > > > > > > > > Jeremy Hylton
> > > > > > > > > > > > Google Blogsearch


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.