Apologies for my tardy response. I'll be sure to give everyone an
update every week, even if we don't have much news to report.
We have been working on an improved blog roll detector. Our internal
tests look fairly promising, but there is a lot of variability in blog
markup that we need to handle. It's going to be a few more weeks
until we can start to deploy it. I'll see if I can provide a better
ETA next week.
> Is anything at ALL being done about this? I'm starting to consider
> either:
> 1. flagging all Google Alerts sent to my Gmail inbox as spam (cuz uh,
> they contain spammy results)
> 2. unsubscribing from Google Alerts -- since the results returned
> aren't relevant and they certainly aren't fresh. (Come on, isn't
> Google's mission to organize the world's information? This is clearly
> disorganized and in a very bad way.)
> Google: we've been pretty darn patient. This thread started in
> December and referenced an even older incident. It's February now.
> Is ANYONE paying attention to this? Please?
> Thanks.
> (p.s. a Google Alert email just prompted this post update. I don't
> really post about this out of the blue.)
> On Feb 2, 12:11 am, Kyle_Texas <Reiko.Admi...@gmail.com> wrote:
> > Yeah, same thing for me. It keeps reverting to these old results
> > which are completely worthless.
> > On Jan 31, 7:00 pm, tamar <puntr...@gmail.com> wrote:
> > > Today, I got links from 2006 and 2007 in my link: query emails.
> > > :(
> > > On Jan 28, 6:25 pm, Kyle_Texas <Reiko.Admi...@gmail.com> wrote:
> > > > Yep, the problem remains. Either SPAM or Blogroll for 90% of
> > > > results. The SPAM is actually getting worse. It's funny to see
> > > > SPLOGS at the top of the relevancy rankings, or better yet, almost the
> > > > entire first page of relevancy rankings being SPLOGS.
> > > > On Jan 27, 10:22 am, tamar <puntr...@gmail.com> wrote:
> > > > > It looks like no progress has been made on this front AT ALL. The
> > > > > Google Alert emails I receive are spam and nothing but at this point.
> > > > > Plus, I keep receiving the same emails again and again -- it's not
> > > > > necessarily a "blogroll" issue but the same OLD content is being
> > > > > treated by Google Blogsearch as new content. On one search query,
> > > > > I've received the same result at least 10 times.
> > > > > Jeremy and team, please don't forget about us.
> > > > > On Jan 22, 9:39 am, tamar <puntr...@gmail.com> wrote:
> > > > > > Any update? It's been 3 weeks.
> > > > > > On Jan 7, 12:58 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:
> > > > > > > On Jan 1, 9:54 pm, tamar <puntr...@gmail.com> wrote:
> > > > > > > > Jeremy, I'm doing searches for "tamar weinberg," my blog title name,
> > > > > > > > or link:www.domain.com(wheredomain.comismyblog).
> > > > > > > > I don't check blogsearch results regularly, but I just performed a
> > > > > > > > search for the purposes of giving you as much information as possible
> > > > > > > > and saw a result that showed my blog on the sidebar navigation from 4
> > > > > > > > hours ago.
> > > > > > > > That said, I'm pretty certain that this isn't fully addressed. :(
> > > > > > > I agree that the problem isn't fully addressed :-(. I just did a
> > > > > > > link: search for your blog. It returned 10 results ranging from 37
> > > > > > > minutes old to several days old (Jan 1). There were two results that
> > > > > > > obviously came from the blogroll, one fromhttp://janefouts.com/and
> > > > > > > one fromhttp://simplystated.realsimple.com/. We'll have to see why
> > > > > > > we failed to detect those links as coming from the blogroll. There
> > > > > > > are also a few results that came from Techcrunch posts that you
> > > > > > > commented on. The comment has a link to your blog. I think those are
> > > > > > > legitimate results, but I'd be interested to hear what users thinks.
> > > > > > > So we're at 80% accuracy at this very moment. It's better than it
> > > > > > > was, but obviously a lot of room for improvement.
> > > > > > > Jeremy
> > > > > > > > On Dec 29 2008, 11:35 am, Jeremy Hylton <jhyl...@gmail.com> wrote:
> > > > > > > > > On Dec 28, 11:10 pm, Kyle_Texas <Reiko.Admi...@gmail.com> wrote:
> > > > > > > > > > Tamar,
> > > > > > > > > > It has become even more common. If Google Blog Search isn't finding
> > > > > > > > > > these blogroll hits, it is finding spam. In the last 3 days, I have
> > > > > > > > > > seen exactly ONE result which was not a result from the blogroll or a
> > > > > > > > > > SPLOG.
> > > > > > > > > Can you tell me the specific queries that are showing bad results?
> > > > > > > > > Also, is the problem specific to alerts or do you see them in regular
> > > > > > > > > blogsearch results, too?
> > > > > > > > > Jeremy
> > > > > > > > > > On Dec 26, 8:34 am, tamar <puntr...@gmail.com> wrote:
> > > > > > > > > > > Curious - around the same time of the initial report, I started
> > > > > > > > > > > getting Google Alerts with blogroll links. If anything, it's become
> > > > > > > > > > > *more* common and not less common lately. Does the change you write
> > > > > > > > > > > about, Jeremy, impact Google Alerts?
> > > > > > > > > > > If not, perhaps someone should take a look.
> > > > > > > > > > > Thanks.
> > > > > > > > > > > On Dec 19, 1:25 pm, Jeremy Hylton <jhyl...@gmail.com> wrote:
> > > > > > > > > > > > I wanted to give everyone a brief end-of-the-year update on the
> > > > > > > > > > > > blogroll problem. When we switched blogsearch to indexing the full
> > > > > > > > > > > > text of posts, we started seeing a lot more results where the only
> > > > > > > > > > > > matches for a query where from the blogroll or other parts of the page
> > > > > > > > > > > > that frame the actual post. (There's been a lot of discussion of the
> > > > > > > > > > > > problem. You can search for [google blogsearch] using Google
> > > > > > > > > > > > Blogsearch.)
> > > > > > > > > > > > We're in the midst of deploying a solution for this problem. The
> > > > > > > > > > > > basic approach is to analyze each blog to look for text and markup
> > > > > > > > > > > > that is common to all of the posts. Usually, these comment elements
> > > > > > > > > > > > include the blogroll, any navigational elements, and other parts of
> > > > > > > > > > > > the page that aren't part of the post. This approach works well for a
> > > > > > > > > > > > lot of blogs, but we're continuing to improve the algorithm. The
> > > > > > > > > > > > search results should ignore matches that only come from these common
> > > > > > > > > > > > elements. The indexing change to implement it is deployed almost
> > > > > > > > > > > > everywhere now.
> > > > > > > > > > > > We expect users will continue to see some spurious results, but many
> > > > > > > > > > > > fewer than before. I tried a search for my own name, which does
> > > > > > > > > > > > appear in a few blogrolls, and all the results looked good. If you
> > > > > > > > > > > > are still seeing blogroll hits, the problem is most likely caused by
> > > > > > > > > > > > our failure to analyze a particular blog correctly. Feel free to
> > > > > > > > > > > > follow up with examples in private email or in this forum.
> > > > > > > > > > > > Jeremy Hylton
> > > > > > > > > > > > Google Blogsearch