[announcement] the future of the Refine project

544 views
Skip to first unread message

David Huynh

unread,
Oct 2, 2012, 2:12:40 PM10/2/12
to <google-refine@googlegroups.com>
Dear all,

When we open-sourced Freebase Gridworks almost 2.5 years ago, in May 2010, and then re-branded it to Google Refine in November 2010, little could we anticipate such a diverse and active community of users and use cases that we now have: the project is starred by 2700+ users, and google+'ed by 640+ people; there are 600+ members on this forum. It is used in many newsrooms, and taught in many tutorials, hackathons, and workshops. Academic theses have been written on it; extensions built; recon APIs served; and code and design ideas borrowed. The Twitter stream keeps on buzzing month after month, and, well, you should read for yourself what people have said about Refine (http://tinyurl.com/9p834vl).

All that excitement is very surprising for a little niche data-wrangling tool.

For almost 2 years we at Google have tried hard to imagine how better Google Refine can be if it is scaled up and integrated with other cloud-based Google products. We've arrived at two insights:

1. The desktop version of Google Refine has struck a fine balance between scale, power, and usability that would be non-trivial to improve upon. For example, increasing scale necessarily decreases power and usability. And it is risky to change that balance in one way or another unless we understand deeply all the existing and future use cases.

2. That understanding of use cases actually lies with you, the real users. You know the forms and shapes of data that you wrangle; you know other tools and services you use Refine with; you know your work flows and how your data-wrangling teams collaborate; etc. In fact, a lot of cool existing features have been suggested by users.

From these insights, we have decided to encourage this community, of both developers and users, to take on more ownership of this project. In specific terms:

1. The project will be forked and rebranded to something like "Open Refine" in order to emphasize that the project will be driven forward primarily by the community. Those of us at Google who use Refine will contribute to the project as one of the many equal contributors.

2. The code base will be transfered to GitHub to make code contributions easier.

We'll work out the logistics of the transition together. If you have ideas, opinions, or want to help out, please chime in! We are actively soliciting additional contributors; now is the time for you to get involved.

We are really looking for your participation. We at Google have taken Refine as far conceptually as we could, and the next phase lies with you.

David
On behalf of Googlers on the Refine team

Tom Morris

unread,
Oct 2, 2012, 3:14:27 PM10/2/12
to google...@googlegroups.com
On Tue, Oct 2, 2012 at 2:12 PM, David Huynh <dfh...@gmail.com> wrote:

> We are really looking for your participation. We at Google have taken Refine
> as far conceptually as we could, and the next phase lies with you.

As one of the non-Google committers on the project, I'm looking
forward to ongoing involvement with the next phase of the project, but
first I'd like to thank the Google/Metaweb team members, and in
particular David, for all that they've contributed to the project so
far. Refine is what it is today because of their skill and efforts
and I hope to see them continue to be involved.

We've kind of let the community languish a little bit after a big
burst of patches, client library implementations, etc at the
beginning, so in addition to the mechanics of choosing a new name and
other transition minutia, I'd love to hear ideas for ways to
reinvigorate the community -- both users and developers.

Tom

p.s. I've already done a trial run of the Github conversion, so we'll
be able to spin up the new repository very quickly once we've chosen a
name (I kind of like Open Refine too).

Stefano Mazzocchi

unread,
Oct 2, 2012, 3:16:45 PM10/2/12
to google-refine
+1 to OpenRefine as well.

Iain Sproat

unread,
Oct 2, 2012, 3:26:34 PM10/2/12
to google...@googlegroups.com
+1 Open Refine

Thad Guidry

unread,
Oct 2, 2012, 3:37:02 PM10/2/12
to google...@googlegroups.com
I also extend my warmest Thank You's to especially

David H.
Stefano M.
James H.
Tom M.
Iain S.

for all their time and effort spent hacking on Google Refine and
listening & adding features specifically based on my constant
complaints & quibbles.

+1 to Open Refine - 3 votes now...is that a majority yet ? :)

As far as ideas go....

One of the constant ideas that I have heard from the greater community
is involving more statistical backend features into Google Refine.

R Lang is a multi-use statistical expression language that could be
leveraged upon. Google Refine's grid could be the input for R Lang in
various workflows. What has been talked about is perhaps having the
Refine's table as either an R object, a workspace, or something
else...

R Lang does not have a nice way to clean datasets, but Refine easily
does that for most statisticians now. Many have told me that there
just needs to be some middle code that keeps both power tools working
on the same dataset back and forth. How best to accomplish that would
be up to the R Lang community itself... so speak up !

--
-Thad
http://www.freebase.com/view/en/thad_guidry

Magdmartin

unread,
Oct 2, 2012, 10:35:43 PM10/2/12
to google...@googlegroups.com
I have more a user approach and I can see the huge enthusiasm around google refine day after day. More and more people are using it for extremely various tasks and I'm glad to see the usual suspect supporting this new direction.

Grefine is one of the few mature data wrangling tool out there and there is still plenty to do to improve it. Count me in for testing, documentation and community house keeping.

By the way +1 "open refine" naming.

Martin

Nicolas Torzec

unread,
Oct 2, 2012, 11:10:28 PM10/2/12
to google...@googlegroups.com
+1 for Open Refine too.

Nicolas.

David Huynh

unread,
Oct 3, 2012, 2:34:19 AM10/3/12
to google...@googlegroups.com
Thank you, everyone, for the support! And the tweets :-) We need all the help we can get. And I know there are open-source veterans on this forum; your input is much appreciated to get this transition done right.

It sounds like "Open Refine" is the winner, but let's give other folks a couple of days to chime in and suggest other names. Also, if you have opinions about the move to github, please do let us know.

There are several tasks to be done for this transition, and thanks to Martin who already volunteers to help out with testing / documentation / community. We'll kickstart a couple of new threads to take care of these tasks soon. For one thing, I'd like to do a quick survey of who's using Refine for what. That should help us all to understand who is in "the community".

David

Paul Makepeace

unread,
Oct 3, 2012, 2:41:02 AM10/3/12
to google...@googlegroups.com


On Tue, Oct 2, 2012 at 11:12 AM, David Huynh <dfh...@gmail.com> wrote:
2. The code base will be transfered to GitHub to make code contributions easier.

YES.

Delighted to hear this & thanks for all the great work thus far!

Paul

Tim McNamara

unread,
Oct 3, 2012, 5:43:27 AM10/3/12
to google...@googlegroups.com
Wow, big news. David, thanks for all of your brilliant work. I hope
the reasons behind this are positive.

I would personally like to see the project renamed to simply be
Refine. I prefer shorter names and the open prefix isn't really
necessary. I also find that the open prefix is a little overused. The
Google part was there because all of Google's products have the house
brand.

mawksey

unread,
Oct 3, 2012, 5:53:00 AM10/3/12
to google...@googlegroups.com
My only caution with going for just 'Refine' is it might make it harder for people to find help/tutorial/solutions (I also dabble in R and my search is limited to known sites) 

Martin Magdinier

unread,
Oct 3, 2012, 8:25:22 AM10/3/12
to google...@googlegroups.com

I support mawksey on this. I have the same experience with 'Tableau' documentation is sometimes hard to find because off the common branding.

Magdmartin

unread,
Oct 4, 2012, 10:01:21 PM10/4/12
to google...@googlegroups.com
We set up a survey to better understand who is using Google Refine and for which purpose. 
Answer will provide a valuable insight to define how the community will be build. Thanks for sharing.

I also developed my thought about this announcement in this post.

Martin

Le jeudi 4 octobre 2012 05:20:05 UTC-4, Mateja Verlic a écrit :
LOD-enabled version of GRefine (integrated Google Refine + specific extensions) is used in LOD2 project (http://lod2.eu/)... and as a developer of DBpedia extension (available on github) I say:

+1 for Open Refine and moving to github!


I'm eager to help/contribute/participate :)


Mateja

Magdmartin

unread,
Oct 11, 2012, 7:50:29 AM10/11/12
to google...@googlegroups.com
Hey,

I've been reaching out to the EtherPad community as they experienced a similar process with Google deprecated EtherPad and got a nice and detailed answer from John McLear who chaired the founding of the EtherPad foundation.

I guess it is worth sharing here so we are all on the same page. 

Martin

yjkchicago

unread,
Oct 14, 2012, 1:15:22 AM10/14/12
to google...@googlegroups.com
Hi David,

Regarding transition to github and adopting a best practices git workflow, the python community has embraced a common workflow which is itself on github and can help the any new community on github get to a well documented workflow procedure in matter of a few variable substitutions: https://github.com/matthew-brett/gitwash

Hope this helps,

Young-Jin

Mateja Verlic

unread,
Oct 15, 2012, 4:29:52 AM10/15/12
to google...@googlegroups.com
I read the tutorial on http://matthew-brett.github.com/pydagogue/gitwash_build.html for using gitwash and I think it would be very useful to have workflow -- and tutorials -- like this for OpenRefine, especially for newbies like myself.

Thanks Young-Jin for sharing.

Best, Mateja  
Reply all
Reply to author
Forward
0 new messages