OpenRefine Contributors, Git WoW, & Stability

101 views
Skip to first unread message

Thad Guidry

unread,
Oct 17, 2015, 2:36:33 PM10/17/15
to openrefine
Martin,

(sorry for the stupid thread subject before, forgot to change it, but changed now)

I do like the idea of #2 btw, always have.  Tom has concerns on the Stable release (this has historically been the Trunk branch being tagged and merged into).  We have Git, yes, and we are free to explore any Flow that helps the project move forward and I am sure that Tom would not be opposed to anything like that.

We want to keep the quality of Trunk at a very high level, since this is ultimately where the Stable Releases are built from after bugs are fixed etc.  Tom and I do not want to pollute it with lots of bugs.

So we need to think of how we want to deal with temporary instability against Trunk... that's easy.  Its called a Branch or Fork.

RefinePro can fork (a branch) and then continue PRs.  This does not block or stop RefinePro or other external teams from pulling from OpenRefine's mainline (Trunk) and absorb other PRs that are getting merged.  And it really does not fracture the community that much unless RefinePro decides not to give back with PR's etc.

So, I want to understand what is the major blocking that is happening for RefinePro against OpenRefine's Trunk ?
Perhaps it is just Qi that needs to pull more often from Trunk ?  Learn Git and GitFlow and the command structure a bit better ? (or use a better Git plugin on his IDE even) ?  Needs more one-on-one time with Tom to understand the code base better ?

So I think the question really is, What is preventing the OpenRefine project from moving at a faster pace and getting PRs reviewed ?  That would be Tom's time.  And until there are more capable individuals at Tom's level, then OpenRefine cannot move at a faster pace.  We accept that fact now.  BUT, But that doesn't prevent other Forks from moving ahead faster like RefinePro's.  (BTW, David Huynh and Stefano Mazzocchi no longer have the interest, I have asked them and their moving onto ambitious things with Google's long term plans.  Those are the only 2 others that have the deep understanding of OpenRefine's code base)
 

On Sat, Oct 17, 2015 at 10:14 AM, Martin Magdinier <martin.m...@gmail.com> wrote:
Thad,

If I understand well the your proposition is if Tom is not answering emails or github notification, send him more emails. This is what we have been doing since 2013, through August and September 2015 RefinePro teams have been trying to reach out to Tom via the official discussion list (with Tom in bcc) and through people we know in common.

I am sorry but I don't see how the proposed process will help to get the project moving. Tom's limited availability for the project is today the core reason we are in this situation. Tom is off for an hackathon for the rest of the week end and we have no visibility when he will be back in the project, it might be days, weeks or months.

Coming from a lean / agile background, I'd rather see a project moving ahead, with frequent release, where we try, and sometime break things, versus a massive 3 years release process packaging hundreds of improvement. I agree, Qi doesn't have Tom's expertise of the code base, but the git work flow leave room to revert and correct errors in a future release.

Last August, we (RefinePro) offer to buy Tom hours for code review and merging PR. Today, the offer is still on the table.

In conclusion, we have now two propositions on the floor:
  1. Thad's proposal: Tom review and merge all PR at his own pace - with the offer of RefinePro to sponsor Tom to secure more time for OpenRefine(decision is up to him)
  2. A more frequent release system with limited overview from Tom and the risk of temporary instability / imperfection in the code base. Tom can jump him at his own pace to provide feedback and guidance regarding the quality of code and direction of the project.

Finally, on the side note, for the good of the community it will better to have all communication regarding the governance of the project made public on the discussion list in a transparent fashion.

PS I think the dev group of OpenRefine is more around 10 - 20 people including people who write extensions, contributed via PR and rely on in-house customization they have done for their organizations. I am extremely happy to see that several developers who haven't contributed in the last 12 months are still following the project and pitch in from time to time via the mailing list, github and twitter feed.

PS - 2: I can't come with a good thread name, so I left the current subject.

Martin


On 15-10-16 10:43 PM, Thad Guidry wrote:
Pushing this conversation out of the issue tracker and into Open Discussion.

Martin,

Tom and I talked about a few things in private, we are concerned about code quality and provenance, and in order to help the project, think it would be best for Tom to have final merge authority in OpenRefine.  (When I told you to merge it, I should not have, since I only did a minor review, and that is my fault. I am sorry, Martin)

Going forward, let's handle our Communication, PRs, and Code Reviews this way...(and Thanks in advance Martin for pushing us to get back into the game on OpenRefine...we have all been too busy/lazy, and even I am to blame).

1. If Tom is unresponsive for 1 week, we try to reach him any way we can (short of a plane trip  :)  ... but Tom still has to do the code review and only he can merge into Trunk.
2. Other contributors only ask for PRs, and instead let Tom handle the final review authority and merges.  (GitFlow with a master reviewer style)
3. To repeat, Tom has final authority (since he is most familiar with the code, and has a long history with the project and other open source efforts)
4. You, Martin, and Me, Thad ... take a back seat on code reviews and only help to improve communications with Tom, blocks in development or questions that the developers may have, documentation, etc.

We know that RefinePro is pushing us to work harder and get OpenRefine more shiny and new, and we really do appreciate that motivation, Martin.  So don't stop with the PRs and Emails, etc.  Open Communication between all of us is vital to our super small niche group of now only 4-5.

All in favor, give me a +1 in reply.


On Fri, Oct 16, 2015 at 6:27 PM, Martin Magdinier <notifi...@github.com> wrote:
Rational to merge Qi PR are on the dev mailing list.
https://groups.google.com/forum/#!topic/openrefine-dev/PZ_nmjPwlzo
On Oct 16, 2015 4:54 PM, "Tom Morris" <notifi...@github.com> wrote:

> @magdmartin <https://github.com/magdmartin> Who reviewed this before you
> merged it? I'm happy to have you merge pull requests for translations,
> READMEs, etc, but you don't have the technical knowledge to review code
> contributions.
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/OpenRefine/OpenRefine/pull/1079#issuecomment-148834271>

> .
>


Reply to this email directly or view it on GitHub.


--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martin Magdinier

unread,
Oct 18, 2015, 10:51:53 PM10/18/15
to openr...@googlegroups.com
Thad,

I am generally reluctant to go with a separate fork as it make harder to contribute between projects. OpenRefine should also be a hub for different fork to collaborate and we should better leverage the branch system.

So here is a thought, what if we create a branch in the OpenRefine repository and have any contributors including RefinePro's team to send their pull request on this branch - let's name it "working branch" for now. PR are merged directly in the working branch (unless for quality reason or travis ci doesn't validate). Major changes can be merged in a specific branches to ease the review, but they are merged. This allow every contributor to share code and engage with the project while leaving the trunk stable. Tom can merge the different branches to the trunk after more careful review. Pre release can be compiled from that working branch. If we go with this process, we just need to well document it so contributors know where to send their code and what to expect.

Regarding, this part of your email:


What is preventing the OpenRefine project from moving at a faster pace and getting PRs reviewed ?  That would be Tom's time.  And until there are more capable individuals at Tom's level, then OpenRefine cannot move at a faster pace.

The question behind is what are we doing as a community to help people to be at that level? But that's an other discussion.


Martin
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Thad Guidry

unread,
Oct 18, 2015, 11:23:45 PM10/18/15
to openrefine
On Sun, Oct 18, 2015 at 9:51 PM, Martin Magdinier <martin.m...@gmail.com> wrote:
Thad,

I am generally reluctant to go with a separate fork as it make harder to contribute between projects. OpenRefine should also be a hub for different fork to collaborate and we should better leverage the branch system.


​OpenRefine is already a hub, since its source is hosted on Github with a powerful forking system and GitFlow built in - https://docs.gradle.org/current/userguide/eclipse_plugin.html
You need to explain to us more - why you think a separate fork makes it harder to contribute.​  Its the Github way and provides lots of benefits between projects, including separate issue trackers.

 
So here is a thought, what if we create a branch in the OpenRefine repository and have any contributors including RefinePro's team to send their pull request on this branch - let's name it "working branch" for now. PR are merged directly in the working branch (unless for quality reason or travis ci doesn't validate). Major changes can be merged in a specific branches to ease the review, but they are merged. This allow every contributor to share code and engage with the project while leaving the trunk stable. Tom can merge the different branches to the trunk after more careful review. Pre release can be compiled from that working branch. If we go with this process, we just need to well document it so contributors know where to send their code and what to expect.


​I understand what your asking but , I think this will put more burden back onto Tom, maybe not, but....  Please answer the above first.
 
Regarding, this part of your email:

What is preventing the OpenRefine project from moving at a faster pace and getting PRs reviewed ?  That would be Tom's time.  And until there are more capable individuals at Tom's level, then OpenRefine cannot move at a faster pace.

The question behind is what are we doing as a community to help people to be at that level? But that's an other discussion.


​I scream and yell about OpenRefine in the Dallas Texas area.  Have hosted intro events for it at Linux Users groups, Makerspaces, etc. Thoughtworks offices, etc.​

​I cannot do much more except make it easier for folks to pull down OpenRefine, build it easily, and see inside that its code is well documented and even has some high level Javadocs (something we still lack)

Thad Guidry

unread,
Oct 18, 2015, 11:24:53 PM10/18/15
to openrefine
Stupid copy paste ... ERR !

Github with a powerful forking system and GitFlow built in -

Martin Magdinier

unread,
Oct 18, 2015, 11:49:12 PM10/18/15
to openr...@googlegroups.com
Unless I am missing something in a git but I cannot pull into my repository a PR that haven't been merged in OpenRefine first.

Taking an live example from OpenRefine. If I want to have PR #909 Allow longer expressions by sendig them as POST I will need to ask its author to resubmit his work to my repo, or go an extract his change manually myself. Both solutions doesn't scale and are not viable. 

I am looking for a solution that will
1. keep the trunk stable under Tom watch
2. Allow contributors to still work with OpenRefine and exchange code and idea between two code review rounds from Tom by working on separate branches.

Martin

Thad Guidry

unread,
Oct 19, 2015, 10:18:00 AM10/19/15
to openrefine
On Sun, Oct 18, 2015 at 10:49 PM, Martin Magdinier <martin.m...@gmail.com> wrote:
Unless I am missing something in a git but I cannot pull into my repository a PR that haven't been merged in OpenRefine first.

Taking an live example from OpenRefine. If I want to have PR #909 Allow longer expressions by sendig them as POST I will need to ask its author to resubmit his work to my repo, or go an extract his change manually myself. Both solutions doesn't scale and are not viable. 


​Yes, you can...​ You simply just click on our Pull Requests to see the listing... then hover your mouse over any PR...look at the ID number in your browser status bar... then fetch it and then check it out.

​So, for example, to pull in Matt's Add Columns from Freebase for instance...

git fetch origin pull/948/head:​matts_branch_for_freebase_stuff

​And if you want to ​pull in some stuff from some other repository or Open Source project out there... easy...

​But some folks find it easier to just keep repos separate and and manually use your file explorer tools, etc.to drop in changes or edit them in Notepad or your IDE into your local repo.  But as you get comfortable with Git, it becomes second nature just like any file system conventions.  Just always commit and push to your remote branches before you start attempting to screw up your local stuff in a new branch :)  That way you can always ​checkout your working branch and get back to normal, and if your really stuck... and want to reset...then 'git reset --hard'

Martin Magdinier

unread,
Oct 19, 2015, 3:19:16 PM10/19/15
to openr...@googlegroups.com
Thad,

Thank for the precision. As said this is a manual process to do for each repository and it still make it hard to scale this across the community. A dedicated branch that will allow sharing with the whole community in three step (1. send PR ; 2. Merge and 3. other repository pull the change)

Martin

Thad Guidry

unread,
Oct 19, 2015, 3:52:46 PM10/19/15
to openrefine
So, you just want us to create some sort of 'develop' branch, besides 'trunk'.

That 'develop' branch IS ESSENTIALLY A fork, because some of the PRs may not land into 'trunk' after Tom reviews them.

Then what ?  other repositories pull on 'develop' and may or may not have a working product, and will have to review 'trunk' for released stable changes.

Again, the point of creating a fork is to make experiments in a local 'develop' branch yourself.

It sounds like you just want to know if the Travis CI is working or not on any particular PR ?  Is that what this is all about ?
I have this feeling that your trying to say something is awkward and your not able to do something for RefinePro easily with your Git setup and Ways of Working....but I am still confused as to WHAT that pain point is for your team ?

Martin Magdinier

unread,
Oct 19, 2015, 6:37:44 PM10/19/15
to openr...@googlegroups.com
Thad,

The pain point is the way PR are merged and review with OpenRefine, if any team decide to actually develop actively OpenRefine, it basically cut apart from the project quickly because OpenRefine trunk is lagging. What will happen when a fork make technology choices that make further contribution with OpenRefine impossible (ie moving to maven, refactoring part of the core).

I am not making the case for RefinePro in this thread. We are one of the many teams that currently develop based of OpenRefine. I am looking for a process that will help the community at large and get the project moving. Help those teams to work better together. Today we have people working hard to improve OpenRefine and make it part of their workflow. They send pull request and if this is a minor change they remain pending for months, if it is a larger one they remain pending for years.

And going back to the same question: What is your proposition to get the work done and will allow people to work in the long term together ; to get new release of OpenRefine out. Forking is just a workaround for one team and it doesn't help the OpenRefine project in the long run.

Martin

Thad Guidry

unread,
Oct 19, 2015, 7:50:31 PM10/19/15
to openrefine
A cry for help from anyone reading this email.

If ANYONE out there reading this has ANY suggestions or feels they have the chops to help OpenRefine with weekly Pull Requests & code reviews, I am all ears.

Magdmartin

unread,
Oct 26, 2015, 9:40:47 AM10/26/15
to OpenRefine
Thad, all,

My concern going with RefinePro fork is to keep diluting OpenRefine development effort and make it harder for new comers to understand what's happening in the community. However, today the community is already split up across multiple repositories (SparkRefine, LODRefine, OpenDataRise, RefinePro ....). So let's try to find something that helps everyone.

What if we turn openrefine.org website to community hub promoting all the forks and not solely the OpenRefine one? The website will act as a gateway / umbrella for the community at large presenting the different forks and relaying their progress and news. We can update the download page on openrefine.org, with a table listing all the different distribution with

    project name
    project description / purpose of the fork explaining why they are different (what specific issue it is addressing or technology they use)
    link to download the release
    last release date
    link to the repo

OpenRefine github remains the central one to send pull request and it is where we host the documentation (wiki) - discussion list remains the same. We can add a page on the wiki explaining how the different fork works together, how to merge a pull request to their repo (workflow described by Thad previously), cross reference issue between repository and update OpenRefine website with their fork information.

Any thought on this?
To people with existing fork, will you be interested to be listed directly from openrefine.org?

Martin

Thad Guidry

unread,
Oct 26, 2015, 11:14:57 AM10/26/15
to openrefine
On Mon, Oct 26, 2015 at 8:40 AM, Magdmartin <martin.m...@gmail.com> wrote:
Thad, all,

My concern going with RefinePro fork is to keep diluting OpenRefine development effort and make it harder for new comers to understand what's happening in the community. However, today the community is already split up across multiple repositories (SparkRefine, LODRefine, OpenDataRise, RefinePro ....). So let's try to find something that helps everyone.

What if we turn openrefine.org website to community hub promoting all the forks and not solely the OpenRefine one? The website will act as a gateway / umbrella for the community at large presenting the different forks and relaying their progress and news. We can update the download page on openrefine.org, with a table listing all the different distribution with

    project name
    project description / purpose of the fork explaining why they are different (what specific issue it is addressing or technology they use)
    link to download the release
    last release date
    link to the repo

OpenRefine github remains the central one to send pull request and it is where we host the documentation (wiki) - discussion list remains the same. We can add a page on the wiki explaining how the different fork works together, how to merge a pull request to their repo (workflow described by Thad previously), cross reference issue between repository and update OpenRefine website with their fork information.

Any thought on this?

​I am OK with what you have suggested.  Tom might be adverse to listing RefinePro on there, but I am not against it actually, since folks would see all options and it still is there choice.

Martin Magdinier

unread,
Oct 26, 2015, 11:54:27 AM10/26/15
to openr...@googlegroups.com
Thad, thanks for the feedback.

Leaving the thread open until the end of the week to leave time for other to comment before updating the website.

Martin

Magdmartin

unread,
Jan 5, 2016, 11:24:07 AM1/5/16
to OpenRefine
All,

Following this discussion I will be pushing / merging in the coming days two things:
  1. a new download page featuring:
    1. 7 existing distributions (OpenRefine 2.6-beta ; Google Refine 2.5 ; LODRefine ; OpenDataRise ; p3-batchrefine ; SparkonRefine and Reconciliation-and-Matching-Framework)
    2. An updated list of extensions
    3. Link to the reconciliation service wiki page
    4. Link to the four existing libraries ( 2 python, ruby and nodejs)
  2. a replacement of the community page by a CONTRIBUTING.md file with guideline for developer. I pushed a draft on the contributing-file branch - edits are welcome. You can read here github help doc on CONTRIBUTING.md file.

Martin.


Le lundi 26 octobre 2015 11:54:27 UTC-4, Magdmartin a écrit :
Thad, thanks for the feedback.

Leaving the thread open until the end of the week to leave time for other to comment before updating the website.

Martin


On 15-10-26 11:14 AM, Thad Guidry wrote:
On Mon, Oct 26, 2015 at 8:40 AM, Magdmartin <> wrote:
Thad, all,

My concern going with RefinePro fork is to keep diluting OpenRefine development effort and make it harder for new comers to understand what's happening in the community. However, today the community is already split up across multiple repositories (SparkRefine, LODRefine, OpenDataRise, RefinePro ....). So let's try to find something that helps everyone.

What if we turn openrefine.org website to community hub promoting all the forks and not solely the OpenRefine one? The website will act as a gateway / umbrella for the community at large presenting the different forks and relaying their progress and news. We can update the download page on openrefine.org, with a table listing all the different distribution with

    project name
    project description / purpose of the fork explaining why they are different (what specific issue it is addressing or technology they use)
    link to download the release
    last release date
    link to the repo

OpenRefine github remains the central one to send pull request and it is where we host the documentation (wiki) - discussion list remains the same. We can add a page on the wiki explaining how the different fork works together, how to merge a pull request to their repo (workflow described by Thad previously), cross reference issue between repository and update OpenRefine website with their fork information.

Any thought on this?

​I am OK with what you have suggested.  Tom might be adverse to listing RefinePro on there, but I am not against it actually, since folks would see all options and it still is there choice.


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.

Thad Guidry

unread,
Jan 5, 2016, 12:00:39 PM1/5/16
to openrefine
Thanks so much for working on this Martin during the holidays !

To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Martin Magdinier

unread,
Jan 10, 2016, 4:25:15 PM1/10/16
to openr...@googlegroups.com
I just updated the website and merged the CONTRIBUTING.md file.

Most changes are on the download and community pages. There is changes (risks) that I left typo or other grammar errors behind, corrections are welcome on this repository.

I also think the website can take a good CSS update ... table don't render that well and the text is small / hard to read. If there is any CSS wizard around, help is appreciated.

Martin
Reply all
Reply to author
Forward
0 new messages