Re: [julia-dev] Request for reviews on JuliaLang/DataStructures.jl

758 views
Skip to first unread message

John Myles White

unread,
Nov 9, 2015, 11:01:47 AM11/9/15
to juli...@googlegroups.com
FWIW, I think we have a community wide problem of coordination. Many libraries are actively maintained, but the work isn't done on a synchronized schedule so that newcomers perceive them to be abandoned.

For myself, I've been meaning to announce a policy in which there's exactly one project I'm working on in any given month. I'll comment on PR's on that project, but no others. But I've yet to actually live up to that idea.

As is, I think the situation is totally overwhelming for many contributors since there's no way to make the required context switches and provide useful input on PR's without abandoning other projects.

-- John

> On Nov 5, 2015, at 9:49 AM, Daniel Arndt <danie...@gmail.com> wrote:
>
> Hi all,
>
> (my apologies if this is a double post, my other seemed to disappear into the nether)
>
> I'm new to Julia but looking to contribute. Since I've been using DataStructures.jl that seemed like a natural place to start.
>
> I've had a few pull requests, one being up for a bit over a week now, but haven't received much feedback. I'm reaching out here to see if I could get a bit more attention on these, as I would love any feedback. They are small to start, but I'd like to continue with these contributions where I can help. I don't want to invest too much time and have them dangle there though.
>
> The module seems to be very slow moving currently. Some pull requests are pretty old, and with few status updates. At the risk of sounding impatient, I understand we all have lots going on outside of coding (as well as other coding projects!), and some things just get lost in the noise. I hope this was a reasonable time to wait before reaching out for comments, or perhaps you can inform me of a better approach.
>
> Cheers,
> Dan

Tom Breloff

unread,
Nov 9, 2015, 11:44:17 AM11/9/15
to juli...@googlegroups.com
Do you think it's worthwhile to assign a single maintainer for some of the "community" packages, so that person feels the responsibility to at least acknowledge issues/PRs and give a plan of action (even if that plan is "I'll get to this in 3 months").  It might be better than 10 people all being 10% responsible for 10 packages.

Daniel Arndt

unread,
Nov 9, 2015, 11:57:35 AM11/9/15
to julia-dev
It may even be worthwhile to have a very small group of "core" maintainers, that could perhaps set expectations.

In my case, by doing a git blame I was able to identify the person who wrote the code and @mention them to get attention. But if they aren't available (even temporarily, as was the case here) I had no idea which of the list of contributors to reach out to next.

John Myles White

unread,
Nov 9, 2015, 12:01:16 PM11/9/15
to juli...@googlegroups.com
I would be pretty unhappy if I were "responsible" for most of the packages I work on. The last thing I need is additional pressure from people that aren't paying my salary.

I think the best thing we can hope for is more openness about what's being worked on and more realism about what's achievable given the amount of engineering hours we can count on.

 -- John

Tom Short

unread,
Nov 9, 2015, 12:03:24 PM11/9/15
to julia-dev
I agree that this is a community-wide problem. The community sites (JuliaLang, JuliaStats, ...) encourage collaboration but tend to dilute attribution and responsibility.

As a first step, the README of each package repo could have a clear statement of how the package is being maintained. This could identify a lead maintainer, a group of maintainers, or the fact that it is currently unmaintained and needs volunteers/forks.


John Myles White

unread,
Nov 9, 2015, 12:07:14 PM11/9/15
to juli...@googlegroups.com
I tend to think the problem here is with our model of onboarding new developers. We shouldn't let people opt in to the projects they work on; we should only let people opt in to the mentors they work with.

This is clearly not how OSS projects typically work, but I think it gets at the core incentive conflict: new volunteers want to work on things that excite them, existing volunteers also want to do this, but may not be excited about what new volunteers are excited about. Given the very high turnover rate of new people who come to the community, it's best to give preference to the wishes of established volunteers -- however pathological that may seem.

 -- John

Tim Holy

unread,
Nov 9, 2015, 12:11:59 PM11/9/15
to juli...@googlegroups.com
The big problem is not that we have 10 people 10% responsible for 10 packages,
it's that we have 10 people 70% responsible for about 50 packages. Early in
its development, many of the core packages of julia were written by a small
number of people. Many of those folks have either moved on to other things, or
very much need to move on to other things.

Now suppose I'm the original author of package A (and B and C and D and E and
F and ...). Let's say I get a PR for package A; that's fantastic, but my
attention is currently on package Q. As JMW says, it's nontrivial to context-
switch and review a PR. It's also very hard to predict in advance "is this
person going to submit the 2 things they need, and then move on?" or "is this
person someone I can groom as my successor to make sure this package keeps
thriving?" I might be a lot more excited about making that context-switch if I
had some hope that this person---and not just this pull request---would make
the package better.

There's a huge advantage in someone stepping up and officially saying "I'll take
the wheel of this package for a little while":

- It signals to the previous maintainer/author that it's worth investing in
someone, both to review their PRs and to coach them through any difficult
decisions as they assume responsibility for a package

- It signals to repo owners that here's someone who likely should have (or
soon should have) direct push access

I'm not aware of very many instances in which this has happened, however.

Best,
--Tim

Tom Breloff

unread,
Nov 9, 2015, 12:12:31 PM11/9/15
to juli...@googlegroups.com
John... I certainly wasn't suggesting that someone takes on more responsibility than they want.  It's kind of the opposite... one takes responsibility for a small subset (maybe only 1) of packages that they would normally feel partially responsible for, and in return others pick up the slack elsewhere.  It's similar to the "mentor" suggestion you had, in that the community can choose a volunteer maintainer(s) to be the primary organizer for that package.

catc...@bromberger.com

unread,
Nov 9, 2015, 12:17:29 PM11/9/15
to julia-dev

John, this is my perception as well. As an example, I'm running into a problem with a semi-active package that's relying on a package that has apparently been deprecated. This is causing precompilation of my package to fail, which means that it appears to the rest of the world that my package is not being properly maintained.


I can probably fix it by removing precompilation, but I'd rather take advantage of these features where I can.


The issue is one of dependence fragility - that is, a quick glance at dependency graphs (thanks to Iain's MetadataTools package)


Here’s the current dependency graph:




If we remove all packages with no dependencies, we get the following distribution:






Here are the top 20 packages by number of dependents:


 "Compat"

 "BinDeps"

 "Distributions"

 "StatsBase"

 "DataFrames"

 "Docile"

 "JSON"

 "Homebrew"

 "ArrayViews"

 "Colors"

 "WinRPM"

 "Images"

 "Dates"

 "DataStructures"

 "FactCheck"

 "Reexport"

 "MathProgBase"

 "Iterators"

 "PyCall"

 "FixedPointNumbers"


IMO, these should be the focus of our immediate attention. (This is not to imply that they're lacking any sort of support; the data just indicate one relative measure of importance to the Julia community.)

Seth.

John Myles White

unread,
Nov 9, 2015, 12:27:45 PM11/9/15
to juli...@googlegroups.com
Tom,

In principle, that's the direction I want to go in. But the problem is that >50% of our packages would be marked as abandoned if we did that.

 -- John

Randy Zwitch

unread,
Nov 9, 2015, 12:57:26 PM11/9/15
to julia-dev
"There's a huge advantage in someone stepping up and officially saying "I'll take 
the wheel of this package for a little while": 

- It signals to the previous maintainer/author that it's worth investing in 
someone, both to review their PRs and to coach them through any difficult 
decisions as they assume responsibility for a package 

- It signals to repo owners that here's someone who likely should have (or 
soon should have) direct push access 

I'm not aware of very many instances in which this has happened, however. "


I did this with Vega.jl, only because JMW was squatting on the name ;) But seriously, the only reason I could do so was that I had his personal email, so I got push access. The flip side of this is I hate it when users send me personal emails to support my R package, so this in itself is only a half-productive counterexample...

John Myles White

unread,
Nov 9, 2015, 1:04:18 PM11/9/15
to juli...@googlegroups.com
Although it's not official, ownership of DataFrames has, de facto, transferred many times. It started with Harlan Harris, went to Tom Short, then to me, then to Sean Garborg and now is sort of up in the air.

Otherwise, I agree with everything Tim said.

What makes things so hard about managing such transitions is that the problem is primarily social rather than technical: I only give commit access to people I trust. Trust requires that a person demonstrate both technical competenence and respect for the aesthetics of the existing codebase. The latter issue explains why we have many more people contributing code than we have people taking on contributor status -- I'm willing to merge code from someone that I feel doesn't respect me, but I'm not willing to put them in charge of makign design decisions going forward.

 -- John

Tim Holy

unread,
Nov 9, 2015, 1:28:29 PM11/9/15
to juli...@googlegroups.com
I'd prefer to turn over maintenance of several packages to anyone who can
prove to me that I'm a complete idiot. That way I'd be sure the person knew
what s/he was doing :-).

--Tim
> > > <>gmail.com <http://gmail.com/>>> >
> > > wrote:
> > > > FWIW, I think we have a community wide problem of coordination. Many
> > > > libraries are actively maintained, but the work isn't done on a
> > > > synchronized schedule so that newcomers perceive them to be abandoned.
> > > >
> > > > For myself, I've been meaning to announce a policy in which there's
> > > > exactly one project I'm working on in any given month. I'll comment on
> > > > PR's
> > > > on that project, but no others. But I've yet to actually live up to
> > > > that
> > > > idea.
> > > >
> > > > As is, I think the situation is totally overwhelming for many
> > > > contributors
> > > > since there's no way to make the required context switches and provide
> > > > useful input on PR's without abandoning other projects.
> > > >
> > > > -- John
> > > >
> > > > > On Nov 5, 2015, at 9:49 AM, Daniel Arndt <danie...@ <>gmail.com

Jonathan Malmaud

unread,
Nov 9, 2015, 8:10:19 PM11/9/15
to julia-dev
I think a version of this would make sense - each communal package could have a maintainer, whose username is displayed prominently in the README. They would be expected to at least respond to PRs within 1 ~week, even if the response is just to ping someone else who might be in a position to review it or to say that they are busy for X more weeks and not to expect an additional response before that. 

Tom Breloff

unread,
Nov 9, 2015, 8:24:25 PM11/9/15
to juli...@googlegroups.com
+1 Jonathan. I don't think we should expect this person to do all the work. They should just be the point person so that questions/requests don't fall through the cracks. 

Joshua Ballanco

unread,
Nov 10, 2015, 12:17:01 AM11/10/15
to juli...@googlegroups.com

Agree completely. Ruby does something like this where each piece of the core ( https://bugs.ruby-lang.org/projects/ruby/wiki/Maintainers ) and standard library ( https://bugs.ruby-lang.org/projects/ruby/wiki/MaintainersStdlib ) has an assigned “maintainer”. If you look through the mailing list or past bugs, though, you’ll find that maintainers are not expected to shoulder the burden of fixing bugs or approving PRs all on their own. They merely to serve as a point of contact for anyone with questions and final arbiter of any disagreements (usually petty naming issues).


It’s interesting to note that a number of very important parts of Ruby’s stdlib are “unmaintained”, such as `lib/mkmf.rb` which is used to generate Makefiles for every gem with a C-extension. Being “unmaintained” is different than being “abandoned”. It merely means that no one has seen the need for  fixes or updates in a very long while.

Tim Holy

unread,
Nov 10, 2015, 4:53:51 AM11/10/15
to juli...@googlegroups.com
I'm in favor of experiments in this direction.

--Tim

Jonathan Malmaud

unread,
Nov 10, 2015, 7:56:29 PM11/10/15
to julia-dev
I'm happy to volunteer for the experiment by being the maintainer for DataStructures.jl if people are OK with that, although I've only contributed a little to it in the past so I might not be the best choice. 

Kevin Squire

unread,
Nov 14, 2015, 6:27:15 PM11/14/15
to juli...@googlegroups.com
As someone who contributed a moderate amount of code to DataStructures, I would be very happy if you took on maintenance of it, Jonathan.

Kevin

andy hayden

unread,
Dec 20, 2015, 3:06:58 AM12/20/15
to julia-dev
It seems like there ought to be some way to "score" packages based on their health, so prioritize which projects need attention*.

That may be something like:
- open issues (average duration they're open, commented, or whether a collaborator has commented)
- recent commits
- "stagnant" PRs
- tests passing and on the latest stable Julia

"importance" (as discussed above)
- number of downloads (apparently number of clones is now recorded on GH! http://stackoverflow.com/questions/4338358/github-can-i-see-the-number-of-downloads-for-a-repo)**
- number of dependancies

As more projects are put under JuliaXXX umbrella groups, flagging projects that need attention seems more useful (as there are more people who can merge stuff, esp. the non-"design" things, which John is rightfully concerned about), Perhaps a weekly suggestion "newsletter" to the community of a few (different each week) packages that need some love would be useful.

I'm going to play around with some ideas here over the hols.

* I guess that may means one of: updating for latest julia, fixing tests, addressing issues, reviewing PRs. They're all very different task types IMO.
** Does this mean you can count new julia users? https://github.com/JuliaLang/METADATA.jl/graphs/traffic
Reply all
Reply to author
Forward
0 new messages