Community renewal and project obsolescence

Rafael Laboissière

unread,

Dec 27, 2023, 3:50:04 PM12/27/23

to

Dear Debian fellows,

This is a very simple-minded analysis about the Debian community (lack
of) renewal and project obsolescence:

https://salsa.debian.org/rafael/debian-contrib-years

containing interesting comments made by Sébastien Villemot.

Best,

Rafael Laboissière, DD

Rafael Laboissière

unread,

Dec 28, 2023, 11:00:05 AM12/28/23

to

* M. Zhou <lu...@debian.org> [2023-12-27 19:00]:

> Thanks for sharing the figure. The data seems correlated with the
> number of new Debian accounts. See the figure below:
> Python Code for this figure:
>
> ```
> # modified from ChatGPT.
> # XXX: members.csv is copy-pasted from https://nm.debian.org/members/
> import pandas as pd
> import matplotlib.pyplot as plt
> df = pd.read_csv('members.csv', sep='\t')
> df = df[df['Since'] != '(unknown)'] # filter out invalid data
> df['Since'] = pd.to_datetime(df['Since'])
> df['Year'] = df['Since'].dt.year
> account_counts = df['Year'].value_counts().sort_index()
> smoothed_counts = account_counts.rolling(window=3).mean()
> plt.figure(figsize=(10, 6))
> plt.bar(account_counts.index, account_counts.values, color='skyblue')
> plt.plot(smoothed_counts.index, smoothed_counts.values, color='orange',
> label=f'Smoothed (Window=3)')
> plt.xlabel('Year')
> plt.ylabel('Number of Accounts Created')
> plt.title('Number of Accounts Created Each Year')
> plt.legend()
> plt.savefig('nm-year.png')
> ```

Thanks for the code and the figure. Indeed, the trend is confirmed by
fitting a linear model count ~ year to the new members list. The
coefficient is -1.39 member/year, which is significantly different from
zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data from year
2001, that could be interpreted as an outlier, the trend is still
siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p < 0.01).

Best,

Rafael Laboissière

P.S.1: The correct way to do the analysis above is by using a
generalized linear model, with the count data from a Poisson distribution
(or, perhaps, by considering overdispersed data). I will eventually add
this to my code in Git.

P.S.2: In your Python code, it is possible to get the data frame directly
from the web page, without copying&pasting. Just replace the line:

df = pd.read_csv('members.csv', sep='\t')

by:

df = pd.read_html("https://nm.debian.org/members/")[0]

I am wondering whether ChatGPT could have figured this out…

Mo Zhou

unread,

Dec 28, 2023, 2:10:05 PM12/28/23

to

On 12/28/23 10:34, Rafael Laboissière wrote:

> * M. Zhou <lu...@debian.org> [2023-12-27 19:00]:
>

> Thanks for the code and the figure. Indeed, the trend is confirmed by
> fitting a linear model count ~ year to the new members list. The
> coefficient is -1.39 member/year, which is significantly different
> from zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data
> from year 2001, that could be interpreted as an outlier, the trend is
> still siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p
> < 0.01).

I thought about to use some models for population statistics, so we can
get the data about DD birth rate and DD retire/leave rate, as well as a
prediction. But since the descendants of DDs are not naturally new DDs,
the typical population models are not likely going to work well. The
birth of DD is more likely mutation, sort of.

Anyway, we do not need sophisticated math models to draw the conclusion
that Debian is an aging community. And yet, we don't seem to have a good
way to reshape the curve using Debian's funds. -- this is one of the key
problems behind the data.

> P.S.1: The correct way to do the analysis above is by using a
> generalized linear model, with the count data from a Poisson
> distribution (or, perhaps, by considering overdispersed data). I will
> eventually add this to my code in Git.

Why not integrate them into nm.debian.org when they are ready?

> P.S.2: In your Python code, it is possible to get the data frame
> directly from the web page, without copying&pasting. Just replace the
> line:
>
> df = pd.read_csv('members.csv', sep='\t')
>
> by:
>
> df = pd.read_html("https://nm.debian.org/members/")[0]
>
> I am wondering whether ChatGPT could have figured this out…

I just specified the CSV input format based on what I have copied. It
produces well-formatted code with detailed documentation in most of the
time. I deleted too much from its outputs to keep the snippet short.

I have to justify one thing to avoid giving you a wrong impression about
large language models. In fact, the performance of an LLM (such as
ChatGPT) greatly varies based on the prompt and the context people
provided to it. Exploring this in-context learning capability is still
one of the cutting edge research topics. For the status-quo LLMs, their
answers on boilerplate code like plotting (matplotlib) and simple
statistics (pandas) are terribly perfect.

Steffen Möller

unread,

Dec 29, 2023, 7:40:04 AM12/29/23

to

> Gesendet: Donnerstag, 28. Dezember 2023 um 20:02 Uhr
> Von: "Mo Zhou" <lu...@debian.org>
> An: debian-...@lists.debian.org
> Betreff: Re: Community renewal and project obsolescence

>
> On 12/28/23 10:34, Rafael Laboissière wrote:
>
> > * M. Zhou <lu...@debian.org> [2023-12-27 19:00]:
> >
> > Thanks for the code and the figure. Indeed, the trend is confirmed by
> > fitting a linear model count ~ year to the new members list. The
> > coefficient is -1.39 member/year, which is significantly different
> > from zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data
> > from year 2001, that could be interpreted as an outlier, the trend is
> > still siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p
> > < 0.01).
>
> I thought about to use some models for population statistics, so we can
> get the data about DD birth rate and DD retire/leave rate, as well as a
> prediction. But since the descendants of DDs are not naturally new DDs,
> the typical population models are not likely going to work well. The
> birth of DD is more likely mutation, sort of.
>
> Anyway, we do not need sophisticated math models to draw the conclusion
> that Debian is an aging community. And yet, we don't seem to have a good
> way to reshape the curve using Debian's funds. -- this is one of the key
> problems behind the data.

What hypothese do we have on what influences the number of active individuals?

Positive factors
* Location of DebConf (with many or not so many devs affording to attend)
* Popular platforms like the Raspberry Pi working with Debian derivative
* Debian packaging teams on salsa
* self-education
* Impression the DD status makes on outsiders/your next employer
* Pleasant interactions on mailing lists with current or past team members
* Team building with other DDs on projects of interest

Negative factors
* Advent of homebrew+conda
* Containers
* Increasing workloads as one ages and does not give packages up
* Work-life-balance
* Migrating to upstream
* Delay between what upstream releases and what is available in our distro
* Unpleasant interactions on mailing lists with current or past team members

Do you have a better list?
I keep thinking about what the last significant change in Debian may have been - to mind came salsa.debian.org. Do I miss anything?
And I think the change I would like to see the most is a variant of brew/salsa for Debian, preferably in some mostly automated way, so we have some way to install the very latest with Debian all the time.

Best,
Steffen

Daniel Gröber

unread,

Dec 29, 2023, 1:10:03 PM12/29/23

to

Hi,

On Fri, Dec 29, 2023 at 08:48:28AM -0800, Antonio Russo wrote:
> [...] my personal experience is that making contributions is like
> dropping a message in a bottle into the sea. It feels like a complete
> crap-shot whether I'll even receive a comment on any code contribution
> (including debian-devel RFS, salsa MR, or BTS patch).

This is also my experience.

A related question I've been pondering: did salsa make this worse for new
contributors because some maintainers (seem to) ignore issues/MRs there?

I figure for the many people coming from GH style platforms nowerdays being
ignored on salsa would be a major discouragment to contributing.

> If there were a single thing that could be done, in my mind it would be
> to have someone make sure that contributions do not go entirely ignored.

I've been thinking along those lines too. Perhaps we just need an
aggregator that flags mails/comments/other contributions by new people that
are being ignored.

I've been meaning to do something like that for the d-mentors list but
perhaps we need to think bigger.

--Daniel

signature.asc

Andrey Rakhmatullin

unread,

Dec 29, 2023, 2:30:04 PM12/29/23

to

On Fri, Dec 29, 2023 at 08:48:28AM -0800, Antonio Russo wrote:

> As someone who would like to participate more in the development of Debian, my personal

> experience is that making contributions is like dropping a message in a bottle into
> the sea. It feels like a complete crap-shot whether I'll even receive a comment on
> any code contribution (including debian-devel RFS, salsa MR, or BTS patch).

There are multiple reasons for that, some common to all of these, some
specific to some contribution types, but all ultimately boil down to other
people being volunteers. There is no direct way to improve this beyond
magically increasing the total amount of time spent by maintainers on
Debian work. Some processes or tools could be improved but I'm not sure
how much would that help.

> If there were a single thing that could be done, in my mind it would be to have someone

> make sure that contributions do not go entirely ignored. Even just telling someone "hey,
> none of the stuff you're submitting is really good enough for Debian" would be helpful
> because they could either work on improving, or stop trying to contribute.
There is no polite way to tell that, but also it's not a big problem for
the project if somebody who submits very bad RFSes gets those RFSes
ignored instead of being told to stop waiting for feedback on them.
Giving constructive feedback, on the other hand, can be very
time-draining, especially to first-time contributors submitting poor
quality things. This is not even specific to Debian but applies to any
open source maintainer work.

Andrey Rakhmatullin

unread,

Dec 29, 2023, 2:30:04 PM12/29/23

to

On Fri, Dec 29, 2023 at 06:49:47PM +0100, Daniel Gröber wrote:
> > [...] my personal experience is that making contributions is like
> > dropping a message in a bottle into the sea. It feels like a complete
> > crap-shot whether I'll even receive a comment on any code contribution
> > (including debian-devel RFS, salsa MR, or BTS patch).
>
> This is also my experience.
>
> A related question I've been pondering: did salsa make this worse for new
> contributors because some maintainers (seem to) ignore issues/MRs there?

Maybe, but also salsa MRs being ignored by default was an intentional
decision AFAIK, both a technical decision of not notifying maintainers
about created MRs and a policy decision of the BTS being the only
officially promoted way to contact maintainers and submit patches.
I have no idea if people are actually told that before they submit MRs.

> I figure for the many people coming from GH style platforms nowerdays being
> ignored on salsa would be a major discouragment to contributing.

Well, salsa didn't make this worse, it just added something that can be
ignored.

> > If there were a single thing that could be done, in my mind it would be
> > to have someone make sure that contributions do not go entirely ignored.
>
> I've been thinking along those lines too. Perhaps we just need an
> aggregator that flags mails/comments/other contributions by new people that
> are being ignored.

You'll still need people to provide feedback.

Sam Hartman

unread,

Dec 29, 2023, 3:00:05 PM12/29/23

to

>>>>> "Daniel" == Daniel Gröber <dx...@darkboxed.org> writes:

Daniel> Hi,

Daniel> On Fri, Dec 29, 2023 at 08:48:28AM -0800, Antonio Russo wrote:
>> [...] my personal experience is that making contributions is like
>> dropping a message in a bottle into the sea. It feels like a
>> complete crap-shot whether I'll even receive a comment on any
>> code contribution (including debian-devel RFS, salsa MR, or BTS
>> patch).

Daniel> A related question I've been pondering: did salsa make this
Daniel> worse for new contributors because some maintainers (seem
Daniel> to) ignore issues/MRs there?

I think so.

Especially for group-maintained packages, it is very easy to get into a
situation where no one is actually notified for a MR on a given
repository.

More generally, as a maintainer, when I find I'm ignoring someone it's
typically because:

* The idea has some merit; if it was complete junk I could close it as
wontfix or invalid or whatever.

* But it requires significant effort from me to get to a place where it
lands.

* And I don't care that much.

Examples include ideas where there's significant review that would be
needed; ideas where there's some rework needed; or especially ideas
where it's important to consider the implications between the new idea
and some part of the system that neither I nor the submitter understands
well.

Another common challenge is an idea that disturbs some part of something
that's been mostly chugging along fine for years, but that has entirely
inadequate test coverage to know whether this new code will break
things.
I feel bad saying "that's great, but please write a test suite to cover
your contribution as well as a significant chunk of the package you are
touching," but can rarely work up the interest in doing that test suite
myself if I don't care much about the enhancement/fix.

Another challenge is when some idea involves significant coordination
work. For example there are a few pam bugs that boil tdown to
pam-auth-update isn't quite fine grain enough to capture some
distinctions that matter.
Proposing a new design, and moving that across the archive would be a
lot of work.

Or for example there's a merge request/bug on pam to enable group write
umask by default with usergroups. Which apparently there was a
consensus to do way back in the day. I'm concerned that consensus
predates modern thinking about being restrictive in write permissions,
and something's probably going to break, but on the other hand Ubuntu
does it, and it's probably going to enable some valuable use cases.
Deciding how to act on something like that is hard.

And yet I completely get your side of things.
If you try to contribute and aren't welcomed, it totally destroys
motivation.

Charles Plessy

unread,

Dec 30, 2023, 3:30:03 PM12/30/23

to

Le Fri, Dec 29, 2023 at 01:14:29PM +0100, Steffen Möller a écrit :
>
> What hypothese do we have on what influences the number of active individuals?

When I was a kid I was playing with a lot of pirate copy of Amiga and
then PC games, and I had a bit of melancholy thinking that what appeared
to be golden days took place when I was still busy learning to walk and
speak. I wondered if I was born too late. Then I was introduced to
Linux and Debian. That was a big thing, a big challenge for me to learn
it, and a big reward to be part of it. At that time I never imagined
that the next big thing was diversity, inclusion and justice, but being
part of Debian unexpectedly connected me to it. Now when I look back I
do not worry being born too late. I would like to say to young people
that joining a thriving community is the best way to journey beyond
one's imagination.

Of course, we need to show how we are thriving. On my wishlist for
2024, there is of course AI. Can we have a DebGPT that will allow us to
interact with our mailing list archives using natural language? Can
that DebGPT produce code that we know derives from a training set that
only includes works for which peole really consented that their
copyrights and licenses will be dissolved? Can it be the single entry
point for our whole infrastructure? I wish I could say "DebGPT, please
accept all these loongarch64 patches and upload the packages now", or
"DebGPT, update debian/copyright now and show me the diff". I am not
able to develop DebGPT and confess I am not investing my time in
learning to do it. But can we attract the people who want to tinker in
this direction? Not because we are the best AI team, but because we are
one of the hearts of software freedom, and that freedom is deeply
connected to everybodys futures.

Well, it is too late for invoking Santa Claus, but this said, best
wishes for 2024 !

Charles

--
Charles Plessy Nagahama, Yomitan, Okinawa, Japan
Debian Med packaging team http://www.debian.org/devel/debian-med
Tooting from work, https://fediscience.org/@charles_plessy
Tooting from home, https://framapiaf.org/@charles_plessy

Mo Zhou

unread,

Dec 30, 2023, 10:00:04 PM12/30/23

to

On 12/30/23 15:06, Charles Plessy wrote:

Le Fri, Dec 29, 2023 at 01:14:29PM +0100, Steffen Möller a écrit :

What hypothese do we have on what influences the number of active individuals?

When I was a kid I was playing with a lot of pirate copy of Amiga and
then PC games, and I had a bit of melancholy thinking that what appeared
to be golden days took place when I was still busy learning to walk and
speak.  I wondered if I was born too late.  Then I was introduced to
Linux and Debian.

If you don't mind to share more of your story -- how are you introduced to Linux and Debian? Can we reproduce it?

For me this is not reproducible. The beginning of my story is similar to yours. Differently, at that time Windows is the only PC operating system I'm aware of. And I suffered a lot from it and its ecosystem: aggressive reboots, aggressive pop-up windows and ads completely out of my control, enormous difficulty to learn and understand its internals given very limited budget for books, enormous difficulty to learn C programming language based on it. Visual studio did a great job to confuse me with a huge amount of irrelevant details and complicated user interface when I want try the code from the K&R C book as a newbie (without any educational resource available or affordable). I forgot why I chose this book but it was a correct one to buy.

One day, out of curiosity I searched for "free of charge operating systems" so that I can get rid of Windows. Then I got Ubuntu 11.10. Its frequent "internal errors" drove me to try other linux distros in virtualbox, including Debian squeeze and Fedora. While squeeze is the ugliest among them all in terms of desktop environment, it crashes significantly less than the rest. I was happy with my choice. Linux does not reboot unless I decide to do so. It does not pop-up ads because the malwares (while being useful) are not available under linux. It does not prevent me from trying to understand how it works, even if I can hardly grasp the source code. And, `gcc hello-world.c` is ridiculously easy for learning programming compared to using visual studio.

I was confused again -- why is all of those free of charge? I tried to learn more until the Debian Social Contract, DFSG and the stuff wrote by FSF (mostly Stallman) completely blown up my mind. With the source code within my reach, I'm able to really tame my computer. The day I realized that is the day when I added "becoming a DD" to my dream list.

That was a big thing, a big challenge for me to learn
it, and a big reward to be part of it.  At that time I never imagined
that the next big thing was diversity, inclusion and justice, but being
part of Debian unexpectedly connected me to it.  Now when I look back I
do not worry being born too late.  I would like to say to young people
that joining a thriving community is the best way to journey beyond
one's imagination.

Ideally yes, but people's mind is also affected by economy.

In developing countries where most people are still struggling to survive and feeding a family, unpaid volunteer work is respected in most of the time, but seldom well-understood. One needs to build up a very strong motivation before taking actions to override the barrier of societal bias.

That's partly the one of the reasons why the number of Chinese DDs is so scarce while China has a very large number of population. And in contrast, most DDs are from developed countries.

I like the interpretations on how human society works from the book "Sapiens: a brief history of humankind". Basically, what connects people all over the world, forming this community is a commonly believed simple story -- we want to build a free and universal operating system. (I'm sad to see this sentence being removed from debian.org) The common belief is the ground on which we build trust and start collaboration.

So, essentially, renewing the community is to spread the simply story, to the young people who seek for something that Debian/FOSS can provide. I don't know how to achieve it. But I do know that my story is completely unreproducible.

Of course, we need to show how we are thriving.  On my wishlist for
2024, there is of course AI.

In case people interested in this topic does not know we have a dedicated ML for that:

https://lists.debian.org/debian-ai/

The key word GPT successfully toggled my "write-a-long-response" button. Here we go.

 Can we have a DebGPT that will allow us to
interact with our mailing list archives using natural language?

I've ever tried to ask ChatGPT about Debian related questions. While ChatGPT is very good at general linux questions, it turns that its training data does not contain much about Debian-specific knowledge. The quality of training data really matters for LLM's performance, especially the amount of book-quality data. The Debian ML is too noisy compared to wikipedia dump and books.

While the training set of the proprietary ChatGPT is a secret, you can have a peek in the Pile dataset frequently used by many "open-source" LLMs. BTW, the formal definition of "open-source AI" is still a work-in-progress by OSI. I'll get back to Debian when OSI makes the draft public for comments at some time in 2024.

https://en.wikipedia.org/wiki/The_Pile_(dataset)

The dataset contains "Ubuntu Freenode IRC" logs, but not any dump from Debian servers.

Thus, technically, in order to build the DebGPT, there are two straightforward solutions:

(1) adopt an "open-source" LLM that supports very large context length. And directly fed the dump of debian knowledge into its context. This is known as the "In-Context Learning" capability of LLMs. It enabled a wide range of prompt engineering methods without any further training of the model. In case you are interested in this, you can read OpenAI's InstructGPT paper as a start.

(2) fine-tune an "open-source" LLM on the debian knowledge dump with LoRA. This will greatly reduce the requirement of training hardware. According to the LoRA paper, the training or full fine-tuning of GPT-3 (175B) requires 1.2TB GPU memory in total (while the best consumer grade GPU provides merely 24GB). LoRA reduced it to 350GB without loosing model performance (in terms of generation quality). That said, an 7B parameter LLM is much easier with cheaper to deal with with LoRA.

All the prevelant "large" language models are Decoder-only Transformers. And the training objective is simply next word prediction. So the debian mailing lists can be organized into tree structure containing mail nodes, and the training objective is to predict the next mail node, following the next-word-prediction paradigm.

How can one download the Debian public mailing list dumps?

 Can
that DebGPT produce code that we know derives from a training set that
only includes works for which peole really consented that their
copyrights and licenses will be dissolved?

Tough cutting-edge research issue. But first, let's wait and see the result for the lawsuite of New York Times against OpenAI+Microsoft:

https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html

The result of this lawsuite might be a milestone in this area. It will definitely impact future lawsuites on LLMs + copyrighted code usage.

Can it be the single entry
point for our whole infrastructure?  I wish I could say "DebGPT, please
accept all these loongarch64 patches and upload the packages now", or
"DebGPT, update debian/copyright now and show me the diff".

The training data will be the Salsa dump. What you described is actually doable.

For each git commit, the first part of the prompt is the files before modification. The user instruction is the git commit message. The expected prediction result is the git diff.

Debian Deep Learning team (debi...@l.d.o) have some AMD GPUs in the unofficial infrastructures. It is not far before we can really do something. AMD GPUs with ROCm (open-source) allows us to train neural networks in decent speed without the proprietary CUDA. The team is still working on packaging the missing dependencies for the ROCm variant of PyTorch. The CPU variant (python3-torch) and the CUDA variant (python3-torch-cuda) is already in unstable. The python3-torch-rocm is in my todo list.

PyTorch is already the most widely used training tool. Please forget tensorflow in this aspect. JAX replaces tensorflow but the user number of PyTorch is still overwhelming.

I am not
able to develop DebGPT and confess I am not investing my time in
learning to do it.  But can we attract the people who want to tinker in
this direction?

Debian funds should be able to cover the hardware requirement and training expenses even if they are slightly expensive. The more expensive thing is the time of domain experts. I can train such a model but clearly I do not have bandwidth for that.

Please help the Debian community to spread its common belief to more domain experts.

 Not because we are the best AI team, but because we are
one of the hearts of software freedom, and that freedom is deeply
connected to everybodys futures.

The academia is working hard on making large generative models (not limited to text generation) easier to customize. I'm optimistic about the future.

Well, it is too late for invoking Santa Claus, but this said, best
wishes for 2024 !

Best wishes for 2024!

Mo Zhou

unread,

Dec 30, 2023, 11:30:04 PM12/30/23

to

On 12/30/23 21:40, Mo Zhou wrote:

>> I am not
>> able to develop DebGPT and confess I am not investing my time in
>> learning to do it. But can we attract the people who want to tinker in
>> this direction?
>
> Debian funds should be able to cover the hardware requirement and
> training expenses even if they are slightly expensive. The more
> expensive thing is the time of domain experts. I can train such a
> model but clearly I do not have bandwidth for that.
>

No. I changed my mind.

I can actually quickly wrap some debian-specific prompts with an
existing chatting LLM. This is easy and does not need expensive hardware
(although it may still require 1~2 GPUs with 24GB memory for inference),
nor any training procedure.

The project repo is created here
https://salsa.debian.org/deeplearning-team/debgpt

I have enabled issues. And maybe people interested in this can redirect
the detailed discussions to the repo issues.

I'm sure it is already possible to let LLM read the long policy
document, or debhelper man pages for us, and provide some suggestions or
patches. The things I'm uncertain is (1) how well a smaller LLM, like 7B
or 13B ones can do compared to proprietary LLMs in this case; (2) how
well a smaller LLM can be when it is quantized to int8 or even int4 for
laptops.

Oh, BTW, the dependencies needed by the project are not complete in
debian archive.

Jeremy Stanley

unread,

Dec 31, 2023, 1:10:04 PM12/31/23

to

On 2023-12-30 21:40:03 -0500 (-0500), Mo Zhou wrote:
[...]

> How can one download the Debian public mailing list dumps?

[...]

I think you'd have to scrape the HTML (MHonArc) archives. The last
update I remember is that the listmasters are intentionally not
providing raw archives, though perhaps that 15 year old decision
could be revisited if there's new compelling reasons:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=161440#39

Alternatively, I suppose a DD with access to the raw archive data on
the server could (perhaps after some discussion with the
listmasters) perform LLM training on those, but would probably need
to sanitize it and weed out the spam when doing so.
--
Jeremy Stanley

signature.asc

Rafael Laboissière

unread,

Jan 3, 2024, 2:20:04 PMJan 3

to

* Rafael Laboissière <raf...@debian.org> [2023-12-27 21:12]:

> This is a very simple-minded analysis about the Debian community (lack
> of) renewal and project obsolescence:
>
> https://salsa.debian.org/rafael/debian-contrib-years
>
> containing interesting comments made by Sébastien Villemot.

First of all, I wish you all a happy 2024.

I have updated my repository at salsa.d.o (URL above), integrating some
elements discussed in the present thread, in particular the analysis
proposed by Mo Zhou and the comments made by Steffen Möller.

Best,

Rafael Laboissière

Gunnar Wolf

unread,

Jan 4, 2024, 1:00:04 PMJan 4

to

Mo Zhou dijo [Thu, Dec 28, 2023 at 02:02:18PM -0500]:

> > Thanks for the code and the figure. Indeed, the trend is confirmed by
> > fitting a linear model count ~ year to the new members list. The
> > coefficient is -1.39 member/year, which is significantly different from
> > zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data from
> > year 2001, that could be interpreted as an outlier, the trend is still
> > siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p <
> > 0.01).
>
> I thought about to use some models for population statistics, so we can get
> the data about DD birth rate and DD retire/leave rate, as well as a
> prediction. But since the descendants of DDs are not naturally new DDs, the
> typical population models are not likely going to work well. The birth of DD
> is more likely mutation, sort of.

Five years ago, I got a paper published where we analized and made
some forecasts on the curated Web-of-Trust keyrings in Debian:

https://jisajournal.springeropen.com/articles/10.1186/s13174-018-0082-7

I did the first part of the article, but the part that better fits
what you are describing was done by my coauthor, Víctor González (who
understands about statistics way better than me).

Anyway, it does not also answer to the exact question you are
presenting --- we there studied the lifetime of keys, and left for
later analysis a way to link said keys into people, in order to map
the life trajectory of an individual in the project. But it might
still be interesting or useful for your analysis.

> Anyway, we do not need sophisticated math models to draw the conclusion that
> Debian is an aging community. And yet, we don't seem to have a good way to
> reshape the curve using Debian's funds. -- this is one of the key problems
> behind the data.

And I think this is hardly an unexpected outcome. There are many
social and technological patterns that define us as a 1990s project
that continues to liveand thrive, but not necessarily with the best /
most up-to-date tooling.

Andreas Tille

unread,

Jan 26, 2024, 3:40:04 AMJan 26

to

Hi Raphael,

thanks a lot for your analysis and sorry for my late reply.

Am Wed, Jan 03, 2024 at 08:01:16PM +0100 schrieb Rafael Laboissière:
> > https://salsa.debian.org/rafael/debian-contrib-years

>
> First of all, I wish you all a happy 2024.

+1

> I have updated my repository at salsa.d.o (URL above), integrating some
> elements discussed in the present thread, in particular the analysis
> proposed by Mo Zhou and the comments made by Steffen Möller.

I'd like to share some ideas. Steffen blamed the advent of homebrew and
conda as one factor which I think is true to some extend in some fields.
But I also think that we are a victim of our own success to be the
distribution growing the most derivatives. Ubuntu and Mint might be the
most famous ones and if I'm not misleaded the number of Ubuntu users is
at least one order of magnitude higher than from (pure) Debian (if you
don't count Ubuntu users as Debian users which they actually are
indirectly). I do not want to discuss whether this is good or bad for
Debian (which would be a long list of pros and cons) but contributors
are recruited from users and we simply do not see the number of
derivatives contributors in our stats. Maybe we simply see patches
arriving from some derivative which are simply collected by a single
contributor (hopefully they really report back issues - my experience
with bug reports+patches from Ubuntu are pretty good ... but I see only
those isses that are reported since I do not check the bug trackers
there whether there are other known problems hidden from our sight).

You might know that I'm very focussed on Blends. The idea way born when
I noticed lots of dervatives dealing with the same topic as I (Debian
Med with a focus on biology and medicine). While lots of people
understood Blends wrongly as a way to create a derivative its the
contrary: Don't derive from Debian but rather create a Blend to find a
solution *inside* Debian.

Over more than 20 years of Blends effort I learned that it is pretty
hard to make this concept popular. In Debian Med we finally managed to
attract some of the derivers in this field to integrate their stuff into
Debian and by doing so they also became Debian contributors. But
meanwhile those contributors drifted away for different reasons (changed
jobs etc.) So at least their work was kept in Debian instead of beeing
lost in an orphaned derivative.

After all these years I need to confess, that my original plan about
Blends somehow failed. I assumed that every Debian Developer /
Contributor is inside the non-Debian world in some community. If this
Contributor would work hard to make Debian fit for this community new
contributors would arrise from this community inside Debian to make
Debian fit even better for the own usage. Since years I include the
proof that this *can* work in my slides of Debian Med related talks[1].
So some outsiders project - biologists and people working in medicine
are by far a lower percentage of overall Debian users than the >1% of
Debian Developers we have in this field - can attract constantly
contributors with a growing tendency in contrary to your graph! This is
a good sign that my original assumption, a Blend can attract
contributors, might be correct. To come back to the "reasons for
decreasing number of contributors": We have to less Blends done right.

My gut feeling is that this is somehow connected to the fact that
developers usually are overworked even with technical work and do less
to reach out to some community which is considered "additional work
squashing some time limit dedicated to Debian". I confirm that lots of
my outreachy acticitiy (GSoC, Outreachy, MoM, in person meetings
(sprints), other ways to contact the community) did not really led to
long term contributors. However, if I would not have started to reach
out the Debian Med project would never ever have reached its current
status of nearly 1000 packages in main[2] with a relatively low number
of RC bugs and by probably maintaining definitely more than 500 packages
in other teams (Debian Science, Debian Python Team, R Pkg tem, etc.)

I also did some investigation in team metrics[3] to see how teams
(originally targetting at Blends teams) are performing. I admit I'm
really proud about beeing "beaten" last year in the number of bugs
squashed[4]. The interesting thing in this bug squashing graph is not
only the fact that it is in contrast to your graph since the number of
contributors is increasing over years. Its also the not visible fact
that the top 3 bug squashers are not actually experts in our field.
Étienne and Nilesh just joined since its fun to work in this team.

To summarise this long mail: Another item in your list of reasons is,
that we should care better for our contributors in strong teams (either
topic related Blends or kind and inviting language teams).

Kind regards
Andreas.

[1] https://people.debian.org/~tille/talks/20230910_debconf_med-team_talk/teams_handout.pdf
-> slide 6/31 "Debian Med has attracted one developer per year"
Data are based on this survey
https://wiki.debian.org/DebianMed/Developers

[2] https://qa.debian.org/developer.php?email=debian-med...@lists.alioth.debian.org
[3] http://blends.debian.net/liststats/
[4] http://blends.debian.net/liststats/bugs_debian-med.png

--
http://fam-tille.de