Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1060201: qa.debian.org: [udd] carnivore_emails is lacking lots of entries

3 views
Skip to first unread message

Andreas Tille

unread,
Jan 7, 2024, 7:50:04 AM1/7/24
to
Package: qa.debian.org
Severity: normal

Hi,

I tried to analyse closed bugs using done_email via carnivore_emails but realised
that this table is lacking lots of entries where I could easily add several from
my own memory:

SELECT done_email, COUNT(*) FROM (
SELECT done_email FROM archived_bugs WHERE id IN (SELECT id FROM (SELECT ab.id, ce.id AS ce_id
FROM archived_bugs ab
LEFT JOIN carnivore_emails ce ON ce.email = ab.done_email
) noid WHERE ce_id IS NULL ) AND done_email NOT IN ('ftpm...@ftp-master.debian.org','nor...@salsa.debian.org','unknown')
) miss GROUP BY done_email
ORDER BY count DESC
;

done_email | count
--------------------------------------------------------+-------
got...@sapo.pt | 5221
ba...@quantz.debian.org | 2665
do...@cs.tu-berlin.de | 2555
kit...@northeye.org | 2371
m...@Linux.IT | 2056
her...@gondor.apana.org.au | 1900
da...@merkel.debian.org | 1788
daniel....@progress-technologies.net | 1393
m...@stro.at | 1327
bu...@fs.tum.de | 1278
debia...@adam-barratt.org.uk | 1155
cch...@cheney.cx | 1031
sram...@respighi.debian.org | 992
...
zweis...@gmx.de | 1
(9075 rows)

I wonder how the carnivore_* tables are filled and whether you want me
to draft some INSERT statements filling up the most relevant emails
where I would volunteer to sort the according IDs.

Kind regards
Andreas.

Andreas Tille

unread,
Jan 7, 2024, 8:30:04 AM1/7/24
to
Control: usertag -1 udd
BTW, its probably pretty easy to resolve >900 of these missing e-mails:

CREATE TEMPORARY TABLE missing_in_carnivore_emails AS
SELECT done_email, COUNT(*) FROM (
SELECT done_email FROM archived_bugs WHERE id IN (SELECT id FROM (SELECT ab.id, ce.id AS ce_id
FROM archived_bugs ab
LEFT JOIN carnivore_emails ce ON ce.email = ab.done_email
) noid WHERE ce_id IS NULL ) AND done_email NOT IN ('ftpm...@ftp-master.debian.org','nor...@salsa.debian.org','unknown')
) miss GROUP BY done_email
ORDER BY count DESC
;

SELECT DISTINCT done_name, done_email, cn.id FROM
(SELECT BTRIM(done_name, '"') AS done_name, done_email FROM archived_bugs) ab
LEFT JOIN carnivore_names cn ON cn.name = ab.done_name
WHERE done_email in (SELECT done_email FROM missing_in_carnivore_emails WHERE count > 10)
AND done_name IS NOT NULL AND done_name != ''
AND id IS NOT null
;

done_name | done_email | id
---------------------------------+-------------------------------------------------+------
Camm Maguire | ca...@enhanced.com | 6158
Ross Vandegrift | ro...@kallisti.us | 734
Michael Ablassmeier | a...@grinser.de | 2751
Neil McGovern | mau...@halon.org.uk | 3708
Torsten Landschoff | tor...@pclab.ifg.uni-kiel.de | 6320
Agney Lopes Roth Ferraz | ag...@users.sourceforge.net | 4000
Galen Hazelwood | gal...@micron.net | 1241
Anand Kumria | wild...@progsoc.org | 4175
Adam Rogoyski | rogo...@cs.utexas.edu | 1102
Christophe Barbe | christop...@ufies.org | 2054
Yann Dirson | ydi...@fr.alcove.com | 5804
Arjan Oosting | arjano...@home.nl | 5366
Julian Gilbey | J.D.G...@qmw.ac.uk | 3875
Norman Jordan | njo...@shaw.ca | 3513
Michael Piefel | pie...@informatik.hu-berlin.de | 1111
Frederic Lepied | Lep...@debian.org | 2460
...
Neil Williams | li...@codehelp.co.uk | 1552
Christopher Martin | chrs...@freeshell.org | 2754
Andrew Lenharth | a...@cs.washington.edu | 3085
(922 rows)


This statement could be easily turned into injects and would be a first approach to enhance
the carnivore_emails table with more ids.

If you give some green light I could create such a statement and maybe more enhancements
by looking into more tables.

Kind regards
Andreas.

--
http://fam-tille.de

Lucas Nussbaum

unread,
Jan 7, 2024, 12:10:03 PM1/7/24
to
Hi,

On 07/01/24 at 14:21 +0100, Andreas Tille wrote:
> > I tried to analyse closed bugs using done_email via carnivore_emails but realised
> > that this table is lacking lots of entries where I could easily add several from
> > my own memory:
>
> [...]
>
> > I wonder how the carnivore_* tables are filled and whether you want me
> > to draft some INSERT statements filling up the most relevant emails
> > where I would volunteer to sort the according IDs.

See https://salsa.debian.org/qa/udd/-/blob/master/udd/carnivore_gatherer.py.

carnivore is a service managed by the QA team (or the MIA team?). UDD
just imports what is being produced in quantz:/org/qa.debian.org/carnivore
(see quantz:/org/qa.debian.org/carnivore/report in particular)

> This statement could be easily turned into injects and would be a first approach to enhance
> the carnivore_emails table with more ids.
>
> If you give some green light I could create such a statement and maybe more enhancements
> by looking into more tables.

Please don't: the correct way to fix that is to improve the source data
(in quantz:/org/qa.debian.org/carnivore)

Lucas

Andreas Tille

unread,
Jan 7, 2024, 12:10:03 PM1/7/24
to
Am Sun, Jan 07, 2024 at 05:58:36PM +0100 schrieb Lucas Nussbaum:
>
> See https://salsa.debian.org/qa/udd/-/blob/master/udd/carnivore_gatherer.py.

Found this meanwhile.

> > This statement could be easily turned into injects and would be a first approach to enhance
> > the carnivore_emails table with more ids.
> >
> > If you give some green light I could create such a statement and maybe more enhancements
> > by looking into more tables.
>
> Please don't: the correct way to fix that is to improve the source data
> (in quantz:/org/qa.debian.org/carnivore)

I have not found this code in Salsa. Is it true that the Python2 code I
can find at

quantz:/org/qa.debian.org/carnivore

is the code source of carnivore?
0 new messages