I hate to say this, but the migration was stopped again. Now it seems
that GitHub does not rewrite issue references properly during the
transfer (sick!). Let me show what the problem is exactly:
Consider two issues: A and B, where A will reference B and B will
reference A. In our case this is used to model various relations like
"duplicates / is duplicated by", "blocks / is blocked by", "depends on
/ required by". So, in bz archive A will reference B as #B and B
will reference #A.
Now, let's migrate A. The references will be rewritten. #B =>
bz-archive#B and #A => llvm-project#A. However, after migration of B
only one reference is rewritten llvm-project#A => #A, the bz-archive#B
link in the issue A will not be rewritten and therefore a dangling
reference will appear.
For us this means that we will lose all links to duplicate issues, and
(more important!) to linked issues in the meta bugs.
I informed GitHub about the bug and I am waiting for their answer.
--
With best regards, Anton Korobeynikov,
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
cfe-dev mailing list
cfe...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> This may be a dumb question, but could this just be an issue of forward references (i.e. issue A references B, but B has not been transferred yet so doesn't exist)?
> If so, could the transfer be split into a two step process:
It cannot, sorry
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
Dear All,
I hate to say this, but the migration was stopped again. Now it seems
that GitHub does not rewrite issue references properly during the
transfer (sick!). Let me show what the problem is exactly:
Consider two issues: A and B, where A will reference B and B will
reference A. In our case this is used to model various relations like
"duplicates / is duplicated by", "blocks / is blocked by", "depends on
/ required by". So, in bz archive A will reference B as #B and B
will reference #A.
Now, let's migrate A. The references will be rewritten. #B =>
bz-archive#B and #A => llvm-project#A. However, after migration of B
only one reference is rewritten llvm-project#A => #A, the bz-archive#B
link in the issue A will not be rewritten and therefore a dangling
reference will appear.
Maybe something else, but IMO the meta-bugs case is quite severe.
Essentially we will lose all links from the meta-bug to the
"downstream" bugs, or vice versa, or mixture depending on the relative
order of migration of meta-bug and the dependees. I will be meeting
with GitHub folks on Monday to see what are the solutions.
> Time to move forward, ideally we should have done these kinds of migrations during the planning phase, but we didn’t and that’s a lesson learnt, but let’s finish up the migration as is. Move on in my view it looks good enough to use and you’ve done a good job but let’s not drag this out for 1% of bugs we might not look at much!
Well. The thing is: we checked and found lots of issues and introduced
many workarounds. I must admit that checking cyclic references after
the migration was not in my checklist and I spotted this issue by
accident. Ordinary references are migrated properly (both to source
code and other issues) and this was checked. There was an assumption
that basic github functionality would simply work. This was a mistake.
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
Alternatively, is it really impossible to, instead of moving issues,
ask github to just move the releases into that new repository and then
swap those two repositories (forks, stars, clones, etc.)?
I think all these problems are only because of the remapping, which
will be problematic regardless, because the in-source mentions aren't
getting rewritten, so there *will* be confusion regardless of whether
github succeeds in moving issues.
Roman
On Sat, Dec 4, 2021 at 11:56 AM Anton Korobeynikov via Openmp-dev
<openm...@lists.llvm.org> wrote:
>
> > To clarify, will those bz-archive#B references just not look as nice, or do they not work at all? It was my understanding that bz-archive#B will redirect to llvm-project#B2, in which case this doesn't seem overly problematic.
> They will be just text. No redirect.
>
> --
> With best regards, Anton Korobeynikov
> Department of Statistical Modelling, Saint Petersburg State University
> _______________________________________________
> Openmp-dev mailing list
> Openm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> Alternatively, is it really impossible to, instead of moving issues,
> ask github to just move the releases into that new repository and then
> swap those two repositories (forks, stars, clones, etc.)?
This is what we asked as well!. The answer was "there is no way".
Maybe there is a way, but it would require some significant
engineering effort from their side (e.g. additional development), so
our request was refused.
> I think all these problems are only because of the remapping, which
> will be problematic regardless, because the in-source mentions aren't
> getting rewritten, so there *will* be confusion regardless of whether
> github succeeds in moving issues.
Right. Do you have an idea how we can move forward?
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
_______________________________________________
The only casualties will be unimportant things:
github stars, github forks, github release dates;
but if github can't be bothered to help with those,
it will serve as a forever reminder to the users that github is unreliable,
and false dependence should not be created on replaceable unreliable things.
* Surely it can be automated.
> --
> With best regards, Anton Korobeynikov
> Department of Statistical Modelling, Saint Petersburg State University
Roman
Surely, if the community will re-decide that these are unimportant
things we can push the existing code into a blank archive fairly
quickly.
--
With best regards, Anton Korobeynikov
> However, this was previously discussed and it
> was decided that release dates do matter as well as forks. The latter
> is even more important for downstream users.
>
> Surely, if the community will re-decide that these are unimportant
> things we can push the existing code into a blank archive fairly
> quickly.
> --
> With best regards, Anton Korobeynikov
Roman
This would seem like a sensible approach to me.
I have worked with many different graph databases, and it is quite
normal to load the nodes or entities first, and then add the
relationships or links second.
Maybe one can add all the bugs first, without any relationships/links.
Then build up a map of github IDs vs bugzilla IDs, and then use that
map to then add all the relationships afterwards using the learnt
bugzilla IDs.
Or, alternatively, use your current method, and then scan over
everything at the end, to add in any relationships that got missed
using the above approach.
Kind Regards
James
> Maybe one can add all the bugs first, without any relationships/links.
> Then build up a map of github IDs vs bugzilla IDs, and then use that
> map to then add all the relationships afterwards using the learnt
> bugzilla IDs.
> Or, alternatively, use your current method, and then scan over
> everything at the end, to add in any relationships that got missed
> using the above approach.
There are few things that you're missing here, unfortunately:
0. Everything is API-rate limited.
1. Every change triggers notifications
2. Every change updates "last modified" timestamp
And no, GitHub is not a graph database which you can use as you could imagine.
Though, patches are always welcome ;)
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
[...]
Surely, if the community will re-decide that these are unimportant
things we can push the existing code into a blank archive fairly
quickly.
> Respectfully, yet frustrated with the never-ending email thread,
I understand your frustration and please rest assured that my own
frustration is certainly not less than yours. I'm also very exhausted
at the moment as the things are beyond my control. The constant
pushing from this and similar emails does not help in resolving the
situation. I certainly have to note that your accusations in "we'll do
it live section" are not quite accurate in many aspects - if you have
not seen the outcome of test imports, then it does not mean that there
were none. I would say even more, this means that they were successful
as nothing triggered excessive notifications (we made such a mistake
once – you could even find the reports of this in the MLs) . For your
information: the last "dry run import" which gated the migration was
14th full (52k issues) try. In all previous runs issues were found and
either a workaround was prepared or they were reported to GitHub.
Now let's proceed from the emotions to the real things. I do
appreciate your willingness to provide the help. Please see my notes
below.
> I'll test it out this week on a blank repo, with the goal of mirroring a 100-bug subset of the LLVM Bugzilla publicly visible in https://github.com/Quuxplusone/LLVMBugzillaTest/ by EOW.
First of all, proof-of-concept should certainly be 10k issues on
non-blank repo. It should already have closed issues / pull requests
in order to represent the real llvm-project repo.
Now down to details:
0. I would suggest you not to use the GitHub API. YMMV, but from our
experience: API is rate limited, and many things are outside your
control including:
- ids
- timestamps
- notifications
1. The real migration starts from a local gitlab instance, where you
import all bugzilla issues. You can certainly skip this step in your
own experiments and proceed directly to step 2, but this will allow
you to check the outcome of the import. The script we used could be
found at https://github.com/llvm/bugzilla2gitlab/tree/llvm
2. Then you need to prepare the dump which could be consumed by GitHub
Enterprise Migration API:
https://docs.github.com/en/rest/reference/migrations We are using
gitlab-to-github scripts provided by GitHub. I'm not sure I can share
them as they are not public – I will ask GH support engineers on
Monday and will return to you.
3. After the dump is prepared you need to upload it via GitHub
Enterprise Migration API. Note that import is only possible into empty
repo (it is essentially created). If the import failed you'd need to
ask GitHub engineers whether the error is real or whether it could be
ignored. If the error is real, then you'd likely need to restart from
scratch – it is possible to resume, but practice shows that this might
create duplicate comments.
4. After the import finished check the results: number of objects
(issues, comments, attachments) that were imported. If there are any
objects that failed import, then you need to figure out which ones and
what to do. Your options are: ignore or restart the import. Here is
the checklist I'm using for the content:
https://docs.google.com/document/d/1G6DZ6AxzSaOlrtTxoxtqYKnD4Myv40QfKK4wj54y8ms/edit
5. At this point one should have something similar to
https://github.com/llvm/llvm-bugzilla-archive
6. In order to transfer issues from the archive to the live repo there
are two options:
- Use GitHub rate-limited API
- Ask GitHub folks
The first variant triggers notifications to everyone mentioned,
assigned or commented on the issue. There is no way to silence these
notifications.
In our case here we are relying on GitHub support engineers that do
this migration step for us. There is no API, no script, nothing that
is within our control. We did several test migrations from dry-run
repo to another repo (and this is how we found all bugs wrt issue
transfer in the past). As I already said, the circular reference
rewriting was not included into my original checklist - I expected
that this feature "just works" and was only spotted later.
Hope this helps. Should you have more questions, I will certainly be
happy to help you. I'm interested in finishing this 2+ year project
more than anyone else.
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
> 1) Can't we calculate in advance the eventual ID of each issue. can't we determine that bugzilla PR12345 = GH12345 + (some offset caused by previous issues in GH)? - (assuming we always import in the same order, oldest first 1 at a time)
The thing is... it's not simple "+" here, the things are a bit more
complex, as we do have gaps in bugzilla id's as well. Some of the
issues were removed due to spam or GDPR requests. So we'd need to
track the things, but this is doable, yes provided that id mapping
that is done by GitHub is predictable. I... cannot be 100% sure as the
final transfer is done not by myself, but by GitHub support engineers
(in order not to trigger notifications on all 52k+ issues).
> Is that something that might be worth a try? or do you do this already and GH is messing it up?
The latter essentially. The references were properly built, but
towards the original archive repo. It is assumed that GH will rewrite
them during the transfer. This is the standard functionality and I was
assured that it works properly, it was tested, deployed, and worked
for many years, etc. etc. etc. Now we are caught halfway as we already
migrated ~13k issues to the main LLVM repo. As I said, I spotted the
problem by chance, checking for the circular links rewriting was not
in my checklist, when I checked links rewriting during the test
migration I checked essentially "one way" and everything was ok (one
needs to migrate both sides of the reference in order to see the
problem and apparently there were none of them in the test 100 issues
we migrated).
I'm meeting with GitHub folks today (morning Pacific Time) to discuss
the options. One option is to proceed with the transfer and rewrite
the stale links afterwards. But I'm wondering if there is a way to fix
the issue on GH side and what is the ETA.
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
If I understand the GitHub process correctly, PR / issue numbers are
monotonic integers that cannot be rolled back or modified because they
are used in cross-referencing and are externally visible. The mapping
from repository to name; however, is mutable (with big warnings when you
press the button because it breaks the aforementioned cross-referencing
and external links).
It should be possible to:
1. Create a new empty private GitHub project.
2. Import all bugs, with the same bug numbers.
3. Make the project public.
4. Pull the entire contents of the current repo to the new project.
5. Delete the llvm-project project (or rename it to llvm-project.old
or something).
6. Rename the new project to llvm-project
Steps 5 and 6 can't be atomic, so this will break everything that tries
to access the repo between steps 5 and 6, but that should be about 30
seconds of downtime. The end result should be a llvm/llvm-project
GitHub project containing the current git repo and the issues from
Bugzilla but not any of the existing issues / PRs on that repo.
David
> David
Roman
> Steps 5 and 6 can't be atomic, so this will break everything that tries
> to access the repo between steps 5 and 6, but that should be about 30
> seconds of downtime. The end result should be a llvm/llvm-project
> GitHub project containing the current git repo and the issues from
> Bugzilla but not any of the existing issues / PRs on that repo.
The problem is that repo is not only code + issues. There are also:
- Releases
- Forks
- Stars / Watches
- All kinds of tokens and integrations
- Maybe something else which I forgot right now
So, we'd also need to re-create releases, but we will lose all
metadata (e.g. there is no way to recreate releases using the correct
release dates) and there is no way to restore forks, so this will
affect all downstream users – they will essentially need to re-create
all their forks and move their private changes there. Currently there
are ~4.5k forks. I do not have information which forks are active.
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
> I also suggested that, but the thing that apparently throws the wrench
> at that is that (as per @aKor's previous mail), the bugzilla issues id's
> aren't consecutive, there are gaps, so just importing into a clean repo
> (even without having to worry about moving issues into a new repo)
> still won't result in 1:1 match to bugzilla issue id's.
The import to empty repo preserves the ids (like on our bugzilla
archive). Though new issues / pull requests will re-use spare id's, so
we'll need to pad for this as well.
The real issue is other parts of the repo which are not code / issues.
> It would have been nice to know this beforehand. This means that missing issue id's would need to
> be padded with empty issues.
See above. It's not a problem
PS: I do have a backup copy of bugzilla archive repo on GitHub. So
importing + renaming is essentially a matter of "git push" +
recreation of all releases. But for the releases I'd certainly
appreciate Tom's opinion.
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
I would be in favor of proceeding and trying to rewrite the links later.
As for the metabugs, I was planning to use Milestones instead of metabugs
once the migration was complete, and I would be fine with converting all
the old metabugs to Milestones, so I don't consider broken metabugs to
be a blocker.
-Tom
> 1) we already migrated in 1300 issues
Way more – 13k.
> 3) its not possible to remove the existing issues and start again
Right
> 4) so if any of the links are wrong in the 1300 then we can't do anything with them other than correct by hand? (is that correct?)
Likely. GitHub engineers are still investigating what are the options here.
> 5) GitHub say they don't recommend post-migration writing? Do they mean they don't recommend using an api to do that? Or doing it by hand?
Both actually. There are multiple concerns including notifications
that will be sent and the last changed time updated. Also, before
rewriting by ourselves we will need to build a map from bz id to
github id, so we will know what is the target issue id.
> 6) We can edit the comments by hand (can you only edit your own comments or can we edit someone else's comments, I'm thinking its only our own based on testing I've done with other repos)
Yes, only admins can edit everything.
> Assuming there is no obvious/immediate fix, Do we have any choice but to move ahead with the existing import and fix the comments by hand retrospectively (assuming 6)
This is what I asked GitHub engineers. They essentially asked for yet
another day to figure out the possible options. My rough estimate that
at least 5k issues will have broken links.
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
> 6) We can edit the comments by hand (can you only edit your own comments or can we edit someone else's comments, I'm thinking its only our own based on testing I've done with other repos)
> - isn't this a requirement in order to fix up the "code-blocks"?
Yes, only admins can edit everything.
> Assuming there is no obvious/immediate fix, Do we have any choice but to move ahead with the existing import and fix the comments by hand retrospectively (assuming 6)
This is what I asked GitHub engineers. They essentially asked for yet
another day to figure out the possible options. My rough estimate that
at least 5k issues will have broken links.
> Anton: I see about 35,000 issues in
> https://github.com/llvm/llvm-bugzilla-archive/issues
> but only 228 (i.e. essentially none, presumably just historical noise from newbie GitHub users) in
> https://github.com/llvm/llvm-project/issues
> Where are the 13,000 issues you are saying have already been migrated?
You cannot see them as issues are currently disabled in llvm-project
repo to keep the things intact while we are waiting for suggestions
from GitHub engineers. What you're seeing are pull requests (note the
header).
> IIUC, it's very fortunate that there aren't yet 13,000 issues in https://github.com/llvm/llvm-project/issues
They are, see above.
> Only once the whole migration has been tested end-to-end on a test repo, would I recommend starting the migration into the production repo https://github.com/llvm/llvm-project.
> Those make it clear that someone's done a little bit of work to script this stuff; but the Google Doc also makes it clear that there is a long way to go to accomplish a "deploy plan": someone needs to take that English description and turn it into code (Python or even Bash or whatever) that
Do you want me to bash script the work which is done by GitHub engineers?
> Step 1, getting the XML files from Bugzilla, turns out to be super easy because there's a public API for that:
> https://github.com/Quuxplusone/BugzillaToGithub
> Step 3, transforming XML to GitHub's JSON schema, requires knowing what GitHub's schema looks like. I've found
> https://gist.github.com/jonmagic/5282384165e0f86ef105#start-an-issue-import
> although it's not real clear what the schema is or if that even still works (I haven't tried yet). Also, there seems to be no way for one GitHub user to create a comment or issue putatively authored by some other GitHub user. (Which certainly makes sense.)
Well, the current approach we're using certainly handles this well.
Though, I would certainly like to see the migrated 10k issues at
https://github.com/Quuxplusone/ at the end of the week as you promised
and compare with what we already have in the llvm-bugzilla-archive.
So this would result in issues and comments filed by "LLVM Import
Bot" or whatever... but I think that's fine, and might even avoid some
issues that you'd have otherwise, with scenarios like "Joe User
created his GitHub account in 2015, but was making comments on LLVM
issues back in 2012."
> Vice versa, btw, you've currently got some issues being incorrectly imported with the reporter listed in the issue summary itself as "LLVM Bugzilla Contributor"; e.g. this one from Chris Burel.
Chris Burel did not fill the survey therefore the data is anonymised.
Dear All,
I hate to say this, but the migration was stopped again. Now it seems
that GitHub does not rewrite issue references properly during the
transfer (sick!). Let me show what the problem is exactly:
Consider two issues: A and B, where A will reference B and B will
reference A. In our case this is used to model various relations like
"duplicates / is duplicated by", "blocks / is blocked by", "depends on
/ required by". So, in bz archive A will reference B as #B and B
will reference #A.
Now, let's migrate A. The references will be rewritten. #B =>
bz-archive#B and #A => llvm-project#A. However, after migration of B
only one reference is rewritten llvm-project#A => #A, the bz-archive#B
link in the issue A will not be rewritten and therefore a dangling
reference will appear.
For us this means that we will lose all links to duplicate issues, and
(more important!) to linked issues in the meta bugs.
I informed GitHub about the bug and I am waiting for their answer.
--
> Some thoughts that just crossed my mind: what if we instead rewrite every link (not only backward reference) to http://llvm.org/PR<XXXX> ; since these links will continue to work and redirect, that would make all possible link working as expected wouldn't it?
Yes, or just make relative links to be absolute, e.g. instead of
llvm/llvm-bugzilla-archive#8125 do
https://github.com/llvm/llvm-bugzilla-archive/issues/8125 which is
essentially the same. As far as I know, GitHub engineers are
investigating the second opportunity now (e.g. make all references
absolute during the issue transfer).
--
Dear All,
I hate to say this, but the migration was stopped again. Now it seems
that GitHub does not rewrite issue references properly during the
transfer (sick!). Let me show what the problem is exactly:
Consider two issues: A and B, where A will reference B and B will
reference A. In our case this is used to model various relations like
"duplicates / is duplicated by", "blocks / is blocked by", "depends on
/ required by". So, in bz archive A will reference B as #B and B
will reference #A.
Now, let's migrate A. The references will be rewritten. #B =>
bz-archive#B and #A => llvm-project#A. However, after migration of B
only one reference is rewritten llvm-project#A => #A, the bz-archive#B
link in the issue A will not be rewritten and therefore a dangling
reference will appear.
For us this means that we will lose all links to duplicate issues, and
(more important!) to linked issues in the meta bugs.
I informed GitHub about the bug and I am waiting for their answer.
--
The current script seems to be forgetting that GitHub issues use Markdown, and so every existing Bugzilla comment needs to be wrapped in triple-backticks to preserve its semantics. (You could do cleverer things, like "don't wrap comments that are only one line long," but doing anything less-clever will be a non-starter.)
[...] btw, you've currently got some issues being incorrectly imported with the reporter listed in the issue summary itself as "LLVM Bugzilla Contributor"; e.g. this one from Chris Burel. https://github.com/llvm/llvm-bugzilla-archive/issues/52567
It certainly makes sense that you won't have a GitHub username for some people, but you still shouldn't throw away the information about their human name just because we're migrating from one platform to another.
> - Bugzilla comments are numbered, so people sometimes say e.g. "see comment 16"; GitHub comments are not numbered. The migration script might consider automagically turning these references into hyperlinks similar to how Bugzilla does it.
It cannot. The URL is not known before the issue is on GitHub as the
URL is assigned by GitHub.
Absolutely, if Arthur will implement such a script which does all
necessary changes, does not yield the spurious notifications and will
not screw the last modified timestamp, then I do not see any reason
why it cannot be run post-migration to perform changes as implemented.
On Sat, Dec 4, 2021 at 5:46 AM Anton Korobeynikov via cfe-dev <cfe...@lists.llvm.org> wrote:[...]
Surely, if the community will re-decide that these are unimportant
things we can push the existing code into a blank archive fairly
quickly.Please, test the above claim this week, on a blank repo. Let's actually find out whether it works, instead of relying on "Surely...".At this point I'm offering my own technical assistance, just to get the thing done and stop getting these emails every day. Send me your Bugzilla export script; I'll test it out this week on a blank repo, with the goal of mirroring a 100-bug subset of the LLVM Bugzilla publicly visible in https://github.com/Quuxplusone/LLVMBugzillaTest/ by EOW.
From the quick scan:
1. There are no labels
2. Attachments are not real – they are just links to bugzilla and will
be obsolete if bugzilla is e.g. down
3. Each attachment results in 2 comments, one of each is redundant
4. CC list is strange, e.g.
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12187 CC's to
"mail.sandbox.de"
5. All text is in verbatim boxes (e.g.
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12092) making
it almost impossible to read due to horizontal scroll
6. There are no "depends on" / "blocks on" references (see
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10900)
7. There are no cross-references in case of duplicates (see
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729)
...
It's pretty straightforward to come to the present state and there are
tools for this, we've been at this point in 2019 (see e.g.
https://github.com/asl/llvm-bugzilla/issues as it was outlined in LLVM
DevMtg 2019 roundtable discussion). The non-trivial part is to
workaround various GitHub issues which are also different depending on
API used.
On Fri, Dec 10, 2021 at 3:00 PM Arthur O'Dwyer
--
Thanks for the try!
From the quick scan:
1. There are no labels
2. Attachments are not real – they are just links to bugzilla and will
be obsolete if bugzilla is e.g. down
3. Each attachment results in 2 comments, one of each is redundant
4. CC list is strange, e.g.
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12187 CC's to
"mail.sandbox.de"
5. All text is in verbatim boxes (e.g.
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12092) making
it almost impossible to read due to horizontal scroll
6. There are no "depends on" / "blocks on" references (see
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10900)
7. There are no cross-references in case of duplicates (see
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729)
Yes, apparently I did. Sorry. I'll attach the logs to that issue instead :)
*** This bug has been marked as a duplicate of bug 9072 ***$ grep -hor '[*][*][*] .* [*][*][*]' xml/ > out
$ sed 's/[0-9][0-9]*/9/g' out | sort | uniq -c | sort -rn | eyeballing-by-arthur
2563 *** Bug 9 has been marked as a duplicate of this bug. ***
2504 *** This bug has been marked as a duplicate of bug 9 ***
76 *** This bug has been marked as a duplicate of 9 ***
...
It's pretty straightforward to come to the present state and there are
tools for this, we've been at this point in 2019 (see e.g.
https://github.com/asl/llvm-bugzilla/issues as it was outlined in LLVM
DevMtg 2019 roundtable discussion). The non-trivial part is to
workaround various GitHub issues which are also different depending on
API used.