[llvm-dev] Bugzilla migration is stopped again

Anton Korobeynikov via llvm-dev

unread,

Dec 3, 2021, 7:19:43 PM12/3/21

to llvm-dev, clang developer list, polly-dev, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org)

Dear All,

I hate to say this, but the migration was stopped again. Now it seems
that GitHub does not rewrite issue references properly during the
transfer (sick!). Let me show what the problem is exactly:

Consider two issues: A and B, where A will reference B and B will
reference A. In our case this is used to model various relations like
"duplicates / is duplicated by", "blocks / is blocked by", "depends on
/ required by". So, in bz archive A will reference B as #B and B
will reference #A.

Now, let's migrate A. The references will be rewritten. #B =>
bz-archive#B and #A => llvm-project#A. However, after migration of B
only one reference is rewritten llvm-project#A => #A, the bz-archive#B
link in the issue A will not be rewritten and therefore a dangling
reference will appear.

For us this means that we will lose all links to duplicate issues, and
(more important!) to linked issues in the meta bugs.

I informed GitHub about the bug and I am waiting for their answer.

--
With best regards, Anton Korobeynikov,
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Nemanja Ivanovic via llvm-dev

unread,

Dec 3, 2021, 7:54:30 PM12/3/21

to Anton Korobeynikov, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List

This may be a dumb question, but could this just be an issue of forward references (i.e. issue A references B, but B has not been transferred yet so doesn't exist)?

If so, could the transfer be split into a two step process:

1. Open all issues with summaries only

2. Populate description, comments, labels, etc.

Please note that I have no idea how any of this GitHub or Bugzilla stuff works, so this suggestion may be completely absurd.

cfe-dev mailing list
cfe...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Anton Korobeynikov via llvm-dev

unread,

Dec 4, 2021, 2:47:23 AM12/4/21

to Nemanja Ivanovic, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List

Hello

> This may be a dumb question, but could this just be an issue of forward references (i.e. issue A references B, but B has not been transferred yet so doesn't exist)?
> If so, could the transfer be split into a two step process:

It cannot, sorry

--
With best regards, Anton Korobeynikov

Department of Statistical Modelling, Saint Petersburg State University

MyDeveloper Day via llvm-dev

unread,

Dec 4, 2021, 3:35:41 AM12/4/21

to Anton Korobeynikov, llvm-dev, clang developer list, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org), polly-dev

How many issues are we talking about with circular dependencies? Small enough for us to fix by hand on a need to basis? At present the lack of bug tracker for 10 days is starting to be more painful than the 100% correctness of the data. My experience of multiple migrations to JIRA systems from various legacy bug trackers that this is an iterative processes at some point you say “That’s good enough” and you conclude that historical issues aren’t looked at enough to worry about it, as long as users can get back to the legacy system they can cross reference if necessary

Once you open this up, most issues we tackle will be new issues created in GitHub.

Time to move forward, ideally we should have done these kinds of migrations during the planning phase, but we didn’t and that’s a lesson learnt, but let’s finish up the migration as is. Move on in my view it looks good enough to use and you’ve done a good job but let’s not drag this out for 1% of bugs we might not look at much!

My 2p worth

MyDeveloperDay

Nikita Popov via llvm-dev

unread,

Dec 4, 2021, 3:50:24 AM12/4/21

to Anton Korobeynikov, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List

On Sat, Dec 4, 2021 at 1:19 AM Anton Korobeynikov via llvm-dev <llvm...@lists.llvm.org> wrote:

Dear All,

I hate to say this, but the migration was stopped again. Now it seems
that GitHub does not rewrite issue references properly during the
transfer (sick!). Let me show what the problem is exactly:

Consider two issues: A and B, where A will reference B and B will
reference A. In our case this is used to model various relations like
"duplicates / is duplicated by", "blocks / is blocked by", "depends on
/ required by". So, in bz archive A will reference B as #B and B
will reference #A.

Now, let's migrate A. The references will be rewritten. #B =>
bz-archive#B and #A => llvm-project#A. However, after migration of B
only one reference is rewritten llvm-project#A => #A, the bz-archive#B
link in the issue A will not be rewritten and therefore a dangling
reference will appear.

To clarify, will those bz-archive#B references just not look as nice, or do they not work at all? It was my understanding that bz-archive#B will redirect to llvm-project#B2, in which case this doesn't seem overly problematic.

Regards,

Nikita

Anton Korobeynikov via llvm-dev

unread,

Dec 4, 2021, 3:56:16 AM12/4/21

to MyDeveloper Day, llvm-dev, clang developer list, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org), polly-dev

> How many issues are we talking about with circular dependencies? Small enough for us to fix by hand on a need to basis? At present the lack of bug tracker for 10 days is starting to be more painful than the 100% correctness of the data.

This affects:
- All issues closed due to being duplicates
- All meta bugs (including release-tracking ones)

Maybe something else, but IMO the meta-bugs case is quite severe.
Essentially we will lose all links from the meta-bug to the
"downstream" bugs, or vice versa, or mixture depending on the relative
order of migration of meta-bug and the dependees. I will be meeting
with GitHub folks on Monday to see what are the solutions.

> Time to move forward, ideally we should have done these kinds of migrations during the planning phase, but we didn’t and that’s a lesson learnt, but let’s finish up the migration as is. Move on in my view it looks good enough to use and you’ve done a good job but let’s not drag this out for 1% of bugs we might not look at much!

Well. The thing is: we checked and found lots of issues and introduced
many workarounds. I must admit that checking cyclic references after
the migration was not in my checklist and I spotted this issue by
accident. Ordinary references are migrated properly (both to source
code and other issues) and this was checked. There was an assumption
that basic github functionality would simply work. This was a mistake.

--

With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University

Anton Korobeynikov via llvm-dev

unread,

Dec 4, 2021, 3:56:48 AM12/4/21

to Nikita Popov, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List

> To clarify, will those bz-archive#B references just not look as nice, or do they not work at all? It was my understanding that bz-archive#B will redirect to llvm-project#B2, in which case this doesn't seem overly problematic.

They will be just text. No redirect.

--

With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University

Roman Lebedev via llvm-dev

unread,

Dec 4, 2021, 4:53:20 AM12/4/21

to Anton Korobeynikov, llvm-dev, clang developer list, polly-dev, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org)

Is it really impossible to just completely remove all the current
issues and PR's in a repository and reset the counter, so that none of
this remapping is necessary in the first place?

Alternatively, is it really impossible to, instead of moving issues,
ask github to just move the releases into that new repository and then
swap those two repositories (forks, stars, clones, etc.)?

I think all these problems are only because of the remapping, which
will be problematic regardless, because the in-source mentions aren't
getting rewritten, so there *will* be confusion regardless of whether
github succeeds in moving issues.

Roman

On Sat, Dec 4, 2021 at 11:56 AM Anton Korobeynikov via Openmp-dev
<openm...@lists.llvm.org> wrote:
>
> > To clarify, will those bz-archive#B references just not look as nice, or do they not work at all? It was my understanding that bz-archive#B will redirect to llvm-project#B2, in which case this doesn't seem overly problematic.
> They will be just text. No redirect.
>
> --
> With best regards, Anton Korobeynikov
> Department of Statistical Modelling, Saint Petersburg State University
> _______________________________________________

> Openmp-dev mailing list
> Openm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev

Anton Korobeynikov via llvm-dev

unread,

Dec 4, 2021, 4:58:42 AM12/4/21

to Roman Lebedev, llvm-dev, clang developer list, polly-dev, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org)

> Is it really impossible to just completely remove all the current
> issues and PR's in a repository and reset the counter, so that none of
> this remapping is necessary in the first place?

I asked this question many times at different levels. As far as I was
told – yes. The bulk import could only happen to the empty repo. If
you know how it could be done in another way – please let us know.

> Alternatively, is it really impossible to, instead of moving issues,
> ask github to just move the releases into that new repository and then
> swap those two repositories (forks, stars, clones, etc.)?

This is what we asked as well!. The answer was "there is no way".
Maybe there is a way, but it would require some significant
engineering effort from their side (e.g. additional development), so
our request was refused.

> I think all these problems are only because of the remapping, which
> will be problematic regardless, because the in-source mentions aren't
> getting rewritten, so there *will* be confusion regardless of whether
> github succeeds in moving issues.

Right. Do you have an idea how we can move forward?

--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
_______________________________________________

Roman Lebedev via llvm-dev

unread,

Dec 4, 2021, 5:41:15 AM12/4/21

to Anton Korobeynikov, llvm-dev, clang developer list, polly-dev, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org)

On Sat, Dec 4, 2021 at 12:58 PM Anton Korobeynikov
<an...@korobeynikov.info> wrote:
>
> > Is it really impossible to just completely remove all the current
> > issues and PR's in a repository and reset the counter, so that none of
> > this remapping is necessary in the first place?
> I asked this question many times at different levels. As far as I was
> told – yes. The bulk import could only happen to the empty repo. If
> you know how it could be done in another way – please let us know.
>
> > Alternatively, is it really impossible to, instead of moving issues,
> > ask github to just move the releases into that new repository and then
> > swap those two repositories (forks, stars, clones, etc.)?
> This is what we asked as well!. The answer was "there is no way".
> Maybe there is a way, but it would require some significant
> engineering effort from their side (e.g. additional development), so
> our request was refused.
>
> > I think all these problems are only because of the remapping, which
> > will be problematic regardless, because the in-source mentions aren't
> > getting rewritten, so there *will* be confusion regardless of whether
> > github succeeds in moving issues.
> Right. Do you have an idea how we can move forward?

Once the issues are imported into a clean llvm-project-NEW repository,
push tags into it, manually* recreate github releases - why do their dates
matter? - by manually* re-uploading all the manually uploaded assets,
then lock down the old llvm-project, rename it to llvm-project-obsolete,
mirror it's branches into the new repo, and finally rename the new
llvm-project-NEW to llvm-project. And delete llvm-project-obsolete.
As far as git is concerned, by now llvm-project repo is exactly
identical as the old one.

The only casualties will be unimportant things:
github stars, github forks, github release dates;
but if github can't be bothered to help with those,
it will serve as a forever reminder to the users that github is unreliable,
and false dependence should not be created on replaceable unreliable things.

* Surely it can be automated.

> --
> With best regards, Anton Korobeynikov
> Department of Statistical Modelling, Saint Petersburg State University

Roman

Anton Korobeynikov via llvm-dev

unread,

Dec 4, 2021, 5:46:42 AM12/4/21

to Roman Lebedev, llvm-dev, clang developer list, polly-dev, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org)

> The only casualties will be unimportant things:
> github stars, github forks, github release dates;
> but if github can't be bothered to help with those,
> it will serve as a forever reminder to the users that github is unreliable,
> and false dependence should not be created on replaceable unreliable things.

Thanks for your opinion. However, this was previously discussed and it
was decided that release dates do matter as well as forks. The latter
is even more important for downstream users.

Surely, if the community will re-decide that these are unimportant
things we can push the existing code into a blank archive fairly
quickly.

--
With best regards, Anton Korobeynikov

Roman Lebedev via llvm-dev

unread,

Dec 4, 2021, 6:05:22 AM12/4/21

to Anton Korobeynikov, llvm-dev, clang developer list, polly-dev, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org)

On Sat, Dec 4, 2021 at 1:46 PM Anton Korobeynikov
<an...@korobeynikov.info> wrote:
>
> > The only casualties will be unimportant things:
> > github stars, github forks, github release dates;
> > but if github can't be bothered to help with those,
> > it will serve as a forever reminder to the users that github is unreliable,
> > and false dependence should not be created on replaceable unreliable things.
> Thanks for your opinion.

Yep, it is indeed *just* *my* opinion, formed by observing the situation.

> However, this was previously discussed and it
> was decided that release dates do matter as well as forks. The latter
> is even more important for downstream users.
>
> Surely, if the community will re-decide that these are unimportant
> things we can push the existing code into a blank archive fairly
> quickly.
> --
> With best regards, Anton Korobeynikov

Roman

James Dutton via llvm-dev

unread,

Dec 4, 2021, 6:19:33 AM12/4/21

to Nemanja Ivanovic, llvm-dev, polly-dev, openmp-dev (openmp-dev@lists.llvm.org), clang developer list, Flang Development List

On Sat, 4 Dec 2021 at 00:54, Nemanja Ivanovic via cfe-dev
<cfe...@lists.llvm.org> wrote:
>
> This may be a dumb question, but could this just be an issue of forward references (i.e. issue A references B, but B has not been transferred yet so doesn't exist)?
> If so, could the transfer be split into a two step process:
> 1. Open all issues with summaries only
> 2. Populate description, comments, labels, etc.
>
> Please note that I have no idea how any of this GitHub or Bugzilla stuff works, so this suggestion may be completely absurd.
>

This would seem like a sensible approach to me.
I have worked with many different graph databases, and it is quite
normal to load the nodes or entities first, and then add the
relationships or links second.
Maybe one can add all the bugs first, without any relationships/links.
Then build up a map of github IDs vs bugzilla IDs, and then use that
map to then add all the relationships afterwards using the learnt
bugzilla IDs.
Or, alternatively, use your current method, and then scan over
everything at the end, to add in any relationships that got missed
using the above approach.

Kind Regards

James

Anton Korobeynikov via llvm-dev

unread,

Dec 4, 2021, 6:40:59 AM12/4/21

to James Dutton, llvm-dev, polly-dev, openmp-dev (openmp-dev@lists.llvm.org), clang developer list, Flang Development List

Hello,

> Maybe one can add all the bugs first, without any relationships/links.
> Then build up a map of github IDs vs bugzilla IDs, and then use that
> map to then add all the relationships afterwards using the learnt
> bugzilla IDs.
> Or, alternatively, use your current method, and then scan over
> everything at the end, to add in any relationships that got missed
> using the above approach.

There are few things that you're missing here, unfortunately:

0. Everything is API-rate limited.
1. Every change triggers notifications
2. Every change updates "last modified" timestamp

And no, GitHub is not a graph database which you can use as you could imagine.

Though, patches are always welcome ;)

--

With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University

Arthur O'Dwyer via llvm-dev

unread,

Dec 4, 2021, 9:16:45 AM12/4/21

to Anton Korobeynikov, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List

On Sat, Dec 4, 2021 at 5:46 AM Anton Korobeynikov via cfe-dev <cfe...@lists.llvm.org> wrote:

[...]

Surely, if the community will re-decide that these are unimportant
things we can push the existing code into a blank archive fairly
quickly.

Please, test the above claim this week, on a blank repo. Let's actually find out whether it works, instead of relying on "Surely...".

At this point I'm offering my own technical assistance, just to get the thing done and stop getting these emails every day. Send me your Bugzilla export script; I'll test it out this week on a blank repo, with the goal of mirroring a 100-bug subset of the LLVM Bugzilla publicly visible in https://github.com/Quuxplusone/LLVMBugzillaTest/ by EOW.

(Credentials: I was SRE at Mixpanel for ~3 years and performed several 100GB cluster migrations with zero downtime. I have seen the "We'll do it live!" attitude be successful, but I have also seen it fail spectacularly. The alternative "plan it carefully, write down your deploy plan, test what can be tested ahead of time, do a practice run, then do it live" approach usually works better. At this point it looks like Anton's initial pass at "We'll do it live!" clearly was not successful, in the sense that if it were successful the repo would have been migrated circa Thanksgiving weekend. So this is the giant-honking-red-flashing-light alert that it's time to shift from "We'll do it live!" to "Let's make a deploy plan.")

Respectfully, yet frustrated with the never-ending email thread,

–Arthur

Anton Korobeynikov via llvm-dev

unread,

Dec 4, 2021, 12:06:33 PM12/4/21

to Arthur O'Dwyer, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List

Dear Arthur,

> Respectfully, yet frustrated with the never-ending email thread,

I understand your frustration and please rest assured that my own
frustration is certainly not less than yours. I'm also very exhausted
at the moment as the things are beyond my control. The constant
pushing from this and similar emails does not help in resolving the
situation. I certainly have to note that your accusations in "we'll do
it live section" are not quite accurate in many aspects - if you have
not seen the outcome of test imports, then it does not mean that there
were none. I would say even more, this means that they were successful
as nothing triggered excessive notifications (we made such a mistake
once – you could even find the reports of this in the MLs) . For your
information: the last "dry run import" which gated the migration was
14th full (52k issues) try. In all previous runs issues were found and
either a workaround was prepared or they were reported to GitHub.

Now let's proceed from the emotions to the real things. I do
appreciate your willingness to provide the help. Please see my notes
below.

> I'll test it out this week on a blank repo, with the goal of mirroring a 100-bug subset of the LLVM Bugzilla publicly visible in https://github.com/Quuxplusone/LLVMBugzillaTest/ by EOW.

First of all, proof-of-concept should certainly be 10k issues on
non-blank repo. It should already have closed issues / pull requests
in order to represent the real llvm-project repo.

Now down to details:
0. I would suggest you not to use the GitHub API. YMMV, but from our
experience: API is rate limited, and many things are outside your
control including:
- ids
- timestamps
- notifications
1. The real migration starts from a local gitlab instance, where you
import all bugzilla issues. You can certainly skip this step in your
own experiments and proceed directly to step 2, but this will allow
you to check the outcome of the import. The script we used could be
found at https://github.com/llvm/bugzilla2gitlab/tree/llvm

2. Then you need to prepare the dump which could be consumed by GitHub
Enterprise Migration API:
https://docs.github.com/en/rest/reference/migrations We are using
gitlab-to-github scripts provided by GitHub. I'm not sure I can share
them as they are not public – I will ask GH support engineers on
Monday and will return to you.

3. After the dump is prepared you need to upload it via GitHub
Enterprise Migration API. Note that import is only possible into empty
repo (it is essentially created). If the import failed you'd need to
ask GitHub engineers whether the error is real or whether it could be
ignored. If the error is real, then you'd likely need to restart from
scratch – it is possible to resume, but practice shows that this might
create duplicate comments.

4. After the import finished check the results: number of objects
(issues, comments, attachments) that were imported. If there are any
objects that failed import, then you need to figure out which ones and
what to do. Your options are: ignore or restart the import. Here is
the checklist I'm using for the content:
https://docs.google.com/document/d/1G6DZ6AxzSaOlrtTxoxtqYKnD4Myv40QfKK4wj54y8ms/edit

5. At this point one should have something similar to
https://github.com/llvm/llvm-bugzilla-archive

6. In order to transfer issues from the archive to the live repo there
are two options:
- Use GitHub rate-limited API
- Ask GitHub folks
The first variant triggers notifications to everyone mentioned,
assigned or commented on the issue. There is no way to silence these
notifications.
In our case here we are relying on GitHub support engineers that do
this migration step for us. There is no API, no script, nothing that
is within our control. We did several test migrations from dry-run
repo to another repo (and this is how we found all bugs wrt issue
transfer in the past). As I already said, the circular reference
rewriting was not included into my original checklist - I expected
that this feature "just works" and was only spotted later.

Hope this helps. Should you have more questions, I will certainly be
happy to help you. I'm interested in finishing this 2+ year project
more than anyone else.

--
With best regards, Anton Korobeynikov

Department of Statistical Modelling, Saint Petersburg State University

MyDeveloper Day via llvm-dev

unread,

Dec 6, 2021, 3:33:59 AM12/6/21

to Anton Korobeynikov, llvm-dev, clang developer list, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org), polly-dev

Trying to think of a solution that might help while we wait for github.

1) Can't we calculate in advance the eventual ID of each issue. can't we determine that bugzilla PR12345 = GH12345 + (some offset caused by previous issues in GH)? - (assuming we always import in the same order, oldest first 1 at a time)

2) Could you build up this mapping a priori?

3) Could you then spin through the bugzilla issues prior to migration (assuming they are in some form you can manipulate, JSON, XML, TXT?) etc...programatically changing the links to what they ultimately will be before doing the migration?

4) Then import that into the github (new bugzilla-archive)

5) then copy those issues from one repo to another?

6) Assuming all ducks are lined up correctly wouldn't these broken links now seemingly point to the correct links?

Is that something that might be worth a try? or do you do this already and GH is messing it up?

MyDeveloperDay.

Anton Korobeynikov via llvm-dev

unread,

Dec 6, 2021, 4:07:13 AM12/6/21

to MyDeveloper Day, llvm-dev, clang developer list, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org), polly-dev

Hello

> 1) Can't we calculate in advance the eventual ID of each issue. can't we determine that bugzilla PR12345 = GH12345 + (some offset caused by previous issues in GH)? - (assuming we always import in the same order, oldest first 1 at a time)

The thing is... it's not simple "+" here, the things are a bit more
complex, as we do have gaps in bugzilla id's as well. Some of the
issues were removed due to spam or GDPR requests. So we'd need to
track the things, but this is doable, yes provided that id mapping
that is done by GitHub is predictable. I... cannot be 100% sure as the
final transfer is done not by myself, but by GitHub support engineers
(in order not to trigger notifications on all 52k+ issues).

> Is that something that might be worth a try? or do you do this already and GH is messing it up?

The latter essentially. The references were properly built, but
towards the original archive repo. It is assumed that GH will rewrite
them during the transfer. This is the standard functionality and I was
assured that it works properly, it was tested, deployed, and worked
for many years, etc. etc. etc. Now we are caught halfway as we already
migrated ~13k issues to the main LLVM repo. As I said, I spotted the
problem by chance, checking for the circular links rewriting was not
in my checklist, when I checked links rewriting during the test
migration I checked essentially "one way" and everything was ok (one
needs to migrate both sides of the reference in order to see the
problem and apparently there were none of them in the test 100 issues
we migrated).

I'm meeting with GitHub folks today (morning Pacific Time) to discuss
the options. One option is to proceed with the transfer and rewrite
the stale links afterwards. But I'm wondering if there is a way to fix
the issue on GH side and what is the ETA.

--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University

David Chisnall via llvm-dev

unread,

Dec 6, 2021, 4:31:19 AM12/6/21

to llvm...@lists.llvm.org

On 04/12/2021 09:58, Anton Korobeynikov via llvm-dev wrote:
>> Is it really impossible to just completely remove all the current
>> issues and PR's in a repository and reset the counter, so that none of
>> this remapping is necessary in the first place?
> I asked this question many times at different levels. As far as I was
> told – yes. The bulk import could only happen to the empty repo. If
> you know how it could be done in another way – please let us know.
>

If I understand the GitHub process correctly, PR / issue numbers are
monotonic integers that cannot be rolled back or modified because they
are used in cross-referencing and are externally visible. The mapping
from repository to name; however, is mutable (with big warnings when you
press the button because it breaks the aforementioned cross-referencing
and external links).

It should be possible to:

1. Create a new empty private GitHub project.
2. Import all bugs, with the same bug numbers.
3. Make the project public.
4. Pull the entire contents of the current repo to the new project.
5. Delete the llvm-project project (or rename it to llvm-project.old
or something).
6. Rename the new project to llvm-project

Steps 5 and 6 can't be atomic, so this will break everything that tries
to access the repo between steps 5 and 6, but that should be about 30
seconds of downtime. The end result should be a llvm/llvm-project
GitHub project containing the current git repo and the issues from
Bugzilla but not any of the existing issues / PRs on that repo.

David

Roman Lebedev via llvm-dev

unread,

Dec 6, 2021, 4:41:26 AM12/6/21

to David Chisnall, llvm...@lists.llvm.org

I also suggested that, but the thing that apparently throws the wrench
at that is that (as per @aKor's previous mail), the bugzilla issues id's
aren't consecutive, there are gaps, so just importing into a clean repo
(even without having to worry about moving issues into a new repo)
still won't result in 1:1 match to bugzilla issue id's. It would have been nice
to know this beforehand. This means that missing issue id's would need to
be padded with empty issues.

> David
Roman

Anton Korobeynikov via llvm-dev

unread,

Dec 6, 2021, 4:41:59 AM12/6/21

to David Chisnall, llvm...@lists.llvm.org

Hello David,

> Steps 5 and 6 can't be atomic, so this will break everything that tries
> to access the repo between steps 5 and 6, but that should be about 30
> seconds of downtime. The end result should be a llvm/llvm-project
> GitHub project containing the current git repo and the issues from
> Bugzilla but not any of the existing issues / PRs on that repo.

The problem is that repo is not only code + issues. There are also:
- Releases
- Forks
- Stars / Watches
- All kinds of tokens and integrations
- Maybe something else which I forgot right now

So, we'd also need to re-create releases, but we will lose all
metadata (e.g. there is no way to recreate releases using the correct
release dates) and there is no way to restore forks, so this will
affect all downstream users – they will essentially need to re-create
all their forks and move their private changes there. Currently there
are ~4.5k forks. I do not have information which forks are active.

--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University

Anton Korobeynikov via llvm-dev

unread,

Dec 6, 2021, 4:45:52 AM12/6/21

to Roman Lebedev, llvm...@lists.llvm.org

Roman,

> I also suggested that, but the thing that apparently throws the wrench
> at that is that (as per @aKor's previous mail), the bugzilla issues id's
> aren't consecutive, there are gaps, so just importing into a clean repo
> (even without having to worry about moving issues into a new repo)
> still won't result in 1:1 match to bugzilla issue id's.

The import to empty repo preserves the ids (like on our bugzilla
archive). Though new issues / pull requests will re-use spare id's, so
we'll need to pad for this as well.

The real issue is other parts of the repo which are not code / issues.

> It would have been nice to know this beforehand. This means that missing issue id's would need to
> be padded with empty issues.

See above. It's not a problem

PS: I do have a backup copy of bugzilla archive repo on GitHub. So
importing + renaming is essentially a matter of "git push" +
recreation of all releases. But for the releases I'd certainly
appreciate Tom's opinion.

--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University

Tom Stellard via llvm-dev

unread,

Dec 6, 2021, 12:01:29 PM12/6/21

to Anton Korobeynikov, MyDeveloper Day, llvm-dev, Flang Development List, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), polly-dev

I would be in favor of proceeding and trying to rewrite the links later.
As for the metabugs, I was planning to use Milestones instead of metabugs
once the migration was complete, and I would be fine with converting all
the old metabugs to Milestones, so I don't consider broken metabugs to
be a blocker.

-Tom

MyDeveloper Day via llvm-dev

unread,

Dec 7, 2021, 5:12:33 AM12/7/21

to Anton Korobeynikov, llvm-dev, clang developer list, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org), polly-dev

I read your Dec 7th Updates from the google doc..

Could I summarize based on my understanding:

1) we already migrated in 1300 issues

2) there were a handful of issues in there previously (erroneously entered because they didn't realize we had bugzilla before we hid them)

3) its not possible to remove the existing issues and start again

4) so if any of the links are wrong in the 1300 then we can't do anything with them other than correct by hand? (is that correct?)

5) GitHub say they don't recommend post-migration writing? Do they mean they don't recommend using an api to do that? Or doing it by hand?

6) We can edit the comments by hand (can you only edit your own comments or can we edit someone else's comments, I'm thinking its only our own based on testing I've done with other repos)

- isn't this a requirement in order to fix up the "code-blocks"?

7) We can't really go back to bugzilla now we've imported the 1300 otherwise if those 1300 get edited they be out of date (I assume future updates would be impossible)

Assuming there is no obvious/immediate fix, Do we have any choice but to move ahead with the existing import and fix the comments by hand retrospectively (assuming 6)

If we could identify the items needing editing (a list) I'd be happy to volunteer to do some of them by hand. (Assuming we can edit the comments of others)

MyDeveloperDay.

Anton Korobeynikov via llvm-dev

unread,

Dec 7, 2021, 7:32:56 AM12/7/21

to MyDeveloper Day, llvm-dev, clang developer list, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org), polly-dev

Hello

> 1) we already migrated in 1300 issues

Way more – 13k.

> 3) its not possible to remove the existing issues and start again

Right

> 4) so if any of the links are wrong in the 1300 then we can't do anything with them other than correct by hand? (is that correct?)

Likely. GitHub engineers are still investigating what are the options here.

> 5) GitHub say they don't recommend post-migration writing? Do they mean they don't recommend using an api to do that? Or doing it by hand?

Both actually. There are multiple concerns including notifications
that will be sent and the last changed time updated. Also, before
rewriting by ourselves we will need to build a map from bz id to
github id, so we will know what is the target issue id.

> 6) We can edit the comments by hand (can you only edit your own comments or can we edit someone else's comments, I'm thinking its only our own based on testing I've done with other repos)

Yes, only admins can edit everything.

> Assuming there is no obvious/immediate fix, Do we have any choice but to move ahead with the existing import and fix the comments by hand retrospectively (assuming 6)

This is what I asked GitHub engineers. They essentially asked for yet
another day to figure out the possible options. My rough estimate that
at least 5k issues will have broken links.

--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University

Arthur O'Dwyer via llvm-dev

unread,

Dec 7, 2021, 10:54:48 AM12/7/21

to Anton Korobeynikov, llvm-dev, polly-dev, openmp-dev (openmp-dev@lists.llvm.org), clang developer list, Flang Development List

On Tue, Dec 7, 2021 at 7:33 AM Anton Korobeynikov via cfe-dev <cfe...@lists.llvm.org> wrote:

> 6) We can edit the comments by hand (can you only edit your own comments or can we edit someone else's comments, I'm thinking its only our own based on testing I've done with other repos)

> - isn't this a requirement in order to fix up the "code-blocks"?

Yes, only admins can edit everything.

I noticed this yesterday with the existing test migration: compare

https://bugs.llvm.org/show_bug.cgi?id=52598

versus

https://github.com/llvm/llvm-bugzilla-archive/issues/52598

The current script seems to be forgetting that GitHub issues use Markdown, and so every existing Bugzilla comment needs to be wrapped in triple-backticks to preserve its semantics.

(You could do cleverer things, like "don't wrap comments that are only one line long," but doing anything less-clever will be a non-starter.)

> Assuming there is no obvious/immediate fix, Do we have any choice but to move ahead with the existing import and fix the comments by hand retrospectively (assuming 6)
This is what I asked GitHub engineers. They essentially asked for yet
another day to figure out the possible options. My rough estimate that
at least 5k issues will have broken links.

Anton: I see about 35,000 issues in

https://github.com/llvm/llvm-bugzilla-archive/issues

but only 228 (i.e. essentially none, presumably just historical noise from newbie GitHub users) in

https://github.com/llvm/llvm-project/issues

Where are the 13,000 issues you are saying have already been migrated?

IIUC, it's very fortunate that there aren't yet 13,000 issues in https://github.com/llvm/llvm-project/issues . That means that it is still an option to do a "practice" migration into a test repo — e.g., https://github.com/llvm/llvm-bugzilla-archive2 (and then if it works as intended, you can either "blow away https://github.com/llvm/llvm-bugzilla-archive and rename https://github.com/llvm/llvm-bugzilla-archive2 to https://github.com/llvm/llvm-bugzilla-archive", or "blow away https://github.com/llvm/llvm-bugzilla-archive and repeat the migration just to prove it works reproducibly".

Only once the whole migration has been tested end-to-end on a test repo, would I recommend starting the migration into the production repo https://github.com/llvm/llvm-project.

Thanks for the links to https://github.com/llvm/bugzilla2gitlab/tree/llvm and https://docs.google.com/document/d/1G6DZ6AxzSaOlrtTxoxtqYKnD4Myv40QfKK4wj54y8ms/edit .

Those make it clear that someone's done a little bit of work to script this stuff; but the Google Doc also makes it clear that there is a long way to go to accomplish a "deploy plan": someone needs to take that English description and turn it into code (Python or even Bash or whatever) that can be

(A) reviewed for correctness, without running it

(B) run multiple times with guaranteed same behavior, with no risk that some human will accidentally forget a step in the middle

Step 1, getting the XML files from Bugzilla, turns out to be super easy because there's a public API for that:

https://github.com/Quuxplusone/BugzillaToGithub

Step 3, transforming XML to GitHub's JSON schema, requires knowing what GitHub's schema looks like. I've found

https://gist.github.com/jonmagic/5282384165e0f86ef105#start-an-issue-import

although it's not real clear what the schema is or if that even still works (I haven't tried yet). Also, there seems to be no way for one GitHub user to create a comment or issue putatively authored by some other GitHub user. (Which certainly makes sense.) So this would result in issues and comments filed by "LLVM Import Bot" or whatever... but I think that's fine, and might even avoid some issues that you'd have otherwise, with scenarios like "Joe User created his GitHub account in 2015, but was making comments on LLVM issues back in 2012."

Vice versa, btw, you've currently got some issues being incorrectly imported with the reporter listed in the issue summary itself as "LLVM Bugzilla Contributor"; e.g. this one from Chris Burel.

https://github.com/llvm/llvm-bugzilla-archive/issues/52567

It certainly makes sense that you won't have a GitHub username for some people, but you still shouldn't throw away the information about their human name just because we're migrating from one platform to another.

–Arthur

Anton Korobeynikov via llvm-dev

unread,

Dec 7, 2021, 11:04:04 AM12/7/21

to Arthur O'Dwyer, llvm-dev, polly-dev, openmp-dev (openmp-dev@lists.llvm.org), clang developer list, Flang Development List

> I noticed this yesterday with the existing test migration: compare
> https://bugs.llvm.org/show_bug.cgi?id=52598
> versus
> https://github.com/llvm/llvm-bugzilla-archive/issues/52598
>
> The current script seems to be forgetting that GitHub issues use Markdown, and so every existing Bugzilla comment needs to be wrapped in triple-backticks to preserve its semantics.

No it is not. This was discussed at one of the roundtables and it was
decided that the conversion will be done verbatim. If necessary for
some issues it could be converted to proper Markdown by the reporters.

> Anton: I see about 35,000 issues in
> https://github.com/llvm/llvm-bugzilla-archive/issues
> but only 228 (i.e. essentially none, presumably just historical noise from newbie GitHub users) in
> https://github.com/llvm/llvm-project/issues
> Where are the 13,000 issues you are saying have already been migrated?

You cannot see them as issues are currently disabled in llvm-project
repo to keep the things intact while we are waiting for suggestions
from GitHub engineers. What you're seeing are pull requests (note the
header).

> IIUC, it's very fortunate that there aren't yet 13,000 issues in https://github.com/llvm/llvm-project/issues

They are, see above.

> Only once the whole migration has been tested end-to-end on a test repo, would I recommend starting the migration into the production repo https://github.com/llvm/llvm-project.

> Those make it clear that someone's done a little bit of work to script this stuff; but the Google Doc also makes it clear that there is a long way to go to accomplish a "deploy plan": someone needs to take that English description and turn it into code (Python or even Bash or whatever) that
Do you want me to bash script the work which is done by GitHub engineers?

> Step 1, getting the XML files from Bugzilla, turns out to be super easy because there's a public API for that:
> https://github.com/Quuxplusone/BugzillaToGithub
> Step 3, transforming XML to GitHub's JSON schema, requires knowing what GitHub's schema looks like. I've found
> https://gist.github.com/jonmagic/5282384165e0f86ef105#start-an-issue-import
> although it's not real clear what the schema is or if that even still works (I haven't tried yet). Also, there seems to be no way for one GitHub user to create a comment or issue putatively authored by some other GitHub user. (Which certainly makes sense.)

Well, the current approach we're using certainly handles this well.
Though, I would certainly like to see the migrated 10k issues at
https://github.com/Quuxplusone/ at the end of the week as you promised
and compare with what we already have in the llvm-bugzilla-archive.

So this would result in issues and comments filed by "LLVM Import
Bot" or whatever... but I think that's fine, and might even avoid some
issues that you'd have otherwise, with scenarios like "Joe User
created his GitHub account in 2015, but was making comments on LLVM
issues back in 2012."

> Vice versa, btw, you've currently got some issues being incorrectly imported with the reporter listed in the issue summary itself as "LLVM Bugzilla Contributor"; e.g. this one from Chris Burel.

Chris Burel did not fill the survey therefore the data is anonymised.

Mehdi AMINI via llvm-dev

unread,

Dec 7, 2021, 12:20:44 PM12/7/21

to Anton Korobeynikov, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List

Hi,

Some thoughts that just crossed my mind: what if we instead rewrite every link (not only backward reference) to http://llvm.org/PR<XXXX> ; since these links will continue to work and redirect, that would make all possible link working as expected wouldn't it?

--

Mehdi

On Fri, Dec 3, 2021 at 4:19 PM Anton Korobeynikov via cfe-dev <cfe...@lists.llvm.org> wrote:

Dear All,

I hate to say this, but the migration was stopped again. Now it seems
that GitHub does not rewrite issue references properly during the
transfer (sick!). Let me show what the problem is exactly:

Consider two issues: A and B, where A will reference B and B will
reference A. In our case this is used to model various relations like
"duplicates / is duplicated by", "blocks / is blocked by", "depends on
/ required by". So, in bz archive A will reference B as #B and B
will reference #A.

Now, let's migrate A. The references will be rewritten. #B =>
bz-archive#B and #A => llvm-project#A. However, after migration of B
only one reference is rewritten llvm-project#A => #A, the bz-archive#B
link in the issue A will not be rewritten and therefore a dangling
reference will appear.

For us this means that we will lose all links to duplicate issues, and
(more important!) to linked issues in the meta bugs.

I informed GitHub about the bug and I am waiting for their answer.

--

Anton Korobeynikov via llvm-dev

unread,

Dec 7, 2021, 12:23:56 PM12/7/21

to Mehdi AMINI, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List

Hi Mehdi,

> Some thoughts that just crossed my mind: what if we instead rewrite every link (not only backward reference) to http://llvm.org/PR<XXXX> ; since these links will continue to work and redirect, that would make all possible link working as expected wouldn't it?

Yes, or just make relative links to be absolute, e.g. instead of
llvm/llvm-bugzilla-archive#8125 do
https://github.com/llvm/llvm-bugzilla-archive/issues/8125 which is
essentially the same. As far as I know, GitHub engineers are
investigating the second opportunity now (e.g. make all references
absolute during the issue transfer).

--

Anton Korobeynikov via llvm-dev

unread,

Dec 8, 2021, 3:14:42 AM12/8/21

to MyDeveloper Day, llvm-dev, clang developer list, Flang Development List, openmp-dev (openmp-dev@lists.llvm.org), polly-dev

> > 4) so if any of the links are wrong in the 1300 then we can't do anything with them other than correct by hand? (is that correct?)
> Likely. GitHub engineers are still investigating what are the options here.

I've been told that the bug was fixed at the GitHub side. After
checking we might proceed with the migration.

Christian Kühnel via llvm-dev

unread,

Dec 8, 2021, 3:31:20 AM12/8/21

to Anton Korobeynikov, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List

Hi folks,

Anton, thank you so much for driving the migration and all the effort you have invested!

We discussed the current situation and the options in the Infrastructure Working Group meeting yesterday [1]. Our proposal is to give GitHub another 48 hours (i.e. Thursday EOB) to resolve the issue regarding missing links in circular dependencies.

If it can't be resolved by that time, we propose to move forward with the migration anyway. The data is there, we're only missing the links. From our perspective this is something we can live with.

Best,

Christian on behalf of the Infrastructure Working Group

[1] https://github.com/llvm/llvm-iwg/issues/56#issuecomment-988090092

On Sat, Dec 4, 2021 at 1:19 AM Anton Korobeynikov via cfe-dev <cfe...@lists.llvm.org> wrote:

Dear All,

I hate to say this, but the migration was stopped again. Now it seems
that GitHub does not rewrite issue references properly during the
transfer (sick!). Let me show what the problem is exactly:

Consider two issues: A and B, where A will reference B and B will
reference A. In our case this is used to model various relations like
"duplicates / is duplicated by", "blocks / is blocked by", "depends on
/ required by". So, in bz archive A will reference B as #B and B
will reference #A.

Now, let's migrate A. The references will be rewritten. #B =>
bz-archive#B and #A => llvm-project#A. However, after migration of B
only one reference is rewritten llvm-project#A => #A, the bz-archive#B
link in the issue A will not be rewritten and therefore a dangling
reference will appear.

For us this means that we will lose all links to duplicate issues, and
(more important!) to linked issues in the meta bugs.

I informed GitHub about the bug and I am waiting for their answer.

--

Arthur O'Dwyer via llvm-dev

unread,

Dec 8, 2021, 5:48:05 PM12/8/21

to Anton Korobeynikov, llvm-dev, clang developer list, polly-dev

On Tue, Dec 7, 2021 at 10:54 AM Arthur O'Dwyer <arthur....@gmail.com> wrote:

The current script seems to be forgetting that GitHub issues use Markdown, and so every existing Bugzilla comment needs to be wrapped in triple-backticks to preserve its semantics. (You could do cleverer things, like "don't wrap comments that are only one line long," but doing anything less-clever will be a non-starter.)

[...] btw, you've currently got some issues being incorrectly imported with the reporter listed in the issue summary itself as "LLVM Bugzilla Contributor"; e.g. this one from Chris Burel. https://github.com/llvm/llvm-bugzilla-archive/issues/52567

It certainly makes sense that you won't have a GitHub username for some people, but you still shouldn't throw away the information about their human name just because we're migrating from one platform to another.

Two more things I've noticed while spot-checking:

https://github.com/llvm/llvm-bugzilla-archive/issues/36617

- Bugzilla lets you attach file attachments; GitHub doesn't. Attachments are not preserved by the migration.

- Bugzilla comments are numbered, so people sometimes say e.g. "see comment 16"; GitHub comments are not numbered. The migration script might consider automagically turning these references into hyperlinks similar to how Bugzilla does it.

–Arthur

Anton Korobeynikov via llvm-dev

unread,

Dec 8, 2021, 5:51:20 PM12/8/21

to Arthur O'Dwyer, llvm-dev, clang developer list, polly-dev

> Two more things I've noticed while spot-checking:
> https://github.com/llvm/llvm-bugzilla-archive/issues/36617
> - Bugzilla lets you attach file attachments; GitHub doesn't. Attachments are not preserved by the migration.

You are wrong again. Attachments are preserved:
https://github.com/llvm/llvm-bugzilla-archive/issues/36617#issuecomment-980994333
Will you please next time check your claims more carefully?

> - Bugzilla comments are numbered, so people sometimes say e.g. "see comment 16"; GitHub comments are not numbered. The migration script might consider automagically turning these references into hyperlinks similar to how Bugzilla does it.

It cannot. The URL is not known before the issue is on GitHub as the
URL is assigned by GitHub.

Geoffrey Martin-Noble via llvm-dev

unread,

Dec 8, 2021, 6:54:51 PM12/8/21

to Anton Korobeynikov, Arthur O'Dwyer, llvm-dev, clang developer list, polly-dev

A couple of potentially relevant pieces of information. At least under the default notification settings, neither adding a label, nor editing a comment sends a notification. So it should be possible to fix things that require doing either of these things after the fact. (e.g. Arthur's suggestion could be implemented *after* the issue is in GitHub, when the URL is known)

Anton Korobeynikov via llvm-dev

unread,

Dec 8, 2021, 6:58:07 PM12/8/21

to Geoffrey Martin-Noble, Arthur O'Dwyer, llvm-dev, clang developer list, polly-dev

Geoffrey,

Absolutely, if Arthur will implement such a script which does all
necessary changes, does not yield the spurious notifications and will
not screw the last modified timestamp, then I do not see any reason
why it cannot be run post-migration to perform changes as implemented.

Arthur O'Dwyer via llvm-dev

unread,

Dec 10, 2021, 7:01:13 AM12/10/21

to Anton Korobeynikov, llvm-dev, polly-dev, clang developer list

On Sat, Dec 4, 2021 at 9:16 AM Arthur O'Dwyer <arthur....@gmail.com> wrote:

On Sat, Dec 4, 2021 at 5:46 AM Anton Korobeynikov via cfe-dev <cfe...@lists.llvm.org> wrote:
[...]
Surely, if the community will re-decide that these are unimportant
things we can push the existing code into a blank archive fairly
quickly.

Please, test the above claim this week, on a blank repo. Let's actually find out whether it works, instead of relying on "Surely...".

At this point I'm offering my own technical assistance, just to get the thing done and stop getting these emails every day. Send me your Bugzilla export script; I'll test it out this week on a blank repo, with the goal of mirroring a 100-bug subset of the LLVM Bugzilla publicly visible in https://github.com/Quuxplusone/LLVMBugzillaTest/ by EOW.

The promised EOW update: I have written Python scripts for the Export, Transform, and (dumbed-down, see below) Load stages of a bugzilla-to-github migration. You can find them at

https://github.com/Quuxplusone/BugzillaToGithub#bugzilla-to-github

and the resulting GitHub issues list (which is just partial, so far) lives at

https://github.com/Quuxplusone/LLVMBugzillaTest/issues

This is merely the result of five evenings of work, so e.g. the formatting of message bodies still isn't perfect, and as of this morning I'm aware of at least one bug (that GitHub's import API doesn't like a comment to have empty string as its `body`). And of course the biggest issue is that I was noodling around without special access to GitHub staff, who are the only people able to forge issue/comment authorship; so my script just puts everything under the username of the person-or-bot that runs it. I guarantee GitHub SRE can help with that.

Arthur

Anton Korobeynikov via llvm-dev

unread,

Dec 10, 2021, 7:34:01 AM12/10/21

to Arthur O'Dwyer, llvm-dev, polly-dev, clang developer list

Thanks for the try!

From the quick scan:

1. There are no labels
2. Attachments are not real – they are just links to bugzilla and will
be obsolete if bugzilla is e.g. down
3. Each attachment results in 2 comments, one of each is redundant
4. CC list is strange, e.g.
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12187 CC's to
"mail.sandbox.de"
5. All text is in verbatim boxes (e.g.
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12092) making
it almost impossible to read due to horizontal scroll
6. There are no "depends on" / "blocks on" references (see
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10900)
7. There are no cross-references in case of duplicates (see
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729)
...

It's pretty straightforward to come to the present state and there are
tools for this, we've been at this point in 2019 (see e.g.
https://github.com/asl/llvm-bugzilla/issues as it was outlined in LLVM
DevMtg 2019 roundtable discussion). The non-trivial part is to
workaround various GitHub issues which are also different depending on
API used.

On Fri, Dec 10, 2021 at 3:00 PM Arthur O'Dwyer

--

Arthur O'Dwyer via llvm-dev

unread,

Dec 10, 2021, 11:42:35 AM12/10/21

to Anton Korobeynikov, llvm-dev, polly-dev, clang developer list

On Fri, Dec 10, 2021 at 7:33 AM Anton Korobeynikov <an...@korobeynikov.info> wrote:

Thanks for the try!

From the quick scan:

1. There are no labels

There are labels, but only according to the "keywords" field from Bugzilla.

https://github.com/Quuxplusone/LLVMBugzillaTest/issues?q=is%3Aopen+is%3Aissue+label%3Aaccepts-invalid

I agree it would make sense to apply more labels in Step 3 (e.g. according to the "Product" field).

If you document the mapping somewhere, it would be trivial to add to my script and I could have 10,000 issues regenerated in about 3 hours.

Also needed: the mapping from Bugzilla usernames to GitHub usernames.

2. Attachments are not real – they are just links to bugzilla and will
be obsolete if bugzilla is e.g. down

Right. This is part of the "dumbed-down Load step", i.e. "take the actual data and munge it into the closest possible thing that can be loaded using the public API": GitHub's beta Issues Import API doesn't support adding files to issues. (Also, e.g.,

- forging authorship of comments is impossible using the public API

- for cross-referencing to other issues, I'm currently using links back into the old Bugzilla's show_bug.cgi; but really these links should go to something like https://reviews.llvm.org/PR1234, which would be under our control and could be HTTP-redirected to their corresponding GitHub issues

)

3. Each attachment results in 2 comments, one of each is redundant

Ack. I wrote code to fix this for the very simplest "Created attachment 1234" auto-comments, but had not noticed that sometimes the auto-comment is more complicated.

E.g. https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729#issuecomment-990590574

This wouldn't be hard to fix.

4. CC list is strange, e.g.
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12187 CC's to
"mail.sandbox.de"

That's partly an artifact of my lack of mapping from Bugzilla usernames to GitHub usernames (the relevant codepath is just a stub), but also something super weird...!

The email addresses from Bugzilla show up in the XML when viewed in Chrome, but not when fetched in Python or curl.

https://stackoverflow.com/questions/70307092/fetching-xml-from-bugzilla-gives-different-results-with-curl-versus-browser

5. All text is in verbatim boxes (e.g.
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/12092) making
it almost impossible to read due to horizontal scroll

The monospace font is intentional on my part, and important even for https://bugs.llvm.org/show_bug.cgi?id=12092 because a big part of the initial comment is indented C++ code. However, I should implement linebreaking: looks like Bugzilla's website layout breaks around 84 characters, and 80 would be perfectly sensible.

Will fix.

6. There are no "depends on" / "blocks on" references (see
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10900)

Ack.

(This is an artifact of my not knowing that the <dependson> element exists. I should have thought to grep and get a list of all the tags that exist in the XML (that is, in the 51567 "xml/*.xml" files produced during Step 1 in the README), to make sure I understood each of them.)

Will fix, at least for the <dependson> tag.

7. There are no cross-references in case of duplicates (see
https://github.com/Quuxplusone/LLVMBugzillaTest/issues/10729)

Ack.

I thought about mangling the duplicate-bug-number into the "Status" line, like Bugzilla does, but decided not to worry about it in the interest of being-done-by-my-self-imposed-EOW-deadline. :)

There's also a harder issue on bug 10729's final comment, where it says

    Yes, apparently I did. Sorry. I'll attach the logs to that issue instead :)
    *** This bug has been marked as a duplicate of bug 9072 ***

where we want that to be both monospaced and hyperlinked — Markdown can't do hyperlinks inside triple-backticks.

The obvious solution is for the script to special-case Bugzilla's auto-comment and pull it outside of the triple-backticked section.

I should grep for all the different Bugzilla auto-comments too. It looks like there are only three possible auto-comments:

$ grep -hor '[*][*][*] .* [*][*][*]' xml/ > out

$ sed 's/[0-9][0-9]*/9/g' out | sort | uniq -c | sort -rn | eyeballing-by-arthur

2563 *** Bug 9 has been marked as a duplicate of this bug. ***

2504 *** This bug has been marked as a duplicate of bug 9 ***

76 *** This bug has been marked as a duplicate of 9 ***

...

It's pretty straightforward to come to the present state and there are
tools for this, we've been at this point in 2019 (see e.g.
https://github.com/asl/llvm-bugzilla/issues as it was outlined in LLVM
DevMtg 2019 roundtable discussion). The non-trivial part is to
workaround various GitHub issues which are also different depending on
API used.

Nice! Yeah, steps 1, 2, 3 (Export and Transform) are possible for literally anyone to do — and also relatively simple, in that I wrote those scripts in a single week of evenings. :) Step 4, the Load step, is equally simple but requires special magic powers that only a GitHub SRE would have — e.g., forging comment authorship. If I were doing this migration for real, I'd ask what API they plan to use, and ask them to test it out on a blank repo in exactly the same way that you and I have now both done with

https://github.com/asl/llvm-bugzilla/issues

and

https://github.com/Quuxplusone/LLVMBugzillaTest/issues

That is, write the script that's going to be used, and then test it out, repeatedly, until it works perfectly... and then test once more, just for safety's sake, before doing it live.

The mantras here are

- "With enough eyeballs, all bugs are shallow" (we're both identified deficiencies in each other's scripts, and can now fix them!)

- "Measure twice, cut once" (rehearse the entire deploy plan in blank repos until it's perfect, then do only the perfect version live)

(Also, ideally, someone involved with LLVM would just get hired at GitHub, to cut down on round-trip time. But I'm not volunteering. ;))

–Arthur

Arthur O'Dwyer via llvm-dev

unread,

Dec 11, 2021, 6:20:46 PM12/11/21

to Anton Korobeynikov, llvm-dev, polly-dev, clang developer list

Just to close the loop on this subthread: I did eventually fix the issues identified below, updated my (by-now-obviously-too-late-to-help-anyone) deploy plan, and blogged a useless little post-mortem. :) https://quuxplusone.github.io/blog/2021/12/11/llvm-bugzilla-fan-edit/

Anyway, long live the official llvm/llvm-project GitHub! I'm happy to be once more able to file Clang bugs. :)

–Arthur

Reply all

Reply to author

Forward