Status of Bugzilla Migration

1 view
Skip to first unread message

Anton Korobeynikov

unread,
Nov 30, 2021, 6:49:19 AM11/30/21
to llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List
Dear All,

I would like to give you the status update of our bugzilla migration.
First of all, unfortunately it was affected by the major GitHub outage
that happened over the weekend and our schedule was delayed a bit as
we have to wait until the outage ended, have to restart the import and
also GitHub support engineers that watched our migration were moved to
fix the issues of the service.

As of now we're having:
- All bugzilla content is imported to the bugzilla archive GitHub
repo: https://github.com/llvm/llvm-bugzilla-archive/issues
- All issue numbers are preserved wrt this repo. This means that
Bugzilla PR 1234 could be found at
https://github.com/llvm/llvm-bugzilla-archive/issues/1234. This repo
will be the basis of our stable numbering of the issue id's.
- The llvm.org/PR redirect is installed. It will redirect to
bugzilla archive for all PRs with ids <= 52601 and to the main
llvm-project for everything else
- The llvm.org/bz redirect is installed. Links like llvm.org/bz1234
will forward to the read-only bugzilla instance

The last remaining step is the transfer of the issues from bugzilla
archive to the main llvm-project repo. This way we will have all the
issues in the single repo (albeit with different numbers, this is kind
of compromise we have to accept as we can ensure the preservation of
the issue numbers only on the empty repo and we cannot lose the
releases, etc. on the existing repo, however, the URLs relative to the
archive repo will allow us to keep the original bugzilla issue numbers
and also, github will rewrite all references inside the comments for
us).

This step is in some sense final - we cannot undo it since we'll start
doing this. Also, while the import of issues to the empty repo does
not trigger any notifications, the transfer will trigger all kinds of
notifications. We certainly do not want that everyone who contributed
to any issue in the past would receive multiple notifications. For
some active community members the volume of notifications would be
excessive – thousands of emails.

GitHub has a way to disable notifications, however, they found some
issues with it pretty recently. We are waiting for them to resolve it
and expect the final migration to happen within the next 48 hours.

--
With best regards, Anton Korobeynikov

Mehdi AMINI

unread,
Nov 30, 2021, 1:11:54 PM11/30/21
to Anton Korobeynikov, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List
Thanks for the update!

Something isn't clear to me here and I'm concerned about this setup: how are we gonna go from the "archived number" to the "real" monorepo issues?
I don't even understand the need for the archive repository right now: why can't we just redirect  llvm.org/PR to the monorepo issue?

I hope we don't lose this mapping, that seems quite critical to me.

Thanks,

-- 
Mehdi

 

This step is in some sense final - we cannot undo it since we'll start
doing this. Also, while the import of issues to the empty repo does
not trigger any notifications, the transfer will trigger all kinds of
notifications. We certainly do not want that everyone who contributed
to any issue in the past would receive multiple notifications. For
some active community members the volume of notifications would be
excessive – thousands of emails.

GitHub has a way to disable notifications, however, they found some
issues with it pretty recently. We are waiting for them to resolve it
and expect the final migration to happen within the next 48 hours.

--
With best regards, Anton Korobeynikov
_______________________________________________
flang-dev mailing list
flan...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/flang-dev

Anton Korobeynikov

unread,
Nov 30, 2021, 1:23:02 PM11/30/21
to Mehdi AMINI, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List
Hi Mehid,

> Something isn't clear to me here and I'm concerned about this setup: how are we gonna go from the "archived number" to the "real" monorepo issues?
> I don't even understand the need for the archive repository right now: why can't we just redirect llvm.org/PR to the monorepo issue?
>
> I hope we don't lose this mapping, that seems quite critical to me.
Sorry for not being 100% clear as this is a bit tricky :)

After the transfer from archive repo to the llvm-project, the issues
will be renumbered and there is no way to control this (sick!).
However, GitHub will maintain the mapping for us and enable the
redirect.

E.g. assume that the original bugzilla issue 12345 will be transferred
to llvm/llvm-project/issues/45678. It would seem that the original
number is lost, however, it will not: llvm/llvm-bugzilla-archive/12345
will redirect to llvm/llvm-project/issues/45678. So, llvm.org/PRNNNN
will redirect to the final monorepo issue regardless whether it was
from bugzilla or the "new" github-only variant (it redirects to
archive for "low" NNNNN numbers and to monorepo for everything else).
Also, during the transfer all github-internal issue links will be
updated, so all references to #12345 will be rewritten to the new
monorepo issue id.

Hope this is a bit more clear now.
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University

Mehdi AMINI

unread,
Nov 30, 2021, 1:26:13 PM11/30/21
to Anton Korobeynikov, llvm-dev, polly-dev, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), Flang Development List
Thanks! Very clear and addressing my concern perfectly :)

Anton Korobeynikov

unread,
Dec 2, 2021, 2:36:44 AM12/2/21
to llvm-dev, clang developer list, Flang Development List, polly-dev, openmp-dev (openmp-dev@lists.llvm.org)
Dear All,

Some of you who are checking the migration notes
(https://bit.ly/3HVjr7a) might already have noticed that we're stuck
again. Let me provide more information about what is going on now and
what the plans are.

As a reminder, previously we imported all issues in the archive repo
and essentially the very last step remained: migration to the live
llvm-project repo. This step is crucial and one-way, once started we
cannot undo the steps we'd made. We also have to rely on GitHub here
as we cannot do it via rate-limited API calls

During the final checks two issues were revealed:
- Notifications are still sent in some cases
- Migration sets the last modification date of the closed issues (it
looks like it was implemented like "re-open issue, transfer and close
again"). As a result, all closed issues essentially got sorted
chronologically before the real open ones.

These issues were fixed at GitHub side and we proceeded with
re-checking everything. It turned out that another issue appeared: the
labels were silently lost and the migrated issues were completely
labelless, despite being annotated by 140+ labels we had originally.
For now this is a show-stopper issue. The issue was reported and
acknowledged by GitHub, however, not ETA was provided.

Our current options are:
1. Abandon the migration
2. Wait until the issue is resolved on GitHub side
3. Try to find alternative solutions to workaround GitHub issue

2. is essentially not an option. I am proposing to abandon the
migration and unlock the bugzilla if the solution will not be found by
the end of this week.

The only alternative I'm seeing is to apply the labels post-migration.
There are important downsides:
- This has to be done via GitHub API and we're rate limited to ~5000
requests per hour, so this means that the labelling will take ~20
hours. I was told that there is no way for us to have the API rate
limit increased.
- This might trigger notifications. My quick check via web ui does
not, but I cannot be 100% with anything here
- (the most important) This will screw the "last modified" timestamp
as label setting is an event that is recorded in the issue. There is
no way to set some "old" timestamp, it is assigned by GitHub
automatically.

For now I'm testing the script for 3. and waiting for any news from GitHub.

I will keep you updated.

--
With best regards, Anton Korobeynikov
On behalf of LLVM Foundation

MyDeveloper Day

unread,
Dec 2, 2021, 3:18:30 AM12/2/21
to Anton Korobeynikov, llvm-dev, clang developer list, Flang Development List, polly-dev, openmp-dev (openmp-dev@lists.llvm.org)
What bad stuff happens if you just open up  https://github.com/llvm/llvm-bugzilla-archive/issues (even if you then make another historical archive later) to use as the bug tracker until you and github have ironed out all the migration from one project to another project issues? rather than going all the way back to bugzillia which is then going to impose some other multi day migration at a later point.

In my mind I've already divorced from bugzilla, I'm ready to move on with my life with github!

MyDeveloperDay

_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Anton Korobeynikov

unread,
Dec 2, 2021, 4:07:07 AM12/2/21
to MyDeveloper Day, llvm-dev, clang developer list, Flang Development List, polly-dev, openmp-dev (openmp-dev@lists.llvm.org)
Well, this is another alternative, yes, but it's up to the community to decide.

Aaron Ballman

unread,
Dec 2, 2021, 9:33:39 AM12/2/21
to Anton Korobeynikov, llvm-dev, clang developer list, Flang Development List, polly-dev, openmp-dev (openmp-dev@lists.llvm.org)
Thank you for all of the hard work you've put into this so far, and
thank you for the detailed update on the unfortunate place we're at.

When you say "abandon the migration", do you mean temporarily or
permanently? I'd be strongly in favor of temporarily abandoning the
migration so that we can continue to do useful work against bugs while
we sort this out. If you're thinking of abandoning permanently, I
could be in support of that as well, but I'd want to know what our
aspirational goals are for the bug database long-term before giving my
support.

~Aaron

>
> The only alternative I'm seeing is to apply the labels post-migration.
> There are important downsides:
> - This has to be done via GitHub API and we're rate limited to ~5000
> requests per hour, so this means that the labelling will take ~20
> hours. I was told that there is no way for us to have the API rate
> limit increased.
> - This might trigger notifications. My quick check via web ui does
> not, but I cannot be 100% with anything here
> - (the most important) This will screw the "last modified" timestamp
> as label setting is an event that is recorded in the issue. There is
> no way to set some "old" timestamp, it is assigned by GitHub
> automatically.
>
> For now I'm testing the script for 3. and waiting for any news from GitHub.
>
> I will keep you updated.
>
> --
> With best regards, Anton Korobeynikov
> On behalf of LLVM Foundation

Philip Reames

unread,
Dec 2, 2021, 11:05:41 AM12/2/21
to MyDeveloper Day, Anton Korobeynikov, llvm-dev, Flang Development List, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), polly-dev

This thought had occurred to me as well.  Using a separate repo for bug tracking seems reasonable as an intermediate step.  Unless there's a complexity here I'm missing, I'd probably vote for that in favor of going all the way back to bugzilla.

Philip

p.s. Anton, thank you for the update and all the work that has gone into this. 

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

paul.r...@sony.com

unread,
Dec 2, 2021, 11:21:13 AM12/2/21
to list...@philipreames.com, mydevel...@gmail.com, an...@korobeynikov.info, llvm...@lists.llvm.org, flan...@lists.llvm.org, openm...@lists.llvm.org, poll...@googlegroups.com

If there are new issues created directly in llvm-bugzilla-archive, and they have cross-references to other (new or old) issues, we’d want to make sure they get fixed up along with the originally-from-bugzilla references. (Recall that all issues will be renumbered when they move to llvm-project.)

 

It would be mildly annoying to have the bug repo move twice instead of once, but if the reference re-writing works correctly then I don’t have any real objection.

--paulr

Reid Kleckner

unread,
Dec 2, 2021, 12:24:05 PM12/2/21
to Anton Korobeynikov, llvm-dev, clang developer list, Flang Development List, polly-dev, openmp-dev (openmp-dev@lists.llvm.org)
Thanks for all the work and info, Anton. Based on your writeup, I think option 3 is best.

Losing the last update timestamps on all the issues is unfortunate, but I think it's OK. We already know the migration doesn't have perfect fidelity, and that's OK.

I also think we can wait a day to get labels on the migrated issues. I think my bigger concern with the rate-limited APIs is that it's hard to test scripts that take 20 hours to run, so there is some risk that the label migration script fails or mislabels issues. Still, I would just hope for the best here. It's not critical to get labels on old issues on day 1. Maybe one way to deal with this is to apply labels to recently modified issues first.

Notifications are concerning, but your test via the web UI gives me enough confidence to want to push forward.

Finally, you are sort of the one in the hot seat here doing the work, so I favor any solution that takes the pressure off you. :) That means either going back to bugzilla temporarily, or moving forward with the migration and fixing the labels as best we can over time.

Anton Korobeynikov

unread,
Dec 2, 2021, 12:26:56 PM12/2/21
to Aaron Ballman, llvm-dev, clang developer list, Flang Development List, polly-dev, openmp-dev (openmp-dev@lists.llvm.org)
> When you say "abandon the migration", do you mean temporarily or
> permanently? I'd be strongly in favor of temporarily abandoning the
> migration so that we can continue to do useful work against bugs while
> we sort this out. If you're thinking of abandoning permanently, I
> could be in support of that as well, but I'd want to know what our
> aspirational goals are for the bug database long-term before giving my
> support.
Well, here is the key point: all "temporary" solutions (e.g. temporary
return to bugzilla or use the current archive) rely on the assumption
that the issues we're facing will be fixed one day. However, here we
are depending on GitHub that might have their own priorities / plans
and we do not have any ways to influence their decisions besides
sharing some concerns and asking questions. So, I'd personally not go
this way until we will know for how long this interim solution will be
in use. Otherwise it could be in such a state forever, e.g. if GitHub
decides that they will keep the status quo.

As for "permanent abandoning" – I think in such a situation we'd need
to take one step back and seriously reconsider all the infrastructure
we're having. Maybe even checking what are the alternatives
platform-wise.

--
With best regards, Anton Korobeynikov

Anton Korobeynikov

unread,
Dec 2, 2021, 12:28:10 PM12/2/21
to paul.r...@sony.com, list...@philipreames.com, mydevel...@gmail.com, llvm...@lists.llvm.org, flan...@lists.llvm.org, openm...@lists.llvm.org, poll...@googlegroups.com
Paul,

Yes, during the migration all references should be rewritten. At least
this is how it is documented, I'm not 100% sure now this is indeed so
;)

Anton Korobeynikov

unread,
Dec 2, 2021, 12:34:37 PM12/2/21
to Reid Kleckner, llvm-dev, clang developer list, Flang Development List, polly-dev, openmp-dev (openmp-dev@lists.llvm.org)
Reid,

> I also think we can wait a day to get labels on the migrated issues. I think my bigger concern with the rate-limited APIs is that it's hard to test scripts that take 20 hours to run, so there is some risk that the label migration script fails or mislabels issues. Still, I would just hope for the best here. It's not critical to get labels on old issues on day 1. Maybe one way to deal with this is to apply labels to recently modified issues first.
I think we need to apply labels in chronological order. E.g. first
apply the labels to the issues that were last modified far away from
now. In such cases we at least will have the sorting in the proper
order. I definitely have the creation time of each issue, but not sure
about the last modification timestamp (there is a timestamp when the
issue is closed, so at least for some issues we do have such timestamp
at hand).

Another thing that I need to check is how everything works after the
migration. I do have labels for each issue in the archive. However,
after the migration it won't be there anymore. So, an additional
question is whether API requests will be redirected or I will need to
build the mapping first. Given the rate limit of 5k requests per hour,
the complete sweep over all issues will take 11 hours.


--
With best regards, Anton Korobeynikov

Jeff Miller

unread,
Dec 2, 2021, 12:47:10 PM12/2/21
to an...@korobeynikov.info, llvm...@lists.llvm.org, cfe...@lists.llvm.org, flan...@lists.llvm.org, poll...@googlegroups.com, openm...@lists.llvm.org

>- This has to be done via GitHub API and we're rate limited to ~5000
requests per hour, so this means that the labelling will take ~20
hours. I was told that there is no way for us to have the API rate
limit increased.

This 5000 request per hour limit, is that per repo or per access token? Could we potentially make a pool access token from multiple github accounts to sidestep the issue? Say 20 tokens to do the migration in 1 hour?

--Jeff Miller
-------- Original Message --------

Arthur O'Dwyer

unread,
Dec 2, 2021, 12:54:49 PM12/2/21
to an...@korobeynikov.info, llvm-dev, Clang Dev, flan...@lists.llvm.org, poll...@googlegroups.com, openmp-dev (openmp-dev@lists.llvm.org)
On Thu, Dec 2, 2021 at 12:47 PM Jeff Miller via cfe-dev <cfe...@lists.llvm.org> wrote:

>- This has to be done via GitHub API and we're rate limited to ~5000
requests per hour, so this means that the labelling will take ~20
hours. I was told that there is no way for us to have the API rate
limit increased.

This 5000 request per hour limit, is that per repo or per access token? Could we potentially make a pool access token from multiple github accounts to sidestep the issue? Say 20 tokens to do the migration in 1 hour?

FWIW, "20 hours" or "11 hours" or "three days" is like nothing, compared to what the migration has already been doing. If it only requires taking Bugzilla down for 24 hours to do it, IMO you should just do it already — whatever "it" is.

Also, re timestamps: The choices seem to be
- Wait for GitHub to offer us some way of importing timestamps, then do the migration; or
- Do the migration, then wait for GitHub to offer us some way of retroactively changing some of the timestamps.
Neither is perfect, but the latter is clearly better for LLVM's purposes.

–Arthur

Anton Korobeynikov

unread,
Dec 2, 2021, 2:19:22 PM12/2/21
to Arthur O'Dwyer, llvm-dev, Clang Dev, flan...@lists.llvm.org, poll...@googlegroups.com, openmp-dev (openmp-dev@lists.llvm.org)
Hello Arthur,

> FWIW, "20 hours" or "11 hours" or "three days" is like nothing, compared to what the migration has already been doing. If it only requires taking Bugzilla down for 24 hours to do it, IMO you should just do it already — whatever "it" is.
Well, it's for single sweep. So, if we'd need to do this, say 5 times,
then everything starts to be very interesting.

> Also, re timestamps: The choices seem to be
> - Wait for GitHub to offer us some way of importing timestamps, then do the migration; or
> - Do the migration, then wait for GitHub to offer us some way of retroactively changing some of the timestamps.
> Neither is perfect, but the latter is clearly better for LLVM's purposes.
Not the timestamps, the labels. And note that there is nothing in
general that could be done in GitHub retroactively. At least for us as
I've been told. If this would be possible we'd simply import into an
empty repo, add git repo, add releases (dating them into the past) and
we're done...

--
With best regards, Anton Korobeynikov

Geoffrey Martin-Noble

unread,
Dec 2, 2021, 2:52:10 PM12/2/21
to Anton Korobeynikov, Arthur O'Dwyer, llvm-dev, flan...@lists.llvm.org, Clang Dev, openmp-dev (openmp-dev@lists.llvm.org), poll...@googlegroups.com
From my experience adding a label to an issue does not trigger any notifications (though it can trigger web hooks), so I think that shouldn't cause problems. Also agree that being able to retroactively edit edited time on GitHub is almost certainly not going to happen, whereas GitHub fixing their repo migration to preserve labels seems likely. One question, Anton, did you create the labels in the target repo before trying the migration? Just a vague hypothesis that perhaps it might preserve them if the labels already exist, but drop them if they don't (pure speculation, but plausible enough to be worth testing out IMO).

Anton Korobeynikov

unread,
Dec 2, 2021, 2:55:07 PM12/2/21
to Geoffrey Martin-Noble, Arthur O'Dwyer, llvm-dev, flan...@lists.llvm.org, Clang Dev, openmp-dev (openmp-dev@lists.llvm.org), poll...@googlegroups.com
The labels do exist. I got confirmation that they drop all labels.

Mehdi AMINI

unread,
Dec 2, 2021, 9:08:59 PM12/2/21
to Anton Korobeynikov, llvm-dev, clang developer list, Flang Development List, polly-dev, openmp-dev (openmp-dev@lists.llvm.org)
Seems reasonable to me!
 

The only alternative I'm seeing is to apply the labels post-migration.
There are important downsides:
  - This has to be done via GitHub API and we're rate limited to ~5000
requests per hour, so this means that the labelling will take ~20
hours. I was told that there is no way for us to have the API rate
limit increased.
  - This might trigger notifications. My quick check via web ui does
not, but I cannot be 100% with anything here
  - (the most important) This will screw the "last modified" timestamp
as label setting is an event that is recorded in the issue. There is
no way to set some "old" timestamp, it is assigned by GitHub
automatically.

 For now I'm testing the script for 3. and waiting for any news from GitHub.

Thanks for the work :)
I hope you can get your script working! 
Maybe if you can share this on a public repo, others here can help to do small test runs in private forks and cross-validate or help fix issues with it?

-- 
Mehdi

 

I will keep you updated.

--
With best regards, Anton Korobeynikov
On behalf of LLVM Foundation

Anton Korobeynikov

unread,
Dec 3, 2021, 3:57:36 AM12/3/21
to Mehdi AMINI, llvm-dev, clang developer list, Flang Development List, polly-dev, openmp-dev (openmp-dev@lists.llvm.org)
Mehdi,

> Maybe if you can share this on a public repo, others here can help to do small test runs in private forks and cross-validate or help fix issues with it?
I certainly could do this, but I doubt this will be useful as the
input will be a local bugzilla dump..

--
With best regards, Anton Korobeynikov

Arthur O'Dwyer

unread,
Dec 3, 2021, 1:24:36 PM12/3/21
to Anton Korobeynikov, Mehdi AMINI, llvm-dev, Flang Development List, clang developer list, openmp-dev (openmp-dev@lists.llvm.org), polly-dev
On Fri, Dec 3, 2021 at 3:57 AM Anton Korobeynikov via cfe-dev <cfe...@lists.llvm.org> wrote:
Mehdi wrote:
> Maybe if you can share this on a public repo, others here can help to do small test runs in private forks and cross-validate or help fix issues with it?

I certainly could do this, but I doubt this will be useful as the
input will be a local bugzilla dump..

IMHO it would be a really good idea to do this!
If the "bugzilla dump" is in some reasonably sane format such as JSON, then people could even hand-craft sample input scenarios to try out the import script on.
There are basically two devops operations here:
- Export a Bugzilla instance into (e.g. JSON)
- Load (e.g. JSON) into a GitHub instance
The ultimate migration will do the first step and then the second, (A) on the official LLVM Bugzilla and the official LLVM GitHub, (B) during a single atomic period where both are protected against tampering by random users.
But before then, it would certainly be easy to test the second step on people's own personal GitHub instances. And I would have expected Thanksgiving weekend's aborted migration to have completed the first step and produced an (e.g. JSON) data file as a side effect, so people would even have some sample data to try out. (Of course they'd want to use only a subset of it, because the whole (e.g. JSON) data file is probably on the order of (50,000 bugs x let's say 100KB per bug) ~= 5GB of data.)

IIUC, none of the data being exported from Bugzilla is "private" in any sense, so there's no particular concern with publishing the (e.g. JSON) data.

It occurs to me that it would also be a really really good idea to have a script that can compare a Bugzilla against a GitHub and verify that they contain the same data, so that we can know whether the migration succeeded. That script can also be published and tested ahead of time.

–Arthur
Reply all
Reply to author
Forward
0 new messages