I would like to give you the status update of our bugzilla migration.
First of all, unfortunately it was affected by the major GitHub outage
that happened over the weekend and our schedule was delayed a bit as
we have to wait until the outage ended, have to restart the import and
also GitHub support engineers that watched our migration were moved to
fix the issues of the service.
As of now we're having:
- All bugzilla content is imported to the bugzilla archive GitHub
repo: https://github.com/llvm/llvm-bugzilla-archive/issues
- All issue numbers are preserved wrt this repo. This means that
Bugzilla PR 1234 could be found at
https://github.com/llvm/llvm-bugzilla-archive/issues/1234. This repo
will be the basis of our stable numbering of the issue id's.
- The llvm.org/PR redirect is installed. It will redirect to
bugzilla archive for all PRs with ids <= 52601 and to the main
llvm-project for everything else
- The llvm.org/bz redirect is installed. Links like llvm.org/bz1234
will forward to the read-only bugzilla instance
The last remaining step is the transfer of the issues from bugzilla
archive to the main llvm-project repo. This way we will have all the
issues in the single repo (albeit with different numbers, this is kind
of compromise we have to accept as we can ensure the preservation of
the issue numbers only on the empty repo and we cannot lose the
releases, etc. on the existing repo, however, the URLs relative to the
archive repo will allow us to keep the original bugzilla issue numbers
and also, github will rewrite all references inside the comments for
us).
This step is in some sense final - we cannot undo it since we'll start
doing this. Also, while the import of issues to the empty repo does
not trigger any notifications, the transfer will trigger all kinds of
notifications. We certainly do not want that everyone who contributed
to any issue in the past would receive multiple notifications. For
some active community members the volume of notifications would be
excessive – thousands of emails.
GitHub has a way to disable notifications, however, they found some
issues with it pretty recently. We are waiting for them to resolve it
and expect the final migration to happen within the next 48 hours.
--
With best regards, Anton Korobeynikov
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
This step is in some sense final - we cannot undo it since we'll start
doing this. Also, while the import of issues to the empty repo does
not trigger any notifications, the transfer will trigger all kinds of
notifications. We certainly do not want that everyone who contributed
to any issue in the past would receive multiple notifications. For
some active community members the volume of notifications would be
excessive – thousands of emails.
GitHub has a way to disable notifications, however, they found some
issues with it pretty recently. We are waiting for them to resolve it
and expect the final migration to happen within the next 48 hours.
--
With best regards, Anton Korobeynikov
_______________________________________________
flang-dev mailing list
flan...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/flang-dev
> Something isn't clear to me here and I'm concerned about this setup: how are we gonna go from the "archived number" to the "real" monorepo issues?
> I don't even understand the need for the archive repository right now: why can't we just redirect llvm.org/PR to the monorepo issue?
>
> I hope we don't lose this mapping, that seems quite critical to me.
Sorry for not being 100% clear as this is a bit tricky :)
After the transfer from archive repo to the llvm-project, the issues
will be renumbered and there is no way to control this (sick!).
However, GitHub will maintain the mapping for us and enable the
redirect.
E.g. assume that the original bugzilla issue 12345 will be transferred
to llvm/llvm-project/issues/45678. It would seem that the original
number is lost, however, it will not: llvm/llvm-bugzilla-archive/12345
will redirect to llvm/llvm-project/issues/45678. So, llvm.org/PRNNNN
will redirect to the final monorepo issue regardless whether it was
from bugzilla or the "new" github-only variant (it redirects to
archive for "low" NNNNN numbers and to monorepo for everything else).
Also, during the transfer all github-internal issue links will be
updated, so all references to #12345 will be rewritten to the new
monorepo issue id.
Hope this is a bit more clear now.
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
Hi Mehid,
> Something isn't clear to me here and I'm concerned about this setup: how are we gonna go from the "archived number" to the "real" monorepo issues?
> I don't even understand the need for the archive repository right now: why can't we just redirect llvm.org/PR to the monorepo issue?
>
> I hope we don't lose this mapping, that seems quite critical to me.
Sorry for not being 100% clear as this is a bit tricky :)
After the transfer from archive repo to the llvm-project, the issues
will be renumbered and there is no way to control this (sick!).
However, GitHub will maintain the mapping for us and enable the
redirect.
E.g. assume that the original bugzilla issue 12345 will be transferred
to llvm/llvm-project/issues/45678. It would seem that the original
number is lost, however, it will not: llvm/llvm-bugzilla-archive/12345
will redirect to llvm/llvm-project/issues/45678. So, llvm.org/PRNNNN
will redirect to the final monorepo issue regardless whether it was
from bugzilla or the "new" github-only variant (it redirects to
archive for "low" NNNNN numbers and to monorepo for everything else).
Also, during the transfer all github-internal issue links will be
updated, so all references to #12345 will be rewritten to the new
monorepo issue id.
Hope this is a bit more clear now.
Some of you who are checking the migration notes
(https://bit.ly/3HVjr7a) might already have noticed that we're stuck
again. Let me provide more information about what is going on now and
what the plans are.
As a reminder, previously we imported all issues in the archive repo
and essentially the very last step remained: migration to the live
llvm-project repo. This step is crucial and one-way, once started we
cannot undo the steps we'd made. We also have to rely on GitHub here
as we cannot do it via rate-limited API calls
During the final checks two issues were revealed:
- Notifications are still sent in some cases
- Migration sets the last modification date of the closed issues (it
looks like it was implemented like "re-open issue, transfer and close
again"). As a result, all closed issues essentially got sorted
chronologically before the real open ones.
These issues were fixed at GitHub side and we proceeded with
re-checking everything. It turned out that another issue appeared: the
labels were silently lost and the migrated issues were completely
labelless, despite being annotated by 140+ labels we had originally.
For now this is a show-stopper issue. The issue was reported and
acknowledged by GitHub, however, not ETA was provided.
Our current options are:
1. Abandon the migration
2. Wait until the issue is resolved on GitHub side
3. Try to find alternative solutions to workaround GitHub issue
2. is essentially not an option. I am proposing to abandon the
migration and unlock the bugzilla if the solution will not be found by
the end of this week.
The only alternative I'm seeing is to apply the labels post-migration.
There are important downsides:
- This has to be done via GitHub API and we're rate limited to ~5000
requests per hour, so this means that the labelling will take ~20
hours. I was told that there is no way for us to have the API rate
limit increased.
- This might trigger notifications. My quick check via web ui does
not, but I cannot be 100% with anything here
- (the most important) This will screw the "last modified" timestamp
as label setting is an event that is recorded in the issue. There is
no way to set some "old" timestamp, it is assigned by GitHub
automatically.
For now I'm testing the script for 3. and waiting for any news from GitHub.
I will keep you updated.
--
With best regards, Anton Korobeynikov
On behalf of LLVM Foundation
cfe-dev mailing list
cfe...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Department of Statistical Modelling, Saint Petersburg State University
Thank you for all of the hard work you've put into this so far, and
thank you for the detailed update on the unfortunate place we're at.
When you say "abandon the migration", do you mean temporarily or
permanently? I'd be strongly in favor of temporarily abandoning the
migration so that we can continue to do useful work against bugs while
we sort this out. If you're thinking of abandoning permanently, I
could be in support of that as well, but I'd want to know what our
aspirational goals are for the bug database long-term before giving my
support.
~Aaron
>
> The only alternative I'm seeing is to apply the labels post-migration.
> There are important downsides:
> - This has to be done via GitHub API and we're rate limited to ~5000
> requests per hour, so this means that the labelling will take ~20
> hours. I was told that there is no way for us to have the API rate
> limit increased.
> - This might trigger notifications. My quick check via web ui does
> not, but I cannot be 100% with anything here
> - (the most important) This will screw the "last modified" timestamp
> as label setting is an event that is recorded in the issue. There is
> no way to set some "old" timestamp, it is assigned by GitHub
> automatically.
>
> For now I'm testing the script for 3. and waiting for any news from GitHub.
>
> I will keep you updated.
>
> --
> With best regards, Anton Korobeynikov
> On behalf of LLVM Foundation
> _______________________________________________
> cfe-dev mailing list
> cfe...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
This thought had occurred to me as well. Using a separate repo for bug tracking seems reasonable as an intermediate step. Unless there's a complexity here I'm missing, I'd probably vote for that in favor of going all the way back to bugzilla.
Philip
p.s. Anton, thank you for the update and all the work that has
gone into this.
If there are new issues created directly in llvm-bugzilla-archive, and they have cross-references to other (new or old) issues, we’d want to make sure they get fixed up along with the originally-from-bugzilla references. (Recall that all issues will be renumbered when they move to llvm-project.)
It would be mildly annoying to have the bug repo move twice instead of once, but if the reference re-writing works correctly then I don’t have any real objection.
--paulr
As for "permanent abandoning" – I think in such a situation we'd need
to take one step back and seriously reconsider all the infrastructure
we're having. Maybe even checking what are the alternatives
platform-wise.
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
Yes, during the migration all references should be rewritten. At least
this is how it is documented, I'm not 100% sure now this is indeed so
;)
Department of Statistical Modelling, Saint Petersburg State University
> I also think we can wait a day to get labels on the migrated issues. I think my bigger concern with the rate-limited APIs is that it's hard to test scripts that take 20 hours to run, so there is some risk that the label migration script fails or mislabels issues. Still, I would just hope for the best here. It's not critical to get labels on old issues on day 1. Maybe one way to deal with this is to apply labels to recently modified issues first.
I think we need to apply labels in chronological order. E.g. first
apply the labels to the issues that were last modified far away from
now. In such cases we at least will have the sorting in the proper
order. I definitely have the creation time of each issue, but not sure
about the last modification timestamp (there is a timestamp when the
issue is closed, so at least for some issues we do have such timestamp
at hand).
Another thing that I need to check is how everything works after the
migration. I do have labels for each issue in the archive. However,
after the migration it won't be there anymore. So, an additional
question is whether API requests will be redirected or I will need to
build the mapping first. Given the rate limit of 5k requests per hour,
the complete sweep over all issues will take 11 hours.
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
>- This has to be done via GitHub API and we're rate limited to ~5000
requests per hour, so this means that the labelling will take ~20
hours. I was told that there is no way for us to have the API rate
limit increased.
This 5000 request per hour limit, is that per repo or per access token? Could we potentially make a pool access token from multiple github accounts to sidestep the issue? Say 20 tokens to do the migration in 1 hour?
> FWIW, "20 hours" or "11 hours" or "three days" is like nothing, compared to what the migration has already been doing. If it only requires taking Bugzilla down for 24 hours to do it, IMO you should just do it already — whatever "it" is.
Well, it's for single sweep. So, if we'd need to do this, say 5 times,
then everything starts to be very interesting.
> Also, re timestamps: The choices seem to be
> - Wait for GitHub to offer us some way of importing timestamps, then do the migration; or
> - Do the migration, then wait for GitHub to offer us some way of retroactively changing some of the timestamps.
> Neither is perfect, but the latter is clearly better for LLVM's purposes.
Not the timestamps, the labels. And note that there is nothing in
general that could be done in GitHub retroactively. At least for us as
I've been told. If this would be possible we'd simply import into an
empty repo, add git repo, add releases (dating them into the past) and
we're done...
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
The only alternative I'm seeing is to apply the labels post-migration.
There are important downsides:
- This has to be done via GitHub API and we're rate limited to ~5000
requests per hour, so this means that the labelling will take ~20
hours. I was told that there is no way for us to have the API rate
limit increased.
- This might trigger notifications. My quick check via web ui does
not, but I cannot be 100% with anything here
- (the most important) This will screw the "last modified" timestamp
as label setting is an event that is recorded in the issue. There is
no way to set some "old" timestamp, it is assigned by GitHub
automatically.
For now I'm testing the script for 3. and waiting for any news from GitHub.
I will keep you updated.
--
With best regards, Anton Korobeynikov
On behalf of LLVM Foundation
_______________________________________________
> Maybe if you can share this on a public repo, others here can help to do small test runs in private forks and cross-validate or help fix issues with it?
I certainly could do this, but I doubt this will be useful as the
input will be a local bugzilla dump..
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
> Also, does GitHub's GraphQL API v4 offer higher throughput than their Rest API v3 for such labeling? See https://docs.github.com/en/graphql/overview/resource-limitations#rate-limit .
Indeed, GraphQL API has different limits and some API is available
only via GraphQL endpoints (e.g. majority of migration API is only
there).
> (I work on Visual C++ so I don't know anything special about GitHub - although I haven't used GraphQL to perform modifications, I learned enough JS/GraphQL to perform read-only queries for a status chart. According to my understanding, applying labels through GraphQL mutation should consume far fewer "points" than individual REST calls consume the v3 limits.)
This certainly requires testing, right. Thanks for the suggestion!
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
Mehdi wrote:
> Maybe if you can share this on a public repo, others here can help to do small test runs in private forks and cross-validate or help fix issues with it?
I certainly could do this, but I doubt this will be useful as the
input will be a local bugzilla dump..