Issue 14631 in skia: GitHub -> Gitiles Mirroring causes occasional failures on primary branch due to missing triage actions from landed PRs

1 view
Skip to first unread message

kjlub… via monorail

unread,
Jul 21, 2023, 10:03:09 AM7/21/23
to bu...@skia.org
Updates:
Summary: GitHub -> Gitiles Mirroring causes occasional failures on primary branch due to missing triage actions from landed PRs

Comment #2 on issue 14631 by kjlu...@google.com: GitHub -> Gitiles Mirroring causes occasional failures on primary branch due to missing triage actions from landed PRs
https://bugs.chromium.org/p/skia/issues/detail?id=14631#c2

Full notes: https://docs.google.com/document/d/1haK6mEL7fYtx1k0AZf6DuTeRDaM5RTFFdMEIYo2aY-I/edit?usp=sharing

Events on Jul 14 2023
03:36:XX ET Commit feab8545 (PR #130414) lands
03:46:XX ET Commit a65e73948 lands (flutter engine -> flutter)
05:32:46 ET Job / Linux framework_tests_libraries / 12141 starts
05:49:14 ET Commit feab8545 processed by gitilesfollower
Up until this time, e90f980f40 was reported to be the latest commit. Then suddenly 4 commits were seen, up through a65e73948
Regular heart beats were seen, so it's not like the process froze
***This must mean the gitiles mirror that syncs GitHub flutter -> gitiles was running behind.***
05:50:14 ET Job 1241 fails
10:15:XX ET Commit b8fa92338 lands
10:21:XX ET Commit b8fa92338 processed by gitilesfollower



Gold as it currently works does not poll GitHub directly for commit data; instead it polls a Google-hosted mirror https://chromium.googlesource.com/external/github.com/flutter/flutter [1].

The mirroring latency is normally is not an issue [citation needed]. However on July 14th, the mirror got 2+ hours behind, which means the post-submit tests that kicked off after the commit landed did not know the digests had been triaged.

Possible courses of action:
- Do nothing. Trust in the Gitiles mirror.
- Follow up with Gitiles mirror owners to see if any configuration can be fixed or if the SLA is acceptable
- Write a GitHub follower that polls data directly from GitHub about 1/minute and use that instead of the mirror.

[1] Why? GitHub support was added to Gold later, and an effort was made to re-use as much as possible. Since there was already a way to scrape these Gitiles mirrors and there already was a flutter mirror, it seemed like the easy choice at the time. Also, when onboarding a project and loading many commits, using the mirror does not cause quota or rate-limiting issues.

--
You received this message because:
1. The project was configured to send all issue notifications to this address

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

kjlub… via monorail

unread,
Jul 21, 2023, 10:05:05 AM7/21/23
to bu...@skia.org

Comment #3 on issue 14631 by kjlu...@google.com: GitHub -> Gitiles Mirroring causes occasional failures on primary branch due to missing triage actions from landed PRs
https://bugs.chromium.org/p/skia/issues/detail?id=14631#c3

To clarify a point made earlier, gitilesfollower does indeed update the triage time to be as of the time the commit landed (not when the commit was processed or when the triage was made pre-commit)

https://github.com/google/skia-buildbot/blob/d729880665ff36a3fc1b2d7ad348f79ca1f9ecfb/golden/cmd/gitilesfollower/gitilesfollower.go#L472-L476

kjlub… via monorail

unread,
Jul 21, 2023, 1:50:05 PM7/21/23
to bu...@skia.org

Comment #4 on issue 14631 by kjlu...@google.com: GitHub -> Gitiles Mirroring causes occasional failures on primary branch due to missing triage actions from landed PRs
https://bugs.chromium.org/p/skia/issues/detail?id=14631#c4

I asked the oncall of "chrome-git-admins" about it and they pointed me to some internal documentation which said "Using this method, replication delay [of the mirror] can be arbitrarily long by default."

I am asking to see if we can make this a well-specified, finite amount, say 10 minutes.

kjlub… via monorail

unread,
Jul 21, 2023, 3:51:50 PM7/21/23
to bu...@skia.org

Comment #5 on issue 14631 by kjlu...@google.com: GitHub -> Gitiles Mirroring causes occasional failures on primary branch due to missing triage actions from landed PRs
https://bugs.chromium.org/p/skia/issues/detail?id=14631#c5

A git admin ran a command that we think makes the sync time 10 minutes. I'll check on Monday to see how that played out in practice.

kjlub… via monorail

unread,
Jul 24, 2023, 8:49:33 AM7/24/23
to bu...@skia.org

Comment #6 on issue 14631 by kjlu...@google.com: GitHub -> Gitiles Mirroring causes occasional failures on primary branch due to missing triage actions from landed PRs
https://bugs.chromium.org/p/skia/issues/detail?id=14631#c6

Still seeing sync issues and following up with the git admin.

It looks like while new commits are usually synced within 10 minutes, there were at least 2 times over the weekend where it stretched to multiple hours. For example, the commit 23d9a6299bfb [1] landed at 2023-07-23 07:59:24 +0000 UTC but wasn't seen by my gitiles polling script (which polls every minute) until 2023-07-23T10:34:14.063420298Z, shortly after a second commit 31e46c5bb77b [2] landed at 2023-07-23 10:28:23 +0000 UTC

[1] https://github.com/flutter/flutter/commit/23d9a6299bfb261aec68a973de61017ccfefd799
[2] https://github.com/flutter/flutter/commit/31e46c5bb77b44b3a3dd62890e5cd4f951b62b56

kjlub… via monorail

unread,
Jul 24, 2023, 10:29:19 AM7/24/23
to bu...@skia.org

Comment #7 on issue 14631 by kjlu...@google.com: GitHub -> Gitiles Mirroring causes occasional failures on primary branch due to missing triage actions from landed PRs
https://bugs.chromium.org/p/skia/issues/detail?id=14631#c7

Thanks to jrn@, I was able to run the correct commands which should make flutter/flutter and flutter/engine sync every 10 minutes (they had been configured for every ~275 minutes, but through a best-effort process would sync faster than that most of the time).

kjlub… via monorail

unread,
Jul 24, 2023, 10:29:49 AM7/24/23
to bu...@skia.org
Updates:
Status: Fixed

Comment #8 on issue 14631 by kjlu...@google.com: GitHub -> Gitiles Mirroring causes occasional failures on primary branch due to missing triage actions from landed PRs
https://bugs.chromium.org/p/skia/issues/detail?id=14631#c8

If we need faster than 10 minutes or the syncing is not actually fixed, we can re-open this issue.

katel… via monorail

unread,
Jul 24, 2023, 2:20:19 PM7/24/23
to bu...@skia.org

Comment #9 on issue 14631 by katel...@google.com: GitHub -> Gitiles Mirroring causes occasional failures on primary branch due to missing triage actions from landed PRs
https://bugs.chromium.org/p/skia/issues/detail?id=14631#c9

Thank you!! There is already a minimum 5 minute hold before a change that has triaged images can actually land due to the cron job timing that checks on it, 10 minutes will hopefully be just fine. Thanks again.
Reply all
Reply to author
Forward
0 new messages