Let's talk about reliability -- in 30m!

37 views
Skip to first unread message

Steve McGhee

unread,
Jan 16, 2024, 12:07:22 PMJan 16
to Reliability discussion group
TODAY at 0930PT r9y.dev/discuss

We talk about reliability (r9y, sorry), resilience, SRE, DevOps, Platforms, our feels, oncall, toil, etc.
See you soon!

[r9y.dev/discuss] Let's talk about Reliability Engineering
Tuesday, January 16 · 09:30 – 10:30
Time zone: America/Los_Angeles
Google Meet joining info
Video call link: https://meet.google.com/kdk-hnmf-yjp
Or dial: ‪(US) +1 609-491-2429‬ PIN: ‪478 299 241‬#
More phone numbers: https://tel.meet/kdk-hnmf-yjp?pin=4362527073963

Steve McGhee

unread,
Jan 16, 2024, 5:00:05 PMJan 16
to Reliability discussion group

Thank you to everyone who joined today! A summary of today's discussion is below.


See you next time! – Tuesday Feb 20th !




Date and Time: 16 January 2024

Topics discussed:

Oncall Handovers
- ref Chad Todd's talk at SREcon23
- https://www.usenix.org/conference/srecon23americas/presentation/todd
- handoff meeting --> create tasks to improve system + severity/urgency (eg alert nonactionable)

Test to Production Environments. To have reliable code delivery, does anyone leverage a production alpha environment?
- Option: Preview mode for customers to opt-in, try new things.
- we use our own CIO / IT organization as Client-0, before we open to clients. For some critical services, we ask any/all employees to hammer the system in a defined window

Hiring market right now?

What AI tools could we use to improve reliability?
- idea: start with the data! what do you have already?
- Similar Incidents, commands, culprits. Risk Assessments of a given release.
- https://www.usenix.org/conference/srecon19emea/presentation/underwood
- Anomoly Detection
- is this already AI, or simply statistics ?
- Story Generation: it might be right? at least a good place to start
Reply all
Reply to author
Forward
0 new messages