TOMORROW 0930PT http://r9y.dev/discuss

25 views
Skip to first unread message

Steve McGhee

unread,
Aug 14, 2023, 1:11:21 PM8/14/23
to Reliability discussion group
I’m fresh off a long vacation and — what do you know — tomorrow is a reliability-discuss! I love these. We talk about reliability (r9y, sorry), resilience, SRE, DevOps, Platforms, our feels, oncall, toil, and maybe a little Taylor Swift.

[r9y.dev/discuss] Let's talk about Reliability Engineering
Tuesday, August 15 · 09:30 – 10:30
Time zone: America/Los_Angeles
Google Meet joining info
Video call link: https://meet.google.com/kdk-hnmf-yjp
Or dial: ‪(US) +1 609-491-2429‬ PIN: ‪478 299 241‬#
More phone numbers: https://tel.meet/kdk-hnmf-yjp?pin=4362527073963

See you tomorrow!

Steve McGhee

unread,
Aug 16, 2023, 5:00:38 PM8/16/23
to Reliability discussion group

Thank you to everyone who joined today! A summary of the discussion is below.


See you next time! – Tuesday 19th Sept


r9y.dev/discuss


r9y-discuss Date and Time: 15 August 2023 17:31 (UTC+00:00)

Topics discussed:

How has r9y changed since 2020? Any predictions that came true? Surprises?

  • hot take "nothing has really improved" - apart from talking

  • another take: awareness (words) is up, if not real understanding, action.

  • was: Ops with a new name

  • now: more like the book!

  • eg SLOs can be useful in theory, but sometimes not actually/directly useful ! is there a discussion happening around this?

  • are SLOs just a proxy/abstraction for "measurement" ?



what reliability/SRE conferences are you looking forward to next year?


any experience with r9y in air gap envs?

  • flash drives and sneakernet!

  • art of slo, distributed image server - links go here.

  • removal of toil !

  • To test airgapped environments for reliability I've heard the US gov't has experimented with https://litmuschaos.io/ but this is only for testing not CI/CD.

  • Target Corp built a nice platform called TAP. This is an old presentation but goodie: https://www.youtube.com/watch?v=cnHfK4MZA2Y&t=1260s They now use TAP to deploy/release to multiple cloud providers, onpremises, and edge....all while the developer doesn't have to think much about the workload the app is going to


Do you have a Terralith? How did you get there, and what now?

  • yes! it happens.

  • maybe this is due to the tool (tf).

  • "pink unicorn" thinking - project an ideal world

  • tools need to do the user validation, is this actually solving the problem.

  • can come from team silos. infra team has one pipeline for all things.

  • platform teams can suffer similarly, release platform too slowly, with big bang. ouch. 

  • as a solution? SOA - provide hooks to service owning developers. 

  • does "my terraform" for my service really need to be mine? esp if i dont understand it. thus

  • goes back to a central team who creates a 'lith

  • ETOOMUCHSTUFF

  • why reduce understanding of production? controls / knowledge? if it needs so much custom knowledge, adapt the tools. make it less hostile. reduce need for an interface team.

  • instead of ops helping dev make good choices, a bad model might be "here go play with this". not abstracted enough? just a weird language.

  • functions > modules.

  • keptn.sh



"Is platform engineering becoming part of/intersecting with reliability roles/topics?" Is Platform Reliability Engineering a thing now?

  • do all org desires need to be a role?

Reply all
Reply to author
Forward
0 new messages