Thursday, July 8, 2021

Maxims - for a service in production

tldr; maxims that served well me as an SRE, single version in production, rollout/rollback testing, symptom alerting, SLI, Hitless upgrades, Production changes that are auditable.

Over the last 8 years these are some of the patterns I find repeatedly prove valuable, and I have called operational metric(s) that it affects, #metric

Rollback testing. It's not sufficient to outline in a production readiness checklist, this should be exercised regularly for correctness, limitations and timing. Other dimensions to it including but not limited to configurations, binary and the qualification process. #mttr

Single version in production - when running a production service whose core characteristic is feature velocity I have seen this principle keeps the chaos in check. Other considerations such as engineering team discipline, maturity of the team, modularity of the code base must be factored into the successful adoption of this policy. #cogntive-load #mttr 

Symptom based alerting - no news is good news for service where the SLOs are well-tested and do represent end to end customer experience. In my experience that, happy-slos is a destination in the journey nowhere near the beginning. #diagnostic #triage #tools #observability

Service Level Objectives( SLOs) - an all important tool in establishing an agreement with the consumers of the service. Indictors are essential pre-step to setting Objectives. So iterating over SLIs is essential to determine a good objective. And good objective is not subjective - there's a value at which the customer is happy. This provides an error margin for the service to use for maintenance and upgrades. #observability #objective-measure

Hitless Upgrades - running a service in production means it will need to be updated periodically, for feature changes and bug fixes. Maintenance, dependency management - all operations needing a restart in place ranging from machine repair, operating system upgrades, etc. #maintenance #toil

I hope to add more notes for each in more detail on subsequent posts.

No comments: