Salim Virji, Google LLC and Murali Suriar, Snowflake
We all have experienced in our organisations the case where we build a quick solution to solve an immediate problem, and eventually find the software fulfilling other needs. This is the story of Chubby, Google's distributed lock service, and how it began as a mechanism to provide leader election for infrastructure and evolved rapidly to provide service discovery, config-file distribution, and other production-critical services.
During this talk, the presenter will explore the evolution and maturity of the field of Site Reliability Engineering through the lens of this specific piece of infrastructure software. The audience will hear foundational experiences with monitoring, caching, proxying, and isolation — and learn about our experiences, both good and bad. The audience will also hear suggestions for the direction that SRE practice will take in the near future.
Salim Virji, Google LLC
Salim Virji develops reliable engineering practices and processes for Google’s SRE organization, and has previously built distributed consensus and storage systems. Salim’s other interests include machine learning and composting.
author = {Salim Virji and Murali Suriar},
title = {Looking at {SRE} Needs and Trends over Two Decades with a Single Service},
year = {2023},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}