Book Review: Practice of Cloud System Administration, The: DevOps and SRE Practices for Web Services, Volume 2 by Thomas Limoncelli, Strata Chalup, Christina J. Hogan (2015)

Camilo Matajira

February 6, 2021

1. Authors’ purpose

The objective that Limoncelli, Chalup, and Hogan had when writing the book was to help the reader “(…) build and run the best cloud-scale service possible.” (p. 1)

To do this, their approach is to concentrate their efforts, not on current technological trends, but rather focus on the fundamental principles and practices that are timeless (p. xxiv). In other words, the authors do not provide recommendations on products or technologies, instead, they provide the principles that should guide the user in selecting one technology over another (p. xxiv). The idea is to “prepare you for a long career where technology changes over time but you are always prepared”. (p. xxiv)

2. Main themes of the book

The authors divide the book in two sets. In the first, they discussed the different architectural patterns to implement a distributed service. For example, the main topics covered were: 1) Web applications with distributed computing (“Distributed computing is the art of building large systems that divide the work over many machines”, p.9); 2) Design patterns for Scaling and 3) Design patterns for Resiliency.

In the second part of the book, how to achieve operational excellence in a team that administers cloud infrastructure. The authors mention the importance of developing a DevOps culture.

3. Commonalities with other books

The authors of the book share the DevOps ideas and hence the book shares the DevOps philosophy with other books. (To corroborate that the authors share the DevOps ideas, look at the title of the book: The Practice of Cloud System Administration: DevOps and Sre Practices for Web Services, Volume 2). DevOps advocates for mutual responsibility of code between developers and operations; control the Work In Progress; put the business goals first; increment the speed of delivering value by automatizing CICD pipelines; conceive infrastructure as code; improvements should be done in the constraint (the bottleneck) etc. In this sense, some lessons of the book are similar to other Devops books like Gene Kim’s Phoenix Project.

4. The uniqueness of the book

There are at least two main contributions of the book. The first is all the treatment it gives to distributed computing architecture. Specifically:

The fundamental patterns for distributed computing (Load Balancer with multiple backend replicas, server with multiple backends, server tree).
Web application architectures (single machine web server, Three-Tier Web Service, Four-Tier Web Service).
How to scale a service at a global level.
How to design for scaling (AKF scaling cube, caching, data sharding etc.)
How to design for resiliency.

The second contribution of the book is the human/soft aspect of System Administration/Operations/Infrastructure team management. This book excels on this topic. I would say this is a book that prepares you to manage an Infrastructure Team. In this book the reader will learn:

How the best teams are doing daily operations in the cloud.
How to create Key Performance Indicators (KPIs) for an Infrastructure Team
How to achieve operational excellence.
How to document projects and architectural decisions using a Design Document.
How to handle Oncall on your team.
How to prepare your team for disasters.

5. Was the author successful in achieving his goal?

Yes, I think the authors fulfilled their purpose, which is to help the reader build and run the best cloud-scale service possible (p. 1).

In the first section of the book, the authors provided a complete discussion of the architectural patterns needed to build a distributed computing service. And in the second part, the authors described the day to day operations to prepare a team to manage that infrastructure.

6. Recommendation of the book.

I recommend this book to managers of Infrastructure teams because the book is rich in strategies to improve the day to day operations of their team. This book will give the manager a vision of where he should be leading his team by applying the latest advances in distributed computing, DevOps, and Cloud Architecture, and the best practices of companies like Google or Twitter. Managers will find it interesting how these companies train their staff to handle emergencies.

The book also provides tools (see the appendices) that the manager can use to: assess his team’s current level of performance; document project proposals, architecture decisions and postmortems.

Not only this but all the discussion about architecture patterns for distributed computing gives the principles that will allow a manager to take better and informed decisions about building scalable and resilient services in the Cloud.

7. What are the three most important ideas to remember?

The most reliable systems are built using cheap, unreliable components.
Software resiliency beats hardware reliability.
As a system administrator, automating the work that needs to be done should account for the majority of your job.

Tagged in :

Camilo Matajira