Murphy’s law is universal and constant, if something can go wrong it will go wrong, which is especially true of distributed heterogeneous systems. Failures can take many forms from a complete service breakdown to a single latent service causing a cascading catastrophic failure for your users or even intermittent service failures. This talk will discuss how to build resilient, highly available systems utilizing circuit breaker and bulkhead design patterns that help provide service and user guarantees regardless of service QoS breakdowns. See how visualizing the telemetry around service interactions, latency and failures can provide valuable early insights into growing problems before they affect your customers. Learn how Netflix, one of the largest examples of a distributed system, implements these patterns at scale and how you can apply them to your infrastructure big or small. Failure is now an option when implemented the right way!